# Data mining

# Lesson 4

# Analyzing Association Rules Using Apriori and FP-Growth

### **Objectives:**
- Learn to implement and compare Apriori and FP-Growth algorithms.
- Understand and calculate key metrics for association rules (support, confidence, and lift).
- Analyze the efficiency and output differences between the two algorithms.

### **Description**

Association rules mining is a fundamental technique for discovering interesting relationships or patterns in transactional datasets. In this lab, we will simulate a dataset of transactions and use Apriori and FP-Growth algorithms to mine frequent itemsets and extract association rules. These rules will be evaluated using metrics such as support, confidence, and lift.

### What we will learn:
- Preprocessing transactional data.
- Generating frequent itemsets using Apriori and FP-Growth.
- Extracting and evaluating association rules.
- Visualizing and interpreting the results.

### Libraries that we use:

- [Pandas](https://pandas.pydata.org/) - a library for working with tabular data, which will help us in the data preparation phase.
- [Matplotlib](https://matplotlib.org/) and [Seaborn](https://seaborn.pydata.org/) - for data visualization and identifying interesting patterns.
- [Scikit-learn](https://scikit-learn.org/stable/) - machine learning library for building and evaluating models.
- [Numpy](https://numpy.org/) - a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- [mlxtend](https://rasbt.github.io/mlxtend/) - for Apriori and FP-Growth algorithms.


#### Structure: Rows represent transactions, and columns represent items.

1 = item purchased.
0 = item not purchased.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Download data
data = pd.read_csv('data.csv')
basket = pd.DataFrame(data)
# Description
print(basket.head())


## **Exercise 1:** Preprocess the Data
- Analyze the dataset to understand:
 1) Total number of transactions.
 2) Frequency of each item in the dataset.
- Convert the dataset into a binary format (if not already binary).

In [None]:
# Analyze the data
print("\nNumber of transactions and items:")
print(f"Transactions: {basket.shape[0]}, Items: {basket.shape[1]}")

# Count item frequencies
item_frequencies = basket.sum().sort_values(ascending=False)
print("\nItem Frequencies:")
print(item_frequencies)

# Optional: Filter out items with very low frequency (e.g., purchased in fewer than 10 transactions)
basket = basket.loc[:, item_frequencies > 10]
print("\nFiltered dataset shape (after removing rare items):")
print(basket.shape)


## **Exercise 2:** Generate Frequent Itemsets Using Apriori
- Use the Apriori algorithm to identify frequent itemsets.
- Set a minimum support threshold to filter out infrequent itemsets.
- Display the top-10 frequent itemsets based on support.

In [None]:
from mlxtend.frequent_patterns import apriori

# Apply the Apriori algorithm to find frequent itemsets
min_support = 0.05  # Minimum support threshold
frequent_itemsets_apriori = apriori(basket, min_support=min_support, use_colnames=True)

# Sort and display the top-10 itemsets by support
print("\nTop-10 Frequent Itemsets (Apriori):")
print(frequent_itemsets_apriori.sort_values(by='support', ascending=False).head(10))


## **Exercise 3:** Extract Association Rules (Apriori)
- Generate association rules using the frequent itemsets obtained from Apriori.
- Calculate key metrics for each rule:
1) Support
2) Confidence
3) Lift
- Display the top-10 rules sorted by lift.

In [None]:
from mlxtend.frequent_patterns import association_rules

# Generate association rules
rules_apriori = association_rules(frequent_itemsets_apriori, metric="lift", min_threshold=1.0, num_itemsets=100)

# Sort and display top-10 rules by lift
print("\nTop-10 Association Rules (Apriori):")
print(rules_apriori.sort_values(by='lift', ascending=False).head(10))


## **Exercise 4:** Generate Frequent Itemsets Using FP-Growth
- Apply the FP-Growth algorithm to find frequent itemsets.
- Compare the results with Apriori in terms of runtime and frequent itemsets generated.

In [None]:
from mlxtend.frequent_patterns import fpgrowth
import time

# Measure runtime for FP-Growth
start_time = time.time()
frequent_itemsets_fpgrowth = fpgrowth(basket, min_support=min_support, use_colnames=True)
end_time = time.time()

# Display top-10 frequent itemsets
print("\nTop-10 Frequent Itemsets (FP-Growth):")
print(frequent_itemsets_fpgrowth.sort_values(by='support', ascending=False).head(10))

# Print runtime
print(f"\nFP-Growth Runtime: {end_time - start_time:.2f} seconds")


## **Exercise 5:** Extract Association Rules (FP-Growth)
- Generate association rules using frequent itemsets from FP-Growth.
- Compare the rules with those obtained from Apriori.
- Compare the runtime of Apriori and FP-Growth.
- Compare the number of frequent itemsets and association rules generated.

In [None]:
# Generate association rules using FP-Growth
rules_fpgrowth = association_rules(frequent_itemsets_fpgrowth, metric="lift", min_threshold=1.0, num_itemsets=100)

# Display top-10 rules sorted by lift
print("\nTop-10 Association Rules (FP-Growth):")
print(rules_fpgrowth.sort_values(by='lift', ascending=False).head(10))

# Runtime comparison
print("\nComparison of Apriori and FP-Growth:")
print(f"Number of Apriori Rules: {len(rules_apriori)}")
print(f"Number of FP-Growth Rules: {len(rules_fpgrowth)}")



## **Exercise 6:** Visualize Association Rules
- Visualize rules based on support, confidence, and lift.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Visualize Apriori rules
plt.figure(figsize=(10, 6))
sns.scatterplot(x='support', y='confidence', size='lift', hue='lift', data=rules_apriori, alpha=0.7)
plt.title('Association Rules (Apriori)')
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.legend(title='Lift')
plt.show()

# Visualize FP-Growth rules
plt.figure(figsize=(10, 6))
sns.scatterplot(x='support', y='confidence', size='lift', hue='lift', data=rules_fpgrowth, alpha=0.7)
plt.title('Association Rules (FP-Growth)')
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.legend(title='Lift')
plt.show()


## Consclusion:

We learned: 

- Preprocessing transactional data.
- Generating frequent itemsets using Apriori and FP-Growth.
- Extracting and evaluating association rules.
- Visualizing and interpreting the results.

This lab provides students with a practical understanding of Apriori and FP-Growth algorithms for association rule mining, using synthetic transactional data.


