# Data mining

# Lesson 4

# Analyzing Association Rules Using Apriori and FP-Growth

### **Objectives:**
- Learn to implement and compare Apriori and FP-Growth algorithms.
- Understand and calculate key metrics for association rules (support, confidence, and lift).
- Analyze the efficiency and output differences between the two algorithms.

### **Description**

Association rules mining is a fundamental technique for discovering interesting relationships or patterns in transactional datasets. In this lab, we will simulate a dataset of transactions and use Apriori and FP-Growth algorithms to mine frequent itemsets and extract association rules. These rules will be evaluated using metrics such as support, confidence, and lift.

### What we will learn:
- Preprocessing transactional data.
- Generating frequent itemsets using Apriori and FP-Growth.
- Extracting and evaluating association rules.
- Visualizing and interpreting the results.

### Libraries that we use:

- [Pandas](https://pandas.pydata.org/) - a library for working with tabular data, which will help us in the data preparation phase.
- [Matplotlib](https://matplotlib.org/) and [Seaborn](https://seaborn.pydata.org/) - for data visualization and identifying interesting patterns.
- [Scikit-learn](https://scikit-learn.org/stable/) - machine learning library for building and evaluating models.
- [Numpy](https://numpy.org/) - a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- [mlxtend](https://rasbt.github.io/mlxtend/) - for Apriori and FP-Growth algorithms.


#### Structure: Rows represent transactions, and columns represent items.

1 = item purchased.
0 = item not purchased.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Download data
data = pd.read_csv('data.csv')
basket = pd.DataFrame(data)
# Description
print(basket.head())


## **Exercise 1:** Preprocess the Data
- Analyze the dataset to understand:
 1) Total number of transactions.
 2) Frequency of each item in the dataset.
- Convert the dataset into a binary format (if not already binary).

In [None]:
# Analyze the data
print("\nNumber of transactions and items:")


# Count item frequencies
print("\nItem Frequencies:")


# Optional: Filter out items with very low frequency (e.g., purchased in fewer than 10 transactions)
print("\nFiltered dataset shape (after removing rare items):")


## **Exercise 2:** Generate Frequent Itemsets Using Apriori
- Use the Apriori algorithm to identify frequent itemsets.
- Set a minimum support threshold to filter out infrequent itemsets.
- Display the top-10 frequent itemsets based on support.

In [None]:
from mlxtend.frequent_patterns import apriori

# Apply the Apriori algorithm to find frequent itemsets


# Sort and display the top-10 itemsets by support



## **Exercise 3:** Extract Association Rules (Apriori)
- Generate association rules using the frequent itemsets obtained from Apriori.
- Calculate key metrics for each rule:
1) Support
2) Confidence
3) Lift
- Display the top-10 rules sorted by lift.

In [None]:
from mlxtend.frequent_patterns import association_rules

# Generate association rules

# Sort and display top-10 rules by lift


## **Exercise 4:** Generate Frequent Itemsets Using FP-Growth
- Apply the FP-Growth algorithm to find frequent itemsets.
- Compare the results with Apriori in terms of runtime and frequent itemsets generated.

In [None]:
from mlxtend.frequent_patterns import fpgrowth
import time

# Measure runtime for FP-Growth

# Display top-10 frequent itemsets

# Print runtime



## **Exercise 5:** Extract Association Rules (FP-Growth)
- Generate association rules using frequent itemsets from FP-Growth.
- Compare the rules with those obtained from Apriori.
- Compare the runtime of Apriori and FP-Growth.
- Compare the number of frequent itemsets and association rules generated.

In [None]:
# Generate association rules using FP-Growth

# Display top-10 rules sorted by lift

# Runtime comparison




## **Exercise 6:** Visualize Association Rules
- Visualize rules based on support, confidence, and lift.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Visualize Apriori rules

# Visualize FP-Growth rules



## Consclusion:

We learned: 

- Preprocessing transactional data.
- Generating frequent itemsets using Apriori and FP-Growth.
- Extracting and evaluating association rules.
- Visualizing and interpreting the results.

This lab provides students with a practical understanding of Apriori and FP-Growth algorithms for association rule mining, using synthetic transactional data.


