# Apriori Case Study
Let's play with a Market Basket Analysis Data for Apriori algorithm and association rule mining

## Setup
Make sure you have the mlxtend library installed, as it provides an efficient implementation of the Apriori algorithm.

You can install the mlxtend library using pip:

In [None]:
!pip install mlxtend



## Load the Dataset
For this tutorial, we'll use a dataset ``Market Basket Analysis Data". You should upload the `.csv` file to your Google Colab. Also, don't forget to set `index_col = 0` when you use `pd.read_csv()`

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Upoad the dataset
df = pd.read_csv('/content/basket_analysis.csv', index_col=0)
print(df.info())
df[:10]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999 entries, 0 to 998
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   Apple         999 non-null    bool 
 1   Bread         999 non-null    bool 
 2   Butter        999 non-null    bool 
 3   Cheese        999 non-null    bool 
 4   Corn          999 non-null    bool 
 5   Dill          999 non-null    bool 
 6   Eggs          999 non-null    bool 
 7   Ice cream     999 non-null    bool 
 8   Kidney Beans  999 non-null    bool 
 9   Milk          999 non-null    bool 
 10  Nutmeg        999 non-null    bool 
 11  Onion         999 non-null    bool 
 12  Sugar         999 non-null    bool 
 13  Unicorn       999 non-null    bool 
 14  Yogurt        999 non-null    bool 
 15  chocolate     999 non-null    bool 
dtypes: bool(16)
memory usage: 23.4 KB
None


  and should_run_async(code)


Unnamed: 0,Apple,Bread,Butter,Cheese,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Sugar,Unicorn,Yogurt,chocolate
0,False,True,False,False,True,True,False,True,False,False,False,False,True,False,True,True
1,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
2,True,False,True,False,False,True,False,True,False,True,False,False,False,False,True,True
3,False,False,True,True,False,True,False,False,False,True,True,True,False,False,False,False
4,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False
5,True,True,True,True,False,True,False,True,False,False,True,False,False,True,True,True
6,False,False,True,False,False,False,True,True,True,True,True,True,False,False,True,False
7,True,False,False,True,False,False,True,False,False,False,True,False,True,False,True,False
8,True,False,False,False,True,True,True,True,False,True,True,True,True,True,True,True
9,True,False,False,False,False,True,True,True,False,True,False,True,True,True,False,True


## Perform Frequent Itemset Mining using Apriori
Now, we'll use the Apriori algorithm to mine frequent itemsets from the one-hot encoded "Online Retail" dataset.

In [None]:
from mlxtend.frequent_patterns import apriori

# Define the minimum support threshold (e.g., 0.05 means an itemset must appear in at least 5% of transactions)
min_support = 0.2

# Perform frequent itemset mining using Apriori
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
     support                itemsets
0   0.383383                 (Apple)
1   0.384384                 (Bread)
2   0.420420                (Butter)
3   0.404404                (Cheese)
4   0.407407                  (Corn)
5   0.398398                  (Dill)
6   0.384384                  (Eggs)
7   0.410410             (Ice cream)
8   0.408408          (Kidney Beans)
9   0.405405                  (Milk)
10  0.401401                (Nutmeg)
11  0.403403                 (Onion)
12  0.409409                 (Sugar)
13  0.389389               (Unicorn)
14  0.420420                (Yogurt)
15  0.421421             (chocolate)
16  0.207207     (Butter, Ice cream)
17  0.202202  (Butter, Kidney Beans)
18  0.202202     (Butter, chocolate)
19  0.200200  (Cheese, Kidney Beans)
20  0.202202  (chocolate, Ice cream)
21  0.211211       (Milk, chocolate)


  and should_run_async(code)


## Generate Association Rules
Next, we'll use the frequent itemsets to generate association rules and calculate various association metrics such as confidence, lift, and support.

In [None]:
from mlxtend.frequent_patterns import association_rules

# Generate association rules with minimum confidence threshold (e.g., 0.5)
min_confidence = 0.5
association_rules_df = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

print("\nAssociation Rules:")
print(association_rules_df)


Association Rules:
   antecedents  consequents  antecedent support  consequent support   support  \
0  (Ice cream)     (Butter)            0.410410            0.420420  0.207207   
1       (Milk)  (chocolate)            0.405405            0.421421  0.211211   
2  (chocolate)       (Milk)            0.421421            0.405405  0.211211   

   confidence      lift  leverage  conviction  zhangs_metric  
0    0.504878  1.200889  0.034662    1.170579       0.283728  
1    0.520988  1.236263  0.040365    1.207857       0.321413  
2    0.501188  1.236263  0.040365    1.192021       0.330310  


  and should_run_async(code)


## Interpret the Results
The output will be DataFrames containing frequent itemsets and association rules along with the corresponding support, confidence, lift, and other metrics.

You can interpret the results to identify significant associations between items in transactions. The association rules represent interesting patterns of item co-occurrences with high confidence, indicating that if the antecedent of the rule is present in a transaction, the consequent is likely to be present as well.

In this case study, we demonstrated how to perform association rule mining using the Apriori algorithm with the "Groceries" dataset. The Apriori algorithm is a powerful technique for finding frequent itemsets and generating association rules, and it is widely used for market basket analysis and recommendation systems.

Feel free to experiment with different datasets and adjust the support and confidence thresholds to discover more or less frequent itemsets and association rules based on your specific use case.