# Apriori Demo
Let's play with a tutorial for association rule mining using the Apriori algorithm with a dummy dataset representing transactions in a grocery store.

## Setup
Make sure you have the mlxtend library installed, as it provides an efficient implementation of the Apriori algorithm.

You can install the mlxtend library using pip:

In [None]:
!pip install mlxtend



## Create a Dummy Dataset of 1,000 Transactions in a Grocery Store
For this tutorial, we'll create a dummy dataset representing 1,000 transactions in a grocery store. Each transaction will contain a random selection of items from a list of 10 unique items.

In [None]:
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder

# Set a random seed for reproducibility
np.random.seed(42)

# Number of records (transactions) in the dataset
num_records = 1000

# Number of unique items in the grocery store
num_items = 10

# Generate the dummy dataset
transactions = []
for _ in range(num_records):
    num_items_in_transaction = np.random.randint(1, num_items + 1)
    items = np.random.choice(range(1, num_items + 1), num_items_in_transaction, replace=False)
    transactions.append([f"Item{item}" for item in items])

# Convert the dataset into a one-hot encoded format
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
print(df_encoded)

     Item1  Item10  Item2  Item3  Item4  Item5  Item6  Item7  Item8  Item9
0     True    True   True   True  False  False   True   True  False   True
1     True    True   True   True  False  False   True   True   True   True
2    False   False  False   True  False  False  False   True  False  False
3     True    True   True   True   True   True   True   True   True   True
4     True    True   True   True   True   True   True  False   True   True
..     ...     ...    ...    ...    ...    ...    ...    ...    ...    ...
995   True   False   True   True  False  False  False  False  False   True
996  False   False  False  False  False  False  False   True  False  False
997   True   False  False   True   True   True  False   True  False   True
998  False    True  False   True  False  False  False   True  False  False
999  False   False  False  False   True  False  False   True   True   True

[1000 rows x 10 columns]


## Perform Frequent Itemset Mining using Apriori
Now, we'll use the Apriori algorithm to mine frequent itemsets from the one-hot encoded "Online Retail" dataset.

In [None]:
from mlxtend.frequent_patterns import apriori

# Define the minimum support threshold (e.g., 0.05 means an itemset must appear in at least 5% of transactions)
min_support = 0.35

# Perform frequent itemset mining using Apriori
frequent_itemsets = apriori(df_encoded, min_support=min_support, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
    support         itemsets
0     0.530          (Item1)
1     0.546         (Item10)
2     0.543          (Item2)
3     0.545          (Item3)
4     0.527          (Item4)
5     0.540          (Item5)
6     0.533          (Item6)
7     0.532          (Item7)
8     0.533          (Item8)
9     0.545          (Item9)
10    0.358   (Item1, Item5)
11    0.358  (Item2, Item10)
12    0.359  (Item3, Item10)
13    0.351  (Item4, Item10)
14    0.353  (Item5, Item10)
15    0.364  (Item9, Item10)
16    0.351   (Item2, Item4)
17    0.357   (Item2, Item6)
18    0.352   (Item8, Item2)
19    0.362   (Item2, Item9)
20    0.354   (Item3, Item4)
21    0.362   (Item3, Item5)
22    0.357   (Item3, Item6)
23    0.360   (Item3, Item7)
24    0.362   (Item3, Item8)
25    0.356   (Item3, Item9)
26    0.358   (Item4, Item5)
27    0.350   (Item8, Item4)
28    0.352   (Item4, Item9)
29    0.352   (Item5, Item6)
30    0.352   (Item8, Item5)
31    0.356   (Item5, Item9)
32    0.354   (Item8, It

  and should_run_async(code)


## Generate Association Rules
Next, we'll use the frequent itemsets to generate association rules and calculate various association metrics such as confidence, lift, and support.

In [None]:
from mlxtend.frequent_patterns import association_rules

# Generate association rules with minimum confidence threshold (e.g., 0.5)
min_confidence = 0.67
association_rules_df = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

print("\nAssociation Rules:")
print(association_rules_df)


Association Rules:
  antecedents consequents  antecedent support  consequent support  support  \
0     (Item1)     (Item5)               0.530               0.540    0.358   
1     (Item4)     (Item3)               0.527               0.545    0.354   
2     (Item5)     (Item3)               0.540               0.545    0.362   
3     (Item7)     (Item3)               0.532               0.545    0.360   
4     (Item8)     (Item3)               0.533               0.545    0.362   
5     (Item4)     (Item5)               0.527               0.540    0.358   

   confidence      lift  leverage  conviction  zhangs_metric  
0    0.675472  1.250874  0.071800    1.417442       0.426721  
1    0.671727  1.232526  0.066785    1.386040       0.398855  
2    0.670370  1.230037  0.067700    1.380337       0.406558  
3    0.676692  1.241636  0.070060    1.407326       0.415836  
4    0.679174  1.246192  0.071515    1.418216       0.423031  
5    0.679317  1.257994  0.073420    1.434438       0.4

  and should_run_async(code)
