# Apriori Algorithm Implementation using Jupyter Labs

This notebook demonstrates the implementation of the **Apriori algorithm** for association rule mining in **Jupyter Labs**.

## Objective:
- Generate random user transaction data in a **grocery superstore**.


In [1]:
import pandas as pd
import numpy as np
import random
from mlxtend.frequent_patterns import apriori, association_rules

**List of 100 Frequently Bought Items**

This list will be used to generate the random transactions.


In [2]:
items = [
    'Apples', 'Bananas', 'Oranges', 'Grapes', 'Tomatoes', 'Lettuce', 'Carrots', 'Potatoes', 'Cucumbers', 'Onions',
    'Broccoli', 'Bell Peppers', 'Spinach', 'Zucchini', 'Eggplant', 'Cabbage', 'Kale', 'Cauliflower', 'Mushrooms',
    'Avocados', 'Lemons', 'Limes', 'Garlic', 'Fresh Herbs', 'Baby Diapers', 'Baby Wipes', 'Baby Lotion', 'Baby Shampoo',
    'Baby Powder', 'Baby Clothes', 'Baby Bottles', 'Baby Formula', 'Baby Snacks', 'Baby Rash Cream', 'Baby Carrier',
    'Diaper Bag', 'Baby Crib', 'Baby High Chair', 'Baby Stroller', 'Baby Teething Rings', 'Lipstick', 'Lip Gloss',
    'Lip Balm', 'Face Powder', 'Blush', 'Mascara', 'Eyeliner', 'Foundation', 'Concealer', 'Nail Polish',
    'Makeup Brushes',
    'Setting Spray', 'Face Cream', 'Face Masks', 'Hand Cream', 'Hair Shampoo', 'Hair Conditioner', 'Hair Gel',
    'Hair Oil',
    'Hairbrush', 'Facial Cleanser', 'Makeup Remover', 'Cotton Swabs', 'Toothpaste', 'Toothbrush', 'Mouthwash',
    'Deodorant',
    'Body Wash', 'Body Lotion', 'Razor', 'Shaving Cream', 'Bath Towels', 'Shower Curtain', 'Bath Mat', 'Loofah',
    'Laundry Detergent', 'Fabric Softener', 'Ironing Board', 'Iron', 'Vacuum Cleaner', 'Broom', 'Dustpan', 'Mop',
    'Toilet Paper', 'Paper Towels', 'Trash Bags', 'Dish Soap', 'Dishwashing Sponges', 'Cutting Boards', 'Knives',
    'Pots and Pans', 'Cooking Utensils', 'Mixing Bowls', 'Bakeware', 'Can Opener', 'Tupperware', 'Coffee Maker',
    'Toaster', 'Blender', 'Plates, Bowls, and Cups'
]

**Function to Generate synthetic grocery transaction data using the above items**

In [3]:
def generate_grocery_dataset(num_transactions=1000, max_transaction_size=20):
    transactions = []

    for transaction_id in range(1, num_transactions + 1):
        # Reduce the maximum transaction size to avoid large transactions
        transaction_size = np.random.randint(1, max_transaction_size + 1)
        transaction = random.sample(items, transaction_size)
        transaction_data = [1 if item in transaction else 0 for item in items]
        transactions.append([transaction_id] + transaction_data)

    df = pd.DataFrame(transactions, columns=['Transaction_ID'] + items)

    return df

**Generate random transactions with transaction id**

In [4]:
df = generate_grocery_dataset(num_transactions=500, max_transaction_size=15)

**Print the generated random dataset with Trtansaction IDs**

In [6]:
print("Random Grocery Store Transactions:")
print(df.head())

Random Grocery Store Transactions:
   Transaction_ID  Apples  Bananas  Oranges  Grapes  Tomatoes  Lettuce  \
0               1       0        0        1       0         0        0   
1               2       0        0        0       0         0        0   
2               3       0        0        0       0         0        0   
3               4       0        0        0       0         0        0   
4               5       0        1        0       0         0        0   

   Carrots  Potatoes  Cucumbers  ...  Pots and Pans  Cooking Utensils  \
0        0         0          0  ...              0                 0   
1        0         0          0  ...              0                 0   
2        0         0          0  ...              0                 1   
3        0         0          0  ...              0                 0   
4        0         0          0  ...              0                 0   

   Mixing Bowls  Bakeware  Can Opener  Tupperware  Coffee Maker  Toaster  \
0    

**Reset index before applying Apriori so 'Transaction_ID' is a column**


In [7]:
df_apriori = df.reset_index(drop=True)

**Apply Apriori algorithm (excluding 'Transaction_ID' column)**

In [8]:
df_apriori_items = df_apriori.drop(columns=['Transaction_ID'])

**Apply apriori algorithm with a lower min_support (0.02) to increase the chances of finding frequent itemsets**

In [9]:
frequent_itemsets = apriori(df_apriori_items, min_support=0.02, use_colnames=True)



In [10]:
# Generate association rules with a lower confidence threshold (0.05)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.05)

In [11]:
# Calculate lift manually (lift = confidence / antecedent support)
rules['lift'] = rules['confidence'] / rules['antecedent support']  


**Display Result**

In [12]:
print("\nFrequent Itemsets:")
print(frequent_itemsets)

print("\nAssociation Rules with Confidence and Lift:")
print(rules[['antecedents', 'consequents', 'confidence', 'lift']])


Frequent Itemsets:
     support                        itemsets
0      0.064                        (Apples)
1      0.062                       (Bananas)
2      0.076                       (Oranges)
3      0.092                        (Grapes)
4      0.088                      (Tomatoes)
..       ...                             ...
128    0.020         (Face Masks, Dish Soap)
129    0.020  (Face Masks, Cooking Utensils)
130    0.020      (Hair Oil, Makeup Remover)
131    0.020     (Fabric Softener, Hair Oil)
132    0.022    (Cooking Utensils, Hair Oil)

[133 rows x 2 columns]

Association Rules with Confidence and Lift:
           antecedents         consequents  confidence      lift
0            (Lettuce)      (Bell Peppers)    0.203704  1.886145
1       (Bell Peppers)           (Lettuce)    0.275000  3.437500
2          (Mushrooms)           (Lettuce)    0.244898  2.498959
3            (Lettuce)         (Mushrooms)    0.222222  2.057613
4        (Fresh Herbs)           (Lettuce)    