# Association rule mining

You will use:
* orders.csv
* order_products__prior.csv
* products.csv
* aisles.csv (optional but VERY powerful later)

### Step 1 â€” Merge the tables

In [5]:
import pandas as pd

orders = pd.read_csv("..\data_raw\orders.csv")
order_products = pd.read_csv("..\data_raw\order_products__prior.csv")
products = pd.read_csv("..\data_raw\products.csv")

# Merge product names
df = order_products.merge(products, on="product_id")

df.head()


  orders = pd.read_csv("..\data_raw\orders.csv")
  order_products = pd.read_csv("..\data_raw\order_products__prior.csv")
  products = pd.read_csv("..\data_raw\products.csv")


Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id
0,2,33120,1,1,Organic Egg Whites,86,16
1,2,28985,2,1,Michigan Organic Kale,83,4
2,2,9327,3,0,Garlic Powder,104,13
3,2,45918,4,1,Coconut Butter,19,13
4,2,30035,5,0,Natural Sweetener,17,13


### Step 2: Create baskets

We group products per order:

In [6]:
transactions = df.groupby('order_id')['product_name'].apply(list)
transactions.head()

order_id
2    [Organic Egg Whites, Michigan Organic Kale, Ga...
3    [Total 2% with Strawberry Lowfat Greek Straine...
4    [Plain Pre-Sliced Bagels, Honey/Lemon Cough Dr...
5    [Bag of Organic Bananas, Just Crisp, Parmesan,...
6    [Cleanse, Dryer Sheets Geranium Scent, Clean D...
Name: product_name, dtype: object

### Step 3 â€” One-Hot Encoding (Basket Matrix)

Association algorithms need:

| order_id | Banana | Milk | Yogurt | Bread |
| -------- | ------ | ---- | ------ | ----- |
| 1        | 1      | 1    | 1      | 0     |
| 2        | 0      | 0    | 1      | 1     |
We create it:


In [None]:
%pip install --upgrade mlxtend numpy
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)

basket = pd.DataFrame(te_ary, columns=te.columns_)
basket.head()


AttributeError: module 'numpy._core._multiarray_umath' has no attribute '_blas_supports_fpe'

In [8]:
from mlxtend.frequent_patterns import apriori, association_rules

frequent_itemsets = apriori(basket, min_support=0.01, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.1)
rules.sort_values("lift", ascending=False).head(10)


NameError: name 'basket' is not defined

Now you are ready for mining ðŸ”¥

## 3) The Algorithms (What each one really does)

### A) Apriori â€” The Foundational Algorithm

Idea:

Find items that appear frequently together.

It uses support pruning:
If {Milk, Bread} is not frequent â†’ {Milk, Bread, Eggs} can NEVER be frequent.

Run Apriori