Market Basket Analysis (MBA) is a data mining technique used to understand consumer purchase behavior. It is based on the idea that if a customer buys a specific set of items, they are more likely to buy other related items in the future.

The technique analyzes transaction data to identify patterns, associations, and relationships between products that are frequently purchased together. For example, if customers who buy bread often also buy butter, the retailer can use this information to suggest butter to customers who purchase bread.

Catalog Design
Customized emails with add-on sales, etc.

*   Changing the store layout according to trends
*   Customers behavior analysis
*   Catalog Design
*  Cross marketing on online stores
* Customized emails with add-on sales, etc.





#### Support 🛍️

measures how popular an item or itemset is among all transactions.

#### Confidence 🎯

Confidence measures the likelihood of buying item B when A is purchased. It helps answer: "If someone buys A, how likely are they to buy B?"

```
Confidence(A → B) = Support(A,B) / Support(A)
```

Example:

If bread and butter appear together in 20 transactions, and bread appears in 30 transactions
Confidence(bread → butter) = 20/30 = 0.67 or 67%
This means 67% of customers who buy bread also buy butter

#### Lift 📈

Lift measures how much more likely items are to be bought together compared to by random chance.


```
Lift(A → B) = Confidence(A → B) / Support(B)
```
#### Interpreting Lift Values:

Lift = 1: Items are independent

Products have no effect on each other's sales


Lift > 1: Positive correlation

Products complement each other
Example: Lift = 2 means customers are twice as likely to buy B when they buy A


Lift < 1: Negative correlation

Products substitute each other
Example: Lift = 0.5 means customers are half as likely to buy B when they buy A


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import warnings
warnings.filterwarnings('ignore')

root = '/content/drive/MyDrive/instacart-data/'

## Data

In [None]:
orders = pd.read_csv(root + 'orders.csv')
order_products_prior = pd.read_csv(root + 'order_products__prior.csv')
order_products_train = pd.read_csv(root + 'order_products__train.csv')
products = pd.read_csv(root + 'products.csv')

  and should_run_async(code)


In [None]:
order_products = pd.concat([order_products_prior, order_products_train], ignore_index = True)
print(order_products.shape)

  and should_run_async(code)


(33819106, 4)


In [None]:
order_products.head()


  and should_run_async(code)


Unnamed: 0,order_id,product_id,add_to_cart_order,reordered
0,2,33120,1,1
1,2,28985,2,1
2,2,9327,3,0
3,2,45918,4,1
4,2,30035,5,0


In [None]:
order_products.product_id.nunique()


  and should_run_async(code)


49685

Out of 49685 keeping top 100 most frequent products.



In [None]:
product_counts = order_products.groupby('product_id')['order_id'].count().reset_index().rename(columns = {'order_id':'frequency'})
product_counts = product_counts.sort_values('frequency', ascending=False)[0:100].reset_index(drop = True)
product_counts = product_counts.merge(products, on = 'product_id', how = 'left')
product_counts.head()

Unnamed: 0,product_id,frequency,product_name,aisle_id,department_id
0,24852,491291,Banana,24,4
1,13176,394930,Bag of Organic Bananas,24,4
2,21137,275577,Organic Strawberries,24,4
3,21903,251705,Organic Baby Spinach,123,4
4,47209,220877,Organic Hass Avocado,24,4


Keeping 100 most frequent items in order_products dataframe



In [None]:
freq_products = list(product_counts.product_id)
freq_products[1:10]

[13176, 21137, 21903, 47209, 47766, 47626, 16797, 26209, 27845]

In [None]:
print(len(freq_products))
order_products = order_products[order_products.product_id.isin(freq_products)]
order_products.shape

100


(7795471, 4)

In [None]:
order_products.order_id.nunique()
# total orders places from top 100 products

2444982

In [None]:
order_products = order_products.merge(products, on = 'product_id', how='left')
order_products.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id
0,2,28985,2,1,Michigan Organic Kale,83,4
1,2,17794,6,1,Carrots,83,4
2,3,24838,2,1,Unsweetened Almondmilk,91,16
3,3,21903,4,1,Organic Baby Spinach,123,4
4,3,46667,6,1,Organic Ginger Root,83,4


Structuring the data for feeding in the algorithm


In [None]:
basket = order_products.groupby(['order_id', 'product_name'])['reordered'].count().unstack().reset_index().fillna(0).set_index('order_id')
basket.head()

product_name,100% Raw Coconut Water,100% Whole Wheat Bread,2% Reduced Fat Milk,Apple Honeycrisp Organic,Asparagus,Bag of Organic Bananas,Banana,Bartlett Pears,Blueberries,Boneless Skinless Chicken Breasts,...,Sparkling Natural Mineral Water,Sparkling Water Grapefruit,Spring Water,Strawberries,Uncured Genoa Salami,Unsalted Butter,Unsweetened Almondmilk,Unsweetened Original Almond Breeze Almond Milk,Whole Milk,Yellow Onions
order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
basket.shape

(2444982, 100)

In [None]:
del product_counts, products, order_products, order_products_prior, order_products_train

In [None]:
basket = basket.astype('int32')

In [None]:
basket.head()

product_name,100% Raw Coconut Water,100% Whole Wheat Bread,2% Reduced Fat Milk,Apple Honeycrisp Organic,Asparagus,Bag of Organic Bananas,Banana,Bartlett Pears,Blueberries,Boneless Skinless Chicken Breasts,...,Sparkling Natural Mineral Water,Sparkling Water Grapefruit,Spring Water,Strawberries,Uncured Genoa Salami,Unsalted Butter,Unsweetened Almondmilk,Unsweetened Original Almond Breeze Almond Milk,Whole Milk,Yellow Onions
order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
5,0,0,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Creating frequent sets and rules



In [None]:
frequent_items = apriori(basket, min_support=0.01, use_colnames=True, low_memory=True)
frequent_items.head()

Unnamed: 0,support,itemsets
0,0.016062,(100% Raw Coconut Water)
1,0.025814,(100% Whole Wheat Bread)
2,0.0158,(2% Reduced Fat Milk)
3,0.035694,(Apple Honeycrisp Organic)
4,0.029101,(Asparagus)


In [None]:
frequent_items.shape


(129, 2)

In [None]:
rules = association_rules(frequent_items, metric="lift", min_threshold=1)
rules = rules.sort_values('lift', ascending=False).reset_index(drop=True)
rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Large Lemon),(Limes),0.065764,0.059984,0.01186,0.180345,3.006544,0.007915,1.146843,0.714372
1,(Limes),(Large Lemon),0.059984,0.065764,0.01186,0.197723,3.006544,0.007915,1.16448,0.70998
2,(Organic Strawberries),(Organic Raspberries),0.112711,0.058325,0.014533,0.12894,2.210731,0.007959,1.081069,0.61723
3,(Organic Raspberries),(Organic Strawberries),0.058325,0.112711,0.014533,0.249174,2.210731,0.007959,1.181751,0.581582
4,(Organic Avocado),(Large Lemon),0.075348,0.065764,0.010538,0.139862,2.126728,0.005583,1.086147,0.572966
5,(Large Lemon),(Organic Avocado),0.065764,0.075348,0.010538,0.160244,2.126728,0.005583,1.101097,0.567088
6,(Organic Strawberries),(Organic Blueberries),0.112711,0.042956,0.010235,0.090809,2.114024,0.005394,1.052633,0.593909
7,(Organic Blueberries),(Organic Strawberries),0.042956,0.112711,0.010235,0.238274,2.114024,0.005394,1.16484,0.550621
8,(Organic Raspberries),(Organic Hass Avocado),0.058325,0.090339,0.010966,0.188018,2.081257,0.005697,1.120298,0.551699
9,(Organic Hass Avocado),(Organic Raspberries),0.090339,0.058325,0.010966,0.121389,2.081257,0.005697,1.071777,0.571115
