# **MARKET BASKET ANALYSIS**

**Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions.**

For market basket analysis we will merge the dataframes order_products_prior, products, order and departments, for better analysis.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
color = sns.color_palette()
%matplotlib inline
pd.options.mode.chained_assignment = None  # default='warn'
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
aisles = pd.read_csv("../input/instacart-market-basket-analysis/aisles.csv")
departments = pd.read_csv("../input/instacart-market-basket-analysis/departments.csv")
orders = pd.read_csv("../input/instacart-market-basket-analysis/orders.csv")
products = pd.read_csv("../input/instacart-market-basket-analysis/products.csv")
order_products_prior= pd.read_csv("../input/instacart-market-basket-analysis/order_products__prior.csv")

In [3]:
merged_df=order_products_prior.merge(orders,on="order_id")
merged_df=merged_df.merge(products,on="product_id")
merged_df=merged_df.merge(aisles,on="aisle_id")
merged_df=merged_df.merge(departments,on="department_id")

Now we will add an additional column product_id_str.

In [4]:
merged_df["product_id_str"]=merged_df["product_id"].astype(str)

In [5]:
g=merged_df.groupby("order_id")["product_name"]

In [6]:
#Replace the commas and add space
g1=g.apply(lambda x: ','.join(x.str.replace(","," ")))

In [7]:
g1

order_id
2          Organic Egg Whites,Michigan Organic Kale,Carro...
3          Total 2% with Strawberry Lowfat Greek Strained...
4          Plain Pre-Sliced Bagels,Honey/Lemon Cough Drop...
5          2% Reduced Fat Milk,Mini Original Babybel Chee...
6          Cleanse,Dryer Sheets Geranium Scent,Clean Day ...
                                 ...                        
3421079                                        Moisture Soap
3421080    Organic Pasture Raised Brown Eggs,Organic Whol...
3421081    Pepper Jack Cheese Slices,Dijon Mustard,Classi...
3421082    Original Whipped Cream,Original Spray,Strawber...
3421083    Banana,Organic  Sweet & Salty Peanut Pretzel G...
Name: product_name, Length: 3214874, dtype: object

In [8]:
#Convert g1 into a dataframe
g2=pd.DataFrame(g1)

In [9]:
#Dataframe with new index
g2.reset_index(inplace=True)

In [10]:
#create a tuple under prod_list
g2["prod_list"]=g2.apply(lambda x: tuple(x["product_name"].split(",")),axis=1)

In [11]:
g2.head()

Unnamed: 0,order_id,product_name,prod_list
0,2,"Organic Egg Whites,Michigan Organic Kale,Carro...","(Organic Egg Whites, Michigan Organic Kale, Ca..."
1,3,Total 2% with Strawberry Lowfat Greek Strained...,(Total 2% with Strawberry Lowfat Greek Straine...
2,4,"Plain Pre-Sliced Bagels,Honey/Lemon Cough Drop...","(Plain Pre-Sliced Bagels, Honey/Lemon Cough Dr..."
3,5,"2% Reduced Fat Milk,Mini Original Babybel Chee...","(2% Reduced Fat Milk, Mini Original Babybel Ch..."
4,6,"Cleanse,Dryer Sheets Geranium Scent,Clean Day ...","(Cleanse, Dryer Sheets Geranium Scent, Clean D..."


Generating apriori association rules.

In [12]:
!pip install akapriori
from akapriori import apriori

Collecting akapriori
  Downloading akapriori-0.1.0.tar.gz (1.9 kB)
Building wheels for collected packages: akapriori
  Building wheel for akapriori (setup.py) ... [?25ldone
[?25h  Created wheel for akapriori: filename=akapriori-0.1.0-py3-none-any.whl size=2321 sha256=3e5c0ed1dc8e493fb68311751521d3ed61523aa2f87ca810ea413e336675ce6e
  Stored in directory: /root/.cache/pip/wheels/79/2f/2a/7d3c985d044f834811f6522d0f3adfee704bd0d9f443abaf54
Successfully built akapriori
Installing collected packages: akapriori
Successfully installed akapriori-0.1.0


In [13]:
rules = apriori(list(g2["prod_list"]), support=0.008, confidence=0.1)

In [14]:
rules_sorted = sorted(rules, key=lambda x: (x[4], x[3], x[2]), reverse=True) # ORDER BY lift DESC, confidence DESC, support DESC

In [15]:
len(rules_sorted)

36

In [16]:
for r in rules_sorted:
    print('\n')
    print(r)
    print('**********') 



(frozenset({'Large Lemon'}), frozenset({'Limes'}), 0.008523817729715067, 0.1795069993514873, 4.10370970747519)
**********


(frozenset({'Limes'}), frozenset({'Large Lemon'}), 0.008523817729715067, 0.19486300639279797, 4.103709707475189)
**********


(frozenset({'Organic Raspberries'}), frozenset({'Organic Strawberries'}), 0.01053322774080726, 0.24707238594161554, 3.0009732007029744)
**********


(frozenset({'Organic Strawberries'}), frozenset({'Organic Raspberries'}), 0.01053322774080726, 0.12793794841376288, 3.0009732007029744)
**********


(frozenset({'Organic Hass Avocado'}), frozenset({'Organic Raspberries'}), 0.008023020497848438, 0.12076279122031613, 2.8326693102988)
**********


(frozenset({'Organic Raspberries'}), frozenset({'Organic Hass Avocado'}), 0.008023020497848438, 0.18819177422532232, 2.8326693102987996)
**********


(frozenset({'Organic Fuji Apple'}), frozenset({'Banana'}), 0.01055811207530995, 0.3786928775437344, 2.5762591093300085)
**********


(frozenset({'Organic

Thus, large lemons are paired with limes, organic strawberries with organic raspberries etc. 
Hence we have successfully created a recomendataion system with the given dataset!