# Background

In this demo project, we will be using the dataset from a French store that consists of various transactions of individual customers. Our aim is to assess the best pairs of items bought together which can later be used by the shop owner to present a 'buy item 1 and get item 2 for free' offer to its future customers. This project is a typical example of market basket analysis   

# Importing libraries and dataset

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
df = pd.read_csv('Market_Basket_Optimisation.csv',header=None)

In [3]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [5]:
#Importing the specific apyori package for this exercise
# pip install apyori

# Data Pre-Processing

In [6]:
#Apyori algorithm requires the data to be passed as a list of transactions
#Using the imported dataset let us create this list first

transactions = []
for i in range(0,len(df)):
    #inner for loop is used to add elements present in each column
    transactions.append([str(df.values[i,j]) for j in range(0,20)]) #apyori algorithm requires string inputs  

In [7]:
#Let us check how first 10 observations of our dataframe is stored in the list 
for i in transactions[:10]:
    print(i)
    print()

['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']

['burgers', 'meatballs', 'eggs', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

['chutney', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

['turkey', 'avocado', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

['low fat yogurt', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'

# Training the Apriori Model on the Dataset

- For support, we consider any items that are purchased at least 3 times in a day for the entire week (based on common intuition)
- For confidence, the default value in other pacakges is usually 0.8. However, we want to select our final rules from a large pool of possible rules, so we will lower our threshold to 0.2 
- For lift, we will use a threshold of 3 to get good quality rules
- We are using this problem to generate possible combo pack sales (buy 1 & get 1 free), hence in our rule we need to limit pairing of one item with another and it should not contain more than 2 items. Hence, finally we include min_length and max_length kwargs

In [8]:
from apyori import apriori

rules = apriori(transactions=transactions, min_support = (3*7)/7501, min_confidence = 0.2, min_lift = 3,
               min_length = 2, max_length = 2)

# Visualising the results 

## 1. Displaying the first results coming directly from the output of the apriori function

In [9]:
results = list(rules)

In [10]:
for i in results:
    print(i)
    print()

RelationRecord(items=frozenset({'chicken', 'extra dark chocolate'}), support=0.0027996267164378083, ordered_statistics=[OrderedStatistic(items_base=frozenset({'extra dark chocolate'}), items_add=frozenset({'chicken'}), confidence=0.23333333333333334, lift=3.8894074074074076)])

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])

RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)])

RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope

**Interpretation:** 
Here, we need to carefully read which is the base item and the additional item in each rule. For example, in the first rule extra dark chocolate is the base item which means that consumers who buy extra dark chocolate are more likely to buy chicken along with it

## 2. Putting the results into a well-organised pandas DataFrame object

**Method 1**

In [11]:
cols = ['items','support','items_base','items_add','confidence','lift']
df_results = pd.DataFrame(columns=cols)

In [12]:
def splitter(text):
    items = text.partition('items=frozenset({')[2].split('}',1)[0]
    support = text.partition('support=')[2].split(',',1)[0]
    items_base = text.partition('items_base=frozenset({')[2].split('}',1)[0]
    items_add = text.partition('items_add=frozenset({')[2].split('}',1)[0]
    confidence = text.partition('confidence=')[2].split(',',1)[0]
    lift = text.partition('lift=')[2].split(')])')[0]
    return pd.DataFrame([[items,support,items_base,items_add,confidence,lift]],columns=cols)

In [13]:
for i in results:
    df_results = pd.concat([df_results,splitter(str(i))], axis=0)

In [14]:
df_results.reset_index(drop=True,inplace=True)

**Method 2**

In [15]:
cols = ['items','support','items_base','items_add','confidence','lift']
df2_results = pd.DataFrame(columns=cols)

In [16]:
def inspect(row):
    items = results[row][0]
    support = results[row][1]
    items_base = tuple(results[row][2][0][0])[0]
    items_add = tuple(results[row][2][0][1])[0]
    confidence = results[row][2][0][2]
    lift = results[row][2][0][3]
    return pd.DataFrame([[items,support,items_base,items_add,confidence,lift]],columns=cols)

In [17]:
for i in range(0,len(results)):
    df2_results = pd.concat([df2_results,inspect(i)],axis=0)

In [18]:
df2_results.reset_index(drop=True,inplace=True)

## 3. Displaying the results - Before sorting

In [19]:
#Method 1 output
df_results

Unnamed: 0,items,support,items_base,items_add,confidence,lift
0,"'chicken', 'extra dark chocolate'",0.0027996267164378,'extra dark chocolate','chicken',0.2333333333333333,3.889407407407408
1,"'light cream', 'chicken'",0.0045327289694707,'light cream','chicken',0.2905982905982905,4.84395061728395
2,"'mushroom cream sauce', 'escalope'",0.0057325689908012,'mushroom cream sauce','escalope',0.3006993006993007,3.790832696715049
3,"'pasta', 'escalope'",0.0058658845487268,'pasta','escalope',0.3728813559322034,4.700811850163794
4,"'fromage blanc', 'honey'",0.0033328889481402,'fromage blanc','honey',0.2450980392156863,5.164270764485569
5,"'ground beef', 'herb & pepper'",0.0159978669510731,'herb & pepper','ground beef',0.3234501347708895,3.2919938411349285
6,"'ground beef', 'tomato sauce'",0.0053326223170243,'tomato sauce','ground beef',0.3773584905660377,3.840659481324083
7,"'light cream', 'olive oil'",0.0031995733902146,'light cream','olive oil',0.2051282051282051,3.1147098515519573
8,"'whole wheat pasta', 'olive oil'",0.0079989334755365,'whole wheat pasta','olive oil',0.2714932126696833,4.122410097642296
9,"'pasta', 'shrimp'",0.0050659912011731,'pasta','shrimp',0.3220338983050847,4.506672147735896


In [20]:
#Method 2 output
df2_results

Unnamed: 0,items,support,items_base,items_add,confidence,lift
0,"(chicken, extra dark chocolate)",0.0028,extra dark chocolate,chicken,0.233333,3.889407
1,"(light cream, chicken)",0.004533,light cream,chicken,0.290598,4.843951
2,"(mushroom cream sauce, escalope)",0.005733,mushroom cream sauce,escalope,0.300699,3.790833
3,"(pasta, escalope)",0.005866,pasta,escalope,0.372881,4.700812
4,"(fromage blanc, honey)",0.003333,fromage blanc,honey,0.245098,5.164271
5,"(ground beef, herb & pepper)",0.015998,herb & pepper,ground beef,0.32345,3.291994
6,"(ground beef, tomato sauce)",0.005333,tomato sauce,ground beef,0.377358,3.840659
7,"(light cream, olive oil)",0.0032,light cream,olive oil,0.205128,3.11471
8,"(whole wheat pasta, olive oil)",0.007999,whole wheat pasta,olive oil,0.271493,4.12241
9,"(pasta, shrimp)",0.005066,pasta,shrimp,0.322034,4.506672


## 4. Displaying the results - After sorting

In [21]:
df_results.sort_values(by='lift',ascending=False)

Unnamed: 0,items,support,items_base,items_add,confidence,lift
4,"'fromage blanc', 'honey'",0.0033328889481402,'fromage blanc','honey',0.2450980392156863,5.164270764485569
1,"'light cream', 'chicken'",0.0045327289694707,'light cream','chicken',0.2905982905982905,4.84395061728395
3,"'pasta', 'escalope'",0.0058658845487268,'pasta','escalope',0.3728813559322034,4.700811850163794
9,"'pasta', 'shrimp'",0.0050659912011731,'pasta','shrimp',0.3220338983050847,4.506672147735896
8,"'whole wheat pasta', 'olive oil'",0.0079989334755365,'whole wheat pasta','olive oil',0.2714932126696833,4.122410097642296
0,"'chicken', 'extra dark chocolate'",0.0027996267164378,'extra dark chocolate','chicken',0.2333333333333333,3.889407407407408
6,"'ground beef', 'tomato sauce'",0.0053326223170243,'tomato sauce','ground beef',0.3773584905660377,3.840659481324083
2,"'mushroom cream sauce', 'escalope'",0.0057325689908012,'mushroom cream sauce','escalope',0.3006993006993007,3.790832696715049
5,"'ground beef', 'herb & pepper'",0.0159978669510731,'herb & pepper','ground beef',0.3234501347708895,3.2919938411349285
7,"'light cream', 'olive oil'",0.0031995733902146,'light cream','olive oil',0.2051282051282051,3.1147098515519573


# Conclusion:

From the above market basket anaysis output, we can suggest the following top 3 offers to the owner - 
- Buy 'fromage blanc' and get 'honey' for free
- Buy 'light cream' and get 'chicken' for free
- Buy 'pasta' and get 'escalope' for free

**Note:** This project was done as a follow-along of Udemy course - https://www.udemy.com/course/machinelearning/