# Market Basket Optimization using Apriori

**Scenario**: We receive a dataset of transactions that lists individual grocery items purchased for each transaction in one week.

**Goal**: Find the strongest associations for business considerations.

**Results**: 

There are many pairs of grocery items that can be considered for retail decisions. 

In this dataset, fromage blanc and honey have the strongest association with a high lift value of 5.16 in that, out of all the customers who purchase fromage blanc, the probability that they also purchase honey is high. The confidence of 24.5% indicates though that, compared to many of the other pairings, these two items are not purchased together as frequently.

## Importing the libraries

In [1]:
!pip install apyori



In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

## Data Preprocessing

The data file does not have column headers so we'll need to pass an argument in the `read_csv` function.

In [3]:
dataset = pd.read_csv("Market_Basket_Optimisation.csv", header = None)

In [4]:
len(dataset)

7501

In [5]:
len(dataset.columns)

20

In [6]:
dataset.head(1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil


In [7]:
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])

In [8]:
print(transactions[:3])

[['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil'], ['burgers', 'meatballs', 'eggs', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], ['chutney', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']]


## Training the Apriori model on the dataset

We choose a minimum support of 3 transaction per day which translates more properly into `3 transactions * 7 days / 7501 transactions` = `0.003` since the transactions are based in a full week.

Since we want to find only one strong associated item for a given item, we specify the minimum and maximum lengths to 2, i.e. buy one product A, get one product B for free.

In [9]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [10]:
results = list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

The left rule is denoted by `items_base` while the right rule is denoted by `items_add`. 

In other words, using the first row as an example with the set {'chicken', 'light cream'}, if a customer purchases `items_base` = `light cream`, then they have a confidence = 0.29 or 29% chance of purchasing `items_add` = 
`chicken`.

### Putting the results well organised into a Pandas DataFrame

In [19]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [20]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [24]:
resultsinDataFrame.nlargest(n = 10, columns = "Lift")

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471


The likelihood of the right hand side products being purchased if the customer purchases the left hand side product is shown in the `Lift` column. The business consideration can be made that when the left hand side product is purchased, a sale or promotion can be applied to the right hand side product. 

Fromage blanc and honey together are ingredients in a popular French dessert recipe. An example would be a promotional bundle where if you buy any fromage blanc and honey product together, a discount would be applied or the total cost would be a discounted fixed amount.

This technique can be applied to any of the pairs of items shown above. 