**OBJECTIVE: TO PERFORM APRIORI ALGORITHM ON A TRANSACTION DATASET**

Apriori algorithm refers to the algorithm which is used to calculate the association rules between objects. It means how two or more objects are related to one another. The apriori algorithm is an association rule leaning that analyzes that people who bought product A also bought product B.

In [1]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder
import time

In [54]:
data = pd.read_csv("/Users/prashastisaraf/Downloads/Groceries_dataset-2.csv")
data

Unnamed: 0,Member_number,Date,itemDescription
0,1000,27/05/15,soda
1,1000,15/03/15,whole milk
2,1000,24/06/14,whole milk
3,1000,15/03/15,yogurt
4,1001,12/12/14,whole milk
...,...,...,...
57,1014,03/10/15,yogurt
58,1014,06/10/14,rolls/buns
59,1015,04/05/15,citrus fruit
60,1015,04/05/15,whole milk


In [28]:
data.isnull().values.any()

False

In [40]:
## Combing all the items in list format for each cutomer

transactions = [a[1]['itemDescription'].tolist() 
                for a in list(data.groupby(['Member_number']))]

In [41]:
transactions

[['soda', 'whole milk', 'whole milk', 'yogurt'],
 ['whole milk', 'soda', 'whole milk', 'soda', 'rolls/buns'],
 ['tropical fruit', 'whole milk', 'other vegetables'],
 ['root vegetables', 'rolls/buns', 'rolls/buns', 'rolls/buns'],
 ['other vegetables',
  'root vegetables',
  'rolls/buns',
  'whole milk',
  'other vegetables',
  'whole milk',
  'whole milk',
  'tropical fruit',
  'rolls/buns'],
 ['rolls/buns', 'rolls/buns'],
 ['whole milk', 'whole milk', 'rolls/buns', 'rolls/buns'],
 ['tropical fruit', 'soda', 'yogurt', 'root vegetables', 'yogurt'],
 ['tropical fruit', 'yogurt', 'yogurt'],
 ['rolls/buns'],
 ['whole milk', 'citrus fruit', 'other vegetables', 'yogurt', 'rolls/buns'],
 ['tropical fruit', 'root vegetables', 'yogurt', 'whole milk', 'rolls/buns'],
 ['whole milk',
  'tropical fruit',
  'root vegetables',
  'other vegetables',
  'whole milk'],
 ['whole milk', 'whole milk', 'yogurt', 'rolls/buns'],
 ['citrus fruit', 'whole milk', 'rolls/buns']]

In [42]:
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,citrus fruit,other vegetables,rolls/buns,root vegetables,soda,tropical fruit,whole milk,yogurt
0,False,False,False,False,True,False,True,True
1,False,False,True,False,True,False,True,False
2,False,True,False,False,False,True,True,False
3,False,False,True,True,False,False,False,False
4,False,True,True,True,False,True,True,False
5,False,False,True,False,False,False,False,False
6,False,False,True,False,False,False,True,False
7,False,False,False,True,True,True,False,True
8,False,False,False,False,False,True,False,True
9,False,False,True,False,False,False,False,False


Here, using TransactionEncoder object, we transform this dataset into a suitable format

In [46]:
apriori(df, min_support=0.2, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.266667,(other vegetables)
1,0.666667,(rolls/buns)
2,0.333333,(root vegetables)
3,0.2,(soda)
4,0.4,(tropical fruit)
5,0.666667,(whole milk)
6,0.4,(yogurt)
7,0.2,"(tropical fruit, other vegetables)"
8,0.266667,"(other vegetables, whole milk)"
9,0.2,"(root vegetables, rolls/buns)"


A minimum support of 0.2 means that we want to find itemsets that appear in at least 20% of the transactions in the dataset.

In [48]:
frequent_itemsets = apriori(df, min_support = 0.2, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.266667,(other vegetables),1
1,0.666667,(rolls/buns),1
2,0.333333,(root vegetables),1
3,0.2,(soda),1
4,0.4,(tropical fruit),1
5,0.666667,(whole milk),1
6,0.4,(yogurt),1
7,0.2,"(tropical fruit, other vegetables)",2
8,0.266667,"(other vegetables, whole milk)",2
9,0.2,"(root vegetables, rolls/buns)",2


In [None]:
# We are taking item sets with length of 3

frequent_itemsets[(frequent_itemsets['length'] == 3) &
                   (frequent_itemsets['support'] >= 0.2)]

In association rule mining, once we have identified frequent itemsets using Apriori, we can generate association rules from these frequent itemsets. 
Association rules are in the form of "IF antecedent THEN consequent," and they represent relationships between different items in a transaction.

In [58]:
rules = association_rules(frequent_itemsets , metric="confidence", min_threshold=0.7)
#rules.head() 
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(other vegetables),(tropical fruit),0.266667,0.4,0.2,0.75,1.875,0.093333,2.4
1,(other vegetables),(whole milk),0.266667,0.666667,0.266667,1.0,1.5,0.088889,inf
2,(rolls/buns),(whole milk),0.666667,0.666667,0.466667,0.7,1.05,0.022222,1.111111
3,(whole milk),(rolls/buns),0.666667,0.666667,0.466667,0.7,1.05,0.022222,1.111111
4,(root vegetables),(tropical fruit),0.333333,0.4,0.266667,0.8,2.0,0.133333,3.0
5,"(tropical fruit, other vegetables)",(whole milk),0.2,0.666667,0.2,1.0,1.5,0.066667,inf
6,"(tropical fruit, whole milk)",(other vegetables),0.266667,0.266667,0.2,0.75,2.8125,0.128889,2.933333
7,"(other vegetables, whole milk)",(tropical fruit),0.266667,0.4,0.2,0.75,1.875,0.093333,2.4
8,(other vegetables),"(tropical fruit, whole milk)",0.266667,0.266667,0.2,0.75,2.8125,0.128889,2.933333
9,"(rolls/buns, yogurt)",(whole milk),0.2,0.666667,0.2,1.0,1.5,0.066667,inf


The confidence of a consequent event given an antecedent event can be described by using conditional probability. Simply put, it is the probability of event A happening given that event B has already happened.

Lift measures how likely an item is purchased when another item is purchased, while controlling for how popular both items are. For any value higher than 1, lift shows that there is actually an association. The higher the value, the higher the association. As all the above pair of items have lift greater than 1 they have high association between them