## Market Basket Analysis in Python using Apriori Algorithm

Major Steps:
1.    To find how many transactions and items are there in the data set?
2.    Find top selling items with minimum support of 2%.
3.    Find all frequent itemsets with minimum support of 5%
4.    Find all frequent itemsets of length 2 with minimum support of 2%.
5.    Find the top 10 association rules with minimum support of 2%, sorted by confidence in descending order.
6.    Find association rules with minimum support of 2% and lift of more than 1.0

In [1]:
# importing the libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from csv import reader
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [3]:
# reading the dataset
groceries = []
with open('/content/drive/MyDrive/Colab Notebooks/Task 2 Code Clause Internship /groceries.csv', 'r') as read_obj:
    csv_reader = reader(read_obj)
    for row in csv_reader:
        groceries.append(row)

In [4]:
# fitting the list and converting the transactions to true and false
encoder = TransactionEncoder()
transactions = encoder.fit(groceries).transform(groceries)

In [5]:
# converting the true and false to 1 and 0
transactions = transactions.astype('int')

In [7]:
# converting the transactions array to a datafrmae
df = pd.DataFrame(transactions, columns=encoder.columns_)

In [8]:
# viewing the first few rows of the dataframe
df.head()

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,beef,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


**1) To find how many transactions and items are there in the data set?**

In [9]:
# finding the dimensions of the dataframe
df.shape

(9835, 169)

In [10]:
# applying the apriori algorithm
frequent_itemsets = apriori(df, min_support=0.02, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.033452,(UHT-milk),1
1,0.052466,(beef),1
2,0.033249,(berries),1
3,0.026029,(beverages),1
4,0.080529,(bottled beer),1
...,...,...,...
117,0.032232,"(whipped/sour cream, whole milk)",2
118,0.020742,"(whipped/sour cream, yogurt)",2
119,0.056024,"(whole milk, yogurt)",2
120,0.023183,"(whole milk, other vegetables, root vegetables)",3


**2) Find top selling items with minimum support of 2%.**

In [11]:
# sorting the dataframe
frequent_itemsets = frequent_itemsets.sort_values(by='support', ascending=False)

In [12]:
# finding top 5 items with minimum support of 2%
frequent_itemsets[ (frequent_itemsets['length'] == 1) &
                   (frequent_itemsets['support'] >= 0.02) ][0:5]

Unnamed: 0,support,itemsets,length
57,0.255516,(whole milk),1
39,0.193493,(other vegetables),1
43,0.183935,(rolls/buns),1
49,0.174377,(soda),1
58,0.139502,(yogurt),1


**3) Find all frequent itemsets with minimum support of 5%**

In [13]:
# finding itemsets having length more than 1 and minimum support of 5%
frequent_itemsets[(frequent_itemsets['length'] > 1) & 
                  (frequent_itemsets['support'] >= 0.05)]

Unnamed: 0,support,itemsets,length
91,0.074835,"(whole milk, other vegetables)",2
103,0.056634,"(whole milk, rolls/buns)",2
119,0.056024,"(whole milk, yogurt)",2


**4)  Find all frequent itemsets of length 2 with minimum support of 2%.**

In [14]:
# finding itemsets having length 2 and minimum support of 2%
frequent_itemsets[(frequent_itemsets['length'] == 2) & 
                  (frequent_itemsets['support'] >= 0.02)]

Unnamed: 0,support,itemsets,length
91,0.074835,"(whole milk, other vegetables)",2
103,0.056634,"(whole milk, rolls/buns)",2
119,0.056024,"(whole milk, yogurt)",2
106,0.048907,"(whole milk, root vegetables)",2
85,0.047382,"(other vegetables, root vegetables)",2
...,...,...,...
75,0.020539,"(frankfurter, whole milk)",2
60,0.020437,"(whole milk, bottled beer)",2
76,0.020437,"(whole milk, frozen vegetables)",2
96,0.020437,"(tropical fruit, pip fruit)",2


**5)Find the top 10 association rules with minimum support of 2%, sorted by confidence in descending order.**

In [15]:
# finding top 10 association rules with minimum support of 2%
rules = association_rules(frequent_itemsets, metric='support', min_threshold=0.02)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(whole milk),(other vegetables),0.255516,0.193493,0.074835,0.292877,1.513634,0.025394,1.140548
1,(other vegetables),(whole milk),0.193493,0.255516,0.074835,0.386758,1.513634,0.025394,1.214013
2,(whole milk),(rolls/buns),0.255516,0.183935,0.056634,0.221647,1.205032,0.009636,1.048452
3,(rolls/buns),(whole milk),0.183935,0.255516,0.056634,0.307905,1.205032,0.009636,1.075696
4,(whole milk),(yogurt),0.255516,0.139502,0.056024,0.219260,1.571735,0.020379,1.102157
...,...,...,...,...,...,...,...,...,...
129,(frozen vegetables),(whole milk),0.048094,0.255516,0.020437,0.424947,1.663094,0.008149,1.294636
130,(tropical fruit),(pip fruit),0.104931,0.075648,0.020437,0.194767,2.574648,0.012499,1.147931
131,(pip fruit),(tropical fruit),0.075648,0.104931,0.020437,0.270161,2.574648,0.012499,1.226392
132,(other vegetables),(butter),0.193493,0.055414,0.020031,0.103521,1.868122,0.009308,1.053661


In [16]:
# sorting the rules in the descending order by confidence
rules.sort_values(by='confidence', ascending=False)[0:10]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
98,"(other vegetables, yogurt)",(whole milk),0.043416,0.255516,0.022267,0.512881,2.007235,0.011174,1.52834
51,(butter),(whole milk),0.055414,0.255516,0.027555,0.497248,1.946053,0.013395,1.480817
60,(curd),(whole milk),0.053279,0.255516,0.026131,0.490458,1.919481,0.012517,1.461085
88,"(other vegetables, root vegetables)",(whole milk),0.047382,0.255516,0.023183,0.48927,1.914833,0.011076,1.457687
87,"(whole milk, root vegetables)",(other vegetables),0.048907,0.193493,0.023183,0.474012,2.44977,0.013719,1.53332
39,(domestic eggs),(whole milk),0.063447,0.255516,0.029995,0.472756,1.850203,0.013783,1.41203
30,(whipped/sour cream),(whole milk),0.071683,0.255516,0.032232,0.449645,1.759754,0.013916,1.352735
7,(root vegetables),(whole milk),0.108998,0.255516,0.048907,0.448694,1.756031,0.021056,1.350401
9,(root vegetables),(other vegetables),0.108998,0.193493,0.047382,0.434701,2.246605,0.026291,1.426693
129,(frozen vegetables),(whole milk),0.048094,0.255516,0.020437,0.424947,1.663094,0.008149,1.294636


**6)Find association rules with minimum support of 2% and lift of more than 1.0.**

In [17]:
# finding association rules with minimum support of 2% and having lift more than 1
rules[(rules['support'] >= 0.02) &
      (rules['lift'] > 1.0)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(whole milk),(other vegetables),0.255516,0.193493,0.074835,0.292877,1.513634,0.025394,1.140548
1,(other vegetables),(whole milk),0.193493,0.255516,0.074835,0.386758,1.513634,0.025394,1.214013
2,(whole milk),(rolls/buns),0.255516,0.183935,0.056634,0.221647,1.205032,0.009636,1.048452
3,(rolls/buns),(whole milk),0.183935,0.255516,0.056634,0.307905,1.205032,0.009636,1.075696
4,(whole milk),(yogurt),0.255516,0.139502,0.056024,0.219260,1.571735,0.020379,1.102157
...,...,...,...,...,...,...,...,...,...
129,(frozen vegetables),(whole milk),0.048094,0.255516,0.020437,0.424947,1.663094,0.008149,1.294636
130,(tropical fruit),(pip fruit),0.104931,0.075648,0.020437,0.194767,2.574648,0.012499,1.147931
131,(pip fruit),(tropical fruit),0.075648,0.104931,0.020437,0.270161,2.574648,0.012499,1.226392
132,(other vegetables),(butter),0.193493,0.055414,0.020031,0.103521,1.868122,0.009308,1.053661
