# **Market Basket Analysis in Python using Apriori Algorithm**

Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.
The given three components comprise the apriori algorithm.
1. Support
2. Confidence
3. Lift
The Apriori Algorithm makes the given assumptions:
1. All subsets of a frequent itemset must be frequent.
2. The subsets of an infrequent item set must be infrequent.
3. Fix a threshold support level. In our case, we have fixed it at 50 percent.

In [1]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5956 sha256=903681239ce9f8f45f2fce17e29c0f4446d0b2a66ee9fb77a7c34a018faae94a
  Stored in directory: /root/.cache/pip/wheels/c4/1a/79/20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [2]:
#import all required packages..
import pandas as pd
import numpy as np
from apyori import apriori

In [35]:
data = pd.read_csv('/content/Market_Basket_Optimisation.csv',header=None)

In [36]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [37]:
data.fillna(0,inplace=True)

In [38]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,chutney,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,turkey,avocado,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,mineral water,milk,energy bar,whole wheat rice,green tea,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


for using aprori , need to convert data in list format:
1. transaction = [['apple','almonds'],['apple'],['banana','apple']]

In [39]:
# Data Pre-processing step
transactions = []

for i in range(0,len(data)):
    transactions.append([str(data.values[i,j]) for j in range(0,20) if str(data.values[i,j])!='0'])


In [40]:
transactions[0]

['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

In [41]:
transactions[1]

['burgers', 'meatballs', 'eggs']

Call apriori function which requires minimum support, confidance and lift, min length is combination of item default is 2".

In [43]:
rules = apriori(transactions, min_support=0.003, min_confidance=0.2, min_lift=3, min_length=2)

In [44]:
rules

<generator object apriori at 0x7fe75d133220>

In [45]:
Results = list(rules)
Results

[RelationRecord(items=frozenset({'cottage cheese', 'brownies'}), support=0.0034662045060658577, ordered_statistics=[OrderedStatistic(items_base=frozenset({'brownies'}), items_add=frozenset({'cottage cheese'}), confidence=0.10276679841897232, lift=3.225329518580382), OrderedStatistic(items_base=frozenset({'cottage cheese'}), items_add=frozenset({'brownies'}), confidence=0.10878661087866107, lift=3.2253295185803816)]),
 RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'chicken'}), items_add=frozenset({'light cream'}), confidence=0.07555555555555556, lift=4.843950617283951), OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'escalope'}),

In [47]:
df_results = pd.DataFrame(Results)

df_results.head()

Unnamed: 0,items,support,ordered_statistics
0,"(cottage cheese, brownies)",0.003466,"[((brownies), (cottage cheese), 0.102766798418..."
1,"(chicken, light cream)",0.004533,"[((chicken), (light cream), 0.0755555555555555..."
2,"(escalope, mushroom cream sauce)",0.005733,"[((escalope), (mushroom cream sauce), 0.072268..."
3,"(pasta, escalope)",0.005866,"[((escalope), (pasta), 0.07394957983193277, 4...."
4,"(fresh bread, tomato juice)",0.004266,"[((fresh bread), (tomato juice), 0.09907120743..."


In [48]:
support = df_results.support

In [49]:
#all four empty list which will contain lhs, rhs, confidance and lift respectively.
first_values = []
second_values = []
third_values = []
fourth_value = []

# loop number of rows time and append 1 by 1 value in a separate list..
# first and second element was frozenset which need to be converted in list..
for i in range(df_results.shape[0]):
    single_list = df_results['ordered_statistics'][i][0]
    first_values.append(list(single_list[0]))
    second_values.append(list(single_list[1]))
    third_values.append(single_list[2])
    fourth_value.append(single_list[3])

In [50]:
# convert all four list into dataframe for further operation..
lhs = pd.DataFrame(first_values)
rhs = pd.DataFrame(second_values)

confidance=pd.DataFrame(third_values,columns=['Confidance'])

lift=pd.DataFrame(fourth_value,columns=['lift'])

In [51]:
df_final = pd.concat([lhs,rhs,support,confidance,lift], axis=1)
df_final

Unnamed: 0,0,1,0.1,1.1,2,support,Confidance,lift
0,brownies,,cottage cheese,,,0.003466,0.102767,3.225330
1,chicken,,light cream,,,0.004533,0.075556,4.843951
2,escalope,,mushroom cream sauce,,,0.005733,0.072269,3.790833
3,escalope,,pasta,,,0.005866,0.073950,4.700812
4,fresh bread,,tomato juice,,,0.004266,0.099071,3.259356
...,...,...,...,...,...,...,...,...
89,ground beef,pancakes,spaghetti,mineral water,,0.003066,0.211009,3.532991
90,ground beef,,spaghetti,tomatoes,mineral water,0.003066,0.031208,3.344117
91,olive oil,,milk,spaghetti,mineral water,0.003333,0.050607,3.216994
92,milk,mineral water,shrimp,spaghetti,,0.003066,0.063889,3.014029


In [52]:
df_final.fillna(value=' ', inplace=True)
df_final.head()

Unnamed: 0,0,1,0.1,1.1,2,support,Confidance,lift
0,brownies,,cottage cheese,,,0.003466,0.102767,3.22533
1,chicken,,light cream,,,0.004533,0.075556,4.843951
2,escalope,,mushroom cream sauce,,,0.005733,0.072269,3.790833
3,escalope,,pasta,,,0.005866,0.07395,4.700812
4,fresh bread,,tomato juice,,,0.004266,0.099071,3.259356


In [53]:
df_final.columns = ['lhs',1,'rhs',2,3,'support','confidance','lift']
df_final.head()

Unnamed: 0,lhs,1,rhs,2,3,support,confidance,lift
0,brownies,,cottage cheese,,,0.003466,0.102767,3.22533
1,chicken,,light cream,,,0.004533,0.075556,4.843951
2,escalope,,mushroom cream sauce,,,0.005733,0.072269,3.790833
3,escalope,,pasta,,,0.005866,0.07395,4.700812
4,fresh bread,,tomato juice,,,0.004266,0.099071,3.259356


In [54]:
df_final['lhs'] = df_final['lhs'] + str(", ") + df_final[1]

df_final['rhs'] = df_final['rhs']+str(", ")+df_final[2] + str(", ") + df_final[3]
df_final.head()

Unnamed: 0,lhs,1,rhs,2,3,support,confidance,lift
0,"brownies,",,"cottage cheese, ,",,,0.003466,0.102767,3.22533
1,"chicken,",,"light cream, ,",,,0.004533,0.075556,4.843951
2,"escalope,",,"mushroom cream sauce, ,",,,0.005733,0.072269,3.790833
3,"escalope,",,"pasta, ,",,,0.005866,0.07395,4.700812
4,"fresh bread,",,"tomato juice, ,",,,0.004266,0.099071,3.259356


In [55]:
df_final.drop(columns=[1,2,3],inplace=True)
#this is final output. You can sort based on the support lift and confidance..
df_final.head()

Unnamed: 0,lhs,rhs,support,confidance,lift
0,"brownies,","cottage cheese, ,",0.003466,0.102767,3.22533
1,"chicken,","light cream, ,",0.004533,0.075556,4.843951
2,"escalope,","mushroom cream sauce, ,",0.005733,0.072269,3.790833
3,"escalope,","pasta, ,",0.005866,0.07395,4.700812
4,"fresh bread,","tomato juice, ,",0.004266,0.099071,3.259356


In [56]:
df_final.sort_values('lift', ascending=False).head(10)

Unnamed: 0,lhs,rhs,support,confidance,lift
58,"olive oil,","whole wheat pasta, mineral water,",0.003866,0.058704,6.115863
6,"fromage blanc,","honey, ,",0.003333,0.245098,5.164271
49,"ground beef,","tomato sauce, spaghetti,",0.003066,0.031208,4.9806
1,"chicken,","light cream, ,",0.004533,0.075556,4.843951
3,"escalope,","pasta, ,",0.005866,0.07395,4.700812
28,"ground beef,","french fries, herb & pepper,",0.0032,0.032564,4.697422
11,"pasta,","shrimp, ,",0.005066,0.322034,4.506672
23,"ground beef,","chocolate, herb & pepper,",0.003999,0.040706,4.490183
69,"frozen vegetables,","chocolate, shrimp, mineral water",0.0032,0.033566,4.417225
10,"olive oil,","whole wheat pasta, ,",0.007999,0.121457,4.12241
