<a href="https://colab.research.google.com/github/kjeyaram-orcl/DataScience/blob/main/MBA_CaseStudy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

What is the Apriori Algorithm?

The Apriori algorithm is the most common technique for performing market basket analysis. It is used for association rule mining, which is a rule-based process used to identify correlations between items purchased by users.

The Apriori algorithm has three main components:
Support - Assess the overall popularity of a given product

*   Support - Assess the overall popularity of a given product
*   Confidence - Confidence tells us the likelihood of different purchase combinations
*   Lift - Lift refers to the increase in the ratio of the sale of B when you sell A

Let me explain with an example

Suppose we have a record of 1000 customers transactions and we want to find out support, confidence and lift for milk and diapers. out of 1000 transactions, 120 contains a milk and 150 contains a diaper. out of this 150 transaction where a diaper is purchased 30 contains transaction contains milk as well.

Support(diaper) = (Transactions containing (diaper))/(Total Transactions)
Support(diaper) = 150 / 1000 = 15 %

Confidence(milk → diaper) = (Transactions containing both (milk and diaper))/(Transactions containing milk)
Confidence(milk → daiper) =30 / 120 = 25 %

Lift(milk → diaper) = (Confidence (milk → diaper))/(Support (diaper))
Lift(milk → diaper) = 25 / 15 = 1.66

This means that customers are 1.66 times more likely to buy milk if you also sell diaper.



**Importing Libraries.**

In [2]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

**Reading the Dataset**

In [3]:
df = pd.read_csv('/content/Groceries_dataset.csv')
df.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


**Data Preparation for Market Basket Analysis**

In [5]:
# need to convert this data into a format that can easily be ingested into the Apriori algorithm

#To achieve this, the first group items that have the same member number and date:

df['single_transaction'] = df['Member_number'].astype(str)+'_'+df['Date'].astype(str)
df.head()

Unnamed: 0,Member_number,Date,itemDescription,single_transaction
0,1808,21-07-2015,tropical fruit,1808_21-07-2015
1,2552,05-01-2015,whole milk,2552_05-01-2015
2,2300,19-09-2015,pip fruit,2300_19-09-2015
3,1187,12-12-2015,other vegetables,1187_12-12-2015
4,3037,01-02-2015,whole milk,3037_01-02-2015


The “single_transaction” variable combines the member number, and date, and tells us the item purchased in one receipt.

In [6]:
#Now, let’s pivot this table to convert the items into columns and the transaction into rows:
df2 = pd.crosstab(df['single_transaction'], df['itemDescription'])
df2.head()

itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
single_transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000_15-03-2015,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0
1000_24-06-2014,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1000_24-07-2015,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1000_25-11-2015,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1000_27-05-2015,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The resulting table tells us how many times each item has been purchased in one transaction

In [7]:
#encoding all values in the above data frame to 0 and 1.
#This means that even if there are multiples of the same item in the same transaction, the value will be encoded to 1 since market basket analysis does not take purchase frequency into consideration.

def encode(item_freq):
    res = 0
    if item_freq > 0:
        res = 1
    return res

basket_input = df2.applymap(encode)

**Build the Apriori Algorithm for Market Basket Analysis**

In [8]:
frequent_itemsets = apriori(basket_input, min_support=0.001, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift")

rules.head()



Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(bottled water),(UHT-milk),0.060683,0.021386,0.001069,0.017621,0.823954,-0.000228,0.996168,-0.185312
1,(UHT-milk),(bottled water),0.021386,0.060683,0.001069,0.05,0.823954,-0.000228,0.988755,-0.179204
2,(other vegetables),(UHT-milk),0.122101,0.021386,0.002139,0.017515,0.818993,-0.000473,0.99606,-0.201119
3,(UHT-milk),(other vegetables),0.021386,0.122101,0.002139,0.1,0.818993,-0.000473,0.975443,-0.184234
4,(UHT-milk),(sausage),0.021386,0.060349,0.001136,0.053125,0.880298,-0.000154,0.992371,-0.121998


Here, the “antecedents” and “consequents” columns show items that are frequently purchased together.

The first row of the dataset tells us that if a person were to buy bottled water, then they are also likely to purchase UHT-milk.

In [9]:
#To get the most frequent item combinations in the entire dataset, let’s sort the dataset by support, confidence, and lift
rules.sort_values(["support", "confidence","lift"],axis = 0, ascending = False).head(8)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
622,(rolls/buns),(whole milk),0.110005,0.157923,0.013968,0.126974,0.804028,-0.003404,0.96455,-0.214986
623,(whole milk),(rolls/buns),0.157923,0.110005,0.013968,0.088447,0.804028,-0.003404,0.97635,-0.224474
694,(yogurt),(whole milk),0.085879,0.157923,0.011161,0.129961,0.82294,-0.002401,0.967861,-0.190525
695,(whole milk),(yogurt),0.157923,0.085879,0.011161,0.070673,0.82294,-0.002401,0.983638,-0.203508
551,(soda),(other vegetables),0.097106,0.122101,0.009691,0.099794,0.817302,-0.002166,0.975219,-0.198448
550,(other vegetables),(soda),0.122101,0.097106,0.009691,0.079365,0.817302,-0.002166,0.980729,-0.202951
649,(sausage),(whole milk),0.060349,0.157923,0.008955,0.148394,0.939663,-0.000575,0.988811,-0.063965
648,(whole milk),(sausage),0.157923,0.060349,0.008955,0.056708,0.939663,-0.000575,0.99614,-0.070851


Conclusion:

The resulting table shows that the four most popular product combinations that are frequently bought together are:

1.   Rolls and milk
2.   Yogurt and milk
3.   Sausages and milk
4.   Soda and vegetables