<a href="https://colab.research.google.com/github/sweyy/sweyy/blob/main/MARKET_APRIORI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MARKET BASKET ANALYSIS USING APRIORI ALGORITHM

# 1)Import the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#  2)Load the dataset

In [2]:
dataset = pd.read_csv("/content/Market_Basket_Optimisation.csv", header = None)
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

# 3) Take a glance at the records

In [3]:
dataset

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


# 4) Look at the shape

In [5]:
dataset.shape

(7501, 20)

# 5) Convert Pandas DataFrame into a list of lists

In [6]:
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

# INSTALL APYORI

In [7]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5955 sha256=66f2f6af04760573ac63a83e402c03c6db9294ee6dcaffbc84c33296af62df6e
  Stored in directory: /root/.cache/pip/wheels/c4/1a/79/20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


# 6) Build the Apriori model

We import the apriori function from the apyori module. We store the resulting         output from apriori function in the ‘rules’ variable. To the apriori function,
we pass 6 parameters:

1.      The transactions List as the main inputs

2.      Minimum support, which we set as 0.003 We get that value by considering that a product should appear at least in 3 transactions in a day. Our data is collected over a week. Hence, the support value should be 3*7/7500 = 0.0028

3.       Minimum confidence, which we choose to be 0.2 (obtained over-analyzing various results)

4.      Minimum lift, which we’ve set to 3

5.     Minimum Length is set to 2, as we are calculating the lift values for buying an item B given another item A is bought, so we take 2 items into consideration.

6.      Minimum Length is set to 2 using the same logic[6].

In [8]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_cinfidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

# 7) Print out the number of rules as list

In [9]:
results = list(rules)

# 8) Have a glance at the rules

In [10]:
results

[RelationRecord(items=frozenset({'cottage cheese', 'brownies'}), support=0.0034662045060658577, ordered_statistics=[OrderedStatistic(items_base=frozenset({'brownies'}), items_add=frozenset({'cottage cheese'}), confidence=0.10276679841897232, lift=3.225329518580382), OrderedStatistic(items_base=frozenset({'cottage cheese'}), items_add=frozenset({'brownies'}), confidence=0.10878661087866107, lift=3.2253295185803816)]),
 RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'chicken'}), items_add=frozenset({'light cream'}), confidence=0.07555555555555556, lift=4.843950617283951), OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'escalope'}),

* ![](https://miro.medium.com/max/1400/1*0xyxcrJhOTncryWyJ7BJyQ.png)

# 9) Visualizing the results

In the LHS variable, we store the first item from all the results, from which we obtain the second item that is bought after that item is already bought, which is now stored in the RHS variable.
The supports, confidences and lifts store all the support, confidence and lift values from the results [6].

In [12]:
def inspect(results):
    lhs         =[tuple(result[2][0][0])[0] for result in results]
    rhs         =[tuple(result[2][0][1])[0] for result in results]
    supports    =[result[1] for result in results]
    confidences =[result[2][0][2] for result in results]
    lifts        =[result[2][0][3] for result in results]
    return list (zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ["Left hand side", "Right hand side", "Support", "Confidence", "Lift"])

****Finally, we store these variables into one dataframe, so that they are easier to visualize.****

In [13]:
resultsinDataFrame

Unnamed: 0,Left hand side,Right hand side,Support,Confidence,Lift
0,brownies,cottage cheese,0.003466,0.102767,3.22533
1,chicken,light cream,0.004533,0.075556,4.843951
2,escalope,mushroom cream sauce,0.005733,0.072269,3.790833
3,escalope,pasta,0.005866,0.07395,4.700812
4,fresh bread,tomato juice,0.004266,0.099071,3.259356
5,fresh tuna,honey,0.003999,0.179641,3.78507
6,fromage blanc,honey,0.003333,0.245098,5.164271
7,ground beef,herb & pepper,0.015998,0.162822,3.291994
8,ground beef,tomato sauce,0.005333,0.054274,3.840659
9,light cream,olive oil,0.0032,0.205128,3.11471


# Now, we sort these final outputs in the descending order of lifts.

In [15]:
resultsinDataFrame.nlargest(n = 10, columns = "Lift")

Unnamed: 0,Left hand side,Right hand side,Support,Confidence,Lift
6,fromage blanc,honey,0.003333,0.245098,5.164271
1,chicken,light cream,0.004533,0.075556,4.843951
3,escalope,pasta,0.005866,0.07395,4.700812
11,pasta,shrimp,0.005066,0.322034,4.506672
10,olive oil,whole wheat pasta,0.007999,0.121457,4.12241
8,ground beef,tomato sauce,0.005333,0.054274,3.840659
2,escalope,mushroom cream sauce,0.005733,0.072269,3.790833
5,fresh tuna,honey,0.003999,0.179641,3.78507
7,ground beef,herb & pepper,0.015998,0.162822,3.291994
4,fresh bread,tomato juice,0.004266,0.099071,3.259356


****This is the final result of our apriori implementation in python. The SuperMarket will use this data to boost their sales and prioritize giving offers on the pair of items with greater Lift values [6].****