# ****How to solve the Apriori algorithm in a simple way from scratch?****

****Note: All the contents of the images, including tables and calculations and codes have been investigated by me and there is no need to refer any references for them.****

![](https://miro.medium.com/max/1400/1*jAv5qpTW930FA629StRcUA.jpeg)

#  ****Introduction****

****There are several methods for machine learning such as association, correlation, classification & clustering, this tutorial primarily focuses on learning using association rules. By association rules, we identify the set of items or attributes that occur together in a table[1].****

# ****Association Rule Learning****

****The association rule learning is one of the very important concepts of machine learning,and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. Here market basket analysis is a technique used by the various big retailer to discover the associations between items.
We can understand it by taking an example of a supermarket, as in a supermarket,
all products that are purchased together are put together[2].****

 ****Association rule learning can be divided into three types of algorithms[2]:****

****1.     Apriori****

****2.     Eclat****

****3.     F-P Growth Algorithm****

# **** Introduction to APRIORI****

****Apriori is an algorithm used for Association Rule learning. It searches for a series of frequent sets of items in the datasets. It builds on associations and correlations between the itemsets. It is the algorithm behind “You may also like” that you commonly saw in recommendation platforms[3].****

![](https://miro.medium.com/max/1400/1*ivVrrcaltUDoVzWdw0748A.jpeg)


# ****What is an Apriori algorithm?****

****Apriori algorithm assumes that any subset of a frequent itemset must be frequent. Say, a transaction containing {milk, eggs, bread} also contains {eggs, bread}. So, according to the principle of Apriori, if {milk, eggs, bread} is frequent, then {eggs, bread} must also be frequent [4].****

# ****How Does the Apriori Algorithm Work?****

****In order to select the interesting rules out of multiple possible rules from this small business scenario, we will be using the following measures[4]:****

*****     Support****

*****     Confidence****

*****     Lift****

*****     Conviction****

![](https://miro.medium.com/max/1102/1*xMlBaEFymfCS7JtFDg3svQ.png)

# ****Support****

****Support of item x is nothing but the ratio of the number of transactions in which item x appears to the total number of transactions.****

# ****Confidence****

****Confidence (x => y) signifies the likelihood of the item y being purchased when item x is purchased. This method takes into account the popularity of item x.****

# ****Lift****

****Lift (x => y) is nothing but the ‘interestingness’ or the likelihood of the item y being purchased when item x is sold. Unlike confidence (x => y), this method takes into account the popularity of the item y.****

* ****Lift (x => y) = 1**** means that there is no correlation within the itemset.
* ****Lift (x => y) > 1**** means that there is a positive correlation within the itemset, i.e.,   products in the itemset, x and y, are more likely to be bought together.
* ****Lift (x => y) < 1**** means that there is a negative correlation within the itemset, i.e.,   products in itemset, x and y, are unlikely to be bought together.

# ****Conviction****

****Conviction of a rule can be defined as follows:**** 

![](https://miro.medium.com/max/218/0*BirEb6MzYN2WP8-i.png)

****Its value range is [0, +∞].****

*****     Conv(x => y) = 1 means that x has no relation with y.****

*****     Greater the conviction higher the interest in the rule.****

![](https://miro.medium.com/max/1400/1*3eQ7CesIRbMiZ2JePUdpjA.png)

# ****Now, we want to solve a problem of the Apriori algorithm in a simple way:****

****Part(a): Apply the Apriori algorithm to the following data set:****

![](https://miro.medium.com/max/1400/1*2_wzA0N-ytk6ya9R9nNi3A.jpeg)

 ****Step-1:****

In the first step, we index the data and then calculate the support for each one, if support was less than the minimum value we eliminate that from the table.

![](https://miro.medium.com/max/1400/1*7sZy0tdalHYz3AXYDqfMfA.png)

****Step-2:****

Calculate the support for each one

![](https://miro.medium.com/max/1400/1*xiO6tZqDLeHOc1ED5XyMTg.png)

****Step-3:****

Continue to calculate the support and select the best answer

![](https://miro.medium.com/max/1400/1*BhKcwrfIPAlvsVIpXUYiJQ.jpeg)

****Part(b): Show two rules that have a confidence of 70% or greater for an itemset containing three items from part a.****

# ****Step-1:****

****Calculate the confidence and follow the rules of question in part(b)****

![](https://miro.medium.com/max/1400/1*hr34WAD1r3QMyEdJt31Mdg.jpeg)

# ****Step-2:****

****In addition to the above rules, the following can also be considered, but in the question only two rules are required for calculation.****

![](https://miro.medium.com/max/1400/1*spVbFUgjYhFu_aS9UHoSRw.png)

# ****Hands-on: Apriori Algorithm in Python- Market Basket Analysis****

# ****Problem Statement:****

****For the implementation of the Apriori algorithm, we are using data collected from a SuperMarket, where each row indicates all the items purchased in a particular transaction.****

> ****The manager of a retail store is trying to find out an association rule between items, to figure out which items are more often bought together so that he can keep the items together in order to increase sales.
The dataset has 7,500 entries. Drive link to download dataset[4][6].****

# ****Environment Setup:****

****Before we move forward, we need to install the ‘apyori’ package first on command prompt.****

![](https://miro.medium.com/max/764/1*BuwVTV6dcOW1PuzKiriuJA.jpeg)

# ****Market Basket Analysis Implementation within Python****

****With the help of the apyori package, we will be implementing the Apriori algorithm in order to help the manager in market basket analysis [4].****

![](https://miro.medium.com/max/1400/1*uUTyXbR7RgRsfoJ4LBmP1A.png)

# ****Step-1: We import the necessary libraries required for the implementation****

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# ****Step-2: Load the dataset****

****Now we have to proceed by reading the dataset we have, that is in a csv format. We do that using pandas module’s read_csv function [6].****

In [2]:
dataset = pd.read_csv("../input/market-basket/Market_Basket_Optimisation.csv", header = None)
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

# ****Step-3: Take a glance at the records****

In [3]:
dataset

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


# ****Step-4: Look at the shape****

In [4]:
dataset.shape

(7501, 20)

# ****Step-5: Convert Pandas DataFrame into a list of lists****

In [5]:
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

In [6]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l- done
[?25hBuilding wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l- \ done
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=ded9511119ee58b9d5e1562b3a7db41ac2d1cd3913b6e7ae7d7039e68b7465aa
  Stored in directory: /root/.cache/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2
[0m

# ****Step-6: Build the Apriori model****

****We import the apriori function from the apyori module. We store the resulting         output from apriori function in the ‘rules’ variable. To the apriori function, 
we pass 6 parameters:****

****1.      The transactions List as the main inputs****

****2.      Minimum support, which we set as 0.003 We get that value by considering that a product should appear at least in 3 transactions in a day. Our data is collected over a week. Hence, the support value should be 3*7/7500 = 0.0028****

****3.       Minimum confidence, which we choose to be 0.2 (obtained over-analyzing various results)****

****4.      Minimum lift, which we’ve set to 3****

****5.     Minimum Length is set to 2, as we are calculating the lift values for buying an item B given another item A is bought, so we take 2 items into consideration.****

****6.      Minimum Length is set to 2 using the same logic[6].****

In [7]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_cinfidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

# ****Step-7: Print out the number of rules as list****

In [8]:
results = list(rules)

# ****Step-8: Have a glance at the rules****

In [9]:
results

[RelationRecord(items=frozenset({'brownies', 'cottage cheese'}), support=0.0034662045060658577, ordered_statistics=[OrderedStatistic(items_base=frozenset({'brownies'}), items_add=frozenset({'cottage cheese'}), confidence=0.10276679841897232, lift=3.225329518580382), OrderedStatistic(items_base=frozenset({'cottage cheese'}), items_add=frozenset({'brownies'}), confidence=0.10878661087866107, lift=3.2253295185803816)]),
 RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'chicken'}), items_add=frozenset({'light cream'}), confidence=0.07555555555555556, lift=4.843950617283951), OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'escalope'}),

* ![](https://miro.medium.com/max/1400/1*0xyxcrJhOTncryWyJ7BJyQ.png)

# ****Step-9: Visualizing the results****

****In the LHS variable, we store the first item from all the results, from which we obtain the second item that is bought after that item is already bought, which is now stored in the RHS variable.
The supports, confidences and lifts store all the support, confidence and lift values from the results [6].****

In [10]:
def inspect(results):
    lhs         =[tuple(result[2][0][0])[0] for result in results]
    rhs         =[tuple(result[2][0][1])[0] for result in results]
    supports    =[result[1] for result in results]
    confidences =[result[2][0][2] for result in results]
    lifts        =[result[2][0][3] for result in results]
    return list (zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ["Left hand side", "Right hand side", "Support", "Confidence", "Lift"])

****Finally, we store these variables into one dataframe, so that they are easier to visualize.****

In [11]:
resultsinDataFrame

Unnamed: 0,Left hand side,Right hand side,Support,Confidence,Lift
0,brownies,cottage cheese,0.003466,0.102767,3.22533
1,chicken,light cream,0.004533,0.075556,4.843951
2,escalope,mushroom cream sauce,0.005733,0.072269,3.790833
3,escalope,pasta,0.005866,0.07395,4.700812
4,fresh bread,tomato juice,0.004266,0.099071,3.259356
5,fresh tuna,honey,0.003999,0.179641,3.78507
6,fromage blanc,honey,0.003333,0.245098,5.164271
7,ground beef,herb & pepper,0.015998,0.162822,3.291994
8,ground beef,tomato sauce,0.005333,0.054274,3.840659
9,light cream,olive oil,0.0032,0.205128,3.11471


# ****Now, we sort these final outputs in the descending order of lifts.****

In [12]:
resultsinDataFrame.nlargest(n = 10, columns = "Lift")

Unnamed: 0,Left hand side,Right hand side,Support,Confidence,Lift
6,fromage blanc,honey,0.003333,0.245098,5.164271
1,chicken,light cream,0.004533,0.075556,4.843951
3,escalope,pasta,0.005866,0.07395,4.700812
11,pasta,shrimp,0.005066,0.322034,4.506672
10,olive oil,whole wheat pasta,0.007999,0.121457,4.12241
8,ground beef,tomato sauce,0.005333,0.054274,3.840659
2,escalope,mushroom cream sauce,0.005733,0.072269,3.790833
5,fresh tuna,honey,0.003999,0.179641,3.78507
7,ground beef,herb & pepper,0.015998,0.162822,3.291994
4,fresh bread,tomato juice,0.004266,0.099071,3.259356


****This is the final result of our apriori implementation in python. The SuperMarket will use this data to boost their sales and prioritize giving offers on the pair of items with greater Lift values [6].****

# ****Why Apriori?****

****1.   It is an easy-to-implement and easy-to-understand algorithm.****
   
****2.   It can be easily implemented on large datasets.****

# ****Limitations of Apriori Algorithm****

****Despite being a simple one, Apriori algorithms have some limitations including:****

*   ****Waste of time when it comes to handling a large number of candidates with frequent itemsets.****

*    ****The efficiency of this algorithm goes down when there is a large number of transactions going on through a limited memory capacity.****

*     ****Required high computation power and need to scan the entire database[4].****

# ****Summary****

![](https://miro.medium.com/max/1400/1*MIXTYPvJwc7uU01LR397jQ.png)

****Association rule learning is a type of unsupervised learning technique that checks for the dependency of one data item on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations or associations among the variables of the dataset. It is based on different rules to discover the interesting relations between variables in the database. The flowchart above will help summarize the entire working of the algorithm[2].****

References:

[1] https://www.softwaretestinghelp.com/apriori-algorithm/

[2] https://www.javatpoint.com/association-rule-learning

[3] https://towardsdatascience.com/underrated-machine-learning- algorithms-apriori-1b1d7a8b7bc

[4] https://intellipaat.com/blog/data-science-apriori-algorithm/

[5] Patterns of user involvement in experiment-driven software development, authors.(S. Yaman), (F. Fagerholm), (M. Munezero), (T.Männistö).December 2019, https://www.journals.elsevier.com/information-and-software-technology

[6] https://djinit-ai.github.io/2020/09/22/apriori-algorithm.html#understanding-our-used-case

[7] https://www.datacamp.com/tutorial/market-basket-analysis-r

[8] https://www.researchgate.net/figure/Flowchart-of-Apriori-algorithm_fig2_351361530https://www.researchgate.net/figure/Flowchart-of-Apriori-algorithm_fig2_351361530

 # ****Writer: [Parisan Ahmadi](https://www.kaggle.com/parisanahmadi)****

****Contact: [Medium](https://medium.com/@parisan.ahmadi), [Github](https://github.com/parisa-ahmadi)**** , ****[Parisan's Published Article in LevelUpCoding Journal](https://levelup.gitconnected.com/how-to-solve-the-apriori-algorithm-in-a-simple-way-from-scratch-9540cfc5c11a)****