In [1]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5979 sha256=064a0e0b257ab988d7b43b77f6624ff72bc8f8ff28e92b7175e4d3055652542d
  Stored in directory: c:\users\surface\appdata\local\pip\cache\wheels\1b\02\6c\a45230be8603bd95c0a51cd2b289aefdd860c1a100eab73661
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


# Theory of Apriori Algorithm

## There are three major components of Apriori algorithm:

    Support
    Confidence
    Lift


# Support

Support refers to the default popularity of an item and can be calculated by finding number of transactions containing a particular item divided by total number of transactions. Suppose we want to find support for item B. This can be calculated as:

    Support(B) = (Transactions containing (B))/(Total Transactions)

For instance if out of 1000 transactions, 100 transactions contain Ketchup then the support for item Ketchup can be calculated as:

    Support(Ketchup) = (Transactions containingKetchup)/(Total Transactions)

    Support(Ketchup) = 100/1000
                 = 10%


# Confidence

Confidence refers to the likelihood that an item B is also bought if item A is bought. It can be calculated by finding the number of transactions where A and B are bought together, divided by total number of transactions where A is bought. Mathematically, it can be represented as:

    Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A)

Coming back to our problem, we had 50 transactions where Burger and Ketchup were bought together. While in 150 transactions, burgers are bought. Then we can find likelihood of buying ketchup when a burger is bought can be represented as confidence of Burger -> Ketchup and can be mathematically written as:

    Confidence(Burger→Ketchup) = (Transactions containing both (Burger and Ketchup))/(Transactions containing A)

    Confidence(Burger→Ketchup) = 50/150
                           = 33.3%


# Lift

Lift(A -> B) refers to the increase in the ratio of sale of B when A is sold. Lift(A –> B) can be calculated by dividing Confidence(A -> B) divided by Support(B). Mathematically it can be represented as:

    Lift(A→B) = (Confidence (A→B))/(Support (B))

Coming back to our Burger and Ketchup problem, the Lift(Burger -> Ketchup) can be calculated as:

    Lift(Burger→Ketchup) = (Confidence (Burger→Ketchup))/(Support (Ketchup))

    Lift(Burger→Ketchup) = 33.3/10
                     = 3.33

Lift basically tells us that the likelihood of buying a Burger and Ketchup together is 3.33 times more than the likelihood of just buying the ketchup. A Lift of 1 means there is no association between products A and B. Lift of greater than 1 means products A and B are more likely to be bought together. Finally, Lift of less than 1 refers to the case where two products are unlikely to be bought together.

In [2]:
#import library 
import numpy as np
import matplotlib.pyplot as plt
from apyori import apriori
import pandas as pd

In [3]:
#import data set
dataset = pd.read_csv("groceries - groceries.csv")

In [4]:
dataset.head()

Unnamed: 0,Item(s),Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,...,Item 23,Item 24,Item 25,Item 26,Item 27,Item 28,Item 29,Item 30,Item 31,Item 32
0,4,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,...,,,,,,,,,,
1,3,tropical fruit,yogurt,coffee,,,,,,,...,,,,,,,,,,
2,1,whole milk,,,,,,,,,...,,,,,,,,,,
3,4,pip fruit,yogurt,cream cheese,meat spreads,,,,,,...,,,,,,,,,,
4,4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,,...,,,,,,,,,,


Now we will use the Apriori algorithm to find out which items are commonly sold together, so that store owner can take action to place the related items together or advertise them together in order to have increased profit.

In [9]:
shape_data = dataset.shape

# Data Proprocessing

The Apriori library we are going to use requires our dataset to be in the form of a list of lists, where the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. Currently we have data in the form of a pandas dataframe. To convert our pandas dataframe into a list of lists, execute the following script:

In [12]:
record = []
for i in range(shape_data[0]):
    record.append([str(dataset.values[i][j]) for j in range(shape_data[1])])

# Applying Apriori

The next step is to apply the Apriori algorithm on the dataset. To do so, we can use the apriori class that we imported from the apyori library.

The apriori class requires some parameter values to work. The first parameter is the list of list that you want to extract rules from. The second parameter is the min_support parameter. This parameter is used to select the items with support values greater than the value specified by the parameter. Next, the min_confidence parameter filters those rules that have confidence greater than the confidence threshold specified by the parameter. Similarly, the min_lift parameter specifies the minimum lift value for the short listed rules. Finally, the min_length parameter specifies the minimum number of items that you want in your rules.

Let's suppose that we want rules for only those items that are purchased at least 5 times a day, or 7 x 5 = 35 times in one week, since our dataset is for a one-week time period. The support for those items can be calculated as 35/7500 = 0.0045. The minimum confidence for the rules is 20% or 0.2. Similarly, we specify the value for lift as 3 and finally min_length is 2 since we want at least two products in our rules. These values are mostly just arbitrarily chosen, so you can play with these values and see what difference it makes in the rules you get back out.

Execute the following script:

In [34]:
#here I useing min_susport as .02 , min_length =2 and othere parameters are sated as before
association_rules = apriori(record, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

In [35]:
print(len(association_results))

129


In [36]:
print(association_results[0])

RelationRecord(items=frozenset({'10', 'domestic eggs'}), support=0.005083884087442806, ordered_statistics=[OrderedStatistic(items_base=frozenset({'10'}), items_add=frozenset({'domestic eggs'}), confidence=0.2032520325203252, lift=3.2034995830727535)])


For instance from the first item, we can see that light cream and chicken are commonly bought together. This makes sense since people who purchase light cream are careful about what they eat hence they are more likely to buy chicken i.e. white meat instead of red meat i.e. beef. Or this could mean that light cream is commonly used in recipes for chicken.

The support value for the first rule is 0.0045. This number is calculated by dividing the number of transactions containing light cream divided by total number of transactions. The confidence level for the rule is 0.2032 which shows that out of all the transactions that contain light cream, 20.32% of the transactions also contain chicken. Finally, the lift of 3.2 tells us that chicken is 3.2 times more likely to be bought by the customers who buy light cream compared to the default likelihood of the sale of chicken.

In [40]:
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: 10 -> domestic eggs
Support: 0.005083884087442806
Confidence: 0.2032520325203252
Lift: 3.2034995830727535
Rule: 12 -> other vegetables
Support: 0.0075241484494153535
Confidence: 0.6324786324786325
Lift: 3.2687479508288755
Rule: 13 -> other vegetables
Support: 0.004778851042196238
Confidence: 0.6025641025641024
Lift: 3.1141450072085903
Rule: other vegetables -> 14
Support: 0.00498220640569395
Confidence: 0.6363636363636364
Lift: 3.288826255195146
Rule: baking powder -> whipped/sour cream
Support: 0.004575495678698526
Confidence: 0.25862068965517243
Lift: 3.607850330154072
Rule: root vegetables -> beef
Support: 0.017386883579054397
Confidence: 0.3313953488372093
Lift: 3.0403668431100312
Rule: berries -> whipped/sour cream
Support: 0.009049313675648195
Confidence: 0.27217125382262997
Lift: 3.796885505454703
Rule: bottled beer -> liquor
Support: 0.004677173360447382
Confidence: 0.4220183486238532
Lift: 5.240594013529793
Rule: bottled beer -> red/blush wine
Support: 0.0048805287239450