# Apriori


**What is Apriori?**

- Apriori is an algorithm for frequent item set mining and association rule learning over relational databases.
- It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.

**Components of Apriori**

- Support

  - Support refers to the default popularity of an item and can be calculated by finding number of transactions containing a particular item divided by total number of transactions.
  - $$Support(X) = \frac{Transactions\ with\ X}{Total\ Transactions}$$

- Confidence

  - Confidence refers to the likelihood that an item B is also bought if item A is bought.
  - It can be calculated by finding the number of transactions where A and B are bought together, divided by total number of transactions where A is bought.
  - $$Confidence(A \rightarrow B) = \frac{Transactions\ with\ A\ and\ B}{Transactions\ with\ A}$$

- Lift
  - Lift refers to the increase in the ratio of sale of B when A is sold.
  - $$Lift(A \rightarrow B) = \frac{Confidence(A \rightarrow B)}{Support(B)}$$
  - It means that the likelihood of buying both A and B together is $x$ times more than the likelihood of just buying B.

**Steps to perform Apriori**

- Set a minimum support and confidence.
- Take all the subsets in transactions having higher support than minimum threshold(support).
- Take all the rules of these subsets having higher confidence than minimum threshold(confidence).
- Sort the rules by decreasing lift.


## Importing the libraries


In [3]:
%pip install apyori

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [4]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


## Data Preprocessing


In [5]:
dataset = pd.read_csv(
    "Market_Basket_Optimisation.csv", header=None
)  # header=None to avoid first row as header, otherwise it will be skipped

transactions = []

# apyori library expects a list of lists as input
# each list is a list of products bought in a single transaction
# each product is a string


for i in range(0, 7501):
    transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])


## Training the Apriori model on the dataset


In [6]:
from apyori import apriori

# apiori function will train the apriori model on the dataset and return the rules
rules = apriori(
    transactions=transactions,
    min_support=0.003,  # 3*7/7500 = 0.0028, 3 times a day, 7 days a week, 7500 transactions
    min_confidence=0.2,  # 20% confidence
    min_lift=3,  # min lift of 3
    min_length=2,
    max_length=2,
)


## Visualising the results


### Displaying the first results coming directly from the output of the apriori function


In [7]:
results = list(rules)


In [8]:
results


[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame


In [9]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))


resultsinDataFrame = pd.DataFrame(
    inspect(results),
    columns=["Left Hand Side", "Right Hand Side", "Support", "Confidence", "Lift"],
)


### Displaying the results non sorted


In [10]:
resultsinDataFrame


Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts


In [11]:
resultsinDataFrame.nlargest(n=10, columns="Lift")


Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
