# The Apriori Algorithm

[Prashant Brahmbhatt](https://www.github.com/hashbanger)

___

The Apriori algorithm states that  
***"if an itemset is infrequent, then all its supersets must also be infrequent"***

This means that if {beer} was found to be infrequent, we can expect {beer, pizza} to be equally or even more infrequent. So in consolidating the list of popular itemsets, we need not consider {beer, pizza}, nor any other itemset configuration that contains beer.

### The Imports

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Getting the data

In [9]:
dataset = pd.read_csv("Market_Basket_Optimisation.csv", header = None)

In [10]:
dataset.head(20)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
5,low fat yogurt,,,,,,,,,,,,,,,,,,,
6,whole wheat pasta,french fries,,,,,,,,,,,,,,,,,,
7,soup,light cream,shallot,,,,,,,,,,,,,,,,,
8,frozen vegetables,spaghetti,green tea,,,,,,,,,,,,,,,,,
9,french fries,,,,,,,,,,,,,,,,,,,


![img](ap_overview.jpg)

In [11]:
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i, j]) for j in range(0,20)])

In [12]:
#Training the Apriori Model 
from apyori import apriori
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length= 2)

___

![img2](formulas.jpg)

**Support** of an item signifies the popularity of an item in an itemset.   

**Confidence** signifies the likelyhood of items Y being purchased if item X is purchased.  
It is similar to the *Conditional Probability* 
$$P(Y|X)$$ 
where we calculate the probability of event Y occuring given event X.  
This has a major drawback as it considers the popularity of only X and not of Y. If Y is equally probable as X then there's a higher probability that these two go together. To overcome this we use another measure called...  

**Lift** signifies the likelihood of an item Y being purchased when X is purchased taking into account the popularity of item Y as well.  
The value of lift > 1 signifies higher probability of X and Y being bought together while < 1 signifies the opposite.

___

### Getting the results

In [13]:
results = list(rules)

#### Visualizing for only 20 rules

In [17]:
for i in range(1,20):
    print('Rules: ',results[i][0])
    print('Support: ',results[0][1])
    print('Confidence: ',results[0][2][0][2])
    print('Lift: ',results[0][2][0][3])
    print()

Rules:  frozenset({'mushroom cream sauce', 'escalope'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'pasta', 'escalope'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'fromage blanc', 'honey'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'ground beef', 'herb & pepper'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'ground beef', 'tomato sauce'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'light cream', 'olive oil'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'olive oil', 'whole wheat pasta'})
Support:  0.004532728969470737
Confidence:  0.29059829059829057
Lift:  4.84395061728395

Rules:  frozenset({'shrimp', 

As we can see we have got plenty of rules from the dataset. **frozenset** is a python datastructure only there to signifiy the immutability of the set.

### de nada!