## Apriori Algorithm:  
- Used to identify frequent item sets. 
- Uses bottom-up approach, identify individual items first that fulfill a min occurence threshold. After this is add one item at a time and check if the resulting item set still meet the specified threshold.
- Algorithm stops when there are no more item left to add to meet the min. occurence threshold

### Apriori Algorithm in Action

Suppose threshold is 3:  
**Order List**:  
order 1: bread, milk, butter  
order 2: bread, banana  
order 3: bread, butter  
order 4: bread, butter, banana  
order 5: banana, milk  

**Iteration 1**:  Count the # of times items appear across all the orders     
Item           Count    
(bread)  ->      4   
(butter) ->      3   
(milk)   ->      2   
(banana) ->      2   

milk and banana appears two times each which is below threshold so these will be removed  


**Iteration 2**: Form iteration 1, build item set of two items at a time  
Itemset              Count    
(bread, butter) ->     3   

Only (bread, butter) remains and algorithm stops here since there are no more items left to add.  

## Association Rules 
Once we generated itemsets using Apriori, we can apply association rules. As our item size is having 2 items so our association rule will be of the form (A) -> (B)  
$Support \longrightarrow Confidence \longrightarrow Lift$  

### Support

It refers to percentage of orders that contains the item set. In our case, total no. of orders are 5 and (bread, butter) appear together in 3 orders:  

$$Support(bread,butter) = \frac{No. of times (bread, butter) appear together}{Total no. of orders} $$


$ Support = \frac {3}{5} = 60 \% $

### Confidence
confidence measures the percentage of times item B is purchased, provided that item A was purchased  

$$Confidence(A \longrightarrow B) = \frac{Support(A,B)}{Support(A)} $$  
Confidence:  
0 -> means B never purchased when A was purchased  
1 -> means B was always purchased when A was purchased  



$ Confidence (Bread \longrightarrow Butter) = \frac {Support(Bread \longrightarrow Butter)}{Support(Bread)} = \frac {3/5} {4/5} = 0.75 $  
$ Confidence (Butter \longrightarrow Bread) = \frac {Support(Butter \longrightarrow Bread)}{Support(Butter)} = \frac {3/5} {3/5} = 1 $  
Above it shows whenever Butter was purchased, Bread was purchased everytime. So, is this by chance or some relationship ?


### Lift
- Lift tells whether there is any relation between 2 items or they are appearing simply by chance.  
- Unlike Confidence, Lift has no direction
$$Lift(A,B) = Lift(B,A) = \frac{Support(A,B)}{Support(A)*Support(B)} $$
  
$ Lift(Bread,Butter) = Lift(Butter, Bread)= \frac{Support(Bread,Butter)}{Support(Bread)*Support(Butter)} = \frac {3/5} {4/5*3/5} = 1.25 $



* lift < 1 means that there is a negative relationship between A & B
* lift = 1 means no relationship between A & B
* lift > 1 means that there is a positive relationship between A & B

### Algorithm

$ Step 1:$ Set Min. Support and Confidence  
$ Step 2:$ Take all the subsets in the transactions having higher Support than the min. support    
$ Step 3:$ Take all the rules of these subsets having higher confidence than min. confidence    
$ Step 4:$ Sort the rules by decreasing Lift  


In [33]:
# Import the Libraries
import numpy as np
import pandas as pd
from apyori import apriori


dataset = pd.read_csv('Data/Unsupervised/Market_Basket_Optimisation.csv', header = None)
X = []
for i in range(0, 7501):
    X.append([str(dataset.values[i,j]) for j in range(0, 20)])

# Training Apriori on the dataset
asso_rules = apriori(X, min_support = 0.006, min_confidence = 0.33, min_lift = 2.5, min_length = 2)

# Storing the results
asso_rules = list(asso_rules)

print(asso_rules[0])

RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3234501347708895, lift=3.2919938411349285)])


In [35]:
for item in asso_rules:

    item_pair = []
    for x in item[0]:
        item_pair.append(x)
    print("Rule: ({0},{1})".format(item_pair[0],item_pair[1]))
    print("Support: {0}".format(item[1]))
    print("Confidence: {0}".format(item[2][0][2]))
    print("Lift: {0}".format(item[2][0][3]))
    



Rule: (ground beef->herb & pepper)
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
Rule: (whole wheat pasta->milk)
Support: 0.009865351286495135
Confidence: 0.33484162895927605
Lift: 2.5839990317114503
Rule: (tomato sauce->spaghetti)
Support: 0.006265831222503666
Confidence: 0.4433962264150943
Lift: 2.546642491837383
Rule: (chocolate->milk)
Support: 0.007998933475536596
Confidence: 0.34883720930232565
Lift: 2.6920040195234
Rule: (eggs->milk)
Support: 0.007332355685908546
Confidence: 0.3374233128834356
Lift: 2.6039220884142495
Rule: (eggs->ground beef)
Support: 0.008932142381015865
Confidence: 0.4466666666666667
Lift: 2.565426237876468
Rule: (ground beef->frozen vegetables)
Support: 0.008665511265164644
Confidence: 0.5118110236220472
Lift: 2.939582303360625
Rule: (shrimp->mineral water)
Support: 0.007199040127982935
Confidence: 0.30508474576271183
Lift: 3.200616332819722
Rule: (tomatoes->frozen vegetables)
Support: 0.006665777896280496
Confidence: 0