## Apriori Algorithm Working

Three important concepts:
### 1. Support
Support of an item is the number of transactions that contain this item.

$$Support = \frac{Transactions\ with\ this\ item}{Total\ Transactions}$$

### 2. Confidence
Confidence(X->Y) will depict the how often item Y is purchased, given item X is already purchased. 

$$Confidence(X \rightarrow Y) = \frac{Transactions\ where\ X\ and\ Y\ are\ bought\ together}{Transactions\ where\ X\ is\ bought}$$

One drawback of Confidence is that it ignores the popularity of Y. For instance out of 5 transactions, X and Y are bought together. Hence Conf(X \rightarrow Y) is 1 but it may be possible that item Y is a common item and is brought frequently hence rule X \rightarrow Y doesn't hold much. Hence, lift is taken into consideration

### 3. Lift
Lift will depict how often item Y is purchased, given item X is already purchased, and also taking into consideration the popularity of item Y.

$$ Lift (X \rightarrow Y) = \frac{Confidence(X \rightarrow Y)}{Support(Y)}$$

So Lift is a correct metric to see if item Y should be associated with item X. A lift of greater than 3 indicates a strong correlation. 

In [27]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math

X = pd.read_csv('Market_Basket_Optimisation.csv', header=None)


In [30]:
transactions = []
for i in range(0, 7501):
    transactions.append([str(X.values[i, j]) for j in range(0, 20)])

In [38]:
transactions = [[y for y in x if y != 'nan'] for x in transactions]
for x in transactions[:5]:
    print x

['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']
['burgers', 'meatballs', 'eggs']
['chutney']
['turkey', 'avocado']
['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea']


In [49]:
from apyori import apriori
# support of 0.003 means an item is purchased at least 3-4 times
rules = apriori(transactions, min_support=0.003, min_condifence=0.2, min_lift=3, min_length=2)

In [52]:
print 'top 5 rules'

for idx, x in enumerate(rules):
    if idx > 5:
        break
    print x.items

top 5 rules
frozenset(['tomato sauce', 'ground beef'])
frozenset(['olive oil', 'light cream'])
frozenset(['olive oil', 'whole wheat pasta'])
frozenset(['pasta', 'shrimp'])
frozenset(['spaghetti', 'avocado', 'milk'])
frozenset(['cake', 'burgers', 'milk'])
