# Apriori
Apriori also stands for "A prior knowledge." This is because it uses prior knowledge of a set to predict a future set.

An association rule learning algorithm that contains 3 parts:
- Support
- Confidence
- Lift

### Apriori Algorithm
Let's say we're solving for a Market Basket Optimization problem.
- Step 1: Set a minimum support and confidence
    - Because Because Apriori would be slow if the program compared every item to another item
- Step 2: Take all the subsets in transactions having higher support than the minimum support
- Step 3: Take all the rules of these subsets having higher confidence than the minimum confidence
- Step 4: Sort the rules by descending lift
    - The rule with the highest lift is the strongest rule, so it's the first rule
    - The rule with the lowest lift is the weakest rule, so it's the last rule

### Support
Let's say we're performing Market Basket Optimization by discovering rules among items in a store.

The support is the number of customers that purchased I (Item) divided by the total number of transactions.

<img src="images/apriori/market_basket_support.png" height="65%" width="65%"></img>

For example, if 50 out of 100 total people purchased Fries, then the support for Fries is 50 / 100 = 50%.

### Confidence
Let's continue with the Market Basket Optimization problem from the "Support" section.

The confidence tells us how likely I2 is purchased when I1 is purchased.

<img src="images/apriori/market_basket_confidence.png" height="75%" width="75%"></img>
- I1 -> I2 also stands for "I1 implies I2"

For example if 50 out of 100 people purchased Fries (I1), and 25 out of the 50 people that purchased Fries also purchased a Burger (I2), then the confidence that Fries implies Burgers is 25 / 50 = 50%.

### Lift
Let's continue with the Market Basket Optimization problem from the "Support" and "Confidence" sections.

While taking into consideration how popular item I2 is to purchase, the lift tells us how likely I2 is purchased when I1 is purchased.
- The problem with Confidence is that it never took into consideration the popularity (support) of I2
- Lift better illustrates the confidence while taking into consideration the popularity of I2

<img src="images/apriori/market_basket_lift.png" height="55%" width="55%"></img>
- I1 -> I2 also stands for "I1 implies I2"

If Lift > 1, then I2 is likely to be purchased if I1 is purchased.  
If Lift = 1, then there is no association with I1 and I2 .  
If Lift < 1, then I2 is unlikely to be purchased if I1 is purchased.

For example if 50 out of 100 people purchased Fries (I1), and 25 out of the 50 people that purchased Fries also purchased a Burger (I2). And 35 people purchased Burgers (I2).
- The confidence that I1 implies I2 is 25 / 50 = 0.5
- The support for burgers is 35 / 100 = 0.35

The lift that Fries Implies Burgers is 0.5 / 0.35 = ~1.43.

In conclusion, Burgers are likely to be purchased if Fries are purchased as well

In [89]:
# import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [90]:
# import the data set, specify that there is no header (no title) in the csv file
basket_df = pd.read_csv("datasets/market_basket.csv", header=None)

# Each row is a transaction that shows the item(s) purchased by the customer
basket_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [91]:
# convert the basket data frame of strings into a list of lists (2D list)
basket_lists = basket_df.applymap(str).values.tolist()

# Apriori Model

In [92]:
# import the apriori function from the apyori.py file
from apyori import apriori

In [93]:
"""
Create Apriori rules.
- min_support the minimum support value that an item must have
    - if we assume an item is purchased 21 times in a week (3 * 7 = 21), then 21 / 7500 = ~0.003
- min_confidence the minimum confidence value that a rule must have
    - min_confidence can be unreliable if I2 is a popularly purchased item, so it's a low value
- min_lift the minimum lift value that a rule must have
- min_length is the minimum number of items within a rule
""" 
rules = apriori(basket_lists, min_support=0.003, min_confidence=0.2, min_lift=3, min_length=2)

# Visualization of Associated Rules

According to the top rule, people who purchase chicken are likely to purchase light cream.
- Based on the confidence, there's a 29% chance that the customer purchases light cream if purchased chicken

In [94]:
# return the results of the apriori algorithm into a list
results = list(rules)

In [97]:
# print the top (best) rule: the rule with the highest lift
print("Rule: " + str(results[0][0]))
print("Support: " + str(results[0][1]))
print("Confidence: " + str(results[0][2][0][2]))
print("Lift: " + str(results[0][2][0][3]))

Rule: frozenset({'chicken', 'light cream'})
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
