# Apriori Association Rule Learning

### Intuition

Someone who bought this or did this, also bought or did this other thing

Good for recommendations and market optimization

Part 1: Support

Support = number of watchlists, transactions, etc. contaning item M1 divided by total number of user watchlists, transactions, etc.

Part 2: Confidence

Confidence = number of people who have seen or buy both M1 and M2 divided by number of people who have seen or bought M1

Part 3: Lift

Lift = Confidence divided by the Support

Improvement in prediction that results from utilizing prior knowledge

Takes probability that someone who has item 1, also has item 2 divided by probability that someone has item 1

### Algorithm Steps

Step 1: Set a minimum support and confidence. Only want to consider items that are popular in order to reduce computation costs as well as increase model applicability. 

Step 2: Take all the subsets in transactions having higher support than minimum support

Step 3: Take all the rules (connections between items) of these subsets having higher confidence than minimum confidence

Step 4: Sort the rules by decreasing lift. Highest to lowest

## Import the Necessary Libraries

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Preprocess the Data

In [2]:
# Data Preprocessing
# import data set, No headers for columns
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
# Initialize an empty list because apriori needs list of lists
transactions = []
# For all transactions in the data set since there are 7500 transactions in the data set
for i in range(0, 7501):
    # For each transaction, we want a list of the products for that transaction
    # We are adding specific transaction (i) and we want all the products represented by the columns (j)
    # Set the values to strings so it can work with apriori library
    transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

## Train the Apriori to the data set

In [3]:
# Training Apriori on the dataset
from apyori import apriori
# Create rules class from the apriori function
"""Parameters:
                transactions-the data you are using
                min_support-the minimum support. Support = transactions containing item/total transactions
                min_confidence-the minimum confidence. Confidence = transactions involving item 1 and item 2 / transactions with item 1
                min_lift-the minimium lift
                min_length-the minimum items that must be used"""
# min_support=product purchased 3 times a day. 21 times a week divided by total number of purchases (7500). 21/7500 = 0.0028
# min_confidence-want confidence that is not too large so we do not only optain obvious rules. Default is 0.8 on R. Divided
# this by 2, still obvious rules so we divided that dividend by 2 to get 0.2
# Rules will be true 0.2 or 20% percent of the time
# min_lift- want rules that are greater than 3 so we good indicators of association
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

## Visualize the results

In [4]:
# Visualising the results
results = list(rules)
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

In [None]:
# If someone bought light creame then they have a 29.06% chance of buying chicken. High lift of 4.84