# Chris Gomez

Import necessary libraries

In [1]:
import pandas as pd
import numpy as np
from apriori_python import apriori 

Read in dataset

In [19]:
df = pd.read_csv("Dataset.csv", header = None)
df

Unnamed: 0,0,1,2,3,4
0,Biscuit,Bournvita,Butter,Cornflakes,Tea
1,Bournvita,Bread,Butter,Cornflakes,
2,Butter,Coffee,Chocolate,Eggs,Jam
3,Bournvita,Butter,Cornflakes,Bread,Eggs
4,Bournvita,Bread,Coffee,Chocolate,Eggs
5,Jam,Sugar,,,
6,Biscuit,Bournvita,Butter,Cornflakes,Jam
7,Curd,Jam,Sugar,,
8,Bournvita,Bread,Butter,Coffee,Cornflakes
9,Bournvita,Bread,Coffee,Chocolate,Eggs


Making transactions

In [3]:
transactions = []
for i in range(0,25):
    transactions.append([str(df.values[i,j]) for j in range (0,5)])

Using apriori method to build model

In [4]:
freq_items, rules = apriori(transactions, minSup = 0.249, minConf = 0.699)

freq_items = the frequent item lists with support greater than 24.9%

rules = transactions greater than 69.9% confidence

## What are the top five rules on a given dataset? 

So it looks like we have all rules with confidence greater than 69.9% and lists them in ascending order. If we want to find the top five rules on this dataset, we look at the last 5 in rules as follows...

In [6]:
last_five_rules = rules[-5:]
last_five_rules

[[{'Chocolate'}, {'Eggs'}, 0.9],
 [{'Coffee', 'Eggs'}, {'Chocolate'}, 1.0],
 [{'Chocolate', 'Coffee'}, {'Eggs'}, 1.0],
 [{'Bournvita', 'Cornflakes'}, {'Butter'}, 1.0],
 [{'Butter', 'Cornflakes'}, {'Bournvita'}, 1.0]]

Thus we can see from the list above the top five rules and in case you don't see them, I will write them in this markdown...
CONFIDENCE(Chocolate -> eggs) = 0.9 or 90%                                      
CONFIDENCE(Bournvita, Cornflakes -> Butter) = 1.0 or 100%                                                                         
CONFIDENCE(Butter, Cornflakes -> Bournvita) = 1.0 or 100%                                                             
CONFIDENCE(Chocolate, Coffee -> Eggs) = 1.0 or 100%                                                                                       
CONFIDENCE(Coffee, Eggs -> Chocolate) = 1.0 or 100%



## What is your major learning outcome of this assignment?

I think for me the biggest learning outcome is how the apriori algorithm works using the apriori_python. Also just learning how we can apply/make associations in supermarkets. Looking at the results, I think the obvious is that we have 2 very common groupings of food items.      
1. (Chocolate, Eggs, Coffee)                                                                                                
2. (Bournvita, Butter, Cornflakes)                                                                                                      
Supermarkets should definitely group these items in the same aisle!

## How we can specify the value of lift in extracting rules?

lift(x->y) = confidence(x and y) / support(y)

lets start by finding the support...

In [28]:
rules

[[{'Chocolate'}, {'Coffee'}, 0.7],
 [{'Chocolate'}, {'Coffee', 'Eggs'}, 0.7],
 [{'Butter'}, {'Bournvita'}, 0.75],
 [{'Coffee'}, {'Eggs'}, 0.7777777777777778],
 [{'Coffee'}, {'Chocolate'}, 0.7777777777777778],
 [{'Coffee'}, {'Chocolate', 'Eggs'}, 0.7777777777777778],
 [{'Chocolate', 'Eggs'}, {'Coffee'}, 0.7777777777777778],
 [{'Eggs'}, {'Chocolate'}, 0.8181818181818182],
 [{'Cornflakes'}, {'Bournvita'}, 0.8888888888888888],
 [{'Bread'}, {'Bournvita'}, 0.8888888888888888],
 [{'Cornflakes'}, {'Butter'}, 0.8888888888888888],
 [{'Cornflakes'}, {'Bournvita', 'Butter'}, 0.8888888888888888],
 [{'Bournvita', 'Butter'}, {'Cornflakes'}, 0.8888888888888888],
 [{'Chocolate'}, {'Eggs'}, 0.9],
 [{'Coffee', 'Eggs'}, {'Chocolate'}, 1.0],
 [{'Chocolate', 'Coffee'}, {'Eggs'}, 1.0],
 [{'Bournvita', 'Cornflakes'}, {'Butter'}, 1.0],
 [{'Butter', 'Cornflakes'}, {'Bournvita'}, 1.0]]

In [22]:
# Calculate the total number of transactions
total_transactions = len(df)

# Create an empty dictionary to store support values
support_values = {}

# Iterate over each column in the dataframe
for column in df.columns:
    # Count the occurrences of each unique item
    item_counts = df[column].value_counts().to_dict()
    
    # Calculate the support for each item
    for item, count in item_counts.items():
        support = count / total_transactions
        support_values[frozenset([item])] = support

print(support_values)

{frozenset({'Bournvita'}): 0.16, frozenset({'Biscuit'}): 0.16, frozenset({'Butter'}): 0.24, frozenset({'Jam'}): 0.08, frozenset({'Coffee'}): 0.04, frozenset({'Chocolate'}): 0.2, frozenset({'Curd'}): 0.04, frozenset({'Bread'}): 0.04, frozenset({'Juice'}): 0.04, frozenset({'Rice'}): 0.04, frozenset({'Sugar'}): 0.08, frozenset({'Cornflakes'}): 0.04, frozenset({'Milk'}): 0.04, frozenset({'Soap'}): 0.04, frozenset({'Eggs'}): 0.24, frozenset({'Tea'}): 0.04}


now let's find the lift using the confidence and lift...

In [30]:
lift_values = []

for rule in rules:
    antecedent = frozenset(rule[0])
    consequent = frozenset(rule[1])
    confidence = rule[2]

    support_antecedent = support_values.get(antecedent, 0.0)
    support_consequent = support_values.get(consequent, 0.0)
    support_antecedent_consequent = support_values.get(antecedent.union(consequent), 0.0)

    if support_antecedent != 0.0 and support_consequent != 0.0:
        lift = confidence / (support_antecedent * support_consequent)
    else:
        lift = 0.0

    lift_values.append((antecedent, consequent, lift))

print(lift_values)


[(frozenset({'Chocolate'}), frozenset({'Coffee'}), 87.49999999999999), (frozenset({'Chocolate'}), frozenset({'Eggs', 'Coffee'}), 0.0), (frozenset({'Butter'}), frozenset({'Bournvita'}), 19.53125), (frozenset({'Coffee'}), frozenset({'Eggs'}), 81.01851851851853), (frozenset({'Coffee'}), frozenset({'Chocolate'}), 97.22222222222223), (frozenset({'Coffee'}), frozenset({'Eggs', 'Chocolate'}), 0.0), (frozenset({'Eggs', 'Chocolate'}), frozenset({'Coffee'}), 0.0), (frozenset({'Eggs'}), frozenset({'Chocolate'}), 17.045454545454547), (frozenset({'Cornflakes'}), frozenset({'Bournvita'}), 138.88888888888889), (frozenset({'Bread'}), frozenset({'Bournvita'}), 138.88888888888889), (frozenset({'Cornflakes'}), frozenset({'Butter'}), 92.5925925925926), (frozenset({'Cornflakes'}), frozenset({'Bournvita', 'Butter'}), 0.0), (frozenset({'Bournvita', 'Butter'}), frozenset({'Cornflakes'}), 0.0), (frozenset({'Chocolate'}), frozenset({'Eggs'}), 18.75), (frozenset({'Eggs', 'Coffee'}), frozenset({'Chocolate'}), 0.0