# Data Science and Visualization (RUC F2023)

## Lecture 9: Associate Rules

 # Exercise Solution: Apriori Algorithm for Association Rule Mining
 
 This exercise requires to use the library mlxtend to mine association rules from the *Bread Basket* dataset.
 
 To install mlxtend, you need to execute the following in Anaconda Prompt:

pip install mlxtend

## 0. Setup and data loading

In [3]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import association_rules, apriori

In [4]:
data = pd.read_csv('C:/Data/bread basket.csv')
data.shape

(20507, 5)

In [6]:
data

Unnamed: 0,Transaction,Item,date_time,period_day,weekday_weekend
0,1,Bread,30-10-2016 09:58,morning,weekend
1,2,Scandinavian,30-10-2016 10:05,morning,weekend
2,2,Scandinavian,30-10-2016 10:05,morning,weekend
3,3,Hot chocolate,30-10-2016 10:07,morning,weekend
4,3,Jam,30-10-2016 10:07,morning,weekend
...,...,...,...,...,...
20502,9682,Coffee,09-04-2017 14:32,afternoon,weekend
20503,9682,Tea,09-04-2017 14:32,afternoon,weekend
20504,9683,Coffee,09-04-2017 14:57,afternoon,weekend
20505,9683,Pastry,09-04-2017 14:57,afternoon,weekend


## 1. Data preparation

We first merge the items of the same transaction to form a transaction table in the format of *a list of lists*. The following function serves the purpose:

In [5]:
def generateTransactions(df):
    transactions = []

    # Process the first row
    pre_id = df.iloc[0, 0]
    transaction = [df.iloc[0, 1]]
    
    # Process all remaining rows, merge the current row to the previous if they share the same Transaction ID
    for i in range(1, df.shape[0]):
        cur_id = df.iloc[i, 0]
        if (cur_id == pre_id):
            transaction.append(df.iloc[i, 1])
        else:
            transactions.append(transaction)
            transaction = [df.iloc[i, 1]]
        pre_id = cur_id
    
    # Don't forget the last transaction
    transactions.append(transaction)
    
    return transactions

In [6]:
transactions = generateTransactions(data)

In [25]:
len(transactions)

9465

In [26]:
print(transactions[9464])

['Smoothies']


### Encode the transaction table into the format required by mlxtend:

In [7]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
# The two steps, fit and transform, can be merged into one step
#te_ary = te.fit(dataset).transform(dataset)
te_ary = te.fit_transform(transactions)

# We generate a DataFrame for the transformed dataset
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,Adjustment,Afternoon with the baker,Alfajores,Argentina Night,Art Tray,Bacon,Baguette,Bakewell,Bare Popcorn,Basket,...,The BART,The Nomad,Tiffin,Toast,Truffles,Tshirt,Valentine's card,Vegan Feast,Vegan mincepie,Victorian Sponge
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9460,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
9461,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False
9462,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
9463,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [28]:
df.shape

(9465, 94)

## 2. Generating Frequent Itemsets

In [8]:
from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.036344,(Alfajores)
1,0.016059,(Baguette)
2,0.327205,(Bread)
3,0.040042,(Brownie)
4,0.103856,(Cake)
...,...,...
56,0.023666,"(Toast, Coffee)"
57,0.014369,"(Sandwich, Tea)"
58,0.010037,"(Bread, Cake, Coffee)"
59,0.011199,"(Bread, Pastry, Coffee)"


## 3. Generating Association Rules

We obtain the inferred rules in a dataframe, via function **association_rules()**:

In [9]:
from mlxtend.frequent_patterns import association_rules

rules = association_rules(frequent_itemsets, metric ="lift", min_threshold = 1)

Print the rules:

In [10]:
rules.iloc[:, 0:7]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift
0,(Alfajores),(Coffee),0.036344,0.478394,0.019651,0.540698,1.130235
1,(Coffee),(Alfajores),0.478394,0.036344,0.019651,0.041078,1.130235
2,(Bread),(Pastry),0.327205,0.086107,0.02916,0.089119,1.034977
3,(Pastry),(Bread),0.086107,0.327205,0.02916,0.33865,1.034977
4,(Brownie),(Coffee),0.040042,0.478394,0.019651,0.490765,1.02586
5,(Coffee),(Brownie),0.478394,0.040042,0.019651,0.041078,1.02586
6,(Cake),(Coffee),0.103856,0.478394,0.054728,0.526958,1.101515
7,(Coffee),(Cake),0.478394,0.103856,0.054728,0.114399,1.101515
8,(Cake),(Hot chocolate),0.103856,0.05832,0.01141,0.109868,1.883874
9,(Hot chocolate),(Cake),0.05832,0.103856,0.01141,0.195652,1.883874
