### Association

#### Overview
In this project, we will perform an association rule learning using the Apriori algorithm. We will be using the guitar dataset that is sourced from the Kaggle Repository

#### Objectives
* Perform Association Rule Mining with Apriori algorithm.
* Discover frequent itemsets.

Source: https://www.kaggle.com/datasets/hakansaritas/dataset-for-apriori-analysis

#### Data Preparation

In [1]:
# import libraries
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
# load dataset
df = pd.read_csv("data/guitar_apriori_analysis.csv")
df.head(5)

Unnamed: 0,products,String,Bridge,String Pegs,Tuning Keys,Nut,Pick Guard,Fret,Saddle,Pick,Tuner
0,"('Nut', 'Saddle', 'Tuner')",0,0,0,0,1,0,0,1,0,1
1,"('Tuning Keys', 'Nut', 'Pick Guard')",0,0,0,1,1,1,0,0,0,0
2,"('Nut', 'Pick Guard', 'Fret', 'Pick')",0,0,0,0,1,1,1,0,1,0
3,"('Bridge', 'Tuning Keys', 'Nut', 'Pick Guard')",0,1,0,1,1,1,0,0,0,0
4,"('String Pegs', 'Fret', 'Saddle', 'Tuner')",0,0,1,0,0,0,1,1,0,1


In [3]:
df.dtypes

products       object
String          int64
Bridge          int64
String Pegs     int64
Tuning Keys     int64
Nut             int64
Pick Guard      int64
Fret            int64
Saddle          int64
Pick            int64
Tuner           int64
dtype: object

In [4]:
df.drop("products", axis=1, inplace=True)

In [5]:
df = df.astype(bool)

In [6]:
df.head()

Unnamed: 0,String,Bridge,String Pegs,Tuning Keys,Nut,Pick Guard,Fret,Saddle,Pick,Tuner
0,False,False,False,False,True,False,False,True,False,True
1,False,False,False,True,True,True,False,False,False,False
2,False,False,False,False,True,True,True,False,True,False
3,False,True,False,True,True,True,False,False,False,False
4,False,False,True,False,False,False,True,True,False,True


In [7]:
len(df)

100

#### Applying Apriori algorithm

In [8]:
freq_items = apriori(df, min_support=0.10, use_colnames=True)
freq_items

Unnamed: 0,support,itemsets
0,0.35,(String)
1,0.34,(Bridge)
2,0.38,(String Pegs)
3,0.31,(Tuning Keys)
4,0.3,(Nut)
5,0.31,(Pick Guard)
6,0.33,(Fret)
7,0.38,(Saddle)
8,0.27,(Pick)
9,0.37,(Tuner)


In [9]:
# add length column
freq_items["length"] = freq_items["itemsets"].apply(lambda i:len(i))
freq_items

Unnamed: 0,support,itemsets,length
0,0.35,(String),1
1,0.34,(Bridge),1
2,0.38,(String Pegs),1
3,0.31,(Tuning Keys),1
4,0.3,(Nut),1
5,0.31,(Pick Guard),1
6,0.33,(Fret),1
7,0.38,(Saddle),1
8,0.27,(Pick),1
9,0.37,(Tuner),1


#### Filtering Results

In [10]:
freq_items[(freq_items['length']>=2) & (freq_items['support']>=0.05)]

Unnamed: 0,support,itemsets,length
10,0.1,"(String, Bridge)",2
11,0.12,"(String, String Pegs)",2
12,0.1,"(String, Fret)",2
13,0.11,"(String, Saddle)",2
14,0.1,"(String, Pick)",2
15,0.14,"(String, Tuner)",2
16,0.12,"(Tuning Keys, Bridge)",2
17,0.1,"(Nut, Bridge)",2
18,0.11,"(Pick Guard, Bridge)",2
19,0.11,"(Saddle, Bridge)",2


#### Display Association Rules

In [11]:
rules = association_rules(freq_items, metric="lift", min_threshold= 1)
rules['antecendents length'] = rules['antecedents'].apply(lambda i:len(i))
rules['consequents length'] = rules['consequents'].apply(lambda i:len(i))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecendents length,consequents length
0,(String),(Pick),0.35,0.27,0.1,0.285714,1.058201,0.0055,1.022,0.084615,1,1
1,(Pick),(String),0.27,0.35,0.1,0.37037,1.058201,0.0055,1.032353,0.075342,1,1
2,(String),(Tuner),0.35,0.37,0.14,0.4,1.081081,0.0105,1.05,0.115385,1,1
3,(Tuner),(String),0.37,0.35,0.14,0.378378,1.081081,0.0105,1.045652,0.119048,1,1
4,(Tuning Keys),(Bridge),0.31,0.34,0.12,0.387097,1.13852,0.0146,1.076842,0.176329,1,1
5,(Bridge),(Tuning Keys),0.34,0.31,0.12,0.352941,1.13852,0.0146,1.066364,0.184343,1,1
6,(Pick Guard),(Bridge),0.31,0.34,0.11,0.354839,1.043643,0.0046,1.023,0.060606,1,1
7,(Bridge),(Pick Guard),0.34,0.31,0.11,0.323529,1.043643,0.0046,1.02,0.063361,1,1
8,(String Pegs),(Tuning Keys),0.38,0.31,0.12,0.315789,1.018676,0.0022,1.008462,0.02957,1,1
9,(Tuning Keys),(String Pegs),0.31,0.38,0.12,0.387097,1.018676,0.0022,1.011579,0.02657,1,1


In [12]:
rules[['antecedents','consequents','support','confidence','lift']][(rules['antecendents length'] >= 1) & (rules.lift > 1)].sort_values(by=["confidence","lift"], ascending = False)

Unnamed: 0,antecedents,consequents,support,confidence,lift
19,(Fret),(Tuner),0.15,0.454545,1.228501
17,(Nut),(Saddle),0.13,0.433333,1.140351
18,(Tuner),(Fret),0.15,0.405405,1.228501
2,(String),(Tuner),0.14,0.4,1.081081
10,(Fret),(String Pegs),0.13,0.393939,1.036683
12,(Pick Guard),(Tuning Keys),0.12,0.387097,1.248699
13,(Tuning Keys),(Pick Guard),0.12,0.387097,1.248699
4,(Tuning Keys),(Bridge),0.12,0.387097,1.13852
9,(Tuning Keys),(String Pegs),0.12,0.387097,1.018676
3,(Tuner),(String),0.14,0.378378,1.081081


In [13]:
rules[['antecedents','consequents', 'support','confidence','lift']][(rules['confidence'] >= 0.4) & (rules.lift > 1)].sort_values(by=["confidence","lift"], ascending = False)

Unnamed: 0,antecedents,consequents,support,confidence,lift
19,(Fret),(Tuner),0.15,0.454545,1.228501
17,(Nut),(Saddle),0.13,0.433333,1.140351
18,(Tuner),(Fret),0.15,0.405405,1.228501
2,(String),(Tuner),0.14,0.4,1.081081


#### Analysis
* There are no trasformation happened since it is already in transaction format.
* Converted numerical boolean values.
* There are 10 items in the dataset. 
* Among those 10 items with a minimum threshold of 1 in lift, the frequent itemsets are fret and tuner, nut and saddle, tuner and fret, and string and tuner.
* The maximum length of itemsets are only 2. It may be because the data are only limited.