# AssociationRules

Association rules represent relationships and interdependencies between large sets of data items. <br><br>
A common example of association rule discovery is "shopping cart analysis". In this process, according to the various items that customers put in their shopping carts, the buying habits and behavior of customers are analyzed, and by identifying the relationship between products, repeating patterns during shopping can be obtained. <br><br>
Three important parameters:
- Support shows the popularity of a set of items according to the frequency of transactions.
- Confidence shows the probability of buying item y if item x is bought. x -> y
- Lift is a combination of the above two parameters.

To implement association rules in this project, we use the Apriori algorithm, which is one of the most popular and efficient algorithms in this field.

### How does the Lift parameter affect the probability of occurrence

The lift value calculates the probability of an item occurring if another item has occurred, while also considering the frequency of each of the two items. <br>
The amount of lift can be calculated using the following equation:
    

lift = confidence / expected_confidence =
confidence / ( s(body) * s(head) / s(body) ) = confidence / s(head) <br>
The lift value can have values ​​from 0 to infinity

Three different scenarios can happen:<br>
- If the lift value is greater than 1, it indicates that the body and head of the rule appear together more than expected, meaning that the body event has a positive effect on the head event.<br>
- If the lift value is less than 1, it means that the body and the head of the rule appeared together less than expected, and in this way, the occurrence of the body has a negative effect on the probability of the occurrence of the head.<br>
- If the lift value is close to 1, it shows that the body and the head occur together almost as expected, meaning that the body event will not affect the head event.

## Apriori Algorithm

The algorithm works in a way that a minimum support value is considered and repetitions occur with frequent itemsets. If the sets and subsets have a support value lower than the threshold, they are removed. This process continues until there is no possibility of deletion.

### Data Prepration

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
 
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
# your code here
import pandas as pd

data = pd.read_csv("../data/Hypermarket_dataset.csv")
# data = data.drop(["Date"], axis=1)
data

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk
...,...,...,...
38760,4471,08-10-2014,sliced cheese
38761,2022,23-02-2014,candy
38762,1097,16-04-2014,cake bar
38763,1510,03-12-2014,fruit/vegetable juice


In [3]:
from mlxtend.preprocessing import TransactionEncoder

data['itemDescription'] = data['itemDescription'].transform(lambda x: [x])
data = data.groupby(['Member_number']).sum()['itemDescription'].reset_index(drop=True)

encoder = TransactionEncoder()
transactions = pd.DataFrame(encoder.fit(data).transform(data), columns=encoder.columns_)
transactions = transactions.replace([False, True], [0,1])
display(transactions.head())

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0
1,0,0,0,0,0,0,0,0,1,0,...,0,0,0,1,0,1,0,1,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


### Identifying Recurring Patterns

In [4]:
# your code here
frequent_patterns  = apriori(transactions, min_support=0.07, use_colnames=True)
frequent_patterns



Unnamed: 0,support,itemsets
0,0.078502,(UHT-milk)
1,0.119548,(beef)
2,0.079785,(berries)
3,0.158799,(bottled beer)
4,0.213699,(bottled water)
...,...,...
78,0.075680,"(tropical fruit, yogurt)"
79,0.079785,"(whole milk, whipped/sour cream)"
80,0.150590,"(whole milk, yogurt)"
81,0.082093,"(other vegetables, whole milk, rolls/buns)"


### Extracting Association Rules

In [5]:
def extract_rules(frequent_itemsets, confidence, lift):
    rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1, support_only=False)
    rules = rules[rules['confidence']>=confidence]
    return rules

In [6]:
# your code here
rules = extract_rules(frequent_patterns, 0.3, 0.1)
display(rules.head())
print("Rules identified: ", len(rules))

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(bottled beer),(whole milk),0.158799,0.458184,0.085428,0.537964,1.174124,0.012669,1.172672
3,(bottled water),(other vegetables),0.213699,0.376603,0.093894,0.439376,1.16668,0.013414,1.111969
4,(bottled water),(rolls/buns),0.213699,0.349666,0.079271,0.370948,1.060863,0.004548,1.033832
6,(bottled water),(soda),0.213699,0.313494,0.076193,0.356543,1.137318,0.009199,1.066902
8,(bottled water),(whole milk),0.213699,0.458184,0.112365,0.52581,1.147597,0.014452,1.142615


Rules identified:  59


In [7]:
# your code here
rules = extract_rules(frequent_patterns, 0.5, 1)
display(rules.head())
print("Rules identified: ", len(rules))

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(bottled beer),(whole milk),0.158799,0.458184,0.085428,0.537964,1.174124,0.012669,1.172672
8,(bottled water),(whole milk),0.213699,0.458184,0.112365,0.52581,1.147597,0.014452,1.142615
11,(canned beer),(whole milk),0.165213,0.458184,0.087224,0.52795,1.152268,0.011526,1.147795
18,(domestic eggs),(whole milk),0.133145,0.458184,0.070292,0.527938,1.152242,0.009287,1.147766
20,(newspapers),(whole milk),0.139815,0.458184,0.072345,0.517431,1.12931,0.008284,1.122775


Rules identified:  15
