
<div dir>
<h1>Association Rules Rules</h1>
Association Rules rules indicate the mutual relationships and dependencies among a large set of data items.

A common example of discovering forum rules is "Market Basket Analysis". In this process, based on the different items that customers put in their baskets, the buying habits and behaviors of customers are analyzed. By identifying the relationships between products, repetitive patterns during shopping can be obtained.

Three important parameters:

Support shows the popularity of a set of items based on their frequency of occurrence in transactions.
Confidence shows the probability of purchasing item y when item x is purchased. x -> y
Lift is a combination of the above two parameters.
In this exercise, we use the Apriori algorithm, which is one of the most popular and efficient algorithms in this field, to implement forum rules.

<font color='Green'>Question: Investigate the effect of different values of the Lift parameter on the probability of the outcome. </font>

<font color='white'>Answer: The lift parameter shows the degree of association between two items. If the value of the lift is high, it means that two items are highly associated with each other, and vice versa. In fact, this parameter is the ratio of the number of times that two items have been purchased together to the number of times that it is expected that these two items be purchased together. </font>

</div>





<div dir>
<h1>Apriori Algorithm</h1>
The working principle of the Apriori algorithm is to set a minimum support value and iterations occur with frequent itemsets. If sets and subsets have a support value less than the threshold, they are removed. This process continues until there is no possibility of further elimination.

In this exercise, we want to apply the Apriori algorithm on the Hypermarket_dataset which includes orders of people's purchases from grocery stores.


</div>





<div dir>
<h1>Data Preprocessing</h1>
<font color='Green'>Question: To start the work, you need to prepare the dataset as a sparse matrix where purchased products are in the columns and the order numbers are the index.

For convenience, encode the purchased products in each order with numbers 0 or 1.

Your sample output matrix should be:

<img src="https://drive.google.com/uc?id=1eD0jan1ZbeYqSklgK--ks7oeY-MyTA3p"></img>

</div>




In [2]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
 
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [3]:
dataset = pd.read_csv('data/Hypermarket_dataset.csv')

In [5]:
dataset.loc[dataset['Member_number'] == 1905]

Unnamed: 0,Member_number,Date,itemDescription
19,1905,07-07-2015,other vegetables
35,1905,21-02-2015,fruit/vegetable juice
875,1905,19-05-2015,sugar
5244,1905,18-10-2015,soda
6713,1905,31-07-2015,ham
7096,1905,07-07-2015,tropical fruit
7112,1905,21-02-2015,sausage
13447,1905,22-04-2014,photo/film
14090,1905,02-06-2014,cat food
15221,1905,31-01-2014,seasonal products


In [6]:
def items_to_array(items_array: list, purchased_items: list):
  vector = [0] * 167
  for item in purchased_items:
    item_index = items_array.index(item)
    vector[item_index] = 1
  return vector

In [7]:
dataset.loc[1]['itemDescription']

'whole milk'

In [8]:
# your code here
items = []
customer_IDs = []
member_transactions = {}

for i in range(len(dataset)):
  customer_id = dataset.loc[i]['Member_number']
  items_tmp = dataset.loc[i]['itemDescription']

  if customer_id not in customer_IDs:
    member_transactions[customer_id] = [items_tmp]
    customer_IDs.append(customer_id)
  else: 
    member_transactions[customer_id] += [items_tmp]

  if items_tmp not in items:
    items.append(items_tmp)

trasactions_vector = {}
for i in list(member_transactions.keys()):
  trasactions_vector[i] = items_to_array(items, list(set(member_transactions[i])))

In [9]:
transactions_df = pd.DataFrame.from_dict(trasactions_vector, orient='index', columns=items)
transactions_df

Unnamed: 0,tropical fruit,whole milk,pip fruit,other vegetables,rolls/buns,pot plants,citrus fruit,beef,frankfurter,chicken,...,flower (seeds),rice,tea,salad dressing,specialty vegetables,pudding powder,ready soups,make up remover,toilet cleaner,preservation products
1808,1,1,0,0,1,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2552,1,1,0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2300,0,0,1,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1187,0,0,0,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3037,0,1,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4590,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4703,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3607,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4587,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0



<div dir>
<h1>Finding frequent patterns</h1>
<font color='Green'>Question: Using the Apriori algorithm and a minimum support threshold of 0.07, generate all frequent patterns. 

</div>




In [10]:
# your code here
frequent_itemsets = apriori(transactions_df, min_support=0.07)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets



Unnamed: 0,support,itemsets,length
0,0.233710,(0),1
1,0.458184,(1),1
2,0.170600,(2),1
3,0.376603,(3),1
4,0.349666,(4),1
...,...,...,...
78,0.097486,"(17, 37)",2
79,0.077219,"(18, 37)",2
80,0.081324,"(37, 21)",2
81,0.082093,"(1, 3, 4)",3



<div dir>
<h1>Extracting Association Rules</h1>
<font color='Green'>Question: Write a function that takes two inputs confidence and lift and displays the resulting association rules in the output. </br>
Record your output for both cases in the report.


</div>




In [13]:
# your code here
def extract_association_rules(freq_itemsets, confidence, lift):
    rules = association_rules(freq_itemsets, metric="confidence", min_threshold=confidence)
    rules = rules[rules['lift'] > lift]
    return rules

In [14]:
extract_association_rules(frequent_itemsets, 0.5, 1.2)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
13,"(3, 4)",(1),0.146742,0.458184,0.082093,0.559441,1.220996,0.014859,1.229837
14,"(17, 3)",(1),0.120318,0.458184,0.071832,0.597015,1.303003,0.016704,1.344507


In [18]:
extract_association_rules(frequent_itemsets, 0.3,  1.1)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
3,(0),(17),0.23371,0.282966,0.07568,0.32382,1.144379,0.009548,1.060419
4,(0),(37),0.23371,0.313494,0.081837,0.350165,1.116974,0.00857,1.056431
5,(2),(1),0.1706,0.458184,0.086968,0.509774,1.112598,0.008801,1.105239
6,(1),(3),0.458184,0.376603,0.19138,0.417693,1.109106,0.018827,1.070564
7,(3),(1),0.376603,0.458184,0.19138,0.508174,1.109106,0.018827,1.101643
8,(1),(4),0.458184,0.349666,0.178553,0.389698,1.114484,0.018342,1.065592
9,(4),(1),0.349666,0.458184,0.178553,0.510638,1.114484,0.018342,1.10719
11,(16),(1),0.213699,0.458184,0.112365,0.52581,1.147597,0.014452,1.142615
12,(1),(17),0.458184,0.282966,0.15059,0.328667,1.16151,0.02094,1.068076
13,(17),(1),0.282966,0.458184,0.15059,0.532185,1.16151,0.02094,1.158185
