# Association Rules - 360DIGITMG

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. ... In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.

Apriori is part of the association rule learning algorithms, which sit under the unsupervised branch of Machine Learning.Apriori does not require us to provide a target variable for the model. Instead, the algorithm identifies relationships between data points subject to our specified constraints.

Support
The first step for us and the algorithm is to find frequently bought items. It is a straightforward calculation that is based on frequency.
Support(A) = Transactions(A) / Total Transactions

Confidence
Now that we have identified frequently bought items let’s calculate confidence. This will tell us how confident (based on our data) we can be that an item will be purchased, given that another item has been purchased.
Confidence(A→B) = Probability(A & B) / Support(A)

Lift
Given that different items are bought at different frequencies, how do we know that eggs and bacon really do have a strong association, and how do we measure it? You will be glad to hear that we have a way to evaluate this objectively using lift.
There are multiple ways to express the formula to calculate lift.
1) Lift(A→B) = Probability(A & B) / (Support(A) * Support(B))
2) Lift(A→B) = Confidence(A & B) / Support(B)

# Steps Involved in Apriori Algorithm
1. Set a minimum value for support and confidence. This means that we are only interested in finding rules for the items that have certain default existence (e.g. support) and have a minimum value for co-occurrence with other items (e.g. confidence).
2. Extract all the subsets having higher value of support than minimum threshold.
3. Select all the rules from the subsets with confidence value higher than minimum threshold.
4. Order the rules by descending order of Lift.

# Association Rules Problem No. 4
A Mobile Phone manufacturing company wants to launch its three brand new phone into the market, but before going with its traditional marketing approach this time it want to analyze the data of its previous model sales in different regions and you have been hired as an Data Scientist to help them out, use the Association rules concept and provide your insights to the company’s marketing team to improve its sales.
 myphonedata.csv


Business Objective :

In [None]:
# Implementing Apriori algorithm from mlxtend

# conda install mlxtend
# or
# pip install mlxtend



In [1]:
import pandas as pd # for Data Manipulation
from mlxtend.frequent_patterns import apriori, association_rules # for Apriori algorithm we use mlxtend 
import matplotlib.pyplot as plt # Data Vizualization


In [4]:
myphonedata = pd.read_csv("myphonedata.csv")
myphonedata

Unnamed: 0,V1,V2,V3,red,white,green,yellow,orange,blue
0,red,white,green,1,1,1,0,0,0
1,white,orange,,0,1,0,0,1,0
2,white,blue,,0,1,0,0,0,1
3,red,white,orange,1,1,0,0,1,0
4,red,blue,,1,0,0,0,0,1
5,white,blue,,0,1,0,0,0,1
6,red,blue,,1,0,0,0,0,1
7,red,white,blue,1,1,0,0,0,1
8,green,,,0,0,1,0,0,0
9,red,white,blue,1,1,0,0,0,1


In [None]:
myphonedata.describe()

In [3]:
# Deleting the unwanted columns
myphonedata_x = myphonedata.drop(['V1','V2','V3'],axis=1)
myphonedata_x

Unnamed: 0,red,white,green,yellow,orange,blue
0,1,1,1,0,0,0
1,0,1,0,0,1,0
2,0,1,0,0,0,1
3,1,1,0,0,1,0
4,1,0,0,0,0,1
5,0,1,0,0,0,1
6,1,0,0,0,0,1
7,1,1,0,0,0,1
8,0,0,1,0,0,0
9,1,1,0,0,0,1


In [None]:
# # Checking the Most Frequent item sets based on min support 
myphonedata_y = apriori(myphonedata_x, min_support = 0.0075, max_len = 4, use_colnames = True)
myphonedata_y

In [None]:
# Most Frequent item sets based on support 
myphonedata_y.sort_values('support', ascending = False, inplace = True)

In [None]:
plt.bar(x = list(range(0,9 )), height = myphonedata_y.support[0:9], color ='rgmyk')
plt.xticks(list(range(0,9 )), myphonedata_y.itemsets[0:9], rotation=20)
plt.xlabel('item-sets')
plt.ylabel('support')
plt.show()

In [None]:
# If we are interested in rules according to a different metric of interest, we can simply adjust the metric 
# and min_threshold arguments .
# Eg.  if you are only interested in rules that have a lift score of >= 1, you would do the following
rules = association_rules(myphonedata_y, metric = "lift", min_threshold = 1)
rules.head(20)
rules.sort_values('lift', ascending = False).head(10)

In [None]:
################################# Extra part ###################################
def to_list(i):
    return (sorted(list(i)))

In [None]:
# adding the antecedents and consequents  rules
ma_X = rules.antecedents.apply(to_list) + rules.consequents.apply(to_list)
ma_X

In [None]:
ma_X = ma_X.apply(sorted)
ma_X

In [None]:
rules_sets = list(ma_X)
rules_sets

In [None]:
unique_rules_sets = [list(m) for m in set(tuple(i) for i in rules_sets)]
unique_rules_sets

In [None]:
index_rules = []

for i in unique_rules_sets:
    index_rules.append(rules_sets.index(i))

In [None]:
# getting rules without any redudancy 
rules_no_redudancy = rules.iloc[index_rules, :]
rules_no_redudancy

In [None]:
# Sorting them with respect to list and getting top 10 rules 
rules_no_redudancy.sort_values('lift', ascending = False).head(10)

In [None]:
rules_no_redudancy.to_csv("myphonedataArules.csv",encoding="utf 8")

import os
os.getcwd()
