# Clustering Case Study 2: Apply Association Rules to the customer segments from Case Study 1 to create a recommendation engine 

## Overview of Association Rules and the Apriori algorithm behind it 

Association Rules uncovers which items in a dataset occur together. Within the context of our ecommerce dataset, if customers normally purchase 

KDNuggets gives a quick overview [here](https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html). For a more mathematical overview, see [pg 497 of ESL by Hastie and Tibshirani](https://web.stanford.edu/~hastie/Papers/ESLII.pdf) 

Association Rules are particularly useful for stock transaction data and provide a good starting point into recommendation engines. 

In [1]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.frequent_patterns import association_rules

  return f(*args, **kwds)
  return f(*args, **kwds)


## Implementing Association Rules on ecommerce data 

1. Read in the cleaned dataset you saved in Case Study 1
2. This dataset is not ready for Association Rules yet. Therefore, reshape the data so that each row is an invoice number and each column is a product
![alt text](stockcode.png)

In [2]:
df = pd.read_csv('data/clean_data.csv', encoding='ISO-8859-1')

In [3]:
df1 = pd.crosstab(df.InvoiceNo,df.StockCode)

In [4]:
df1[df1>1]=1

# 3. Apply the apriori algorithm on the dataset generated above to get the frequent itemsets. You may find the `mlextend` libary useful
4. Apply association rules on the frequent itemsets from 3 to generate confidence, support and lift measures for the data 
5. What happens when you change the `min_threshold` parameter? 

__Changing Min_threshold__
- Increasing the threshold, decreases the number of itemsets found

In [5]:
frequent_itemsets=apriori(df1, min_support=0.01,use_colnames=True)
# x=[]
# for i in range(0,df1.values.shape[1]):
#    x.append(df1.values[i].sum()/df1.values.shape[1])
# dfx= pd.DataFrame(x,columns=['x'])
# sns.distplot(dfx.x,kde=False)

In [6]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)

In [7]:
rules.sort_values('lift',ascending=False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1100,"(23170, 23171)",(23172),0.012342,0.012179,0.010059,0.814978,66.915513,0.009908,5.338936
1101,(23172),"(23170, 23171)",0.012179,0.012342,0.010059,0.825893,66.915513,0.009908,5.672701
1103,(23171),"(23172, 23170)",0.014572,0.010711,0.010059,0.690299,64.446549,0.009903,3.19433
1098,"(23172, 23170)",(23171),0.010711,0.014572,0.010059,0.939086,64.446549,0.009903,16.17745
539,(23171),(23172),0.014572,0.012179,0.010983,0.753731,61.886727,0.010806,4.011151
538,(23172),(23171),0.012179,0.014572,0.010983,0.901786,61.886727,0.010806,10.033453
1097,(22746),"(22748, 22745)",0.013702,0.01381,0.010113,0.738095,53.445069,0.009924,3.765451
1092,"(22748, 22745)",(22746),0.01381,0.013702,0.010113,0.732283,53.445069,0.009924,3.684115
545,(23175),(23174),0.01468,0.014463,0.011092,0.755556,52.24127,0.010879,4.031743
544,(23174),(23175),0.014463,0.01468,0.011092,0.766917,52.24127,0.010879,4.227339


### Creating tailored recommendations by applying Association Rules to the customer segments produced from Case Study 1

1. In the previous notebook, we created a GMM model that clustered customers into n segments. Apply association rules to each segment from your chosen model. 
2. Do results for each segment differ from each other? 

In [12]:
gmm_df= pd.read_csv('data/gmm_df.csv', sep=',',encoding='ISO-8859-1')

In [13]:
gmm_df.head()

Unnamed: 0,CustomerID,NoOfInvoices,NoOfUniqueItems,TotalQuantity,UnitPriceMean,UnitPriceStd,QuantityPerInvoice,UniqueItemsPerInvoice,Clusters
0,12347,0.028708,0.057111,0.013904,0.001217,0.0027,0.112347,0.066336,1
1,12348,0.014354,0.011758,0.013242,0.002752,0.016044,0.187463,0.024223,1
2,12349,0.0,0.040314,0.003565,0.003994,0.041939,0.202142,0.332724,1
3,12350,0.0,0.008959,0.001109,0.001806,0.011176,0.062889,0.076782,0
4,12352,0.033493,0.032475,0.003028,0.007753,0.064302,0.021177,0.032793,1


In [14]:
gmm_df_0= gmm_df[gmm_df['Clusters']==0][['CustomerID']]
gmm_df_1= gmm_df[gmm_df['Clusters']==1][['CustomerID']]

In [15]:
def getRules(df_custid):
    df0=pd.merge(df,df_custid,how='inner',on='CustomerID')
    df1=pd.crosstab(df0.InvoiceNo,df0.StockCode)
    df1[df1>1]=1
    frequent_itemsets=apriori(df1, min_support=0.01,use_colnames=True)
    rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
    return rules.sort_values('lift',ascending=False)

In [17]:
gmm_0_rules=getRules(gmm_df_0)
gmm_1_rules=getRules(gmm_df_1)

In [19]:
gmm_0_rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
153,(23254),(23256),0.013228,0.013971,0.010553,0.797753,57.098733,0.010368,4.875363
152,(23256),(23254),0.013971,0.013228,0.010553,0.755319,57.098733,0.010368,4.032893
167,(47590A),(47590B),0.01843,0.019917,0.014863,0.806452,40.491093,0.014496,5.063763
166,(47590B),(47590A),0.019917,0.01843,0.014863,0.746269,40.491093,0.014496,3.868539
74,(22144),(22142),0.018876,0.015012,0.010107,0.535433,35.667264,0.009824,2.120229
75,(22142),(22144),0.015012,0.018876,0.010107,0.673267,35.667264,0.009824,3.002833
117,(22578),(22579),0.023335,0.013228,0.010999,0.471338,35.631003,0.01069,1.866544
116,(22579),(22578),0.013228,0.023335,0.010999,0.831461,35.631003,0.01069,5.794877
212,"(22698, 22699)","(22423, 22697)",0.017985,0.017241,0.010999,0.61157,35.471074,0.010689,2.530081
209,"(22423, 22697)","(22698, 22699)",0.017241,0.017985,0.010999,0.637931,35.471074,0.010689,2.712233


In [20]:
gmm_1_rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
2570,(22916),"(22917, 22920)",0.011574,0.01106,0.010374,0.896296,81.04186,0.010246,9.53621
2567,"(22917, 22920)",(22916),0.01106,0.011574,0.010374,0.937984,81.04186,0.010246,15.938368
2576,(22918),"(22917, 22919)",0.01166,0.010888,0.010288,0.882353,81.037517,0.010161,8.40745
2573,"(22917, 22919)",(22918),0.010888,0.01166,0.010288,0.944882,81.037517,0.010161,17.931315
2569,(22917),"(22916, 22920)",0.012003,0.010717,0.010374,0.864286,80.648229,0.010245,7.289456
2568,"(22916, 22920)",(22917),0.010717,0.012003,0.010374,0.968,80.648229,0.010245,30.874914
2556,"(22918, 22916)",(22917),0.010631,0.012003,0.010288,0.967742,80.626728,0.01016,30.627915
2557,(22917),"(22918, 22916)",0.012003,0.010631,0.010288,0.857143,80.626728,0.01016,6.925583
2563,(22917),"(22916, 22919)",0.012003,0.01046,0.010117,0.842857,80.58267,0.009991,6.297076
2587,(22917),"(22920, 22919)",0.012003,0.01046,0.010117,0.842857,80.58267,0.009991,6.297076
