# Clustering Case Study 2: Apply Association Rules to the customer segments from Case Study 1 to create a recommendation engine 

## Overview of Association Rules and the Apriori algorithm behind it 

Association Rules uncovers which items in a dataset occur together. Within the context of our ecommerce dataset, if customers normally purchase 

KDNuggets gives a quick overview [here](https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html). For a more mathematical overview, see [pg 497 of ESL by Hastie and Tibshirani](https://web.stanford.edu/~hastie/Papers/ESLII.pdf) 

Association Rules are particularly useful for stock transaction data and provide a good starting point into recommendation engines. 

## Implementing Association Rules on ecommerce data 

1. Read in the cleaned dataset you saved in Case Study 1
2. This dataset is not ready for Association Rules yet. Therefore, reshape the data so that each row is an invoice number and each column is a product
![alt text](stockcode.png)

In [None]:
# Your code here 
stock_transaction = ecommerce.groupby(['InvoiceNo', 'StockCode']).size().unstack().fillna(0)
for i in range(stock_transaction.shape[1]):
    stock_transaction.iloc[:,i][stock_transaction.iloc[:,i]>1] = 1
stock_transaction.head()

In [None]:
temp = ecommerce.groupby(['StockCode', 'Description']).size()
stock_map = pd.DataFrame({'code':np.array(temp.index.get_level_values(0)), 'desc':np.array(temp.index.get_level_values(1))})

In [None]:
print(len(stock_map['code']), len(np.unique(stock_map['code'])))


In [None]:
stock_map1 = stock_map.groupby('code')['desc'].apply(lambda x:x.iloc[0])


In [None]:
stock_transaction = stock_transaction.rename(columns=stock_map1[stock_transaction.columns])


In [None]:
stock_transaction.head()


In [None]:
stock_freq_item = apriori(stock_transaction, min_support=0.02, use_colnames=True)
stock_freq_item

In [None]:
stock_assoc_rules = association_rules(stock_freq_item, metric="confidence", min_threshold=0.3)
stock_assoc_rules

3. Apply the apriori algorithm on the dataset generated above to get the frequent itemsets. You may find the `mlextend` libary useful
4. Apply association rules on the frequent itemsets from 3 to generate confidence, support and lift measures for the data 
5. What happens when you change the `min_threshold` parameter? 

### Creating tailored recommendations by applying Association Rules to the customer segments produced from Case Study 1

1. In the previous notebook, we created a GMM model that clustered customers into n segments. Apply association rules to each segment from your chosen model. 
2. Do results for each segment differ from each other? 

In [None]:
# Your code here 
customer_segment = # read data set output
customer_segment.head()


In [None]:
for p in range(max(customer_segment['Segment'])):
    selected_customer = customer_segment['CustomerID'][customer_segment['Segment']==p]
    selected_customer = list(selected_customer.astype('str'))
    ecommerce_filtered = ecommerce[ecommerce['CustomerID'].isin(selected_customer)]
    stock_transaction_filtered = ecommerce_filtered.groupby(['InvoiceNo', 'StockCode']).size().unstack().fillna(0)
    for i in range(stock_transaction_filtered.shape[1]):
        stock_transaction_filtered.iloc[:,i][stock_transaction_filtered.iloc[:,i]>1] = 1
    stock_transaction_filtered = stock_transaction_filtered.rename(columns=stock_map1[stock_transaction_filtered.columns])
    stock_freq_item_filtered = apriori(stock_transaction_filtered, min_support=0.02, use_colnames=True)
    stock_assoc_rules_filtered = association_rules(stock_freq_item_filtered, metric="confidence", min_threshold=0.3)
    print("Segment", p, ":\n", stock_assoc_rules_filtered.sort_values(['confidence', 'support'], ascending=False).iloc[1:10,:])