# Association Rule Learning
Association rule learning is a rule-based method for discovering relations between variables in large datasets. In the case of retail POS (point-of-sale) transactions analytics, our variables are going to be the retail products. It essentially discovers strong associations (rules) with some “strongness” level, which is represented by several parameters.
Different statistical algorithms have been developed to implement association rule mining, and Apriori is one such algorithm. 

****Theory of Apriori Algorithm****

There are three major components of Apriori algorithm:

* Support
* Confidence
* Lift

In [None]:
!pip install openpyxl

In [None]:
!pip install xlrd

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
df_ = pd.read_excel("../input/online-retail-2/online_retail_II.xlsx", sheet_name="Year 2010-2011")
df=df_.copy()

Threshold value is determined for outlier values.

In [None]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

Function that replaces outliers with threshold values according to threshold values

In [None]:
def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

****Data preprocessing****

Function that cleans the retail data set

In [None]:
def retail_data_prep(dataframe):
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
    dataframe = dataframe[dataframe["Quantity"] > 0]
    dataframe = dataframe[dataframe["Price"] > 0]
    replace_with_thresholds(dataframe, "Quantity")
    replace_with_thresholds(dataframe, "Price")
    return dataframe

In [None]:
df = retail_data_prep(df)

The data set is reduced by Germany.

In [None]:
df_ger = df[df['Country'] == "Germany"]

****The status we want the data to come from: (binary encode operation)****

In [None]:
# Description   NINE DRAWER OFFICE TIDY   SET 2 TEA TOWELS I LOVE LONDON    SPACEBOY BABY GIFT SET
# Invoice
# 536370                              0                                 1                       0
# 536852                              1                                 0                       1
# 536974                              0                                 0                       0
# 537065                              1                                 0                       0
# 537463                              0                                 0                       1


Fillna(0) is used to type zero in places that say nan. To indicate if it's in the shopping cart or not

In [None]:
df_ger.groupby(["Invoice", "Description"]).agg({"Quantity":"sum"}).unstack().fillna(0).iloc[0:5,0:5]

In [None]:
df_ger.groupby(["Invoice", "Description"]).agg({"Quantity":"sum"}).\
unstack().fillna(0).\
applymap(lambda x: 1 if x>0 else 0).iloc[0:5,0:5]

In [None]:
def create_invoice_product_df(dataframe, id=False):
    if id:
        return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)
    else:
        return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)

Columns are created based on matrix id values, i.e. StockCodes

In [None]:
ger_inv_pro_df = create_invoice_product_df(df_ger, id=True)

****Creation of assosiation rules****

The apriori function is used for the possibilities of all possible product combinations.

min_support= Specified treshold value

Only supports calculated with apriori

In [None]:
frequent_itemsets = apriori(ger_inv_pro_df, min_support=0.01, use_colnames=True)
frequent_itemsets.sort_values("support", ascending=False).head(10)

All other metrics are calculated using assosiation_rules

In [None]:
rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.01)

In [None]:
rules.sort_values("support", ascending=False).head(10)

In [None]:
rules.sort_values("lift", ascending=False).head(10)

* antecedent support: X-possibilities alone
* consequent support: Y-possibilities alone
* support: The possibility of the two being seen together
* confidence: Probability of purchasing Y when X is received
* lift: When X is received, the probability of purchasing Y increases by the given value.
* leverage: Leverage effect. It's similar to lift, but the lift is more commonly used. 
* conviction: Expected frequency of X without Y

**Names of products given IDs**

User 1 product ID: 21987

User 2 product ID: 23235

User 3 product ID: 22747


In [None]:
def check_id(dataframe, stock_code):
    product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"]].values[0].tolist()
    print(product_name)

In [None]:
check_id(df_ger, 21987)

In [None]:
check_id(df_ger, 23235)

In [None]:
check_id(df_ger, 22747)

**To recommend products to users who are in the process of throwing products in the basket**

Bringing product Id that can be recommended according to product id from the rule table

In [None]:
def arl_recommender(rules_df, product_id, rec_count=1):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    recommendation_list = []
    for i, product in enumerate(sorted_rules["antecedents"]):
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

    return recommendation_list[0:rec_count]

In [None]:
arl_recommender(rules, 21987, 2)

In [None]:
arl_recommender(rules, 23235, 2)

In [None]:
arl_recommender(rules, 22747, 2)

**Names of recommended products**

For products with id 21987

In [None]:
check_id(df_ger, 21988)
check_id(df_ger, 21086)

For products with id 23235

In [None]:
check_id(df_ger, 23243)

For products with id 22747 

In [None]:
check_id(df_ger, 22746)
check_id(df_ger, 22745)