## **ASSOCIATION RULE LEARNING (BİRLİKTELİK KURALI ÖĞRENİMİ)**

 1. Veri Ön İşleme
 2. ARL Veri Yapısını Hazırlama (Invoice-Product Matrix)
 3. Birliktelik Kurallarının Çıkarılması
 4. Çalışmanın Scriptini Hazırlama
 5. Sepet Aşamasındaki Kullanıcılara Ürün Önerisinde Bulunmak
 
Source : https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

#### **1. Veri Ön İşleme**

In [None]:
 !pip install mlxtend

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
# çıktının tek bir satırda olmasını sağlar.
pd.set_option('display.expand_frame_repr', False)
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
df_ = pd.read_excel("/content/drive/MyDrive/DSMLBC10/week_7 (10.11.22-16.11.22)/datasets/online_retail_II.xlsx",
                    sheet_name="Year 2010-2011")

In [None]:
df = df_.copy()
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [None]:
print(df.describe().T)
print(df.isnull().sum())
print(df.shape)

                count          mean          std       min       25%       50%       75%      max
Quantity     541910.0      9.552234   218.080957 -80995.00      1.00      3.00     10.00  80995.0
Price        541910.0      4.611138    96.759765 -11062.06      1.25      2.08      4.13  38970.0
Customer ID  406830.0  15287.684160  1713.603074  12346.00  13953.00  15152.00  16791.00  18287.0
Invoice             0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
Price               0
Customer ID    135080
Country             0
dtype: int64
(541910, 8)


In [None]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [None]:
def retail_data_prep(dataframe):
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
    dataframe = dataframe[dataframe["Quantity"] > 0]
    dataframe = dataframe[dataframe["Price"] > 0]
    replace_with_thresholds(dataframe, "Quantity")
    replace_with_thresholds(dataframe, "Price")
    return dataframe

In [None]:
retail_data_prep(df)

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6.0,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6.0,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8.0,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6.0,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6.0,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6.0,2011-12-09 12:50:00,2.10,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4.0,2011-12-09 12:50:00,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4.0,2011-12-09 12:50:00,4.15,12680.0,France
541908,581587,22138,BAKING SET 9 PIECE RETROSPOT,3.0,2011-12-09 12:50:00,4.95,12680.0,France


In [None]:
print(df.isnull().sum())
print(df.describe().T)

Invoice        0
StockCode      0
Description    0
Quantity       0
InvoiceDate    0
Price          0
Customer ID    0
Country        0
dtype: int64
                count          mean          std      min       25%       50%       75%      max
Quantity     406830.0     12.061276   248.693065 -80995.0      2.00      5.00     12.00  80995.0
Price        406830.0      3.460507    69.315080      0.0      1.25      1.95      3.75  38970.0
Customer ID  406830.0  15287.684160  1713.603074  12346.0  13953.00  15152.00  16791.00  18287.0


#### **2. ARL Veri Yapısını Hazırlama (Invoice-Product Matrix)**

In [None]:
df_fr = df[df['Country'] == "France"]

df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Quantity
Invoice,Description,Unnamed: 2_level_1
536370,SET 2 TEA TOWELS I LOVE LONDON,24
536370,ALARM CLOCK BAKELIKE GREEN,12
536370,ALARM CLOCK BAKELIKE PINK,24
536370,ALARM CLOCK BAKELIKE RED,24
536370,CHARLOTTE BAG DOLLY GIRL DESIGN,20
536370,CIRCUS PARADE LUNCH BOX,24
536370,INFLATABLE POLITICAL GLOBE,48
536370,LUNCH BOX I LOVE LONDON,24
536370,MINI JIGSAW CIRCUS PARADE,24
536370,MINI JIGSAW SPACEBOY,24


In [None]:
df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).unstack().iloc[0:5, 0:5]

Unnamed: 0_level_0,Quantity,Quantity,Quantity,Quantity,Quantity
Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,NINE DRAWER OFFICE TIDY,SET 2 TEA TOWELS I LOVE LONDON
Invoice,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
536370,,,,,24.0
536852,,,,,
536974,,,,,
537065,,,,,
537463,,,,,


In [None]:

df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).unstack().fillna(0).iloc[0:5, 0:5]

Unnamed: 0_level_0,Quantity,Quantity,Quantity,Quantity,Quantity
Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,NINE DRAWER OFFICE TIDY,SET 2 TEA TOWELS I LOVE LONDON
Invoice,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
536370,0.0,0.0,0.0,0.0,24.0
536852,0.0,0.0,0.0,0.0,0.0
536974,0.0,0.0,0.0,0.0,0.0
537065,0.0,0.0,0.0,0.0,0.0
537463,0.0,0.0,0.0,0.0,0.0


In [None]:
df_fr.groupby(['Invoice', 'StockCode']). \
    agg({"Quantity": "sum"}). \
    unstack(). \
    fillna(0). \
    applymap(lambda x: 1 if x > 0 else 0).iloc[0:5, 0:5]  #applymap bütün gözlemleri gezer.

Unnamed: 0_level_0,Quantity,Quantity,Quantity,Quantity,Quantity
StockCode,10002,10120,10125,10135,11001
Invoice,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
536370,1,0,0,0,0
536852,0,0,0,0,0
536974,0,0,0,0,0
537065,0,0,0,0,0
537463,0,0,0,0,0


In [None]:
def create_invoice_product_df(dataframe, id=False):
    if id:
        return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)
    else:
        return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)

fr_inv_pro_df = create_invoice_product_df(df_fr)

fr_inv_pro_df = create_invoice_product_df(df_fr, id=True)

In [None]:
def check_id(dataframe, stock_code):
    product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"]].values[0].tolist()
    print(product_name)

In [None]:
check_id(df_fr, 10002)

['INFLATABLE POLITICAL GLOBE ']


#### **3. Birliktelik Kurallarının Çıkarılması**


In [None]:
frequent_itemsets = apriori(fr_inv_pro_df,
                            min_support=0.01,
                            use_colnames=True)

frequent_itemsets.sort_values("support", ascending=False)

Unnamed: 0,support,itemsets
440,0.657205,(POST)
324,0.159389,(23084)
93,0.152838,(21731)
203,0.146288,(22554)
205,0.144105,(22556)
...,...,...
6448,0.010917,"(23290, POST, 22382)"
6447,0.010917,"(23290, 23291, 22382)"
6446,0.010917,"(23289, 23291, 22382)"
6445,0.010917,"(23289, 23290, 22382)"


In [None]:
rules = association_rules(frequent_itemsets,
                          metric="support",
                          min_threshold=0.01)

In [None]:
rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)] #genel bu şekilde koşullu istenir.

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
646,(22382),(20726),0.102620,0.100437,0.054585,0.531915,5.296022,0.044278,1.921794
647,(20726),(22382),0.100437,0.102620,0.054585,0.543478,5.296022,0.044278,1.965689
1048,(21080),(21086),0.113537,0.117904,0.087336,0.769231,6.524217,0.073950,3.822416
1049,(21086),(21080),0.117904,0.113537,0.087336,0.740741,6.524217,0.073950,3.419214
1050,(21080),(21094),0.113537,0.109170,0.087336,0.769231,7.046154,0.074941,3.860262
...,...,...,...,...,...,...,...,...,...
76250,"(POST, 22727)","(22728, 22726)",0.076419,0.063319,0.050218,0.657143,10.378325,0.045380,2.731987
76251,"(22726, 22727)","(22728, POST)",0.067686,0.078603,0.050218,0.741935,9.439068,0.044898,3.570415
76252,(22728),"(POST, 22726, 22727)",0.087336,0.063319,0.050218,0.575000,9.081034,0.044688,2.203956
76254,(22726),"(22728, POST, 22727)",0.082969,0.058952,0.050218,0.605263,10.267057,0.045327,2.383988


In [None]:
check_id(df_fr, 21086)

['SET/6 RED SPOTTY PAPER CUPS']


In [None]:
rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \
sort_values("confidence", ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
12157,"(21080, 21094)",(21086),0.087336,0.117904,0.085153,0.975000,8.269444,0.074856,35.283843
12156,"(21080, 21086)",(21094),0.087336,0.109170,0.085153,0.975000,8.931000,0.075618,35.633188
44268,"(21080, POST, 21094)",(21086),0.072052,0.117904,0.069869,0.969697,8.224467,0.061374,29.109170
44266,"(21080, POST, 21086)",(21094),0.072052,0.109170,0.069869,0.969697,8.882424,0.062003,29.397380
1177,(21094),(21086),0.109170,0.117904,0.104803,0.960000,8.142222,0.091932,22.052402
...,...,...,...,...,...,...,...,...,...
647,(20726),(22382),0.100437,0.102620,0.054585,0.543478,5.296022,0.044278,1.965689
646,(22382),(20726),0.102620,0.100437,0.054585,0.531915,5.296022,0.044278,1.921794
26243,(22551),"(22554, 22556)",0.115721,0.087336,0.058952,0.509434,5.833019,0.048845,1.860430
26481,(22554),"(POST, 22551)",0.146288,0.098253,0.072052,0.492537,5.012935,0.057679,1.776971


#### **4. Çalışmanın Scriptini Hazırlama**

In [None]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

def retail_data_prep(dataframe):
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
    dataframe = dataframe[dataframe["Quantity"] > 0]
    dataframe = dataframe[dataframe["Price"] > 0]
    replace_with_thresholds(dataframe, "Quantity")
    replace_with_thresholds(dataframe, "Price")
    return dataframe


def create_invoice_product_df(dataframe, id=False):
    if id:
        return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)
    else:
        return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)


def check_id(dataframe, stock_code):
    product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"]].values[0].tolist()
    print(product_name)


def create_rules(dataframe, id=True, country="France"):
    dataframe = dataframe[dataframe['Country'] == country]
    dataframe = create_invoice_product_df(dataframe, id)
    frequent_itemsets = apriori(dataframe, min_support=0.01, use_colnames=True)
    rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.01)
    return rules

df = df_.copy()

df = retail_data_prep(df)
rules = create_rules(df)

rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \
sort_values("confidence", ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
23707,"(21080, 21094)",(21086),0.102828,0.138817,0.100257,0.975000,7.023611,0.085983,34.447301
23706,"(21080, 21086)",(21094),0.102828,0.128535,0.100257,0.975000,7.585500,0.087040,34.858612
108820,"(21080, POST, 21086)",(21094),0.084833,0.128535,0.082262,0.969697,7.544242,0.071358,28.758355
108822,"(21080, POST, 21094)",(21086),0.084833,0.138817,0.082262,0.969697,6.985410,0.070486,28.419023
1777,(21094),(21086),0.128535,0.138817,0.123393,0.960000,6.915556,0.105550,21.529563
...,...,...,...,...,...,...,...,...,...
7212,(22629),(22630),0.125964,0.100257,0.071979,0.571429,5.699634,0.059351,2.099400
62249,(22630),"(POST, 22629)",0.100257,0.100257,0.053985,0.538462,5.370809,0.043933,1.949443
62244,"(POST, 22629)",(22630),0.100257,0.100257,0.053985,0.538462,5.370809,0.043933,1.949443
62248,(22629),"(POST, 22630)",0.125964,0.074550,0.053985,0.428571,5.748768,0.044594,1.619537


#### **5. Sepet Aşamasındaki Kullanıcılara Ürün Önerisinde Bulunmak**

 Örnek:  Kullanıcı örnek ürün id: 22492

In [None]:
product_id = 22492
check_id(df, product_id)

sorted_rules = rules.sort_values("lift", ascending=False)

recommendation_list = []

for i, product in enumerate(sorted_rules["antecedents"]):
    for j in list(product):
        if j == product_id:
            recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

recommendation_list[0:3]

['MINI PAINT SET VINTAGE ']


[22556, 22551, 22326]

In [None]:
check_id(df, 22326)

def arl_recommender(rules_df, product_id, rec_count=1):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    recommendation_list = []
    for i, product in enumerate(sorted_rules["antecedents"]):
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

    return recommendation_list[0:rec_count]


arl_recommender(rules, 22492, 1)

['ROUND SNACK BOXES SET OF4 WOODLAND ']


[22556]

In [None]:
arl_recommender(rules, 22492, 2)

[22556, 22551]

In [None]:
arl_recommender(rules, 22492, 3)

[22556, 22551, 22326]