# Association Rule Learning Recommender

## İş Problemi

Sepet aşamasındaki kullanıcılara ürün önerisinde bulunmak.

## Veri Seti Hikayesi

Online Retail II isimli veri seti İngiltere merkezli online bir satış mağazasının 01/12/2009 - 09/12/2011 tarihleri arasındaki astışlarını içeriyor.

Bu şirketin ürün kataloğunda hediyelik eşyalar yer alıyor. Promosyon ürünleri olarak da düşünülebilir.

Çoğu müşterisinin toptancı olduğu bilgisi de mevcut.

## Proje Görevleri

Aşağıda 3 farklı kullanıcının sepet bilgileri verilmiştir.

Bu sepet bilgilerine en uygun ürün önerisini yapınız.

**NOT:** Ürün önerileri 1 tane ya da 1'den fazla olabilir.

**ÖNEMLİ NOT:** Karar kurallarını 2010 - 2011 Germany müşterini üzerinden türetiniz.

- Kullanıcı 1 ürün id: 21987
- Kullanıcı 2 ürün id: 23235
- Kullanıcı 3 ürün id: 22747

# Kütüphaneler

In [181]:
!pip install mlxtend



In [182]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Verinin Yüklenmesi

In [183]:
df_ = pd.read_excel('online_retail_II.xlsx',sheet_name='Year 2010-2011')

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [None]:
df = df_.copy()
df.head(4)

## GÖREV 1: Veri Ön İşleme

In [184]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit
    
def retail_data_prep(dataframe):
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe['Invoice'].str.contains('C',na=False)]
    dataframe = dataframe[dataframe['Quantity'] > 0]
    dataframe = dataframe[dataframe['Price'] > 0]
    replace_with_thresholds(dataframe, 'Quantity')
    replace_with_thresholds(dataframe, 'Price')
    return dataframe

df = retail_data_prep(df)

df_ger = df[df['Country'] == 'Germany']
df_ger.head(2)

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
1109,536527,22809,SET OF 6 T-LIGHTS SANTA,6.0,2010-12-01 13:04:00,2.95,12662.0,Germany
1110,536527,84347,ROTATING SILVER ANGELS T-LIGHT HLDR,6.0,2010-12-01 13:04:00,2.55,12662.0,Germany


In [185]:
df_ger.shape

(9040, 8)

## GÖREV 2: Association Rule Learning

Veriyi index değerlerinde Invoice (fatura no) ve sütun değerlerinde ürün ismi olacak şekilde düzenlemeliyiz. Hangi işlemlerde hangi ürünler mevcut?

In [186]:
# Description   NINE DRAWER OFFICE TIDY   SET 2 TEA TOWELS I LOVE LONDON    SPACEBOY BABY GIFT SET
# Invoice
# 536370                              0                                 1                       0
# 536852                              1                                 0                       1
# 536974                              0                                 0                       0
# 537065                              1                                 0                       0
# 537463                              0                                 0                       1

In [187]:
df_ger.groupby(['Invoice','Description']).agg({'Quantity':'sum'}).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Quantity
Invoice,Description,Unnamed: 2_level_1
536527,3 HOOK HANGER MAGIC GARDEN,12.0
536527,5 HOOK HANGER MAGIC TOADSTOOL,12.0
536527,5 HOOK HANGER RED MAGIC TOADSTOOL,12.0
536527,ASSORTED COLOUR LIZARD SUCTION HOOK,24.0
536527,CHILDREN'S CIRCUS PARADE MUG,12.0


In [188]:
df_ger.groupby(['Invoice','Description']). \
agg({'Quantity':'sum'}).unstack().fillna(0).iloc[0:5, 0:5]

Unnamed: 0_level_0,Quantity,Quantity,Quantity,Quantity,Quantity
Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,RED SPOT GIFT BAG LARGE,SET 2 TEA TOWELS I LOVE LONDON
Invoice,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
536527,0.0,0.0,0.0,0.0,0.0
536840,0.0,0.0,0.0,0.0,0.0
536861,0.0,0.0,0.0,0.0,0.0
536967,0.0,0.0,0.0,0.0,0.0
536983,0.0,0.0,0.0,0.0,0.0


In [189]:
df_ger.groupby(['Invoice','Description']). \
agg({'Quantity':'sum'}).unstack().fillna(0). \
applymap(lambda x: 1 if x > 0 else 0).iloc[0:5,0:5]

Unnamed: 0_level_0,Quantity,Quantity,Quantity,Quantity,Quantity
Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,RED SPOT GIFT BAG LARGE,SET 2 TEA TOWELS I LOVE LONDON
Invoice,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
536527,0,0,0,0,0
536840,0,0,0,0,0
536861,0,0,0,0,0
536967,0,0,0,0,0
536983,0,0,0,0,0


## GÖREV 3: ID'leri verilen ürünlerin isimleri nedir?

- 21987
- 23235
- 22747

In [193]:
def create_invoice_product(dataframe, id=False):
    if id:
        return dataframe.groupby(['Invoice','StockCode'])['Quantity'].sum().unstack().applymap(lambda x: 1 if x>0 else 0)
    else: 
        return dataframe.groupby(['Invoice','Description'])['Quantity'].sum().unstack().applymap(lambda x: 1 if x>0 else 0)

In [196]:
create_invoice_product(df_ger).iloc[0:3,0:6]

Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,RED SPOT GIFT BAG LARGE,SET 2 TEA TOWELS I LOVE LONDON,SPACEBOY BABY GIFT SET
Invoice,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
536527,0,0,0,0,0,0
536840,0,0,0,0,0,0
536861,0,0,0,0,0,0


In [197]:
create_invoice_product(df_ger, id=True).iloc[0:4,0:10]

StockCode,10002,10125,10135,11001,15034,15036,15039,16008,16011,16014
Invoice,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
536527,0,0,0,0,0,0,0,0,0,0
536840,0,0,0,0,0,0,0,0,0,0
536861,0,0,0,0,0,0,0,0,0,0
536967,0,0,0,0,0,0,0,0,0,0


In [198]:
def check_id(dataframe,stock_code):
    product_name = dataframe[dataframe['StockCode'] == stock_code]['Description'].values[0]
    print(product_name)

In [199]:
check_id(df_ger,21987)

PACK OF 6 SKULL PAPER CUPS


In [200]:
check_id(df_ger,23235)

STORAGE TIN VINTAGE LEAF


In [281]:
check_id(df_ger,22747)

POPPY'S PLAYHOUSE BATHROOM


In [202]:
ger_inv_pro = create_invoice_product(df_ger,id=True)

## GÖREV 4: Sepetteki ürünler için ürün önerisi yapınız.

- 21987
- 23235
- 22747

In [203]:
ger_inv_pro.iloc[0:3,0:6]

StockCode,10002,10125,10135,11001,15034,15036
Invoice,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
536527,0,0,0,0,0,0
536840,0,0,0,0,0,0
536861,0,0,0,0,0,0


In [204]:
frequent_itemsets = apriori(ger_inv_pro, min_support = 0.01, use_colnames = True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.013129,(10125)
1,0.019694,(15036)
2,0.010941,(16016)
3,0.015317,(16045)
4,0.010941,(16235)
...,...,...
6950,0.010941,"(POST, 22326, 22328, 22554, 22555, 22556)"
6951,0.010941,"(21668, 21669, 21670, 21671, 21672, 21673, POST)"
6952,0.010941,"(21668, 21670, 21671, 21672, 21673, POST, 22326)"
6953,0.010941,"(21668, 21670, 21671, 21672, 21673, POST, 22423)"


In [205]:
frequent_itemsets.sort_values('support',ascending=False).head()

Unnamed: 0,support,itemsets
538,0.818381,(POST)
189,0.245077,(22326)
1864,0.225383,"(POST, 22326)"
191,0.157549,(22328)
1931,0.150985,"(22328, POST)"


In [206]:
rules = association_rules(frequent_itemsets, metric='support', min_threshold = 0.01)
rules.sort_values('support',ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
2650,(POST),(22326),0.818381,0.245077,0.225383,0.275401,1.123735,0.024817,1.04185
2651,(22326),(POST),0.245077,0.818381,0.225383,0.919643,1.123735,0.024817,2.260151
2784,(22328),(POST),0.157549,0.818381,0.150985,0.958333,1.171012,0.022049,4.358862
2785,(POST),(22328),0.818381,0.157549,0.150985,0.184492,1.171012,0.022049,1.033038
2414,(22328),(22326),0.157549,0.245077,0.131291,0.833333,3.400298,0.092679,4.52954


In [207]:
rules.sort_values('lift',ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
39036,"(21987, 21989)","(21086, 21988, 21094)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
39034,"(21086, 21989, 21094)","(21987, 21988)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
39028,"(21987, 21989, 21094)","(21988, 21086)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
39026,"(21987, 21988, 21094)","(21989, 21086)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
24749,"(21989, 21086)","(21987, 21988)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf


In [209]:
# Lifte göre sıralı bir veriseti oluşturuyoruz.
sorted_rules = rules.sort_values('lift',ascending=False)

sorted_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
39036,"(21987, 21989)","(21086, 21988, 21094)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
39034,"(21086, 21989, 21094)","(21987, 21988)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
39028,"(21987, 21989, 21094)","(21988, 21086)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
39026,"(21987, 21988, 21094)","(21989, 21086)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf
24749,"(21989, 21086)","(21987, 21988)",0.010941,0.010941,0.010941,1.0,91.4,0.010821,inf


- 21987
- 23235
- 22747

In [211]:
product_id = 21987

In [213]:
recommendation_list = []

for i, product in enumerate(sorted_rules['antecedents']):
    for j in list(product):
        if j == product_id:
            recommendation_list.append(list(sorted_rules.iloc[i]['consequents'])[0])

In [220]:
recommendation_list[0:6]

[21086, 21988, 21989, 21988, 21989, 21086]

## - Bir süre sonra aynı ürünler tavsiye ediliyor.
## - Birden fazla ürün için tavsiye alınabilir.

In [473]:
product_id_list = [21987,23235,22747]

for a in product_id_list:
    print(''.center(50,'#'))
    product_name = df_ger[df_ger['StockCode'] == a]['Description'].values[0]
    print('RECOMMENDS FOR {} ({})'.format(a,product_name))
    print(''.center(50,'#'))
    recommendation_list = []
    for i, product in enumerate(sorted_rules['antecedents']):
        
        for j in list(product):
            if j == a:
                b = list(sorted_rules.iloc[i]['consequents'])
                for k in b:
                    if k not in recommendation_list:
                        recommendation_list.append(k)
    
    print('  RECOMMENDED PRODUCT IDs  '.center(50,'-'))
    print(recommendation_list)
    print('  RECOMMENDED ITEM NAMES  '.center(50,'-'))
    
    for item in recommendation_list:
        check_id(df_ger,item)

##################################################
RECOMMENDS FOR 21987 (PACK OF 6 SKULL PAPER CUPS)
##################################################
-----------  RECOMMENDED PRODUCT IDs  ------------
[21086, 21988, 21094, 21989, 'POST']
------------  RECOMMENDED ITEM NAMES  ------------
SET/6 RED SPOTTY PAPER CUPS
PACK OF 6 SKULL PAPER PLATES
SET/6 RED SPOTTY PAPER PLATES
PACK OF 20 SKULL PAPER NAPKINS
POSTAGE
##################################################
RECOMMENDS FOR 23235 (STORAGE TIN VINTAGE LEAF)
##################################################
-----------  RECOMMENDED PRODUCT IDs  ------------
[23240, 23244, 23236, 23245, 'POST', 23243, 23237]
------------  RECOMMENDED ITEM NAMES  ------------
SET OF 4 KNICK KNACK TINS DOILEY 
ROUND STORAGE TIN VINTAGE LEAF
DOILEY STORAGE TIN
SET OF 3 REGENCY CAKE TINS
POSTAGE
SET OF TEA COFFEE SUGAR TINS PANTRY
SET OF 4 KNICK KNACK TINS LEAVES 
##################################################
RECOMMENDS FOR 22747 (POPPY'S PLAYHOUSE 