# ARMUT Association Rule Based Recommender System


Armut, Turkey's largest online service platform, brings together service providers and those who want to receive service. It provides easy access to services such as cleaning, modification and transportation with a few touches on your computer or smart phone. It is desired to create a product recommendation system with Association Rule Learning by using the data set containing the service users and the services and categories these users have received.

## Dataset
The data set consists of the services customers receive and the categories of these services. It contains the date and time information of each service received.

*UserId* : Customer number<br>
*ServiceId* : Anonymized services belonging to each category. <br>
(*Example: Upholstery washing service under the cleaning category)
A ServiceId can be found under different categories and refers to different services under different categories. (Example: The service with CategoryId 7 and ServiceId 4 is honeycomb cleaning, while the service with CategoryId 2 and ServiceId 4 is furniture assembly*)<br>
*CategoryId* : Anonymized categories. (Example: Cleaning, transportation, renovation category)<br>
*CreateDate* : The date the service was purchased

In [20]:
# libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from mlxtend.frequent_patterns import apriori, association_rules

pd.set_option("display.max_columns",None)
pd.set_option("display.width",500)
sns.set(rc={"figure.figsize":(12,12)})

  if LooseVersion(mpl.__version__) >= "3.0":
  other = LooseVersion(other)


## Preparing the Data

In [31]:
data = pd.read_csv("datas/armut_data.csv")
data.head(10)

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00
5,14354,15,1,2017-08-06 16:27:00
6,14162,21,5,2017-08-06 16:28:00
7,21230,46,4,2017-08-06 16:34:00
8,25446,6,7,2017-08-06 16:39:00
9,10659,4,5,2017-08-06 16:44:00


*ServiceID* represents a different service for each CategoryID. Therefore, a new variable will be created to represent these services by combining ServiceID and CategoryID with "**_**".

In [32]:
#data["Service"] = data["ServiceId"].astype(str) + "_" + data["CategoryId"].astype(str)
data["Service"] = [str(row[1]) + "_" + str(row[2]) for row in data.values]
data.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Service
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


The data set consists of the date and time the services are received, there is no basket definition (invoice, etc.). In order to apply **Association Rule Learning**, a basket (invoice, etc.) definition must be created. Here, the definition of basket is the services that each customer receives monthly. For example; A basket of 9_4, 46_4 services received by the customer with id 7256 in the 8th month of 2017; The 9_4, 38_4 services received in the 10th month of 2017 represent another basket. Baskets must be identified with a unique ID. For this, first a new date variable will be created that contains only the year and month. UserID and newly created date variable will be combined with "**_**" and assigned to a new variable named ID.

In [33]:
data["CreateDate"].dtype

dtype('O')

In [34]:
data["CreateDate"] = pd.to_datetime(data["CreateDate"])
data["NEW_DATE"] = data["CreateDate"].dt.strftime("%Y-%m")
data["BasketId"] = [str(row[0]) + "_" + str(row[5]) for row in data.values]
data.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Service,NEW_DATE,BasketId
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08,25446_2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08,22948_2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08,10618_2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08,25446_2017-08


In [37]:
data[data["UserId"] == 7256 ]

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Service,NEW_DATE,BasketId
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
1268,7256,46,4,2017-08-09 16:15:00,46_4,2017-08,7256_2017-08
9540,7256,46,4,2017-08-29 03:53:00,46_4,2017-08,7256_2017-08
24679,7256,9,4,2017-10-01 04:59:00,9_4,2017-10,7256_2017-10
24680,7256,38,4,2017-10-01 05:01:00,38_4,2017-10,7256_2017-10
28698,7256,9,4,2017-10-11 08:06:00,9_4,2017-10,7256_2017-10
65325,7256,15,1,2017-12-31 04:17:00,15_1,2017-12,7256_2017-12
67093,7256,2,0,2018-01-03 22:06:00,2_0,2018-01,7256_2018-01
70623,7256,38,4,2018-01-11 13:07:00,38_4,2018-01,7256_2018-01
160299,7256,18,4,2018-07-25 00:51:00,18_4,2018-07,7256_2018-07


In [38]:
data.groupby(["BasketId","Service"]).agg({"Service":"count"}).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Service
BasketId,Service,Unnamed: 2_level_1
0_2017-08,46_4,1
0_2017-08,48_5,1
0_2017-09,48_5,1
0_2017-09,4_5,1
0_2018-01,30_2,1


In [39]:
invoice_product_df = data.groupby(['BasketId', 'Service'])['Service'].count().unstack().fillna(0).applymap(lambda x: 1 if x > 0 else 0)
invoice_product_df.head()

Service,0_8,10_9,11_11,12_7,13_11,14_7,15_1,16_8,17_5,18_4,19_6,1_4,20_5,21_5,22_0,23_10,24_10,25_0,26_7,27_7,28_4,29_0,2_0,30_2,31_6,32_4,33_4,34_6,35_11,36_1,37_0,38_4,39_10,3_5,40_8,41_3,42_1,43_2,44_0,45_6,46_4,47_7,48_5,49_1,4_5,5_11,6_7,7_3,8_5,9_4
BasketId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1
0_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0
0_2017-09,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
0_2018-01,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
0_2018-04,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10000_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [40]:
frequent_itemsets = apriori(invoice_product_df, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.01)
rules.head()



Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(2_0),(13_11),0.130286,0.056627,0.012819,0.098394,1.737574,0.005442,1.046325
1,(13_11),(2_0),0.056627,0.130286,0.012819,0.226382,1.737574,0.005442,1.124216
2,(15_1),(2_0),0.120963,0.130286,0.033951,0.280673,2.154278,0.018191,1.209066
3,(2_0),(15_1),0.130286,0.120963,0.033951,0.260588,2.154278,0.018191,1.188833
4,(15_1),(33_4),0.120963,0.02731,0.011233,0.092861,3.400299,0.007929,1.072262


In [41]:
def arl_recommender(rules_df, product_id,rec_size=1):
    
    sorted_results = rules_df.sort_values("lift",ascending=False)
    recommendation_list = []
    
    for i,product in enumerate(sorted_results["antecedents"]):
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_results.iloc[i]["consequents"])[0])
                
    return recommendation_list[0:rec_size]

In [42]:
arl_recommender(rules,"2_0",3)

['22_0', '25_0', '15_1']

The recommender system suggests *22_0*, *25_0*, and *15_1* for ones who had the service *2_0*. 