# Business Problem

Armut, Turkey’s largest online service platform, brings together service providers and those who want to receive service. It allows you to easily access services such as cleaning, renovation and transportation with a few taps on your computer or smartphone. It is desired to create a product recommendation system with Association Rule Learning, using the data set containing service users and the services and categories they receive.

# Dataset

The data set consists of the services received by customers and the categories of these services. It contains the date and time information of each service received.

* **UserId:** Customer ID.
* **ServiceId:** They are anonymized services belonging to each category. (Example: Sofa washing service under the cleaning category) A ServiceId can be found under different categories and represents different services under different categories. Example: The service with CategoryId 7 and ServiceId 4 is radiator cleaning, while the service with CategoryId 2 and ServiceId 4 is furniture assembly.
* **CategoryId:** They are anonymized categories. (Example: Cleaning, transportation, renovation category)
* **CreateDate:** Date the service was purchased.

# Importings & Load Dataset

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

df = pd.read_csv("/kaggle/input/armut-data-arl/armut_data.csv")

# Overview & Preprocessing

In [2]:
def check_df(dataframe, head=5):
    print("##################### Shape #####################")
    print(dataframe.shape)
    print("##################### Types #####################")
    print(dataframe.dtypes)
    print("##################### Head #####################")
    print(dataframe.head(head))
    print("##################### Tail #####################")
    print(dataframe.tail(head))
    print("##################### NA #####################")
    print(dataframe.isnull().sum())

check_df(df, 10)

##################### Shape #####################
(162523, 4)
##################### Types #####################
UserId         int64
ServiceId      int64
CategoryId     int64
CreateDate    object
dtype: object
##################### Head #####################
   UserId  ServiceId  CategoryId           CreateDate
0   25446          4           5  2017-08-06 16:11:00
1   22948         48           5  2017-08-06 16:12:00
2   10618          0           8  2017-08-06 16:13:00
3    7256          9           4  2017-08-06 16:14:00
4   25446         48           5  2017-08-06 16:16:00
5   14354         15           1  2017-08-06 16:27:00
6   14162         21           5  2017-08-06 16:28:00
7   21230         46           4  2017-08-06 16:34:00
8   25446          6           7  2017-08-06 16:39:00
9   10659          4           5  2017-08-06 16:44:00
##################### Tail #####################
        UserId  ServiceId  CategoryId           CreateDate
162513    6680         28           4  

In [3]:
df.describe()

Unnamed: 0,UserId,ServiceId,CategoryId
count,162523.0,162523.0,162523.0
mean,13089.803862,21.64114,4.325917
std,7325.81606,13.774405,3.129292
min,0.0,0.0,0.0
25%,6953.0,13.0,1.0
50%,13139.0,18.0,4.0
75%,19396.0,32.0,6.0
max,25744.0,49.0,11.0


ServiceID represents a different service for each CategoryID. We create a new variable to represent services by combining ServiceID and CategoryID with "_".

In [4]:
df["CustomService"] = df["ServiceId"].astype(str) + "_" + df["CategoryId"].astype(str)
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,CustomService
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


The data set consists of the date and time the services were received, there is no basket definition (invoice, etc.). In order to apply Association Rule Learning, a basket (invoice, etc.) definition must be created. The basket definition here is the services that each customer receives monthly.

For example; The customer with ID 7256 received a basket of 9_4, 46_4 services in the 8th month of 2017; The 9_4 and 38_4 services received in the 10th month of 2017 represent another basket.

Carts must be identified with a unique ID. To do this, we first create a new date variable that contains only the year and month. Combine UserID and the date variable you just created with "_" and assign it to a new variable called ID.

In [5]:
df["CreateDate"] = pd.to_datetime(df["CreateDate"])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 162523 entries, 0 to 162522
Data columns (total 5 columns):
 #   Column         Non-Null Count   Dtype         
---  ------         --------------   -----         
 0   UserId         162523 non-null  int64         
 1   ServiceId      162523 non-null  int64         
 2   CategoryId     162523 non-null  int64         
 3   CreateDate     162523 non-null  datetime64[ns]
 4   CustomService  162523 non-null  object        
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 6.2+ MB


In [6]:
df["NewDate"] = df["CreateDate"].dt.strftime("%Y-%m")

df["BasketId"] = df["UserId"].astype(str) + "_" + df["NewDate"].astype(str)

df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,CustomService,NewDate,BasketId
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08,25446_2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08,22948_2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08,10618_2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08,25446_2017-08


# ARL

Association Rule Learning (ARL) is a popular data mining technique used to discover interesting relationships and patterns within large datasets. It is particularly useful for analyzing transactional data, such as customer purchases in a retail setting. The primary goal of ARL is to uncover strong associations or co-occurrence patterns among items in a dataset, which can be represented as rules in the form “If X, then Y.”

**Association Rules**

An association rule is an implication expression of the form “X → Y,” where X and Y are disjoint itemsets. Some key metrics for association rules are:

**Support(X, Y) = Freq(X, Y) / N**

The percentage of transactions that contain both X and Y.

**Confidence(X, Y) = Freq(X, Y) / Freq(X)**

The ratio of the number of transactions containing X and Y to the number of transactions containing X.

**Lift = Support(X, Y) / ( Support(X) * Support(Y) )**

The ratio of the observed support to the expected support if X and Y were independent.

Association rules with high confidence and lift can reveal interesting relationships between items that can be leveraged for recommendation purposes.

**Apriori Algorithm**

The Apriori algorithm is a classic algorithm used for mining frequent itemsets and association rules. It operates in two steps:

Find frequent itemsets: The algorithm generates candidate itemsets of a particular size and counts their occurrences in the data to determine which ones are frequent (i.e., have support above a specified threshold).

Generate association rules: For each frequent itemset, the algorithm generates association rules that meet a specified minimum confidence threshold.

The Apriori algorithm uses a bottom-up approach where it starts with frequent single items and progressively builds up to larger frequent itemsets.

In [7]:
df_pivot = df.pivot_table(values="UserId", index="BasketId", columns="CustomService", aggfunc="count"). \
            fillna(0). \
            map(lambda x: 1 if x > 0 else 0)
df_pivot.reset_index()

CustomService,BasketId,0_8,10_9,11_11,12_7,13_11,14_7,15_1,16_8,17_5,...,46_4,47_7,48_5,49_1,4_5,5_11,6_7,7_3,8_5,9_4
0,0_2017-08,0,0,0,0,0,0,0,0,0,...,1,0,1,0,0,0,0,0,0,0
1,0_2017-09,0,0,0,0,0,0,0,0,0,...,0,0,1,0,1,0,0,0,0,0
2,0_2018-01,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,0_2018-04,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,10000_2017-08,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71215,99_2017-12,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
71216,99_2018-01,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
71217,99_2018-02,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
71218,9_2018-03,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# Creating Association Rules

In [8]:
frequent_itemsets = apriori(df_pivot.astype("bool"),
                            min_support=0.01,
                            use_colnames=True)

frequent_itemsets.sort_values("support", ascending=False)

rules = association_rules(frequent_itemsets,
                          metric="support",
                          min_threshold=0.01)

rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(2_0),(13_11),0.130286,0.056627,0.012819,0.098394,1.737574,0.005442,1.046325,0.488074
1,(13_11),(2_0),0.056627,0.130286,0.012819,0.226382,1.737574,0.005442,1.124216,0.449965
2,(2_0),(15_1),0.130286,0.120963,0.033951,0.260588,2.154278,0.018191,1.188833,0.616073
3,(15_1),(2_0),0.120963,0.130286,0.033951,0.280673,2.154278,0.018191,1.209066,0.609539
4,(15_1),(33_4),0.120963,0.02731,0.011233,0.092861,3.400299,0.007929,1.072262,0.803047
5,(33_4),(15_1),0.02731,0.120963,0.011233,0.411311,3.400299,0.007929,1.493211,0.725728
6,(15_1),(38_4),0.120963,0.066568,0.011177,0.092397,1.388001,0.003124,1.028458,0.318007
7,(38_4),(15_1),0.066568,0.120963,0.011177,0.167897,1.388001,0.003124,1.056404,0.299475
8,(49_1),(15_1),0.067762,0.120963,0.010011,0.147741,1.221375,0.001815,1.03142,0.194425
9,(15_1),(49_1),0.120963,0.067762,0.010011,0.082763,1.221375,0.001815,1.016354,0.206192


# Service Recommendations at Cart

In [9]:
def arl_recommender(rules_df, product_id, rec_count=1, method="lift"):
    sorted_rules = rules_df.sort_values(method, ascending=False)
    recommendation_list = []
    for i, product in enumerate(sorted_rules["antecedents"]):
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

    return recommendation_list[0:rec_count]

arl_recommender(rules, "2_0", rec_count=3, method="support")

['15_1', '22_0', '25_0']

The recommended services are curated based on an analysis of services that other users have added to their shopping carts alongside the services you've selected. By examining shopping patterns and co-purchased items, we can suggest additional relevant services that customers with similar interests have found complementary or useful. This tailored recommendation approach aims to surface services you may find appealing and increase the likelihood of meeting your needs or interests.