# Business Problem

Birliktelik Kuralları Nedir?

Veri içerisindeki pattern'leri (ilişkileri, yapıları) bulmak için kullanılan kural tabanlı bir makine öğrenmesi tekniğidir.

Birliktelik analizi uygulamaları veri biliminde en çok karşımıza çıkan uygulamalardandır. Tavsiye sistemleri olarak da denk gelmiş olacaktır.

Bu uygulamalar karşınıza şu şekillerde gelmiş olabilir "o ürünü alan bu ürünü de aldı" ya da "o ilana bakanlar bu ilanlara da baktı" ya da "senin için çalma listesi oluşturduk" ya da "sıradaki video için önerilen video" gibi.

Bu senaryolar e-ticaret veri bilimi veri madenciliği çalışmaları kapsamında en sık karşımıza çıkacak olan senaryolar.

Türkiye'deki ve dünyadaki büyük e-ticaret şirketleri, spotify, amazon, netflix gibi biraz daha yakından bilebileceğimiz birçok platform tavsiye sistemlerini kullanmaktadır.

Peki özetle ne yapmaktadır bu birliktelik analizleri?

Apriori Algoritması

Bu alanda en çok kullanılan yöntemdir.

Birliktelik kuralı analizi bazı metrikler incelenerek gerçekleştirilir:

- Destek (Support)

- Support(X, Y) = Freq(X,Y)/N

X: ürün Y: ürün N: toplam alışveriş

- Güven (Confidence)
- Confidence(X, Y) = Freq(X,Y) / Freq(X)

- Lift
- Lift = Support (X, Y) / ( Support(X) * Support(Y) )

In [1]:
!pip install mlxtend

Collecting mlxtend
[?25l  Downloading https://files.pythonhosted.org/packages/4c/0d/4a73b8bc49e2cfee178fe50dd8e84d5ba817d0b2454b09308397416e0e48/mlxtend-0.17.3-py2.py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 244kB/s eta 0:00:01
Installing collected packages: mlxtend
Successfully installed mlxtend-0.17.3


In [2]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [4]:
df = pd.read_csv("retail_dataset.csv", sep =',')
df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Bread,Wine,Eggs,Meat,Cheese,Pencil,Diaper
1,Bread,Cheese,Meat,Diaper,Wine,Milk,Pencil
2,Cheese,Meat,Eggs,Milk,Wine,,
3,Cheese,Meat,Eggs,Milk,Wine,,
4,Meat,Pencil,Wine,,,,


In [6]:
df.shape

(315, 7)

In [13]:
items = (df['0'].unique())

In [14]:
# one-hot encoding dönüşümü el ile. Sadece teori için
encoded_vals = []
for index, row in df.iterrows(): 
    labels = {}
    uncommons = list(set(items) - set(row))
    commons = list(set(items).intersection(row))
    for uc in uncommons:
        labels[uc] = 0
    for com in commons:
        labels[com] = 1
    encoded_vals.append(labels)

In [15]:
ohe_df = pd.DataFrame(encoded_vals)

In [19]:
ohe_df.head()

Unnamed: 0,Bagel,Milk,Meat,Cheese,Wine,Diaper,Eggs,Pencil,Bread
0,0,0,1,1,1,1,1,1,1
1,0,1,1,1,1,1,0,1,1
2,0,1,1,1,1,0,1,0,0
3,0,1,1,1,1,0,1,0,0
4,0,0,1,0,1,0,0,1,0


In [40]:
freq_items = apriori(ohe_df, min_support = 0.2, use_colnames = True, verbose = 1) 

Processing 4 combinations | Sampling itemset size 4 3


In [48]:
freq_items

Unnamed: 0,support,itemsets
0,0.425397,(Bagel)
1,0.501587,(Milk)
2,0.47619,(Meat)
3,0.501587,(Cheese)
4,0.438095,(Wine)
5,0.406349,(Diaper)
6,0.438095,(Eggs)
7,0.361905,(Pencil)
8,0.504762,(Bread)
9,0.225397,"(Bagel, Milk)"


In [49]:
association_rules(freq_items, metric="confidence", min_threshold = 0.6)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Bagel),(Bread),0.425397,0.504762,0.279365,0.656716,1.301042,0.064641,1.44265
1,(Cheese),(Milk),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148
2,(Milk),(Cheese),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148
3,(Cheese),(Meat),0.501587,0.47619,0.32381,0.64557,1.355696,0.084958,1.477891
4,(Meat),(Cheese),0.47619,0.501587,0.32381,0.68,1.355696,0.084958,1.55754
5,(Eggs),(Meat),0.438095,0.47619,0.266667,0.608696,1.278261,0.05805,1.338624
6,(Wine),(Cheese),0.438095,0.501587,0.269841,0.615942,1.227986,0.050098,1.297754
7,(Eggs),(Cheese),0.438095,0.501587,0.298413,0.681159,1.358008,0.07867,1.563203
8,"(Cheese, Meat)",(Milk),0.32381,0.501587,0.203175,0.627451,1.250931,0.040756,1.337845
9,"(Cheese, Milk)",(Meat),0.304762,0.47619,0.203175,0.666667,1.4,0.05805,1.571429


In [50]:
df_ar = association_rules(freq_items, metric = "confidence", min_threshold = 0.6)

In [51]:
df_ar[(df_ar.support < 0.3) & (df_ar.confidence > 0.7)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
10,"(Meat, Milk)",(Cheese),0.244444,0.501587,0.203175,0.831169,1.657077,0.080564,2.952137
11,"(Eggs, Cheese)",(Meat),0.298413,0.47619,0.215873,0.723404,1.519149,0.073772,1.893773
12,"(Eggs, Meat)",(Cheese),0.266667,0.501587,0.215873,0.809524,1.613924,0.082116,2.616667
