# Business Problem

Birliktelik Kuralları Nedir?

Veri içerisindeki pattern'leri (ilişkileri, yapıları) bulmak için kullanılan kural tabanlı bir makine öğrenmesi tekniğidir.

Birliktelik analizi uygulamaları veri biliminde en çok karşımıza çıkan uygulamalardandır. Tavsiye sistemleri olarak da denk gelmiş olacaktır.

Bu uygulamalar karşınıza şu şekillerde gelmiş olabilir "o ürünü alan bu ürünü de aldı" ya da "o ilana bakanlar bu ilanlara da baktı" ya da "senin için çalma listesi oluşturduk" ya da "sıradaki video için önerilen video" gibi.

Bu senaryolar e-ticaret veri bilimi veri madenciliği çalışmaları kapsamında en sık karşımıza çıkacak olan senaryolar.

Türkiye'deki ve dünyadaki büyük e-ticaret şirketleri, spotify, amazon, netflix gibi biraz daha yakından bilebileceğimiz birçok platform tavsiye sistemlerini kullanmaktadır.

Peki özetle ne yapmaktadır bu birliktelik analizleri?

Apriori Algoritması

Bu alanda en çok kullanılan yöntemdir.

Birliktelik kuralı analizi bazı metrikler incelenerek gerçekleştirilir:

- Destek (Support)

- Support(X, Y) = Freq(X,Y)/N

X: ürün Y: ürün N: toplam alışveriş

- Güven (Confidence)
- Confidence(X, Y) = Freq(X,Y) / Freq(X)

- Lift
- Lift = Support (X, Y) / ( Support(X) * Support(Y) )

What are Association Rules?

It is a rule-based machine learning technique used to find patterns (relationships, structures) in data.

Association analysis applications are among the most common applications in data science. It will also come across as recommendation systems.

These applications may have come to you in the following ways, such as "those who bought that product also bought this product" or "those who viewed that ad also looked at these ads" or "we created a playlist for you" or "the recommended video for the next video".

These scenarios will be the most common scenarios within the scope of e-commerce data science data mining studies.

and the world's largest e-commerce company in Turkey, spotify, amazon, it uses many platforms like netflix recommendation systems can know a little more closely.

So what do these association analyzes do?

Apriori Algorithm

It is the most used method in this field.

Association rule analysis is performed by examining some metrics:

- Support

- Support (X, Y) = Freq (X, Y) / N

X: item Y: item N: total purchase

- Confidence
- Confidence (X, Y) = Freq (X, Y) / Freq (X)

- Lift
- Lift = Support (X, Y) / (Support (X) * Support (Y))

In [67]:
#pip install mlxtend #if you don'ıt have mixtend library

In [68]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [69]:
df = pd.read_csv('C:\\Users\\merta\\data\\GroceryStoreDataSet.csv',names=['products'],header=None) 
# import data to dataframe

In [70]:
df.head() # see first rows of dataset

Unnamed: 0,products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"


In [71]:
data = list(df["products"].apply(lambda x:x.split(',')))

data #  parse all rows, every value seperated with comma will be an item of list and data will be list of lists 

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [72]:
df = pd.DataFrame(data)

In [73]:
df.head()

Unnamed: 0,0,1,2,3
0,MILK,BREAD,BISCUIT,
1,BREAD,MILK,BISCUIT,CORNFLAKES
2,BREAD,TEA,BOURNVITA,
3,JAM,MAGGI,BREAD,MILK
4,MAGGI,TEA,BISCUIT,


In [74]:
df.shape

(20, 4)

In [75]:
list(df.iterrows()) # Every row represents a tuple (index, Series) Every Series item is a shopping basket

[(0,
  0       MILK
  1      BREAD
  2    BISCUIT
  3       None
  Name: 0, dtype: object),
 (1,
  0         BREAD
  1          MILK
  2       BISCUIT
  3    CORNFLAKES
  Name: 1, dtype: object),
 (2,
  0        BREAD
  1          TEA
  2    BOURNVITA
  3         None
  Name: 2, dtype: object),
 (3,
  0      JAM
  1    MAGGI
  2    BREAD
  3     MILK
  Name: 3, dtype: object),
 (4,
  0      MAGGI
  1        TEA
  2    BISCUIT
  3       None
  Name: 4, dtype: object),
 (5,
  0        BREAD
  1          TEA
  2    BOURNVITA
  3         None
  Name: 5, dtype: object),
 (6,
  0         MAGGI
  1           TEA
  2    CORNFLAKES
  3          None
  Name: 6, dtype: object),
 (7,
  0      MAGGI
  1      BREAD
  2        TEA
  3    BISCUIT
  Name: 7, dtype: object),
 (8,
  0      JAM
  1    MAGGI
  2    BREAD
  3      TEA
  Name: 8, dtype: object),
 (9,
  0    BREAD
  1     MILK
  2     None
  3     None
  Name: 9, dtype: object),
 (10,
  0        COFFEE
  1          COCK
  2       BISCUIT
  3 

In [76]:
from mlxtend.preprocessing import TransactionEncoder #bizim fonksiyon yazarak yaptigimizi TransactionEncoder yapiyor!

In [81]:
#one hot encoding first way#te = TransactionEncoder()
te_data = te.fit(data).transform(data)
ohe_df = pd.DataFrame(te_data,columns=te.columns_)
ohe_df # one hot encoding operation. Every unique item will be a column of a dataframe

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,False,False,False,True,False,False
1,True,False,True,False,False,True,False,False,True,False,False
2,False,True,True,False,False,False,False,False,False,False,True
3,False,False,True,False,False,False,True,True,True,False,False
4,True,False,False,False,False,False,False,True,False,False,True
5,False,True,True,False,False,False,False,False,False,False,True
6,False,False,False,False,False,True,False,True,False,False,True
7,True,False,True,False,False,False,False,True,False,False,True
8,False,False,True,False,False,False,True,True,False,False,True
9,False,False,True,False,False,False,False,False,True,False,False


In [79]:
items = df[0].unique()

In [80]:
# manuei way of one hot encoding
#encoded_vals = []
#for index, row in df.iterrows(): 
#    labels = {}
#   uncommons = list(set(items) - set(row))
#    commons = list(set(items).intersection(row))#

#    for uc in uncommons:
#        labels[uc] = 0
#    for com in commons:
#        labels[com] = 1
#    encoded_vals.append(labels)

In [56]:
#ohe_df = pd.DataFrame(encoded_vals) # create dataframe with list of dictionaries

In [82]:
ohe_df.head() # show first 5 rows

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,False,False,False,True,False,False
1,True,False,True,False,False,True,False,False,True,False,False
2,False,True,True,False,False,False,False,False,False,False,True
3,False,False,True,False,False,False,True,True,True,False,False
4,True,False,False,False,False,False,False,True,False,False,True


In [83]:
from mlxtend.frequent_patterns import apriori

In [84]:
freq_items = apriori(ohe_df, min_support=0.02, use_colnames=True, verbose = 1) # support values for items
freq_items

Processing 30 combinations | Sampling itemset size 54


Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.20,(BOURNVITA)
2,0.65,(BREAD)
3,0.15,(COCK)
4,0.40,(COFFEE)
...,...,...
78,0.05,"(TEA, BREAD, BISCUIT, MAGGI)"
79,0.10,"(COFFEE, COCK, BISCUIT, CORNFLAKES)"
80,0.05,"(MILK, JAM, MAGGI, BREAD)"
81,0.05,"(TEA, BREAD, JAM, MAGGI)"


In [85]:
df_ar = association_rules(freq_items, metric="confidence", min_threshold = 0.6) #assosciation rules for min. confidence 0.6

In [86]:
df_ar.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(COCK),(BISCUIT),0.15,0.35,0.1,0.666667,1.904762,0.0475,1.95
1,(BOURNVITA),(BREAD),0.2,0.65,0.15,0.75,1.153846,0.02,1.4
2,(JAM),(BREAD),0.1,0.65,0.1,1.0,1.538462,0.035,inf
3,(MAGGI),(BREAD),0.25,0.65,0.15,0.6,0.923077,-0.0125,0.875
4,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75


In [87]:
pd.DataFrame(df_ar["support"].sort_values(ascending = False))

Unnamed: 0,support
9,0.20
4,0.20
5,0.20
8,0.20
11,0.20
...,...
30,0.05
29,0.05
58,0.05
59,0.05


In [88]:
df_ar[(df_ar.support > 0.1) & (df_ar.confidence > 0.7)] #association rules for min support and confidence thresholds

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(BOURNVITA),(BREAD),0.2,0.65,0.15,0.75,1.153846,0.02,1.4
4,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75
6,(COCK),(COFFEE),0.15,0.4,0.15,1.0,2.5,0.09,inf
11,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25


### Yorumlar

1.
- MILK ve BREAD tum alisverislerin %20'sinde beraber gorulmektedir. 
- MILK alan musterilerin %80'i BREAD'de almaktadir.
- COCK ve COFFEE tum alisverislerin %15'inde beraber gorulmektedir. 
- COCK alan musterilerin tamamı COFFEE'de almaktadir.
- MAGGI ve TEA tum alisverislerin %20'sinde beraber gorulmektedir. 
- MAGGI alan musterilerin %80'i TEA'de almaktaden dir.
2.
- COCK ve COFFEE birlikteligi icin cikan sonuc COCK alan musterilerin tamaminin COFFEE de aliyor olmalaridir. 
lift orani gercekten cok yuksek, yani COCK alinan alisverisler COFFEE satislarini 2.5 kat arttiriyor. 
Kesinlikle COFFEE urunu COCK urununun hemen yaninda konumlandirilmali, bunun disinda COCK yaninda COFFEE'den 5-10 tane alana su kadar indirim seklinde promosyıonlar duzenlenebilir, zira COCK alan  musteriler COFFEE de alacaklar, bu ilişki kullanilarak COFFEE surumunden kazanilabilir. COFFEE urun cesitliligi arttirilarak COCK ile satisindan yuksek kar elde edilebilir.



1. 
- MILK and BREAD are seen together in 20% of all shopping.
- 80% of customers who buy MILK are buying at BREAD.
- COCK and COFFEE are seen together in 15% of all shopping.
- All of the customers who buy COCK buy at COFFEE.
- MAGGI and TEA are seen together in 20% of all shopping.
- 80% of customers who buy MAGGI are from TEA.
2.
- The result for the combination of COCK and COFFEE is that all customers who receive COCK are also receiving COFFEE.
The lift rate is really high, so purchases from COCK increase COFFEE sales by 2.5 times.
Certainly, the COFFEE product should be positioned right next to the COCK product, and besides this, promotions can be arranged in the form of a discount of water up to 5-10 units of COFFEE next to COCK, because customers who buy COCK will also buy COFFEE, this relationship can be earned from the COFFEE version. By increasing the variety of COFFEE product, a high profit can be obtained from the sale with COCK.