# BG-NBD ve Gamma-Gamma ile CLTV Prediction

### İş Problemi (Business Problem)
FLO, sataış ve pazarlama faaliyetleri için roadmap belirlemek istemektedir. Şirketin orta uzun vadeli plan yapabilmesi için var olan müşterilerin gelecekte şirkete sağlayacakları potansiyel değerin tahmin edilmesi gerekmektedir.


### Veri Seti Hikayesi 
Veri seti Flo’dan son alışverişlerini 2020 -2021 yıllarında OmniChannel(hem online hem offline alışveriş yapan) olarak yapan müşterilerin geçmiş alışveriş davranışlarından elde edilen bilgilerden oluşmaktadır. 
- master_id: Eşsiz müşteri numarası
- order_channel: Alışveriş yapılan platforma ait hangi kanalın kullanıldığı(Android, ios, Desktop, Mobile) \
- last_order_channel: En son alışverişin yapıldığı kanal
- first_order_date: Müşterinin yaptığı ilk alışveriş tarihi
- last_order_date: Müşterinin yaptığı son alışveriş tarihi
- last_order_date_online: Müşterinin online platformda yaptığı son alışveriş tarihi
- last_order_date_offline: Müşterinin offline platformda yaptığı son alışveriş tarihi
- order_num_total_ever_online: Müşterinin online platformda yaptığı toplam alışveriş sayısı 
- order_num_total_ever_offline: Müşterinin offline'da yaptığı toplam alışveriş sayısı 
- customer_value_total_ever_offline: Müşterinin offline alışverişlerinde ödediği toplam ücret
- customer_value_total_ever_online: Müşterinin online alışverişlerinde ödediği toplam ücret
- interested_in_categories_12: Müşterinin son 12 ayda alışveriş yaptığı kategorilerin listesi

### Veriyi Hazırlama


In [25]:
import numpy as np
import pandas as pd
import datetime as dt
from sklearn.preprocessing import MinMaxScaler

In [26]:
df_ = pd.read_csv("D:\\FLOMusteriSegmentasyonu\\flo_data_20k.csv")
df = df_.copy()

In [27]:
df.head(5)

Unnamed: 0,master_id,order_channel,last_order_channel,first_order_date,last_order_date,last_order_date_online,last_order_date_offline,order_num_total_ever_online,order_num_total_ever_offline,customer_value_total_ever_offline,customer_value_total_ever_online,interested_in_categories_12
0,cc294636-19f0-11eb-8d74-000d3a38a36f,Android App,Offline,2020-10-30,2021-02-26,2021-02-21,2021-02-26,4.0,1.0,139.99,799.38,[KADIN]
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,Android App,Mobile,2017-02-08,2021-02-16,2021-02-16,2020-01-10,19.0,2.0,159.97,1853.58,"[ERKEK, COCUK, KADIN, AKTIFSPOR]"
2,69b69676-1a40-11ea-941b-000d3a38a36f,Android App,Android App,2019-11-27,2020-11-27,2020-11-27,2019-12-01,3.0,2.0,189.97,395.35,"[ERKEK, KADIN]"
3,1854e56c-491f-11eb-806e-000d3a38a36f,Android App,Android App,2021-01-06,2021-01-17,2021-01-17,2021-01-06,1.0,1.0,39.99,81.98,"[AKTIFCOCUK, COCUK]"
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,Desktop,Desktop,2019-08-03,2021-03-07,2021-03-07,2019-08-03,1.0,1.0,49.99,159.99,[AKTIFSPOR]


In [28]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
order_num_total_ever_online,19945.0,3.110855,4.225647,1.0,1.0,2.0,4.0,200.0
order_num_total_ever_offline,19945.0,1.913913,2.06288,1.0,1.0,1.0,2.0,109.0
customer_value_total_ever_offline,19945.0,253.922597,301.532853,10.0,99.99,179.98,319.97,18119.14
customer_value_total_ever_online,19945.0,497.32169,832.601886,12.99,149.98,286.46,578.44,45220.13


In [31]:
#aykırı değer baskılama
def outlier_thresholds(dataframe,variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3= dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range.round()
    low_limit = quartile1 - 1.5 * interquantile_range.round()
    return low_limit, up_limit
#cltv hesaplanırken frequency değerleri integer olması gerekmtiği için alt ve üst limitlerini round() ile yuvarladık
    
def replace_with_tresholds(dataframe,variable):
    low_limit, up_limit = outlier_thresholds(dataframe,variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [32]:
replace_with_tresholds(df,"order_num_total_ever_online")
replace_with_tresholds(df,"order_num_total_ever_offline")
replace_with_tresholds(df,"customer_value_total_ever_offline")
replace_with_tresholds(df,"customer_value_total_ever_online")

In [33]:
df["total_order"] = df["order_num_total_ever_online"] + df["order_num_total_ever_offline"]
df["total_spend"] = df["customer_value_total_ever_online"] + df["customer_value_total_ever_offline"]
# herbir müşterinin toplam alışveriş sayısı ve harcaması için yeni değişkenler

In [34]:
df[["total_order","total_spend"]]

Unnamed: 0,total_order,total_spend
0,5.0,939.37
1,21.0,2013.55
2,5.0,585.32
3,2.0,121.97
4,2.0,209.98
...,...,...
19940,3.0,401.96
19941,2.0,390.47
19942,3.0,632.94
19943,6.0,1009.77


In [35]:
df.dtypes

master_id                             object
order_channel                         object
last_order_channel                    object
first_order_date                      object
last_order_date                       object
last_order_date_online                object
last_order_date_offline               object
order_num_total_ever_online          float64
order_num_total_ever_offline         float64
customer_value_total_ever_offline    float64
customer_value_total_ever_online     float64
interested_in_categories_12           object
total_order                          float64
total_spend                          float64
dtype: object

In [36]:
#tarih değişkenlerinin veri tipini güncelleme
df_date = df.loc[:, df.columns.str.contains("date")]
df[df_date.columns] = df_date.apply(pd.to_datetime)

In [37]:
df.dtypes

master_id                                    object
order_channel                                object
last_order_channel                           object
first_order_date                     datetime64[ns]
last_order_date                      datetime64[ns]
last_order_date_online               datetime64[ns]
last_order_date_offline              datetime64[ns]
order_num_total_ever_online                 float64
order_num_total_ever_offline                float64
customer_value_total_ever_offline           float64
customer_value_total_ever_online            float64
interested_in_categories_12                  object
total_order                                 float64
total_spend                                 float64
dtype: object

### CLTV Veri Yapısının Oluşturulması


In [38]:
date_temp = df["last_order_date"].max()

In [39]:
date_temp

Timestamp('2021-05-30 00:00:00')

In [41]:
# veri setindeki en son alışverişin yapıldığı tarihten 2 gün sonrasını analiz tarihi olarak aldık
today_date = dt.datetime(2021,6,1)

In [42]:
today_date

datetime.datetime(2021, 6, 1, 0, 0)

In [43]:
cltv_df = pd.DataFrame()

- recency : son satın alma üzerinden geçen zaman. haftalık.
- T : müşteri yaşı. haftalık. (analiz tarihinden ne kadar süre önce ilk satın alım yapılmış)
- frequency : tekrar eden toplam satın alma sayısı (frequency>1)
- monetary : satın alma başına ortalam kazanç

In [44]:
# cltv dataframe'i oluşturulması
cltv_df["customer_id"] = df["master_id"]
cltv_df["recency"] = (df["last_order_date"] - df["first_order_date"]).astype("timedelta64[D]")
cltv_df["T"] = (today_date - df["first_order_date"]).astype("timedelta64[D]")
cltv_df["frequency"] = df["total_order"]
cltv_df["monetary"] = df["total_spend"]

In [45]:
cltv_df

Unnamed: 0,customer_id,recency,T,frequency,monetary
0,cc294636-19f0-11eb-8d74-000d3a38a36f,119.0,214.0,5.0,939.37
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,1469.0,1574.0,21.0,2013.55
2,69b69676-1a40-11ea-941b-000d3a38a36f,366.0,552.0,5.0,585.32
3,1854e56c-491f-11eb-806e-000d3a38a36f,11.0,146.0,2.0,121.97
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,582.0,668.0,2.0,209.98
...,...,...,...,...,...
19940,727e2b6e-ddd4-11e9-a848-000d3a38a36f,288.0,619.0,3.0,401.96
19941,25cd53d4-61bf-11ea-8dd8-000d3a38a36f,296.0,457.0,2.0,390.47
19942,8aea4c2a-d6fc-11e9-93bc-000d3a38a36f,621.0,629.0,3.0,632.94
19943,e50bb46c-ff30-11e9-a5e8-000d3a38a36f,689.0,797.0,6.0,1009.77


In [46]:
cltv_df["monetary"] = cltv_df["monetary"] / cltv_df["frequency"]
#monetary değişkenini satın alma başına ortalama harcama olacak şekilde ayarladık

In [47]:
cltv_df = cltv_df[(cltv_df["frequency"] > 1)]
cltv_df["recency"] = cltv_df["recency"] / 7
cltv_df["T"] = cltv_df["T"] / 7
#değişkenleri haftalık cinsten ifade ettik 

In [48]:
cltv_df

Unnamed: 0,customer_id,recency,T,frequency,monetary
0,cc294636-19f0-11eb-8d74-000d3a38a36f,17.000000,30.571429,5.0,187.874000
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,209.857143,224.857143,21.0,95.883333
2,69b69676-1a40-11ea-941b-000d3a38a36f,52.285714,78.857143,5.0,117.064000
3,1854e56c-491f-11eb-806e-000d3a38a36f,1.571429,20.857143,2.0,60.985000
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,83.142857,95.428571,2.0,104.990000
...,...,...,...,...,...
19940,727e2b6e-ddd4-11e9-a848-000d3a38a36f,41.142857,88.428571,3.0,133.986667
19941,25cd53d4-61bf-11ea-8dd8-000d3a38a36f,42.285714,65.285714,2.0,195.235000
19942,8aea4c2a-d6fc-11e9-93bc-000d3a38a36f,88.714286,89.857143,3.0,210.980000
19943,e50bb46c-ff30-11e9-a5e8-000d3a38a36f,98.428571,113.857143,6.0,168.295000


### BG/NBD, Gamma-Gamma Modellerinin Kurulması ve CLTV’ninHesaplanması

In [49]:
pip install lifetimes

Note: you may need to restart the kernel to use updated packages.


In [50]:
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter

In [51]:
bgf = BetaGeoFitter(penalizer_coef=0.001)
bgf.fit(cltv_df["frequency"],cltv_df["recency"],cltv_df["T"])
# BG/NBD modelinin fit edilmesi

  result = getattr(ufunc, method)(*inputs, **kwargs)


<lifetimes.BetaGeoFitter: fitted with 19945 subjects, a: 0.00, alpha: 76.17, b: 0.00, r: 3.66>

3 ay içerisinde müşterilerden beklenen satın almaları tahmin edilmesi ve exp_sales_3_month olarak cltv dataframe'ine eklenmesi

In [52]:
cltv_df["exp_sales_3_month"] = bgf.predict(12,cltv_df["frequency"],cltv_df["recency"],cltv_df["T"])

In [53]:
cltv_df

Unnamed: 0,customer_id,recency,T,frequency,monetary,exp_sales_3_month
0,cc294636-19f0-11eb-8d74-000d3a38a36f,17.000000,30.571429,5.0,187.874000,0.973927
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,209.857143,224.857143,21.0,95.883333,0.983161
2,69b69676-1a40-11ea-941b-000d3a38a36f,52.285714,78.857143,5.0,117.064000,0.670586
3,1854e56c-491f-11eb-806e-000d3a38a36f,1.571429,20.857143,2.0,60.985000,0.700412
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,83.142857,95.428571,2.0,104.990000,0.396039
...,...,...,...,...,...,...
19940,727e2b6e-ddd4-11e9-a848-000d3a38a36f,41.142857,88.428571,3.0,133.986667,0.485785
19941,25cd53d4-61bf-11ea-8dd8-000d3a38a36f,42.285714,65.285714,2.0,195.235000,0.480429
19942,8aea4c2a-d6fc-11e9-93bc-000d3a38a36f,88.714286,89.857143,3.0,210.980000,0.481605
19943,e50bb46c-ff30-11e9-a5e8-000d3a38a36f,98.428571,113.857143,6.0,168.295000,0.610224


6 ay içerisinde müşterilerden beklenen satın almaları tahmin edilmesi ve exp_sales_3_month olarak cltv dataframe'ine eklenmesi

In [54]:
cltv_df["exp_sales_6_month"] = bgf.predict(24,cltv_df["frequency"],cltv_df["recency"],cltv_df["T"])

In [55]:
cltv_df.sort_values("exp_sales_6_month",ascending=False)

Unnamed: 0,customer_id,recency,T,frequency,monetary,exp_sales_3_month,exp_sales_6_month
7330,a4d534a2-5b1b-11eb-8dbd-000d3a38a36f,62.714286,67.285714,52.5,164.637912,4.697962,9.395924
15611,4a7e875e-e6ce-11ea-8f44-000d3a38a36f,39.714286,40.000000,29.0,165.297586,3.373958,6.747915
8328,1902bf80-0035-11eb-8341-000d3a38a36f,28.857143,33.285714,25.0,97.439600,3.142396,6.284792
19538,55d54d9e-8ac7-11ea-8ec0-000d3a38a36f,52.571429,58.714286,31.0,228.530000,3.083779,6.167558
14373,f00ad516-c4f4-11ea-98f7-000d3a38a36f,38.000000,46.428571,27.0,141.354815,3.001287,6.002574
...,...,...,...,...,...,...,...
14562,7753092e-a69e-11e9-a2fc-000d3a38a36f,330.000000,374.714286,2.0,97.855000,0.150727,0.301453
7770,9b976186-a6cb-11e9-a2fc-000d3a38a36f,339.142857,377.000000,2.0,80.240000,0.149966,0.299933
11232,f486e45e-a691-11e9-a2fc-000d3a38a36f,350.571429,378.142857,2.0,45.240000,0.149589,0.299178
19823,4eb38320-a691-11e9-a2fc-000d3a38a36f,361.285714,378.428571,2.0,64.985000,0.149495,0.298990


In [56]:
ggf = GammaGammaFitter(penalizer_coef=0.01)
ggf.fit(cltv_df["frequency"].astype(int),cltv_df["monetary"])
# Gamma-Gamma modelinin fit edilmesi

<lifetimes.GammaGammaFitter: fitted with 19945 subjects, p: 4.15, q: 0.47, v: 4.08>

In [57]:
cltv_df["expected_average_profit"] = ggf.conditional_expected_average_profit(cltv_df["frequency"],cltv_df["monetary"])
# müşterilerin ortalama bırakacakları değeri tahminleyip exp_average_value olarak cltv dataframe'ine eklenmesi

In [58]:
cltv_df

Unnamed: 0,customer_id,recency,T,frequency,monetary,exp_sales_3_month,exp_sales_6_month,expected_average_profit
0,cc294636-19f0-11eb-8d74-000d3a38a36f,17.000000,30.571429,5.0,187.874000,0.973927,1.947853,193.632662
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,209.857143,224.857143,21.0,95.883333,0.983161,1.966323,96.665046
2,69b69676-1a40-11ea-941b-000d3a38a36f,52.285714,78.857143,5.0,117.064000,0.670586,1.341172,120.967609
3,1854e56c-491f-11eb-806e-000d3a38a36f,1.571429,20.857143,2.0,60.985000,0.700412,1.400824,67.320131
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,83.142857,95.428571,2.0,104.990000,0.396039,0.792077,114.325083
...,...,...,...,...,...,...,...,...
19940,727e2b6e-ddd4-11e9-a848-000d3a38a36f,41.142857,88.428571,3.0,133.986667,0.485785,0.971569,141.360353
19941,25cd53d4-61bf-11ea-8dd8-000d3a38a36f,42.285714,65.285714,2.0,195.235000,0.480429,0.960859,210.722354
19942,8aea4c2a-d6fc-11e9-93bc-000d3a38a36f,88.714286,89.857143,3.0,210.980000,0.481605,0.963210,221.775178
19943,e50bb46c-ff30-11e9-a5e8-000d3a38a36f,98.428571,113.857143,6.0,168.295000,0.610224,1.220448,172.647445


In [59]:
cltv_df["cltv"] = ggf.customer_lifetime_value(bgf,
                                  cltv_df["frequency"],
                                  cltv_df["recency"],
                                  cltv_df["T"],
                                  cltv_df["monetary"],
                                  time=6,
                                  freq="W",
                                  discount_rate=0.01)
# BG/NBD ve Gamma Gamma modeli ile 6 aylık CLTV Hesabı

In [60]:
cltv_df

Unnamed: 0,customer_id,recency,T,frequency,monetary,exp_sales_3_month,exp_sales_6_month,expected_average_profit,cltv
0,cc294636-19f0-11eb-8d74-000d3a38a36f,17.000000,30.571429,5.0,187.874000,0.973927,1.947853,193.632662,395.733200
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,209.857143,224.857143,21.0,95.883333,0.983161,1.966323,96.665046,199.430689
2,69b69676-1a40-11ea-941b-000d3a38a36f,52.285714,78.857143,5.0,117.064000,0.670586,1.341172,120.967609,170.224170
3,1854e56c-491f-11eb-806e-000d3a38a36f,1.571429,20.857143,2.0,60.985000,0.700412,1.400824,67.320131,98.945505
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,83.142857,95.428571,2.0,104.990000,0.396039,0.792077,114.325083,95.011638
...,...,...,...,...,...,...,...,...,...
19940,727e2b6e-ddd4-11e9-a848-000d3a38a36f,41.142857,88.428571,3.0,133.986667,0.485785,0.971569,141.360353,144.101694
19941,25cd53d4-61bf-11ea-8dd8-000d3a38a36f,42.285714,65.285714,2.0,195.235000,0.480429,0.960859,210.722354,212.440731
19942,8aea4c2a-d6fc-11e9-93bc-000d3a38a36f,88.714286,89.857143,3.0,210.980000,0.481605,0.963210,221.775178,224.130740
19943,e50bb46c-ff30-11e9-a5e8-000d3a38a36f,98.428571,113.857143,6.0,168.295000,0.610224,1.220448,172.647445,221.078892


In [61]:
#cltv değeri en yüksek 20 kişi
cltv_df.sort_values("cltv",ascending=False).head(20)

Unnamed: 0,customer_id,recency,T,frequency,monetary,exp_sales_3_month,exp_sales_6_month,expected_average_profit,cltv
9055,47a642fe-975b-11eb-8c2a-000d3a38a36f,2.857143,7.857143,4.0,1401.7867,1.094385,2.188769,1449.046567,3327.745116
13880,7137a5c0-7aad-11ea-8f20-000d3a38a36f,6.142857,13.142857,11.0,758.068218,1.970108,3.940216,767.343132,3172.322169
17323,f59053e2-a503-11e9-a2fc-000d3a38a36f,51.714286,101.0,7.0,1106.467143,0.722238,1.444476,1127.611454,1708.981954
12438,625f40a2-5bd2-11ea-98b0-000d3a38a36f,74.285714,74.571429,16.0,501.8619,1.565309,3.130618,506.154706,1662.57421
7330,a4d534a2-5b1b-11eb-8dbd-000d3a38a36f,62.714286,67.285714,52.5,164.637912,4.697962,9.395924,165.117026,1627.792555
8868,9ce6e520-89b0-11ea-a6e7-000d3a38a36f,3.428571,34.428571,8.0,601.22625,1.265456,2.530912,611.492582,1623.812595
6402,851de3b4-8f0c-11eb-8cb8-000d3a38a36f,8.285714,9.428571,2.0,862.69,0.793924,1.587847,923.679751,1538.855549
6666,53fe00d4-7b7a-11eb-960b-000d3a38a36f,9.714286,13.0,17.0,259.865294,2.780689,5.561378,262.0729,1529.227957
19538,55d54d9e-8ac7-11ea-8ec0-000d3a38a36f,52.571429,58.714286,31.0,228.53,3.083779,6.167558,229.606942,1485.819136
14858,031b2954-6d28-11eb-99c4-000d3a38a36f,14.857143,15.571429,3.0,743.586667,0.871564,1.743128,778.050253,1422.999459


###  CLTV Değerine Göre Segmentlerin Oluşturulması

In [62]:
cltv_df["segment"] = pd.qcut(cltv_df["cltv"], 4, labels=["D","C","B","A"])
# 6 aylık standartlaştırılmış CLTV'ye göre tüm müşterilerinizi 4 gruba (segmente) ayırılması ve veri setine eklenmesi

In [63]:
cltv_df.groupby("segment").agg({"count","mean","sum"})

Unnamed: 0_level_0,recency,recency,recency,T,T,T,frequency,frequency,frequency,monetary,...,exp_sales_3_month,exp_sales_6_month,exp_sales_6_month,exp_sales_6_month,expected_average_profit,expected_average_profit,expected_average_profit,cltv,cltv,cltv
Unnamed: 0_level_1,sum,mean,count,sum,mean,count,sum,mean,count,sum,...,count,sum,mean,count,sum,mean,count,sum,mean,count
segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
D,693193.857143,139.000172,4987,808807.714286,162.183219,4987,18795.0,3.768799,4987,464547.0,...,4987,4078.327732,0.817792,4987,492172.4,98.691071,4987,400657.9,80.340465,4987
C,461850.857143,92.629534,4986,562512.142857,112.81832,4986,21962.0,4.404733,4986,627181.6,...,4986,5239.769227,1.050896,4986,659401.4,132.250574,4986,689621.1,138.31149,4986
B,408794.0,81.988367,4986,500228.0,100.326514,4986,25392.5,5.09276,4986,800933.1,...,4986,5994.243721,1.202215,4986,837649.9,168.000385,4986,994870.5,199.532792,4986
A,336191.714286,67.427139,4986,411592.857143,82.549711,4986,33146.5,6.647914,4986,1140934.0,...,4986,7709.068468,1.546143,4986,1186769.0,238.02034,4986,1806499.0,362.314355,4986
