# RFM Analizi ile MÃ¼ÅŸteri Segmentasyonu (Customer Segmentation with RFM)

## Ä°ÅŸ Problemi (Business Problem)

Online ayakkabÄ± maÄŸazasÄ± olan FLO mÃ¼ÅŸterilerini segmentlere ayÄ±rÄ±p bu segmentlere gÃ¶re pazarlama
stratejileri belirlemek istiyor. Buna yÃ¶nelik olarak mÃ¼ÅŸterilerin davranÄ±ÅŸlarÄ± tanÄ±mlanacak ve bu
davranÄ±ÅŸlardaki Ã¶beklenmelere gÃ¶re gruplar oluÅŸturulacak.

## Veri Seti Hikayesi

Veri seti Floâ€™dan son alÄ±ÅŸveriÅŸlerini 2020 - 2021 yÄ±llarÄ±nda OmniChannel (hem online hem offline alÄ±ÅŸveriÅŸ yapan)
olarak yapan mÃ¼ÅŸterilerin geÃ§miÅŸ alÄ±ÅŸveriÅŸ davranÄ±ÅŸlarÄ±ndan elde edilen bilgilerden oluÅŸmaktadÄ±r.

### ğŸ“Š Veri Seti DeÄŸiÅŸkenleri

| DeÄŸiÅŸken AdÄ±                         | AÃ§Ä±klama                                                                 |
|-------------------------------------|--------------------------------------------------------------------------|
| master_id                           | EÅŸsiz mÃ¼ÅŸteri numarasÄ±                                                  |
| order_channel                       | AlÄ±ÅŸveriÅŸ yapÄ±lan platform (Android, iOS, Desktop, Mobile)             |
| last_order_channel                  | En son alÄ±ÅŸveriÅŸ yapÄ±lan kanal                                          |
| first_order_date                    | MÃ¼ÅŸterinin yaptÄ±ÄŸÄ± ilk alÄ±ÅŸveriÅŸ tarihi                                |
| last_order_date                     | MÃ¼ÅŸterinin yaptÄ±ÄŸÄ± son alÄ±ÅŸveriÅŸ tarihi                                |
| last_order_date_online              | MÃ¼ÅŸterinin online platformda yaptÄ±ÄŸÄ± son alÄ±ÅŸveriÅŸ tarihi              |
| last_order_date_offline             | MÃ¼ÅŸterinin offline platformda yaptÄ±ÄŸÄ± son alÄ±ÅŸveriÅŸ tarihi             |
| order_num_total_ever_online         | Online platformdaki toplam alÄ±ÅŸveriÅŸ sayÄ±sÄ±                            |
| order_num_total_ever_offline        | Offline platformdaki toplam alÄ±ÅŸveriÅŸ sayÄ±sÄ±                           |
| customer_value_total_ever_offline   | Offline alÄ±ÅŸveriÅŸlerde Ã¶denen toplam Ã¼cret                             |
| customer_value_total_ever_online    | Online alÄ±ÅŸveriÅŸlerde Ã¶denen toplam Ã¼cret                              |
| interested_in_categories_12         | Son 12 ayda alÄ±ÅŸveriÅŸ yapÄ±lan kategoriler                              |


## Ã–n Ayarlar

In [None]:
import datetime as dt
import pandas as pd

pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

## 1. Veriyi Anlama (Data Understanding) ve HazÄ±rlama

In [None]:
# veri setini okuma ve kopyasÄ±nÄ± oluÅŸturma 

df_ = pd.read_csv('/kaggle/input/flo-data/flo_data_20k.csv')
df = df_.copy()

In [None]:
# Veri setinde
# a. Ä°lk 10 gÃ¶zlem
df.head(10)

# b. DeÄŸiÅŸken isimleri
df.columns

# c. Betimsel istatistik
df.describe().T

# d. BoÅŸ deÄŸer incelemesi
df.isnull().sum()
                     
# e. DeÄŸiÅŸken tipleri incelemesi
df.info()

In [None]:
# Omnichannel mÃ¼ÅŸterilerin hem online'dan hem de offline platformlardan alÄ±ÅŸveriÅŸ yaptÄ±ÄŸÄ±nÄ± ifade etmektedir. 
# Herbir mÃ¼ÅŸterinin toplam alÄ±ÅŸveriÅŸ sayÄ±sÄ± ve harcamasÄ± iÃ§in yeni deÄŸiÅŸkenlerin oluÅŸturulmasÄ±
df["total_order_num"] = df["order_num_total_ever_online"] + df["order_num_total_ever_offline"]
df["total_customer_value"] = df["customer_value_total_ever_offline"] + df["customer_value_total_ever_online"]

In [None]:
# Tarih ifade eden deÄŸiÅŸkenlerin tipinin date'e Ã§evrilmesi
date_columns = [col for col in df.columns if "date" in col]
df[date_columns] = df[date_columns].apply(pd.to_datetime)

In [None]:
# AlÄ±ÅŸveriÅŸ kanallarÄ±ndaki mÃ¼ÅŸteri sayÄ±sÄ±nÄ±n, toplam alÄ±nan Ã¼rÃ¼n sayÄ±sÄ±nÄ±n ve toplam harcamalarÄ±n daÄŸÄ±lÄ±mÄ±nÄ±n incelenmesi
df.groupby("order_channel").agg({"master_id": "count",
                                 "total_order_num": "sum",
                                 "total_customer_value": "sum"})

In [None]:
# En fazla kazancÄ± getiren ilk 10 mÃ¼ÅŸterinin sÄ±ralanmasÄ±
df.groupby("master_id").agg({"total_customer_value": "sum"}).sort_values("total_customer_value", ascending=False)[:10]

In [None]:
# En fazla sipariÅŸi veren ilk 10 mÃ¼ÅŸterinin sÄ±ralanmasÄ±
df.groupby("master_id").agg({"total_order_num": "sum"}).sort_values("total_order_num", ascending=False)[:10]

## Veri Ã–n HazÄ±rlÄ±k SÃ¼recinin FonksiyonlaÅŸtÄ±rÄ±lmasÄ±

In [None]:
def data_preparation(dataframe):

    dataframe["total_order_num"] = dataframe["order_num_total_ever_online"] + dataframe["order_num_total_ever_offline"]
    dataframe["total_customer_value"] = dataframe["customer_value_total_ever_offline"] + dataframe["customer_value_total_ever_online"]
    dataframe.dropna(inplace=True)
    date_columns = [col for col in dataframe.columns if "date" in col]
    dataframe[date_columns] = dataframe[date_columns].apply(pd.to_datetime)

## 2. RFM Metriklerinin HesaplanmasÄ±

In [None]:
# Analiz gÃ¼nÃ¼ (son sipariÅŸ tarihinden 2 gÃ¼n sonrasÄ± olarak seÃ§ilmiÅŸtir)
df["last_order_date"].max()
today_date = dt.datetime(2021, 6, 1)

In [None]:
# Recency, Frequency, Monetary
rfm = df.groupby("master_id").agg({"last_order_date": lambda last_order_date : (today_date - last_order_date.max()).days,
                                   "total_order_num": lambda total_order_num: total_order_num.sum(),
                                   "total_customer_value": lambda total_customer_value: total_customer_value.sum()})

rfm.columns = ['recency', 'frequency', 'monetary']

## 3. RFM ve RF SkorlarÄ±nÄ±n HesaplanmasÄ±

In [None]:
rfm['recency_score'] = pd.qcut(rfm['recency'], 5, labels = [5, 4, 3, 2, 1])

rfm['frequency_score'] = pd.qcut(rfm['frequency'].rank(method="first"), 5, labels = [1, 2, 3, 4, 5])

rfm['monetary_score'] = pd.qcut(rfm['monetary'], 5, labels = [1, 2, 3, 4, 5])

rfm['RF_SCORE'] = (rfm['recency_score'].astype(str) +
                   rfm['frequency_score'].astype(str))

## 4. RF SkorlarÄ±nÄ±n Segment Olarak TanÄ±mlanmasÄ±

In [None]:
seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

rfm['segment'] = rfm['RF_SCORE'].replace(seg_map, regex = True)

In [None]:
# Segmentlerin recency, frequnecy ve monetary ortalamalarÄ±nÄ±n incelenmesi
rfm[['segment', 'recency', 'frequency', 'monetary']].groupby('segment').agg(["mean", "count"])

## 5. RFM Analizi YardÄ±mÄ± ile Ä°lgili Profildeki MÃ¼ÅŸterilerin BulunmasÄ± ve MÃ¼ÅŸteri id'lerinin csv'ye Kaydedilmesi

In [None]:
# a. FLO bÃ¼nyesine yeni bir kadÄ±n ayakkabÄ± markasÄ± dahil ediyor. Dahil ettiÄŸi markanÄ±n Ã¼rÃ¼n fiyatlarÄ± genel mÃ¼ÅŸteri tercihlerinin Ã¼stÃ¼nde. Bu nedenle markanÄ±n
# tanÄ±tÄ±mÄ± ve Ã¼rÃ¼n satÄ±ÅŸlarÄ± iÃ§in ilgilenecek profildeki mÃ¼ÅŸterilerle Ã¶zel olarak iletiÅŸime geÃ§eilmek isteniliyor. Bu mÃ¼ÅŸterilerin sadÄ±k (champions,loyal_customers) ve
# kadÄ±n kategorisinden alÄ±ÅŸveriÅŸ yapan kiÅŸiler olmasÄ± planlandÄ±. MÃ¼ÅŸterilerin id numaralarÄ±nÄ± csv dosyasÄ±na yeni_marka_hedef_mÃ¼ÅŸteri_id.cvs
# olarak kaydedilmesi

yeni_marka_hedef_mÃ¼ÅŸteri_segmenti = rfm[rfm['segment'].isin(['champions', 'loyal_customers'])]
yeni_marka_hedef_mÃ¼ÅŸteri = df[df['master_id'].isin(yeni_marka_hedef_mÃ¼ÅŸteri_segmenti.index) &
                           df['interested_in_categories_12'].str.contains("KADIN")]

yeni_marka_hedef_mÃ¼ÅŸteri['master_id'].to_csv("yeni_marka_hedef_mÃ¼ÅŸteri_id.csv", index=False)

In [None]:
# b. Erkek ve Ã‡oÃ§uk Ã¼rÃ¼nlerinde %40'a yakÄ±n indirim planlanmaktadÄ±r. Bu indirimle ilgili kategorilerle ilgilenen geÃ§miÅŸte iyi mÃ¼ÅŸteri olan ama uzun sÃ¼redir
# alÄ±ÅŸveriÅŸ yapmayan kaybedilmemesi gereken mÃ¼ÅŸteriler, uykuda olanlar ve yeni gelen mÃ¼ÅŸteriler Ã¶zel olarak hedef alÄ±nmak isteniliyor. Uygun profildeki mÃ¼ÅŸterilerin id'lerini csv dosyasÄ±na indirim_hedef_mÃ¼ÅŸteri_ids.csv
# olarak kaydedilmesi

kampanya_hedef_musteri_segmenti = rfm[rfm['segment'].isin(['cant_loose', 'hibernating', 'new_customers'])]
kampanya_hedef_musteri = df[df['master_id'].isin(kampanya_hedef_musteri_segmenti).index &
                         df['interested_in_categories_12'].str.contains("ERKEK", "COCUK")]

kampanya_hedef_musteri['master_id'].to_csv("kampanya_40_hedef_musteri_id.csv", index=False)

## 6. TÃ¼m SÃ¼recin FonksiyonlaÅŸtÄ±rÄ±lmasÄ±

In [None]:
def create_rfm(dataframe, csv=False):

    # data preparetion
    dataframe["total_order_num"] = (dataframe["order_num_total_ever_online"] +
                                    dataframe["order_num_total_ever_offline"])
    dataframe["total_customer_value"] = (dataframe["customer_value_total_ever_offline"] +
                                         dataframe["customer_value_total_ever_online"])
    dataframe.dropna(inplace=True)
    date_columns = [col for col in dataframe.columns if "date" in col]
    dataframe[date_columns] = dataframe[date_columns].apply(pd.to_datetime)

    # calculating RFM metrics
    dataframe["last_order_date"].max()
    today_date = dt.datetime(2021, 6, 1)
    rfm = dataframe.groupby("master_id").agg(
        {"last_order_date": lambda last_order_date: (today_date - last_order_date.max()).days,
         "total_order_num": lambda total_order_num: total_order_num.sum(),
         "total_customer_value": lambda total_customer_value: total_customer_value.sum()})
    rfm.columns = ['recency', 'frequency', 'monetary']

    # calculating RFM scores
    rfm['recency_score'] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
    rfm['frequency_score'] = pd.qcut(rfm['frequency'].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])
    rfm['monetary_score'] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])
    rfm['RF_SCORE'] = (rfm['recency_score'].astype(str) +
                       rfm['frequency_score'].astype(str))

    #creating & analysing RFM segments
    seg_map = {
        r'[1-2][1-2]': 'hibernating',
        r'[1-2][3-4]': 'at_Risk',
        r'[1-2]5': 'cant_loose',
        r'3[1-2]': 'about_to_sleep',
        r'33': 'need_attention',
        r'[3-4][4-5]': 'loyal_customers',
        r'41': 'promising',
        r'51': 'new_customers',
        r'[4-5][2-3]': 'potential_loyalists',
        r'5[4-5]': 'champions'
    }

    rfm['segment'] = rfm['RF_SCORE'].replace(seg_map, regex=True)

    rfm = rfm[['recency', 'frequency', 'monetary', 'segment']]

    if csv:
        rfm.to_csv("rfm.csv")

    return rfm

In [None]:
df = df_.copy()

rfm_new = create_rfm(df, csv=True)