# Kural Tabanlı Sınıflandırma ile Potansiyel Müşteri Getirisi Hesaplama

## İş Problemi
- Bir oyun şirketi müşterilerinin bazı özelliklerini kullanarak seviye tabanlı (level based) yeni müşteri tanımları (persona) oluşturmak ve bu yeni müşteri tanımlarına göre segmentler oluşturup bu segmentlere göre yeni gelebilecek müşterilerin şirkete ortalama ne kadar kazandırabileceğini tahmin etmek istemektedir.
- Örneğin: Türkiye’den IOS kullanıcısı olan 25 yaşındaki bir erkek kullanıcının ortalama ne kadar kazandırabileceği belirlenmek isteniyor.



## Veri Seti Hikayesi
- Persona.csv veri seti uluslararası bir oyun şirketinin sattığı ürünlerin fiyatlarını ve bu ürünleri satın alan kullanıcıların bazı demografik bilgilerini barındırmaktadır. Veri seti her satış işleminde oluşan kayıtlardan meydana gelmektedir. Bunun anlamı tablo tekilleştirilmemiştir. Diğer bir ifade ile belirli demografik özelliklere sahip bir kullanıcı birden fazla alışveriş yapmış olabilir.

- Price: Müşterinin harcama tutarı
- Source: Müşterinin bağlandığı cihaz türü
- Sex: Müşterinin cinsiyeti
- Country: Müşterinin ülkesi
- Age: Müşterinin yaşı

In [3]:
################# Uygulama Öncesi #####################

#    PRICE   SOURCE   SEX COUNTRY  AGE
# 0     39  android  male     bra   17
# 1     39  android  male     bra   17
# 2     49  android  male     bra   17
# 3     29  android  male     tur   17
# 4     49  android  male     tur   17

################# Uygulama Sonrası #####################

#       customers_level_based        PRICE SEGMENT
# 0   BRA_ANDROID_FEMALE_0_18  1139.800000       A
# 1  BRA_ANDROID_FEMALE_19_23  1070.600000       A
# 2  BRA_ANDROID_FEMALE_24_30   508.142857       A
# 3  BRA_ANDROID_FEMALE_31_40   233.166667       C
# 4  BRA_ANDROID_FEMALE_41_66   236.666667       C


## Import and read the data

In [5]:
import pandas as pd
pd.set_option("display.max_rows", None)
df = pd.read_csv("datasets\persona.csv")
df.head()

Unnamed: 0,PRICE,SOURCE,SEX,COUNTRY,AGE
0,39,android,male,bra,17
1,39,android,male,bra,17
2,49,android,male,bra,17
3,29,android,male,tur,17
4,49,android,male,tur,17


## Data Analysis

In [7]:
df.shape

(5000, 5)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5000 entries, 0 to 4999
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   PRICE    5000 non-null   int64 
 1   SOURCE   5000 non-null   object
 2   SEX      5000 non-null   object
 3   COUNTRY  5000 non-null   object
 4   AGE      5000 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 234.4+ KB


In [11]:
# Number of unique SOURCE
df["SOURCE"].nunique()

2

In [12]:
# Frequency of SOURCE
df["SOURCE"].value_counts()

android    2974
ios        2026
Name: SOURCE, dtype: int64

In [13]:
# Number of unique PRICE
df["PRICE"].nunique()

6

In [15]:
# PRICES number of sales
df["PRICE"].value_counts()

29    1305
39    1260
49    1031
19     992
59     212
9      200
Name: PRICE, dtype: int64

In [16]:
# COUNTRY number of sales
df["COUNTRY"].value_counts()

usa    2065
bra    1496
deu     455
tur     451
fra     303
can     230
Name: COUNTRY, dtype: int64

In [17]:
# Total earnings by country
df.groupby("COUNTRY")["PRICE"].sum()

COUNTRY
bra    51354
can     7730
deu    15485
fra    10177
tur    15689
usa    70225
Name: PRICE, dtype: int64

In [18]:
# Number of sales by SOURCE types
df["SOURCE"].value_counts()

android    2974
ios        2026
Name: SOURCE, dtype: int64

In [19]:
# Mean of PRICE by COUNTRY
df.groupby(by=['COUNTRY']).agg({"PRICE": "mean"})

Unnamed: 0_level_0,PRICE
COUNTRY,Unnamed: 1_level_1
bra,34.32754
can,33.608696
deu,34.032967
fra,33.587459
tur,34.78714
usa,34.007264


In [21]:
# Mean of PRICE by SOURCE
df.groupby(by=['SOURCE']).agg({"PRICE": "mean"})

Unnamed: 0_level_0,PRICE
SOURCE,Unnamed: 1_level_1
android,34.174849
ios,34.069102


In [22]:
# Mean of PRICE by COUNTRY-SOURCE
df.groupby(by=["COUNTRY", 'SOURCE']).agg({"PRICE": "mean"})

Unnamed: 0_level_0,Unnamed: 1_level_0,PRICE
COUNTRY,SOURCE,Unnamed: 2_level_1
bra,android,34.387029
bra,ios,34.222222
can,android,33.330709
can,ios,33.951456
deu,android,33.869888
deu,ios,34.268817
fra,android,34.3125
fra,ios,32.776224
tur,android,36.229437
tur,ios,33.272727


In [23]:
# Mean of PRICE by COUNTRY-SOURCE-SEX-AGE
df.groupby(["COUNTRY", 'SOURCE', "SEX", "AGE"]).agg({"PRICE": "mean"}).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,PRICE
COUNTRY,SOURCE,SEX,AGE,Unnamed: 4_level_1
bra,android,female,15,38.714286
bra,android,female,16,35.944444
bra,android,female,17,35.666667
bra,android,female,18,32.255814
bra,android,female,19,35.206897


In [24]:
agg_df = df.groupby(by=["COUNTRY", 'SOURCE', "SEX", "AGE"]).agg({"PRICE": "mean"}).sort_values("PRICE", ascending=False)
agg_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,PRICE
COUNTRY,SOURCE,SEX,AGE,Unnamed: 4_level_1
bra,android,male,46,59.0
usa,android,male,36,59.0
fra,android,female,24,59.0
usa,ios,male,32,54.0
deu,android,female,36,49.0


In [25]:
# Reset indexes
agg_df = agg_df.reset_index()
agg_df.head()

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE
0,bra,android,male,46,59.0
1,usa,android,male,36,59.0
2,fra,android,female,24,59.0
3,usa,ios,male,32,54.0
4,deu,android,female,36,49.0


In [26]:
# Converting AGE variable to categorical variable
bins = [0, 18, 23, 30, 40, agg_df["AGE"].max()]
mylabels = ['0_18', '19_23', '24_30', '31_40', '41_' + str(agg_df["AGE"].max())]

agg_df["age_cat"] = pd.cut(agg_df["AGE"], bins, labels=mylabels)
agg_df.head()

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,age_cat
0,bra,android,male,46,59.0,41_66
1,usa,android,male,36,59.0,31_40
2,fra,android,female,24,59.0,24_30
3,usa,ios,male,32,54.0,31_40
4,deu,android,female,36,49.0,31_40


In [27]:
# Defining new level based customers and add to dataset
agg_df['customers_level_based'] = agg_df[['COUNTRY', 'SOURCE', 'SEX', 'age_cat']].agg(lambda x: '_'.join(x).upper(), axis=1)
agg_df.head()

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,age_cat,customers_level_based
0,bra,android,male,46,59.0,41_66,BRA_ANDROID_MALE_41_66
1,usa,android,male,36,59.0,31_40,USA_ANDROID_MALE_31_40
2,fra,android,female,24,59.0,24_30,FRA_ANDROID_FEMALE_24_30
3,usa,ios,male,32,54.0,31_40,USA_IOS_MALE_31_40
4,deu,android,female,36,49.0,31_40,DEU_ANDROID_FEMALE_31_40


In [29]:
# Segmentation
agg_df["SEGMENT"] = pd.qcut(agg_df["PRICE"], 4, labels=["D", "C", "B", "A"])
agg_df.head()

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,age_cat,customers_level_based,SEGMENT
0,bra,android,male,46,59.0,41_66,BRA_ANDROID_MALE_41_66,A
1,usa,android,male,36,59.0,31_40,USA_ANDROID_MALE_31_40,A
2,fra,android,female,24,59.0,24_30,FRA_ANDROID_FEMALE_24_30,A
3,usa,ios,male,32,54.0,31_40,USA_IOS_MALE_31_40,A
4,deu,android,female,36,49.0,31_40,DEU_ANDROID_FEMALE_31_40,A


In [30]:
# Classify new customers and predict how much revenue they can generate
new_user = "TUR_ANDROID_FEMALE_31_40"
agg_df[agg_df["customers_level_based"] == new_user]

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,age_cat,customers_level_based,SEGMENT
18,tur,android,female,32,43.0,31_40,TUR_ANDROID_FEMALE_31_40,A
35,tur,android,female,31,40.666667,31_40,TUR_ANDROID_FEMALE_31_40,A


In [31]:
new_user = "FRA_IOS_FEMALE_31_40"
agg_df[agg_df["customers_level_based"] == new_user]

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,age_cat,customers_level_based,SEGMENT
208,fra,ios,female,40,33.0,31_40,FRA_IOS_FEMALE_31_40,C
221,fra,ios,female,31,32.636364,31_40,FRA_IOS_FEMALE_31_40,C
