**Imports and Reading Dataset**

In [1]:
import pandas as pd
df = pd.read_csv("persona.csv")
df.head()

Unnamed: 0,PRICE,SOURCE,SEX,COUNTRY,AGE
0,39,android,male,bra,17
1,39,android,male,bra,17
2,49,android,male,bra,17
3,29,android,male,tur,17
4,49,android,male,tur,17


**General Information About Dataset**

In [2]:
def information(df):
    print("###############################    Shape  ##################################")
    print(df.shape)
    print("###############################    Types  ##################################")
    print(df.dtypes)
    print("###############################    Head   ##################################")
    print(df.head())
    print("###############################    Tail   ##################################")
    print(df.tail())
    print("###############################    NA     ##################################")
    print(df.isnull().sum())
    print("############################### Quantiles ##################################")
    print(df.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

information(df)

###############################    Shape  ##################################
(5000, 5)
###############################    Types  ##################################
PRICE       int64
SOURCE     object
SEX        object
COUNTRY    object
AGE         int64
dtype: object
###############################    Head   ##################################
   PRICE   SOURCE   SEX COUNTRY  AGE
0     39  android  male     bra   17
1     39  android  male     bra   17
2     49  android  male     bra   17
3     29  android  male     tur   17
4     49  android  male     tur   17
###############################    Tail   ##################################
      PRICE   SOURCE     SEX COUNTRY  AGE
4995     29  android  female     bra   31
4996     29  android  female     bra   31
4997     29  android  female     bra   31
4998     39  android  female     bra   31
4999     29  android  female     bra   31
###############################    NA     ##################################
PRICE      0
SOURCE     0
S

**Number of Unıque SOURCE**

In [3]:
df["SOURCE"].nunique()

2

**Frequencies of SOURCE**

In [4]:
df["SOURCE"].value_counts()

android    2974
ios        2026
Name: SOURCE, dtype: int64

**Number of Unıque PRICE**

In [5]:
df["PRICE"].nunique()

6

**Frequencies of PRICE**

In [6]:
df["PRICE"].value_counts()

29    1305
39    1260
49    1031
19     992
59     212
9      200
Name: PRICE, dtype: int64

**Sales by Country**

In [7]:
df["COUNTRY"].value_counts()

usa    2065
bra    1496
deu     455
tur     451
fra     303
can     230
Name: COUNTRY, dtype: int64

**Total Earnings by Country**

In [8]:
df.groupby("COUNTRY").agg({"PRICE":"sum"})

Unnamed: 0_level_0,PRICE
COUNTRY,Unnamed: 1_level_1
bra,51354
can,7730
deu,15485
fra,10177
tur,15689
usa,70225


**Sales by Source Type**

In [9]:
df.groupby("SOURCE").agg({"PRICE":"sum"})

Unnamed: 0_level_0,PRICE
SOURCE,Unnamed: 1_level_1
android,101636
ios,69024


**Average Prices by Country**

In [10]:
df.groupby("COUNTRY").agg({"PRICE":"mean"})

Unnamed: 0_level_0,PRICE
COUNTRY,Unnamed: 1_level_1
bra,34.32754
can,33.608696
deu,34.032967
fra,33.587459
tur,34.78714
usa,34.007264


**Average Prices by Source**

In [11]:
df.groupby("SOURCE").agg({"PRICE":"mean"})

Unnamed: 0_level_0,PRICE
SOURCE,Unnamed: 1_level_1
android,34.174849
ios,34.069102


**Average Prices by Country-Source**

In [12]:
df.groupby(["COUNTRY", "SOURCE"]).agg({"PRICE":"mean"})

Unnamed: 0_level_0,Unnamed: 1_level_0,PRICE
COUNTRY,SOURCE,Unnamed: 2_level_1
bra,android,34.387029
bra,ios,34.222222
can,android,33.330709
can,ios,33.951456
deu,android,33.869888
deu,ios,34.268817
fra,android,34.3125
fra,ios,32.776224
tur,android,36.229437
tur,ios,33.272727


**Average Earnings by Country-Source-Sex-Age**

In [13]:
avg_earn = df.groupby(["COUNTRY", "SOURCE", "SEX", "AGE"], as_index=False)["PRICE"]\
    .mean().sort_values("PRICE", ascending=False, ignore_index=True)

avg_earn.head()


Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE
0,bra,android,male,46,59.0
1,usa,android,male,36,59.0
2,fra,android,female,24,59.0
3,usa,ios,male,32,54.0
4,deu,android,female,36,49.0


In order to create new customer definitions, we first created a new variable containing the average price paid by customers on the basis of Country - Source - Gender - Age


**Convertion of the Age variable to a categorical variable**

In [14]:
avg_earn['AGE_CAT'] = \
    pd.cut(avg_earn["AGE"],bins=[0,18,23,30,40,70],
           labels=['0_18','19_23','24_30','31_40','41_70'])

avg_earn

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,AGE_CAT
0,bra,android,male,46,59.0,41_70
1,usa,android,male,36,59.0,31_40
2,fra,android,female,24,59.0,24_30
3,usa,ios,male,32,54.0,31_40
4,deu,android,female,36,49.0,31_40
...,...,...,...,...,...,...
343,usa,ios,female,38,19.0,31_40
344,usa,ios,female,30,19.0,24_30
345,can,android,female,27,19.0,24_30
346,fra,android,male,18,19.0,0_18


Since we are creating new customer definitions, we must change the type of the age variable to a categorical variable. To do this, we need to create ranges for the age variable


**Define personas and add them as variables to the dataset**

In [15]:
columns = ['COUNTRY','SOURCE','SEX','AGE_CAT']

avg_earn['Customer_Level_Based'] = avg_earn[columns].apply('_'.join, axis=1).str.upper()

avg_earn

Unnamed: 0,COUNTRY,SOURCE,SEX,AGE,PRICE,AGE_CAT,Customer_Level_Based
0,bra,android,male,46,59.0,41_70,BRA_ANDROID_MALE_41_70
1,usa,android,male,36,59.0,31_40,USA_ANDROID_MALE_31_40
2,fra,android,female,24,59.0,24_30,FRA_ANDROID_FEMALE_24_30
3,usa,ios,male,32,54.0,31_40,USA_IOS_MALE_31_40
4,deu,android,female,36,49.0,31_40,DEU_ANDROID_FEMALE_31_40
...,...,...,...,...,...,...,...
343,usa,ios,female,38,19.0,31_40,USA_IOS_FEMALE_31_40
344,usa,ios,female,30,19.0,24_30,USA_IOS_FEMALE_24_30
345,can,android,female,27,19.0,24_30,CAN_ANDROID_FEMALE_24_30
346,fra,android,male,18,19.0,0_18,FRA_ANDROID_MALE_0_18


**Values need to be unique.It is necessary to take them to groupby and get the price averages**

In [16]:
avg_earn["Customer_Level_Based"].value_counts()

BRA_ANDROID_MALE_24_30      7
USA_ANDROID_MALE_41_70      7
USA_IOS_FEMALE_24_30        7
BRA_ANDROID_FEMALE_24_30    7
USA_ANDROID_MALE_24_30      7
                           ..
TUR_ANDROID_MALE_41_70      1
CAN_ANDROID_MALE_19_23      1
TUR_IOS_MALE_31_40          1
TUR_IOS_MALE_24_30          1
CAN_ANDROID_FEMALE_24_30    1
Name: Customer_Level_Based, Length: 109, dtype: int64

**We will define make then unique**

In [17]:
avg_earn_new = avg_earn.groupby("Customer_Level_Based", as_index = False).agg({"PRICE": "mean"})
avg_earn_new

Unnamed: 0,Customer_Level_Based,PRICE
0,BRA_ANDROID_FEMALE_0_18,35.645303
1,BRA_ANDROID_FEMALE_19_23,34.077340
2,BRA_ANDROID_FEMALE_24_30,33.863946
3,BRA_ANDROID_FEMALE_31_40,34.898326
4,BRA_ANDROID_FEMALE_41_70,36.737179
...,...,...
104,USA_IOS_MALE_0_18,33.983495
105,USA_IOS_MALE_19_23,34.901872
106,USA_IOS_MALE_24_30,34.838143
107,USA_IOS_MALE_31_40,36.206324


**Segmentation**

In [19]:
avg_earn_new["SEGMENT"] = pd.qcut(avg_earn_new["PRICE"], 4, labels=["D", "C", "B", "A"])
avg_earn_new

Unnamed: 0,Customer_Level_Based,PRICE,SEGMENT
0,BRA_ANDROID_FEMALE_0_18,35.645303,B
1,BRA_ANDROID_FEMALE_19_23,34.077340,C
2,BRA_ANDROID_FEMALE_24_30,33.863946,C
3,BRA_ANDROID_FEMALE_31_40,34.898326,B
4,BRA_ANDROID_FEMALE_41_70,36.737179,A
...,...,...,...
104,USA_IOS_MALE_0_18,33.983495,C
105,USA_IOS_MALE_19_23,34.901872,B
106,USA_IOS_MALE_24_30,34.838143,B
107,USA_IOS_MALE_31_40,36.206324,A


**Segment Details**

In [20]:
avg_earn_new.groupby("SEGMENT").agg({"PRICE": ['mean','min','max','std','sum','count']}).sort_values("SEGMENT",ascending = False)

Unnamed: 0_level_0,PRICE,PRICE,PRICE,PRICE,PRICE,PRICE
Unnamed: 0_level_1,mean,min,max,std,sum,count
SEGMENT,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,38.691234,36.060606,45.428571,2.581762,1044.663328,27
B,34.999645,34.103727,36.0,0.636502,944.990411,27
C,33.509674,32.5,34.07734,0.492587,904.761209,27
D,29.20678,19.0,32.333333,3.638037,817.789833,28


**Classification of New Customers**

In [21]:
new_users = ["USA_ANDROID_FEMALE_31_40",
                 "FRA_IOS_MALE_0_18",
                 "BRA_ANDROID_FEMALE_41_70",
                 "DEU_ANDROID_MALE_19_23",
                 "TUR_IOS_FEMALE_24_30",
                 "CAN_IOS_MALE_41_70"]

for user in new_users:
    print(avg_earn_new[avg_earn_new["Customer_Level_Based"] == user])
    print("\n----------------------------------------------\n")

        Customer_Level_Based     PRICE SEGMENT
92  USA_ANDROID_FEMALE_31_40  32.80303       C

----------------------------------------------

   Customer_Level_Based      PRICE SEGMENT
64    FRA_IOS_MALE_0_18  33.444444       C

----------------------------------------------

       Customer_Level_Based      PRICE SEGMENT
4  BRA_ANDROID_FEMALE_41_70  36.737179       A

----------------------------------------------

      Customer_Level_Based      PRICE SEGMENT
40  DEU_ANDROID_MALE_19_23  36.070707       A

----------------------------------------------

    Customer_Level_Based  PRICE SEGMENT
81  TUR_IOS_FEMALE_24_30   34.0       C

----------------------------------------------

   Customer_Level_Based  PRICE SEGMENT
33   CAN_IOS_MALE_41_70   31.0       D

----------------------------------------------

