PRICING: Item fiyatı ne olmalı?

Bir oyun şirketi bir oyununda kullanıcılarına item satın alımları için hediye paralar vermiştir.
Kullanıcılar bu sanal paraları kullanarak karakterlerine çeşitli araçlar satın almaktadır.
Oyun şirketi bir item için fiyat belirtmemiş ve kullanıcılardan bu item'ı istedikleri fiyattan almalarını sağlamış.
Örneğin kalkan isimli item için kullanıcılar kendi uygun gördükleri miktarları ödeyerek bu kalkanı satın alacaklar.
Örneğin bir kullanıcı kendisine verilen sanal paralardan 30 birim, diğer kullanıcı 45 birim ile ödeme yapabilir.
Dolayısıyla kullanıcılar kendilerine göre ödemeyi göze aldıkları miktarlar ile bu item'ı satın alabilirler.

Çözülmesi gereken problemler:
1. Item'in fiyatı kategorilere göre farklılık göstermekte midir? İstatistiki olarak ifade ediniz.
2. İlk soruya bağlı olarak item'ın fiyatı ne olmalıdır? Nedenini açıklayınız?
3. Fiyat konusunda "hareket edebilir olmak" istenmektedir. Fiyat stratejisi için karar destek sistemi oluşturunuz 
4. olası fiyat değişiklikleri için item satın almalarını ve gelirlerini simüle ediniz.

SENARYO 1: TUM KATEGORILER AYNI OLURSA (30-50): frekans sum * fiyat

SENARYO 2: KATEGORI FIYATLARI FARKLI OLURSA? kategori frekansları * kategori fiyatı

In [1]:
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
import scipy.stats as stats
from scipy.stats import shapiro

df = pd.read_csv("pricing.csv", sep=";")
df.head()

Unnamed: 0,category_id,price
0,489756,32.117753
1,361254,30.71137
2,361254,31.572607
3,489756,34.54384
4,489756,47.205824


In [2]:
df["price"].describe()

count      3448.000000
mean       3254.475770
std       25235.799009
min          10.000000
25%          31.890438
50%          34.798544
75%          41.536211
max      201436.991255
Name: price, dtype: float64

In [3]:
df.groupby("category_id").agg({"price": [np.mean, np.median]})

Unnamed: 0_level_0,price,price
Unnamed: 0_level_1,mean,median
category_id,Unnamed: 1_level_2,Unnamed: 2_level_2
201436,36.175498,33.534678
326584,1424.665182,31.748242
361254,1659.680663,34.459195
489756,3589.808526,35.635784
675201,3112.240362,33.835566
874521,4605.357258,34.40086


In [4]:
category_id_list = df["category_id"].unique()
category_id_list

array([489756, 361254, 874521, 326584, 675201, 201436], dtype=int64)

In [5]:
# It will be checked if there are outliers 
# Outlier functions which were written during previous classes are used. 

def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.25)
    quartile3 = dataframe[variable].quantile(0.75)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit


def has_outliers(dataframe, num_col_names, plot=False):
    variable_names = []
    for col in num_col_names:
        low_limit, up_limit = outlier_thresholds(dataframe, col)
        if dataframe[(dataframe[col] > up_limit) | (dataframe[col] < low_limit)].any(axis=None):
            number_of_outliers = dataframe[(dataframe[col] > up_limit) | (dataframe[col] < low_limit)].shape[0]
            variable_names.append(col)
            if plot:
                sns.boxplot(x=dataframe[col])
                plt.show()
    return variable_names

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [6]:
# A dictionary is created based on the category id's
dic = {}
for i in range(len(df)):
    if f"{df.iloc[i, 0]}" not in dic.keys():
        dic[f"{df.iloc[i, 0]}"] = [df.iloc[i, 1]]
    else:
        dic[f"{df.iloc[i, 0]}"].append(df.iloc[i, 1])

# Using the created dictionary, list of dataframes created in order to be able to use loops effectively.
dataframe_list = []
for i in dic.keys():
    dataframe_list.append(pd.DataFrame(dic[i], columns=[i]))  

In [7]:
# Outlier check is done as seen below. Outliers has been replaced with upper or lower tresholds using functions defined above.
for i in dataframe_list:
    a = has_outliers(i, i.columns)
    if a == []:
        pass
    else:
        replace_with_thresholds(i, i.columns[0])

In [8]:
# H0: there is no statistical meaningful difference between the ...
# H1: ...there is a difference

def ab_test(df1, df2):
    
    sh1 = shapiro(df1)
    sh2 = shapiro(df2)
    if sh1[1] > 0.05 and sh2[1] > 0.05:
        head1 = df1.columns[0]
        head2 = df2.columns[0]
        lh = stats.levene(df1[head1], df2[head2])
        if lh[1] > 0.05:
            test_stat, pvalue = stats.ttest_ind(df1, df2, equal_var=True)
            if pvalue < 0.05:
                print("Hypothesis is rejected.")
            else:
                print("Hypothesis is accepted.")
        else:
            test_stat, pvalue = stats.ttest_ind(df1, df2, equal_var=False)
            if pvalue < 0.05:
                print("Hypothesis is rejected.")
            else:
                print("Hypothesis is accepted.")
    else:
        print("Non-parametric distribution!")
        test_stat, pvalue = stats.mannwhitneyu(df1, df2)
        if pvalue < 0.05:
            print("Hypothesis is rejected.")
        else:
            print("Hypothesis is accepted.")
            
    return pvalue
            
def combo_list(liste):
    combos = []
    for i in range(len(liste)-1):
        for j in range(len(liste)-1):
            if i != j+1 and (j+1, i) not in combos:
                combos.append((i, j+1))
    return combos

In [9]:
combo = combo_list(dataframe_list)
combo

[(0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (0, 5),
 (1, 2),
 (1, 3),
 (1, 4),
 (1, 5),
 (2, 3),
 (2, 4),
 (2, 5),
 (3, 4),
 (3, 5),
 (4, 5)]

In [10]:
accepted = []
rejected = []
for i in combo:
    print(f"\n\nHypothesis check for the category id's: {category_id_list[i[0]]} and {category_id_list[i[1]]}")
    a = ab_test(dataframe_list[i[0]], dataframe_list[i[1]])
    if a > 0.05:
        accepted.append((category_id_list[i[0]], category_id_list[i[1]]))
    else:
        rejected.append((category_id_list[i[0]], category_id_list[i[1]]))



Hypothesis check for the category id's: 489756 and 361254
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 489756 and 874521
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 489756 and 326584
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 489756 and 675201
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 489756 and 201436
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 361254 and 874521
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 361254 and 326584
Non-parametric distribution!
Hypothesis is rejected.


Hypothesis check for the category id's: 361254 and 675201
Non-parametric distribution!
Hypothesis is accepted.


Hypothesis check for the category id's: 361254 and 201436
Non-parametric distribution!
Hypothe

In [11]:
# The list below shows the hypothesis results which were rejected, in this case shows the categories that are not the same statistically 
# in terms of their prices.
rejected

[(489756, 361254),
 (489756, 874521),
 (489756, 326584),
 (489756, 675201),
 (489756, 201436),
 (361254, 874521),
 (361254, 326584),
 (874521, 326584),
 (326584, 675201),
 (326584, 201436)]

In [12]:
# The list below shows the hypothesis results which were accepted, in this case shows the categories that are the same statistically 
# in terms their prices.
accepted

[(361254, 675201),
 (361254, 201436),
 (874521, 675201),
 (874521, 201436),
 (675201, 201436)]

In [13]:
for i in dataframe_list:
    print(i.mean())

489756    41.876341
dtype: float64
361254    34.055929
dtype: float64
874521    35.823835
dtype: float64
326584    33.360228
dtype: float64
675201    35.522887
dtype: float64
201436    34.549115
dtype: float64


items with the id of 489756 are statistically and mathematically different from the items with other id numbers. 
It's price could be evaluated using only by its own values.

items which have id numbers of 201436, 675201, 874521 statistically always the same. So their price can be evaluated together.

items which have id number of 361254 can be evaluated with either 675201 or 201436 or even on its own.

items which have id number of 326584 are statistically and mathematically different from the items with other id numbers. 
It's price could be evaluated using only by its own values.

In [14]:
# Confidence interval of price for the items with the id of 489756
print(sms.DescrStatsW(dataframe_list[0]).tconfint_mean())

(array([41.25646306]), array([42.49621953]))


In [15]:
category_id_list

array([489756, 361254, 874521, 326584, 675201, 201436], dtype=int64)

In [16]:
# Confidence interval of price for the items with the id's of 201436, 675201, 874521
a = np.array(dataframe_list[5]["201436"])    
b = np.array(dataframe_list[4]["675201"]) 
c = np.array(dataframe_list[2]["874521"])
df_1 = pd.DataFrame(np.concatenate((a, b, c), axis=None))
print(sms.DescrStatsW(df_1).tconfint_mean())

(array([35.29936696]), array([36.01482213]))


In [17]:
# Confidence interval of price for the items with id of 361254
print(sms.DescrStatsW(dataframe_list[1]).tconfint_mean())

(array([33.82469425]), array([34.28716354]))


In [18]:
# Confidence interval of price for the items with id of 326584
print(sms.DescrStatsW(dataframe_list[3]).tconfint_mean())

(array([32.66450972]), array([34.05594561]))
