## Dynamic Pricing - A/B Testing - Sales Simulation


##### In this project I tried to apply A/B Testing to find out optimum price with a confidence intervall and simulate sales with different prices. Our goal is to ensure the maximum profit as predicted :)

##### A/B Testing is done without using # ANOVA because of educational purposes

##### The data we are going to use is consist of 2 columns which are Shop_id and Price.
##### The Shop_id represents 6 different shops and price is the amount that how much a customer wish to pay for a particular Item
##### Data set is created gathering survey answers taken place different shops.
##### This project considers that survey participants answered the questions with rationality

### Imports

In [50]:
import pandas as pd
from scipy import stats
from scipy.stats import shapiro
import statsmodels.stats.api as sms


### Data Prepaeration

In [51]:
df= pd.read_csv("data.csv")

In [52]:
df.head()

Unnamed: 0,shop_id,price
0,shop_4,32.117753
1,shop_3,30.71137
2,shop_3,31.572607
3,shop_4,34.54384
4,shop_4,47.205824


In [53]:
df.shop_id.value_counts()

shop_4    1661
shop_6     733
shop_3     615
shop_2     144
shop_5     129
shop_1      97
Name: shop_id, dtype: int64

##### As we can see here the survey participants are not balanced between shops. But we will go on as if it is balanced

##### Lets see what is the mean of price in different shops

In [54]:
df.groupby("shop_id").agg({"price": "mean"})

Unnamed: 0_level_0,price
shop_id,Unnamed: 1_level_1
shop_1,36.175498
shop_2,35.69317
shop_3,35.477261
shop_4,43.872913
shop_5,37.443592
shop_6,40.376575


In [55]:
df.describe([0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99]).T

Unnamed: 0,count,mean,std,min,1%,5%,10%,25%,50%,75%,90%,95%,99%,max
price,3379.0,40.771387,19.730179,10.0,30.0,30.0,30.0,31.802522,34.744389,40.569827,56.921335,74.453248,135.848659,205.052944


##### When we look at the variance of price variable we can see that our assumption regarding answers of survey participants seems allright with a couple of exception.

##### Lets get rid of prices higher than 100

In [56]:
df=df[df.price <= 100]

In [57]:
df.describe([0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99]).T

Unnamed: 0,count,mean,std,min,1%,5%,10%,25%,50%,75%,90%,95%,99%,max
price,3292.0,38.284972,11.657494,10.0,30.0,30.0,30.0,31.672294,34.678475,38.528301,53.188295,65.198083,88.622812,99.169972


##### Lets seperate shops for Testing

In [58]:
shop_1 = df[(df["shop_id"] == "shop_1")]
shop_2 = df[(df["shop_id"] == "shop_2")]
shop_3 = df[(df["shop_id"] == "shop_3")]
shop_4 = df[(df["shop_id"] == "shop_4")]
shop_5 = df[(df["shop_id"] == "shop_5")]
shop_6 = df[(df["shop_id"] == "shop_6")]


### A/B Testing

##### To decide parametric or non-parametric test we need to check the distribution and varience of each shop values 
##### If the values are normally distributed. We will use independent t-test (parametric). If not we will use Mann–Whitney U test (non-parametric)


#### Normal Distribution Test

##### Our hypothesises
##### H0 = Price variable is normally distributed
##### H1 = Price variable is not normally distributed

In [59]:
shop_list = [shop_1,shop_2,shop_3,shop_4,shop_5,shop_6]
for i in shop_list:
    test_statistics, pvalue = shapiro(i["price"])
    print('Test statistics = %.4f, p-value = %.4f' % (test_statistics, pvalue))

Test statistics = 0.6190, p-value = 0.0000
Test statistics = 0.5513, p-value = 0.0000
Test statistics = 0.4935, p-value = 0.0000
Test statistics = 0.7639, p-value = 0.0000
Test statistics = 0.6382, p-value = 0.0000
Test statistics = 0.6631, p-value = 0.0000


##### We rejected H0 hypothesis. Because, as you can see the p-values are smaller than "0,05"

##### In this case we need to use Mann–Whitney U test. If the data is not normally distributed we do not have to check the variance homogeneity.

#### Mann–Whitney U test

##### Our hypothesises
##### H0 = Statisticly there is not a difference between prices accros shops
##### H1 = Statisticly there is a difference between prices accros shops

In [60]:
shop_list = [shop_1,shop_2,shop_3,shop_4,shop_5,shop_6] # The same shop list we used abave
shop_names = ["shop_1","shop_2","shop_3","shop_4","shop_5","shop_6"] # shop names as string
test_stat = []
p_values = []
first_shop =[]
second_shop = []
for i in range(0, len(shop_list)): # Here we create a nested for loop to provide all shop combinatios crosswise
    for j in range(0, len(shop_list)):
        if i == j:                      # Here we are making sure that shops are not crossed themselves
            continue
        test_statistics, pvalue = stats.mannwhitneyu(shop_list[i]["price"], shop_list[j]["price"])
        test_stat.append(test_statistics)
        p_values.append(pvalue.round(4))
        first_shop.append(shop_names[i]) 
        second_shop.append(shop_names[j])
        
p_value_table = pd.DataFrame({"first_shop":first_shop,"second_shop":second_shop,"test_stat": test_stat, "p_values":p_values})
# Here we are creating a data frame with the lists appended above

##### Lets see our p_value table loks like

In [61]:
p_value_table

Unnamed: 0,first_shop,second_shop,test_stat,p_values
0,shop_1,shop_2,5010.0,0.0001
1,shop_1,shop_3,29424.0,0.4252
2,shop_1,shop_4,60158.0,0.0001
3,shop_1,shop_5,6121.0,0.3905
4,shop_1,shop_6,34006.0,0.4124
5,shop_2,shop_1,5010.0,0.0001
6,shop_2,shop_3,31313.0,0.0
7,shop_2,shop_4,65110.0,0.0
8,shop_2,shop_5,6575.5,0.0
9,shop_2,shop_6,36586.0,0.0


##### As you can see all possible combinatios of price comparison between shops included in the table. 
##### If you are thinking about dublicated records above, do not worry this was on purpose
##### We will use the table as is

In [62]:
p_value_table.groupby("first_shop").agg({"p_values": "mean"})

Unnamed: 0_level_0,p_values
first_shop,Unnamed: 1_level_1
shop_1,0.24566
shop_2,2e-05
shop_3,0.2079
shop_4,4e-05
shop_5,0.24426
shop_6,0.2276


##### Here as you can see we took the avarage of p-values to understand how much these shops are different from each other interms of price
##### If the p-value is less than "0.05" we can say that there is a significant difference between shops as we stated in out hypotesis




##### Based on this information Statisticly shop_2 and shop_4 is different from other shops interms of price. And this difference can not be by chance.

##### For other shops we can not say that

##### Now we know that the 2 shops mentioned above are different from other shops interms of price customers willing to pay.

##### As our goal is to ensure maximum profit we need to have different prices in these 2 shops as they are not alike with any of the shops including each other.(check p-value table line 7)

##### For other shops which are not statisticly different from each other interms of price, we can set the same price.

##### As company policy we can decide that item price should be same across the shops. But we are not going to do that.

### Confidence intervals

##### As we decided to go on with different prices we need to determine different confidence intervals. 

In [63]:
alike_shops = ["shop_1", "shop_3","shop_5", "shop_6"] # alike shops interms of price
shop_1356 = df[(df["shop_id"].isin(alike_shops))] # here we are gathering them to calculate confidence interval
shop_1356_con_int =sms.DescrStatsW(shop_1356["price"]).tconfint_mean() # here we are calculating confidence interval
shop_1356_con_int

(35.803488556367476, 36.7997387818449)

##### Here we calculated confidence interval for alike shops

In [64]:
shop_2_con_int = sms.DescrStatsW(shop_2["price"]).tconfint_mean() 
shop_2_con_int

(33.244221031281384, 36.24683640338621)

##### Here we calculated confidence interval for shop_2

In [65]:
shop_4_con_int = sms.DescrStatsW(shop_4["price"]).tconfint_mean()
shop_4_con_int

(39.891734916786355, 41.15449478737718)

##### Here we calculated confidence interval for shop_4

### Simulations of Profit

#### We will calculate the profit with the minimum and maximum values of the confidence intervals 

#### First lets calculate the profit with max value of confidence intervals

##### Shop_1356 Revenue Calculation

In [66]:
freq_1356 = len(shop_1356[shop_1356["price"] >= shop_1356_con_int[1]]) # here we determined that how many people is willing to pay more than our max confidence interval 
revenue_1356 = freq_1356 * shop_1356_con_int[1] # here we multiplied what we found above with max confidence interval to calculate revenue
print(f'Revenue: {revenue_1356}')

Revenue: 10524.725291607641


##### Shop_2 Revenue Calculation

In [67]:
freq_2 = len(shop_2[shop_2["price"] >= shop_2_con_int[1]]) # here we determined that how many people is willing to pay more than our max confidence interval 
revenue_2 = freq_2 * shop_2_con_int[1] # here we multiplied what we found above with max confidence interval to calculate revenue
print(f'Revenue: {revenue_2}')

Revenue: 724.9367280677243


##### Shop_1356 Revenue Calculation

In [68]:
freq_4 = len(shop_4[shop_4["price"] >= shop_4_con_int[1]]) # here we determined that how many people is willing to pay more than our max confidence interval 
revenue_4 = freq_4 * shop_4_con_int[1] # here we multiplied what we found above with max confidence interval to calculate revenue
print(f'Revenue: {revenue_4}')

Revenue: 20412.629414539082


In [69]:
total_revenue_with_max = revenue_1356 + revenue_2 + revenue_4 # here we calculate total revenue
total_revenue_with_max.round(2)

31662.29

##### As you can see that our total revenue is 31662.29 for the scenario that we decided to go on max value of confidence interval 

##### Now lets calculate the profit with min value of confidence intervals

##### Shop_1356 Revenue Calculation

In [70]:
freq_1356 = len(shop_1356[shop_1356["price"] >= shop_1356_con_int[0]]) # here we determined that how many people is willing to pay more than our max confidence interval 
revenue_1356 = freq_1356 * shop_1356_con_int[0] # here we multiplied what we found above with max confidence interval to calculate revenue
print(f'Revenue: {revenue_1356}')

Revenue: 12602.827971841352


##### Shop_2 Revenue Calculation

In [71]:
freq_2 = len(shop_2[shop_2["price"] >= shop_2_con_int[0]]) # here we determined that how many people is willing to pay more than our max confidence interval 
revenue_2 = freq_2 * shop_2_con_int[0] # here we multiplied what we found above with max confidence interval to calculate revenue
print(f'Revenue: {revenue_2}')

Revenue: 1662.2110515640693


##### Shop_1356 Revenue Calculation

In [72]:
freq_4 = len(shop_4[shop_4["price"] >= shop_4_con_int[0]]) # here we determined that how many people is willing to pay more than our max confidence interval 
revenue_4 = freq_4 * shop_4_con_int[0] # here we multiplied what we found above with max confidence interval to calculate revenue
print(f'Revenue: {revenue_4}')

Revenue: 21780.88726456535


In [73]:
total_revenue_with_min = revenue_1356 + revenue_2 + revenue_4 # here we calculate total revenue
total_revenue_with_min.round(2)

36045.93

##### As you can see that our total revenue is 36045.93 for the scenario that we decided to go on min value of confidence interval 

### Conclusion

##### In this project our goal was to set optimum price for a particular item being sold in different shops

##### We evaluated the survey results by applying A/B test to see wheter there is a statisticly significant difference between prices people wish to pay across the shops

##### We found that 2 shops is are not alike with any other shops according to our A/B test result

##### Then we determined to confidence intervals for different shops

##### And Finally we simulate the sales according to maximum and minimum values of confidence interval

##### According to our foundings above setting the price to minimum value of confidence interval resulted in higher revenue amount. 
##### However it does not mean that it is more profitable to sell this product with min value of confidence interval as we do not know what is the cost for this product.
##### But our job is done here. We made the necessary tests and provided the information needed.