### Introduction

The goal here is to evaluate whether a pricing test running on the site has been successful.
As always, you should focus on user segmentation and provide insights about segments
who behave differently as well as any other insights you might find.

Questions:
1. Should the company sell its software for $39 or $59?
The VP of Product is interested in having a holistic view into user behavior, especially
focusing on actionable insights that might increase conversion rate. What are your main
findings looking at the data?

2. The VP of Product feels that the test has been running for too long and he should
have been able to get statistically significant results in a shorter time. Do you agree with
her intuition? After how many days you would have stopped the test? Please, explain
why.

### My Investigation

To compare the converted and non-converted proportiions, I used statistical test. My analysis planUsing sample data, complete the following computations to find the test statistic and its associated P-Value.

Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled sample proportion (p) to compute the standard error of the sampling distribution.
p = (p1 * n1 + p2 * n2) / (n1 + n2)

where p1 is the sample proportion from population 1, p2 is the sample proportion from population 2, n1 is the size of sample 1, and n2 is the size of sample 2.
Standard error. Compute the standard error (SE) of the sampling distribution difference between two proportions.
SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }

where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of sample 2.
Test statistic. The test statistic is a z-score (z) defined by the following equation.
z = (p1 - p2) / SE

where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is the standard error of the sampling distribution.

P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the probability associated with the z-score. (See sample problems at the end of this lesson for examples of how this is done.)
The analysis described above is a two-proportion z-test.


### Hypothesis

The null hypothesis is that no difference between the two population proportions (i.e., d = P1 - P2 = 0), while the alternative hypothesis is that d>0 ie. there is a difference between the proportions.  The null and alternative hypothesis for a two-tailed test are often stated in the following form.

Ho: P1 = P2
Ha: P1 ≠ P2

### Findings

### Recommendation

In [52]:
import numpy as np
import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt 
from matplotlib import dates
%matplotlib inline

import scipy.stats

In [14]:
users_df = pd.read_csv('Pricing_Test_data/user_table.csv')
users_df.head(10)

Unnamed: 0,user_id,city,country,lat,long
0,510335,Peabody,USA,42.53,-70.97
1,89568,Reno,USA,39.54,-119.82
2,434134,Rialto,USA,34.11,-117.39
3,289769,Carson City,USA,39.15,-119.74
4,939586,Chicago,USA,41.84,-87.68
5,229234,New York,USA,40.67,-73.94
6,339138,Durham,USA,35.98,-78.91
7,270353,New York,USA,40.67,-73.94
8,166748,Burke,USA,38.78,-77.27
9,167700,New York,USA,40.67,-73.94


In [15]:
test_results_df = pd.read_csv('Pricing_Test_data/test_results.csv')
test_results_df.head(10)

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0
2,317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0
3,685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0
4,820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0
5,169971,2015-04-13 12:07:08,ads-google,mobile,iOS,0,39,0
6,600150,2015-03-04 14:45:44,seo_facebook,web,windows,0,39,0
7,798371,2015-03-15 08:19:29,ads-bing,mobile,android,1,59,1
8,447194,2015-03-28 12:28:10,ads_facebook,web,windows,1,59,0
9,431639,2015-04-24 12:42:18,ads_facebook,web,windows,1,59,0


In [16]:
# Sample a few observations from the dataset
indices = [56, 245, 392]

# Create a DataFrame of the chosen samples
samples = pd.DataFrame(users_df.loc[indices], columns = users_df.keys()).reset_index(drop = True)
print ("Chosen samples of users dataset:")
display(samples)

Chosen samples of users dataset:


Unnamed: 0,user_id,city,country,lat,long
0,555109,Dearborn,USA,42.31,-83.21
1,521380,Brandon,USA,27.93,-82.29
2,504048,Raleigh,USA,35.82,-78.66


### Describe the customers ...


In [27]:
# sanity check do any users submit more than one entry in the test table
sum(test_results_df.groupby('user_id').count().timestamp == 1)

# Great, now I'll bring in the user data
joined_table = test_results_df.set_index('user_id').join(users_df.set_index('user_id'), on = 'user_id')

joined_table.head()

Unnamed: 0_level_0,timestamp,source,device,operative_system,test,price,converted,city,country,lat,long
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0,Buffalo,USA,42.89,-78.86
624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0,Lakeville,USA,44.68,-93.24
317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0,Parma,USA,41.38,-81.73
685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0,Fayetteville,USA,35.07,-78.9
820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0,Fishers,USA,39.95,-86.02


In [28]:
# another sanity check how did that join work
print(sum(joined_table.groupby('user_id').count().timestamp == 1))
print(sum(joined_table.groupby('user_id').count().city == 1))

316800
275616


In [42]:
# making sure that the total number of users add up (275616+41184= 31600)
joined_table.isna().sum()

timestamp               0
source                  0
device                  0
operative_system        0
test                    0
price                   0
converted               0
city                41184
country             41184
lat                 41184
long                41184
dtype: int64

In [43]:
# I should probably keep this in mind in my segmentation analysis. 
# for now I'll assume that things are properly randomized and
# look at the results.
joined_table.groupby(['test']).count()

Unnamed: 0_level_0,price,converted,lat,long
test,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,39.020718,0.019904,37.096686,-93.984342
1,58.972824,0.015543,37.138351,-93.977199


In [44]:
# now I can look at the conversion
# probability.
joined_table.groupby(['test']).mean()

Unnamed: 0_level_0,price,converted,lat,long
test,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,39.020718,0.019904,37.096686,-93.984342
1,58.972824,0.015543,37.138351,-93.977199


Before making the comparison let me think about what sort of change would be meaningfull from a bussness perspective - a higher price would probably mean fewer conversions, so how many customers could we loose at the higher price and still increase our revenue? Assuming that this doesn't change the overall userbase ie. customers leave entierly. Essentially this means that new conversion probability should not go down more than (P_con_old*price_old)/price_new

In [36]:
price_old = 39
price_new = 59

P_con_old = joined_table.groupby(['test']).mean().converted[0]
P_con_new = joined_table.groupby(['test']).mean().converted[1]

n_old = joined_table.groupby(['test']).count().timestamp[0]
n_new = joined_table.groupby(['test']).count().timestamp[1]

rev_thresh = (price_old*P_con_old)/price_new
print('rev_thresh = ' + str(rev_thresh))

rev_thresh = 0.013156626348885488


So I need to analyize the test results with this in mind. What I want to calculate is from the point estimate p_con_new of my new conversion probability P_con_new, what is the probability that the actual estimate is lower than this key value. To do this I calculate the sampling distribution around the threshold using the pooled standard error

The pooled proportion is:

p = (p1 n1 + p2 n2) / (n1 + n2)

The pooled standard error is then:

SE = sqrt( p ( 1 - p ) ((1/n1) + (1/n2)) )

In [56]:
p = (P_con_old * n_old + P_con_new * n_new) / (n_old + n_new)
SE =  np.sqrt( p * ( 1 - p ) * ((1/n_old) + (1/n_new)))
print('standard_error = ' + str(SE))

#standard error of the estimate
z_statistic= (P_con_new- rev_thresh)/SE
print('Z-statistic = ' + str(z_statistic))

#P-value
print('p = ' + str(1-scipy.stats.norm.cdf(z_statistic)))

standard_error = 0.0004965329774424502
Z-statistic = 4.8054259645381165
p = 7.721118726600196e-07


The point estimate for the new conversion rate works out to be about 4.8 standard deviations greater than the key bussness threshold. The probability of observing a value this high given that the null hypothesis was true is very low. Given this result, I would increase the price.

Also, since the p-value is so low, it is worth mentioning that the experiment is probably over-powered. Assuming that we had access to the original conversion rate before performing the test we could have performed a power calculation to determine the sample size needed to safely reject this null hypothesis from the outset.

In [57]:
print(P_con_new*price_old)
print(SE*1.96*price_old)

print(P_con_new*price_new)
print(SE*1.96*price_new)

0.6061644736265374
0.037954980795700886
0.9170180498452746
0.057419073511444936


In real dollars this means with the new price we would on average make around \$0.91±0.06 per user compared with \$0.60±0.04 per user at the old price.