Pricing Data Challange:

The goal here is to evaluate whether a pricing test running on the site has been successful. 

Company XYZ sells a software for \$39. Since revenue has been flat for some time, the VP of Product has decided to run a test increasing the price. She hopes that this would increase revenue. In the experiment, 66% of the users have seen the old price (\$39), while a random sample of 33% users a higher price (\$59).
The test has been running for some time and we are interested in understanding how it went and whether it would make sense to increase the price for all the users.

We want to answer the following questions:

* Should the company sell its software for \$39 or \$59?
    * Switch to the higher price: The higher price results in a decrease in users, but not enough to result in lost revenue.
    p = 7.9e-7
    
    
* Has the test been running too long and he should have been able to get statistically significant results in a shorter time?
    * Yes way to long.

In [1]:
DATA_DIR = '/Users/theodorelindsay/src.git/ds_projects/09_ds_challenge2_abtest/Pricing_Test_data/'

In [2]:
import os
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import scipy.stats

In [3]:
os.listdir(DATA_DIR)

['user_table.csv', 'test_results.csv']

In [4]:
user_table = pd.read_csv(DATA_DIR + 'user_table.csv')
test_results = pd.read_csv(DATA_DIR + 'test_results.csv')

In [5]:
# sanity check do any users submit more than one entry in the test table
sum(test_results.groupby('user_id').count().timestamp == 1)

316800

In [6]:
# Great, now I'll bring in the user data
joined_table = test_results.set_index('user_id').join(user_table.set_index('user_id'), on = 'user_id')

In [7]:
joined_table.head()

Unnamed: 0_level_0,timestamp,source,device,operative_system,test,price,converted,city,country,lat,long
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0,Buffalo,USA,42.89,-78.86
624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0,Lakeville,USA,44.68,-93.24
317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0,Parma,USA,41.38,-81.73
685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0,Fayetteville,USA,35.07,-78.9
820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0,Fishers,USA,39.95,-86.02


In [8]:
# another sanity check how did that join work
print(sum(joined_table.groupby('user_id').count().timestamp == 1))
print(sum(joined_table.groupby('user_id').count().city == 1))

316800
275616


In [9]:
#It seems there were some users in the experiment table that didn't exist 
#in the user_table,let me check on that.
joined_table.isna().country[:10]

user_id
604839    False
624057    False
317970    False
685636    False
820854    False
169971    False
600150     True
798371    False
447194    False
431639    False
Name: country, dtype: bool

In [10]:
user_table[user_table.user_id == 600150]

Unnamed: 0,user_id,city,country,lat,long


In [11]:
# I should probably keep this in mind in my segmentation analysis. 
# for now I'll assume that things are properly randomized and
# look at the results.

In [12]:
joined_table.groupby(['test']).count()

Unnamed: 0_level_0,timestamp,source,device,operative_system,price,converted,city,country,lat,long
test,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,202727,202727,202727,202727,202727,202727,176428,176428,176428,176428
1,114073,114073,114073,114073,114073,114073,99188,99188,99188,99188


In [13]:
# now I can look at the conversion
# probability.
joined_table.groupby(['test']).mean()

Unnamed: 0_level_0,price,converted,lat,long
test,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,39.020718,0.019904,37.096686,-93.984342
1,58.972824,0.015543,37.138351,-93.977199


In [14]:
# 2 percent conversion rate, in the untested population.
# with this low rate. Can I use normal assumption
# in comparing these samples? The rule of thumb
# is n*p and n*(1-p) should be > 10.
print(202727*0.019904)

4035.0782080000004


Ok. Before making the comparison let me think about what sort of change would be meaningfull from a bussness perspective - a higher price would *probably* mean fewer conversions, so how many customers could we loose at the higher price and still increase our revenue? Assuming that this doesn't change the overall userbase ie. customers leave entierly. Essentially this means that new conversion probability should not go down more than (P_con_old*price_old)/price_new

In [15]:
price_old = 39
price_new = 59
P_con_old = joined_table.groupby(['test']).mean().converted[0]
P_con_new = joined_table.groupby(['test']).mean().converted[1]
n_old = joined_table.groupby(['test']).count().timestamp[0]
n_new = joined_table.groupby(['test']).count().timestamp[1]
rev_thresh = (price_old*P_con_old)/price_new
print('rev_thresh = ' + str(rev_thresh))

rev_thresh = 0.0131566263489


So I need to analyize the test results with this in mind. What I want to calculate is from the point estimate p_con_new of my new conversion probability P_con_new, what is the probability that the actual estimate is lower than this key value. To do this I calculate the sampling distribution around the threshold using the pooled standard error

The pooled proportion is:

p = (p1 * n1 + p2 * n2) / (n1 + n2)

The pooled standard error is then:

SE = sqrt( p * ( 1 - p ) * ((1/n1) + (1/n2)) )

In [16]:
p = (P_con_old * n_old + P_con_new * n_new) / (n_old + n_new)

In [17]:
SE =  np.sqrt( p * ( 1 - p ) * ((1/n_old) + (1/n_new)))

The point estimate for the new conversion rate works out to be about 4.8 standard deviations greater than the key bussness threshold. The probability of observing a value this high given that the null hypothesis was true is very low. Given this result, I would increase the price.

Also, since the p-value is so low, it is worth mentioning that the experiment is probably over-powered. Assuming that we had access to the original conversion rate before performing the test we could have performed a power calculation to determine the sample size needed to safely reject this null hypothesis from the outset.

In [18]:
#standard error of the estimate
(P_con_new-rev_thresh)/SE

4.8054259645381165

In [19]:
#P-value
print('p = ' + str(1-scipy.stats.norm.cdf(4.8)))

p = 7.93328151949e-07


In real dollars this means with the new price we would on average make around \$0.91±0.06 per user compared with \$0.60±0.04 per user at the old price.

In [20]:
print(P_con_new*price_new)
print(SE*1.96*price_new)
print(P_con_new*price_old)
print(SE*1.96*price_old)

0.917018049845
0.0574190735114
0.606164473627
0.0379549807957
