### Goal: Evaluate whether a pricing test running on the site has been successful.
As always, you should focus on user segmentation and provide insights about segments
who behave differently as well as any other insights you might find.

- Should the company sell its software for \$39 or \$59?
- The VP of Product is interested in having a holistic view into user behavior, especially focusing on actionable insights that might increase conversion rate. What are your main findings looking at the data?
- [Bonus] The VP of Product feels that the test has been running for too long and he should have been able to get statistically significant results in a shorter time. Do you agree with her intuition? After how many days would you have stopped the test? Why?

### Summary

The company should sell its software for \$59. When the price is increased to \$59, the total revenue earned per viewer is higher with the higher price; \$0.99/viewer compared to \$0.78/viewer (p < 0.001). Of course, the conversion rate does decrease with the higher rate, as would expected. It drops to 0.015 from 0.019 However, the increase in price per sale at the lower conversion rate makes up for the fewer units sold, leading to higher overall revenue. 

Referrals from a friend have the highest conversion rate. To increase conversion rates, promoting referrals may have the greatest effect. 

I agree with the VP. The results of a power analysis indicate we need far less data than we collected. Given the difference of 0.15 in the sample means, if we were highly conservative and used a significance level of 0.001 and power of 0.99, we needed to collect 3,250 samples in the control group and 2/3 that number in the test group.

Plan of Attack:
- Clean data
    - Check 0/1 matches 39/59
    - Check for imbalanced data
    - A/A test: are the groups random re: source, device, access method
    - effects of time? need to worry about seasonal effect? => No, if split data truly randomly
- Check assumptions of t-test
- Determine metric for 'did changing price result in an increase in revenue?'

My biggest aim is to evaluate whether increasing the price of software increased revenue. To do this, I use:

Ho: Increasing the price did not increase revenue.

Ha: Increasing price to \$59 did increase revenue

In [172]:
import matplotlib.pyplot as plt
import pandas as pd
import  numpy as np
#import seaborn as sns
import statsmodels.api as sm
from datetime import datetime
from scipy.stats import mannwhitneyu, ttest_ind
%matplotlib inline

In [9]:
user_data_file = 'data/Pricing_Test_data/test_results.csv'
test_data_file = 'data/Pricing_Test_data/user_table.csv'

In [10]:
user_data = pd.read_csv(user_data_file)
test_data = pd.read_csv(test_data_file)

In [11]:
user_data.head()

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0
2,317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0
3,685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0
4,820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0


In [12]:
test_data.head()

Unnamed: 0,user_id,city,country,lat,long
0,510335,Peabody,USA,42.53,-70.97
1,89568,Reno,USA,39.54,-119.82
2,434134,Rialto,USA,34.11,-117.39
3,289769,Carson City,USA,39.15,-119.74
4,939586,Chicago,USA,41.84,-87.68


Clean data, checking for:
- Check 0/1 matches 39/59
- Check for imbalanced data
- A/A test: are the groups random re: source, device, access method
- effects of time? need to worry about seasonal effect? => No, if split data truly randomly

In [13]:
user_data.shape

(316800, 8)

In [139]:
bad_data = user_data[(user_data.test==1) & (user_data.price ==39)].index
bad_data

Int64Index([  1457,   1912,   2337,   3147,   4277,  11792,  11975,  13838,
             16633,  18009,
            ...
            304776, 307648, 308539, 311541, 312755, 313723, 314391, 314402,
            314696, 315864],
           dtype='int64', length=155)

In [140]:
clean_user = user_data.drop(bad_data)

In [141]:
bad_data2 = clean_user[(clean_user.test==0) & (clean_user.price ==59)].index
bad_data2

Int64Index([  8238,   8369,  11555,  12848,  14630,  15020,  15724,  17252,
             20223,  20623,
            ...
            305508, 307444, 308723, 310426, 312287, 312725, 313735, 314275,
            315529, 316663],
           dtype='int64', length=210)

In [142]:
clean_user = clean_user.drop(bad_data2)

In [143]:
#Check for imbalanced data
sum(clean_user['test']==1)/len(clean_user['test'])

0.36000442428934853

That's enough balanced to not cause an issue in a t-test, or most basic statistical analyses

In [58]:
#Check if truly split randomly- across devices, access source
source_tab = pd.crosstab(index=clean_user['test'],columns=clean_user['source'])
source_tab.iloc[1]/source_tab.iloc[0]

source
ads-bing           0.565483
ads-google         0.568027
ads-yahoo          0.573358
ads_facebook       0.565857
ads_other          0.548920
direct_traffic     0.556933
friend_referral    0.575947
seo-bing           0.613243
seo-google         0.534199
seo-other          0.572376
seo-yahoo          0.570478
seo_facebook       0.573657
dtype: float64

In [59]:
device_tab = pd.crosstab(index=clean_user['test'],columns=clean_user['device'])
device_tab.iloc[1]/device_tab.iloc[0]

device
mobile    0.547467
web       0.584555
dtype: float64

Data seems relatively  balanced- proportions of each source and device are comparable to proportions of how many got the test

Need to choose a metric: 
- total revenue
    - total revenue = #sold * cost/unit
- conversion rate
    - proportion who bought

In [151]:
#Check if split cross time evenly
print(type(clean_user['timestamp'][0]))

datetime_object = datetime.strptime(clean_user['timestamp'][0], '%Y-%m-%d %H:%M:%S')
print(datetime_object)
print(datetime_object.month)

clean_user['timestamp'] = pd.to_datetime(clean_user['timestamp'], format='%Y-%m-%d %H:%M:%S', errors='coerce')

<class 'str'>
2015-05-08 03:38:34
5


In [157]:
print(clean_user['timestamp'].dt.year.unique())
print(clean_user['timestamp'].dt.month.unique())

[2015.   nan]
[ 5.  4.  3. nan]


In [160]:
month_test = pd.crosstab(index=clean_user['test'],columns=clean_user['timestamp'].dt.month)
month_test.iloc[1]/month_test.iloc[0]

timestamp
3.0    0.565773
4.0    0.556058
5.0    0.567263
dtype: float64

In [60]:
#Calculate total revenue for each group

In [66]:
39*len(clean_user[(clean_user['test']==0) & (clean_user['converted']==1)])/len(clean_user[clean_user['test']==0])

0.7760829955016122

In [67]:
59*len(clean_user[(clean_user['test']==1) & (clean_user['converted']==1)])/len(clean_user[clean_user['test']==1])

0.917747853719342

This is the unit price * the number of units purchased/max number could be purchased. This represents the revenue earned per user who viewed the site. Overall, revenue per viewer is higher for viewers who saw the price of $59. 

Run a t-test, where the response is the revenue fromm each viewer

In [None]:
#run a t test- two columns are the revenue generated from that viewer
control_data = clean_user[clean_user['test']==0]['converted']*clean_user[clean_user['test']==0]['price']
test_data = clean_user[clean_user['test']==1]['converted']*clean_user[clean_user['test']==1]['price']
ttest_ind(control_data, test_data)

print(control_data.mean())
print(test_data.mean())

In [107]:
n1 = len(clean_user[clean_user['test']==0])
n2 = len(clean_user[clean_user['test']==1])
s1 = len(clean_user[(clean_user['test']==0) & (clean_user['converted']==1)])
s2 = len(clean_user[(clean_user['test']==1) & (clean_user['converted']==1)])

In [108]:
zscore, pval = sm.stats.proportions_ztest([s1, s2], [n1, n2], alternative='smaller', value=0)


In [109]:
print(zscore)
print(pval)

8.74375299250435
1.0


Sold less units statistically, but did the highe price make up for this difference in # sold?

In [115]:
#count each dollar as a sample- did the company get that dollar?
n1d = len(clean_user[clean_user['test']==0])
n2d = len(clean_user[clean_user['test']==1])
s1d = len(clean_user[(clean_user['test']==0) & (clean_user['converted']==1)])
s2d = 20*len(clean_user[(clean_user['test']==1) & (clean_user['converted']==1)])#count each conversion of $59 as a weight of 20, since it brings in $20 more?? statistically invalid because artificially increasing sample size
zscore, pval = sm.stats.proportions_ztest([s1d, s2d], [n1d, n2d], alternative='smaller', value=0)
print(zscore)
print(pval)

-237.96694230979944
0.0


In [111]:
s1d/n1d

0.019899563987220825

In [171]:
#try mann-whitney test on revenue
#no- makes no sense- since revenue for test group HAS to be higher than control group if it isn't 0. 
#Would work if shopping cart total was continuous- not 0 or 59/39

0.917747853719342

### Other findings looking at data

In [113]:
#groupby: conversions by device, by source
converted_source_tab = pd.crosstab(index=clean_user['converted'],columns=clean_user['source'])
converted_source_tab.iloc[1]/converted_source_tab.iloc[0]

source
ads-bing           0.012147
ads-google         0.021975
ads-yahoo          0.015015
ads_facebook       0.021657
ads_other          0.014559
direct_traffic     0.012447
friend_referral    0.040245
seo-bing           0.024367
seo-google         0.017271
seo-other          0.015936
seo-yahoo          0.016493
seo_facebook       0.016312
dtype: float64

Referral from a friend has much higher conversion rate

In [114]:
converted_device_tab = pd.crosstab(index=clean_user['converted'],columns=clean_user['device'])
converted_device_tab.iloc[1]/converted_device_tab.iloc[0]

device
mobile    0.018905
web       0.018354
dtype: float64

No difference in mobile and web users

In [169]:
converted_month_test = pd.crosstab(index=[clean_user['converted'],clean_user['test']],columns=clean_user['timestamp'].dt.month)
print(converted_month_test)
print(converted_month_test.iloc[1]/converted_month_test.iloc[0])
print(converted_month_test.iloc[3]/converted_month_test.iloc[2])

timestamp         3.0    4.0    5.0
converted test                     
0         0     60968  62125  68875
          1     34617  34739  39236
1         0      1223   1291   1387
          1       569    524    621
timestamp
3.0    0.567790
4.0    0.559179
5.0    0.569670
dtype: float64
timestamp
3.0    0.465249
4.0    0.405887
5.0    0.447729
dtype: float64


No difference between months

### Power analysis- how long did we need to run the test for?

In [188]:
sm.stats.tt_ind_solve_power(effect_size = 0.15, alpha=.05, power=0.8, ratio=0.66)

878.5396182469023

We needed to gather 888 samples in the control group to get the same conclusion given a significance level of 0.05 and P(correctly rejecting the null) = 0.8. If we want to be super-duper sure before raising the value,, we can repeat this with a smaller significance level and higher power.

In [190]:
sm.stats.tt_ind_solve_power(effect_size = 0.15, alpha=.001, power=0.99, ratio=0.66)

3529.9844288995077

We still only needed 3,530 samples in the control group to get the same result. 

In [200]:
#how long does it take to get this many samples?

control_group = clean_user[clean_user['test']==0]
control_group['timestamp'].sort()

AttributeError: 'Series' object has no attribute 'sort'

In [193]:
clean_user.shape

(316435, 8)

In [198]:
control_group.columns

Index(['user_id', 'timestamp', 'source', 'device', 'operative_system', 'test',
       'price', 'converted'],
      dtype='object')