# 网站首页对下载量和购买量影响研究

根据cookie将用户随机分配到实验组和对照组：
- 对照组：首页不做调整，记录网页的每日用户数量、下载量和购买量
- 实验组：首页突出显示7天试用期，记录网页的每日用户数量、下载量和购买量

## 1. 读取数据

读取实验数据，并初步查看数据

In [1]:
# 导入相应的包
import pandas as pd 
import numpy as np 
import scipy.stats as stats

# 读取数据
data = pd.read_csv('homepage-experiment-data.csv')

# 数据大小
print('数据大小：', data.shape)

# 查看数据
data.head(5)

数据大小： (29, 7)


Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
0,1,1764,246,1,1850,339,3
1,2,1541,234,2,1590,281,2
2,3,1457,240,1,1515,274,1
3,4,1587,224,1,1541,284,2
4,5,1606,253,2,1643,292,3


数据大小29行7列，每列的意义：
- Day                  : 实验的天数
- Control Cookies      ：对照组的cookie数量
- Control Downloads    ：对照组的下载量
- Control Licenses     ：对照组的购买量
- Experiment Cookies   ：实验组的cookie数量
- Experiment Downloads ：实验组的下载量
- Experiment Licenses  ：实验组的购买量

In [2]:
data.describe()

Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
count,29.0,29.0,29.0,29.0,29.0,29.0,29.0
mean,15.0,1615.551724,260.482759,24.482759,1632.62069,294.758621,25.241379
std,8.514693,116.308268,28.338037,13.873461,113.02636,22.404807,13.76241
min,1.0,1457.0,223.0,1.0,1458.0,256.0,1.0
25%,8.0,1529.0,240.0,12.0,1555.0,279.0,20.0
50%,15.0,1602.0,254.0,30.0,1606.0,290.0,29.0
75%,22.0,1700.0,276.0,34.0,1728.0,300.0,36.0
max,29.0,1822.0,331.0,42.0,1861.0,349.0,44.0


## 2. 不变指标

对照组和实验组的cookie数量是不变指标，需要考察显著性。

### 2.1 符号检验

In [110]:
x = data['Control Cookies']
y = data['Experiment Cookies']
n = x.shape[0] - (x == y).sum()
k = (x > y).sum() - (x == y).sum()
p_value = 1 - 2 * stats.binom(n, 0.5).cdf(min(k, n - k))
print('符号检验p值: {:.5f}'.format(p_value))

符号检验p值: 0.28893


### 2.2 Z检验

In [112]:
data_sum = data.sum(axis = 0)
n_ctrl = data_sum['Control Cookies']
n_exp = data_sum['Experiment Cookies']
n_obs = n_ctrl + n_exp
p = 0.5 
sd = np.sqrt(n_obs * p * (1 - p))
z = ((n_ctrl - 0.5) - n_obs * p) / sd
print('Z-score: {:.4f}'.format(z))
p_value = 2 * stats.norm.cdf(z)
print('Z检验p-value: {:.4f}'.format(p_value))

Z-score: -1.6161
Z检验p-value: 0.1061


# 3. 评估指标

评估指标分为下载量和许可证购买量这两个方面，分别考察的指标是：
- 下载率：下载量/Cookie数量
- 购买率：购买量/Cookie数量

## 3.1 下载率
首先，计算获得下载率数据，然后采用多种检验方法进行检

In [104]:
# 每日的下载率数据
drate_ctrl = data['Control Downloads'] / data['Control Cookies']
drate_expr = data['Experiment Downloads'] / data['Experiment Cookies']

# 下载率
dload_ctrl = data['Control Downloads'].sum()
n_ctrl = data['Control Cookies'].sum()
dload_expr = data['Experiment Downloads'].sum()
n_expr = data['Experiment Cookies'].sum()
n_obs = n_ctrl + n_expr
p_ctrl = dload_ctrl / n_ctrl
p_expr = dload_expr / n_expr
p_null = (dload_ctrl + dload_expr) / (n_ctrl + n_expr)

### 3.1.1 符号检验

In [105]:
# 采用符号检验计算p-value
n = drate_ctrl.shape[0] - (drate_ctrl == drate_expr).sum()
k = (drate_ctrl > drate_expr).sum() - (drate_ctrl == drate_expr).sum()
p_value = stats.binom(n, 0.5).cdf(k)
print('符号检验p值: {:.8f}'.format(p_value))

符号检验p值: 0.00000762


### 3.1.2 Z检验

In [106]:
# 采用z检验计算p-value
# 计算检验统计量的均值和方差
z_sd = np.sqrt(p_ctrl * (1 - p_ctrl) / n_ctrl + p_expr * (1 - p_expr) / n_expr)
z = (p_expr - p_ctrl) / z_sd
print('Z-score: {:.2f}'.format(z))
p_value = 1 - stats.norm.cdf(z)
print('Z检验p-value: {}'.format(p_value))

Z-score: 7.88
Z检验p-value: 1.6653345369377348e-15


## 3.2 购买率

可以采用符号检验和Z参数检验方法，对购买率进行检验。

### 3.2.1 符号检验

In [109]:
# 截取第9天及以后的数据进行分析
data_lic = data[8:]
# 每日的购买率数据
lic_ctrl = data_lic['Control Licenses'] / data_lic['Control Cookies']
lic_expr = data_lic['Experiment Licenses'] / data_lic['Experiment Cookies']
# 采用符号检验计算p-value
n = lic_ctrl.shape[0] - (lic_ctrl == lic_expr).sum()
k = (lic_ctrl > lic_expr).sum() - (lic_ctrl == lic_expr).sum()
p_value = stats.binom(n, 0.5).cdf(k)
print('符号检验p值: {:.8f}'.format(p_value))

符号检验p值: 0.66818810


### 3.2.2 Z检验

In [93]:
# 截取第9天及以后的数据进行分析
data_lic = data[8:]
data_cookie = data[:-8]
# 计算对照组和实验组的总数量
n_ctrl = data_cookie['Control Cookies'].sum()
n_expr = data_cookie['Experiment Cookies'].sum()
# 计算对照组和实验组的总购买率
plic_ctrl = data['Control Licenses'].sum() / n_ctrl
plic_expr = data['Experiment Licenses'].sum() / n_expr

In [94]:
# Z参数检验
sd = np.sqrt(plic_ctrl * (1 - plic_ctrl) / n_ctrl + plic_expr * (1 - plic_expr) / n_expr)
z = (plic_expr - plic_ctrl) / sd
print('Z-score: {:.4f}'.format(z))
p_value = 1 - stats.norm.cdf(z)
print('Z检验p-value: {:.4f}'.format(p_value))

Z-score: 0.2587
Z检验p-value: 0.3979
