# Tutorial 2: How to perform AA and AB tests
*AB-test is shown below*

## 0. Import Libraries

In [1]:
import pandas as pd
import numpy as np
from hypex.ab_test.ab_tester import AATest, ABTest
from hypex.utils.tutorial_data_creation import create_test_data

pd.options.display.float_format = '{:,.2f}'.format

np.random.seed(52) #needed to create example data

## 1. Create or upload your dataset
In this case we will create random dataset with known effect size  
If you have your own dataset, go to the part 2 

In [2]:
data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])
data

Unnamed: 0,user_id,signup_month,treat,pre_spends,post_spends,age,gender,industry
0,0,0,0,488.00,414.44,,M,E-commerce
1,3,0,0,501.50,424.33,31.00,,Logistics
2,10,0,0,522.50,416.22,64.00,M,E-commerce
3,12,0,0,472.00,423.78,43.00,M,E-commerce
4,13,0,0,508.50,424.22,36.00,F,E-commerce
...,...,...,...,...,...,...,...,...
5365,9991,0,0,482.50,421.89,23.00,F,E-commerce
5366,9992,0,0,491.50,424.00,44.00,M,E-commerce
5367,9994,0,0,486.00,423.78,27.00,F,Logistics
5368,9996,0,0,500.50,430.89,56.00,F,E-commerce


## 2. AATest 

### 2.0 Initialize parameters
`info_col` used to define informative attributes that should NOT be part of testing, such as user_id and signup_month <br>

In [3]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

### 2.1 Simple AA-test
This is the easiest way to initialize and calculate metrics on a AA-test (default - on 10 iterations)<br>
Use it when you are clear about each attribute or if you don't have any additional task conditions (like grouping)

In [4]:
experiment = AATest( info_cols=info_cols, target_fields=target)

In [5]:
experiment_result, dict_of_datas = experiment.(data, iterations=10)

AttributeError: 'AATest' object has no attribute 'process'

`experiment_result` is a table of results of experiments, which includes 
- means of all targets in a and b samples, 
- p_values of Student t-test and test Kolmogorova-Smirnova, 
- and results of tests (did data on the random_state passes the uniform test)

In [9]:
experiment_result.head(3)

Unnamed: 0,random_state,post_spends a mean,post_spends b mean,post_spends ab delta,post_spends ab delta %,post_spends t_test p_value,post_spends ks_test p_value,post_spends t_test passed,post_spends ks_test passed,pre_spends a mean,pre_spends b mean,pre_spends ab delta,pre_spends ab delta %,pre_spends t_test p_value,pre_spends ks_test p_value,pre_spends t_test passed,pre_spends ks_test passed,mean_tests_score
0,0,427.85,428.48,0.63,0.15,0.42,0.86,True,True,484.63,485.22,0.59,0.12,0.17,0.67,True,True,0.53
1,1,427.67,428.65,0.98,0.23,0.21,0.54,True,True,484.81,485.04,0.23,0.05,0.6,0.93,True,True,0.57
2,2,428.38,427.94,-0.44,-0.1,0.57,0.96,True,True,484.76,485.09,0.33,0.07,0.45,0.86,True,True,0.71


`dict_of_datas` is a dictionary with random_states as keys and dataframes as values.<br>
Result of separation can be find in column 'group', it contains values 'test' and 'control'

In [10]:
dict_of_datas[0].head(3)

Unnamed: 0,user_id,signup_month,treat,pre_spends,post_spends,age,gender,industry,group
0,3,0,0,501.5,424.33,31.0,,Logistics,test
1,10,0,0,522.5,416.22,64.0,M,E-commerce,test
2,12,0,0,472.0,423.78,43.0,M,E-commerce,test


#### - Single experiment
To get stable results lets fix `random_state`

In [11]:
random_state = 11

To perform single experiment you can use `sampling_metrics()`

In [12]:
experiment = AATest(info_cols=info_cols, target_fields=target)
metrics, dict_of_datas = experiment.sampling_metrics(data=data, random_state=random_state).values()

The results contains the same info as in multisampling, but on one experiment

In [13]:
metrics

{'random_state': 11,
 'post_spends a mean': 427.7893234016139,
 'post_spends b mean': 428.5347817090834,
 'post_spends ab delta': 0.7454583074695051,
 'post_spends ab delta %': 0.17395514653361088,
 'post_spends t_test p_value': 0.33561550504114157,
 'post_spends ks_test p_value': 0.6263469727648824,
 'post_spends t_test passed': True,
 'post_spends ks_test passed': True,
 'pre_spends a mean': 484.9912476722533,
 'pre_spends b mean': 484.8584729981378,
 'pre_spends ab delta': -0.13277467411546695,
 'pre_spends ab delta %': -0.027384212406245112,
 'pre_spends t_test p_value': 0.7577698697749307,
 'pre_spends ks_test p_value': 0.762662388584242,
 'pre_spends t_test passed': True,
 'pre_spends ks_test passed': True,
 'mean_tests_score': 0.6205986840412991}

In [14]:
dict_of_datas[random_state]

Unnamed: 0,user_id,signup_month,treat,pre_spends,post_spends,age,gender,industry,group
0,3,0,0,501.50,424.33,31.00,,Logistics,test
1,14,0,0,497.00,421.78,26.00,M,Logistics,test
2,21,0,0,489.00,433.11,30.00,M,E-commerce,test
3,28,3,1,479.50,527.89,20.00,,E-commerce,test
4,29,0,0,505.00,414.33,30.00,M,E-commerce,test
...,...,...,...,...,...,...,...,...,...
5365,9983,0,0,494.50,428.33,29.00,F,E-commerce,control
5366,9984,0,0,460.00,417.11,66.00,M,Logistics,control
5367,9985,0,0,484.00,411.33,,F,E-commerce,control
5368,9994,0,0,486.00,423.78,27.00,F,Logistics,control


### 2.2 AA-test with grouping

To perform experiment that separates samples by groups `group_col` can be used

In [15]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

group_cols = 'industry'

In [16]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)

In [17]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data=data, )

  0%|          | 0/10 [00:00<?, ?it/s]

The result is in the same format as without groups

In this regime groups equally divided on each sample (test and control):

In [18]:
dict_of_datas[0].groupby(['industry', 'group'])[['user_id']].count()

Unnamed: 0_level_0,Unnamed: 1_level_0,user_id
industry,group,Unnamed: 2_level_1
E-commerce,control,1352
E-commerce,test,1351
Logistics,control,1334
Logistics,test,1333


## 3. AB-test

### 3.0 Data
Lets correct data to see how AB-test works

In [19]:
data_ab = data.copy()

half_data = int(data.shape[0]/2)
data_ab['group'] = ['test']*half_data + ['control']*half_data
data_ab.head(3)

Unnamed: 0,user_id,signup_month,treat,pre_spends,post_spends,age,gender,industry,group
0,0,0,0,488.0,414.44,,M,E-commerce,test
1,3,0,0,501.5,424.33,31.0,,Logistics,test
2,10,0,0,522.5,416.22,64.0,M,E-commerce,test


### 3.1 Full AB-test

Full (basic) version of test includes calculation of all available metrics, which are: "diff in means", "diff in diff" and "cuped"<br>
Pay attention, that for "cuped" and "diff in diff" metrics requred target before pilot.

In [20]:
model = ABTest()
results = model.execute(
    data=data_ab, 
    target_field='post_spends', 
    target_field_before='pre_spends', 
    group_field='group'
)
results

{'size': {'test': 2685, 'control': 2685},
 'difference': {'ate': 0.9805090006207325,
  'cuped': 0.9764245308837758,
  'diff_in_diff': 0.39224084419618066},
 'p_value': {'t_test': 0.20533212744131016,
  'mann_whitney': 0.08089945933651932}}

To see results in more convenient way `show_beautiful_result` can be used

In [21]:
model.show_beautiful_result()

Unnamed: 0,size
test,2685
control,2685


Unnamed: 0,difference
ate,0.98
cuped,0.98
diff_in_diff,0.39


Unnamed: 0,p_value
t_test,0.21
mann_whitney,0.08


### 3.2 Simple AB-test
To estimate effect without target data before pilot `calc_difference_method='ate'` can be used - effect will be estimated with "diff in means" method

In [22]:
model = ABTest(calc_difference_method='ate')
model.execute(data=data_ab, target_field='post_spends', group_field='group')

model.show_beautiful_result()

Unnamed: 0,size
test,2685
control,2685


Unnamed: 0,difference
ate,0.98


Unnamed: 0,p_value
t_test,0.21
mann_whitney,0.08
