# Tutorial 2: How to perform AA and AB tests
*AB-test is shown below*

## 0. Import Libraries

In [3]:
import pandas as pd
import numpy as np
from hypex.ab_test.ab_tester import AATest, ABTest
from hypex.dataset.dataset import Dataset

pd.options.display.float_format = '{:,.2f}'.format

np.random.seed(52) #needed to create example data

## 1. Create or upload your dataset
In this case we will create random dataset with known effect size  
If you have your own dataset, go to the part 2 

In [21]:
data = Dataset(num_treatments=0, na_columns=['feature_col_1', 'feature_col_2'], num_outcomes=1, num_info_cols=1, num_main_causes_cols=2)
data.df

Unnamed: 0,info_col_1,feature_col_1,feature_col_2,feature_col_3,feature_col_4,outcome_1
0,3403,,Deposit,-0.59,1.00,2.37
1,12049,male,,0.56,3.00,8.39
2,12877,female,Credit,0.56,3.00,4.33
3,2431,male,Deposit,0.15,2.00,5.54
4,2683,male,Deposit,0.98,1.00,2.08
...,...,...,...,...,...,...
4995,8596,female,Deposit,0.05,1.00,2.73
4996,13519,female,Deposit,0.82,3.00,2.35
4997,4471,male,Deposit,-1.98,3.00,2.41
4998,3889,female,Deposit,1.24,2.00,2.44


## 2. AATest 

### 2.0 Initialize parameters
`info_col` used to define informative attributes that should NOT be part of testing, such as user_id and signup_month <br>

In [22]:
info_cols = data.info_col_names[0]
target = data.outcome_name[0]

### 2.1 Simple AA-test
This is the easiest way to initialize and calculate metrics on a AA-test (default - on 10 iterations)<br>
Use it when you are clear about each attribute or if you don't have any additional task conditions (like grouping)

In [23]:
experiment = AATest(info_cols=info_cols, target_fields=target)

In [24]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data.df, iterations=10)

100%|██████████| 10/10 [00:00<00:00, 32.80it/s]


`experiment_result` is a table of results of experiments, which includes 
- means of all targets in a and b samples, 
- p_values of Student t-test and test Kolmogorova-Smirnova, 
- and results of tests (did data on the random_state passes the uniform test)

In [25]:
experiment_result.head(3)

Unnamed: 0,random_state,outcome_1 a mean,outcome_1 b mean,outcome_1 ab delta,outcome_1 ab delta %,outcome_1 t_test p_value,outcome_1 ks_test p_value,outcome_1 t_test passed,outcome_1 ks_test passed,mean_tests_score
0,0,3.69,3.51,-0.17,-4.95,0.04,0.02,False,False,0.03
1,1,3.61,3.59,-0.01,-0.42,0.86,0.51,True,True,0.69
2,2,3.58,3.62,0.03,0.96,0.68,0.11,True,True,0.4


`dict_of_datas` is a dictionary with random_states as keys and dataframes as values.<br>
Result of separation can be find in column 'group', it contains values 'test' and 'control'

In [26]:
dict_of_datas[0].head(3)

Unnamed: 0,info_col_1,feature_col_1,feature_col_2,feature_col_3,feature_col_4,outcome_1,group
0,12049,male,,0.56,3.0,8.39,test
1,12877,female,Credit,0.56,3.0,4.33,test
2,2431,male,Deposit,0.15,2.0,5.54,test


#### - Single experiment
To get stable results lets fix `random_state`

In [27]:
random_state = 11

To perform single experiment you can use `sampling_metrics()`

In [28]:
experiment = AATest(info_cols=info_cols, target_fields=target)
metrics, dict_of_datas = experiment.sampling_metrics(data=data.df, random_state=random_state).values()

The results contains the same info as in multisampling, but on one experiment

In [29]:
metrics

{'random_state': 11,
 'outcome_1 a mean': 3.6704633575568995,
 'outcome_1 b mean': 3.5264609914727756,
 'outcome_1 ab delta': -0.14400236608412387,
 'outcome_1 ab delta %': -4.083481043242254,
 'outcome_1 t_test p_value': 0.08956264908378045,
 'outcome_1 ks_test p_value': 0.14492108072188525,
 'outcome_1 t_test passed': True,
 'outcome_1 ks_test passed': True,
 'mean_tests_score': 0.11724186490283285}

In [30]:
dict_of_datas[random_state]

Unnamed: 0,info_col_1,feature_col_1,feature_col_2,feature_col_3,feature_col_4,outcome_1,group
0,12049,male,,0.56,3.00,8.39,test
1,12877,female,Credit,0.56,3.00,4.33,test
2,1210,male,Credit,-0.82,3.00,2.25,test
3,13072,female,Deposit,1.33,3.00,9.91,test
4,2239,female,,-0.26,1.00,3.07,test
...,...,...,...,...,...,...,...
4995,6556,male,Credit,-0.06,2.00,6.28,control
4996,13249,male,Investment,0.16,3.00,7.61,control
4997,6523,male,,-0.79,0.00,-1.57,control
4998,598,male,Deposit,0.19,1.00,1.40,control


### 2.2 AA-test with grouping

To perform experiment that separates samples by groups `group_col` can be used

In [32]:
info_cols = data.info_col_names[0]
target = data.outcome_name[0]

group_cols = 'feature_col_2'

In [33]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)

In [34]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data=data.df)

100%|██████████| 10/10 [00:00<00:00, 14.62it/s]


The result is in the same format as without groups

In this regime groups equally divided on each sample (test and control):

In [35]:
dict_of_datas[0].groupby(['feature_col_2', 'group'])[['info_col_1']].count()

Unnamed: 0_level_0,Unnamed: 1_level_0,info_col_1
feature_col_2,group,Unnamed: 2_level_1
Credit,control,747
Credit,test,746
Deposit,control,761
Deposit,test,761
Investment,control,743
Investment,test,742


## 3. AB-test

### 3.0 Data
Lets correct data to see how AB-test works

In [55]:
dataset_ab = Dataset(num_outcomes=2)
data_ab = dataset_ab.df.copy()
half_data = int(dataset_ab.df.shape[0]/2)
data_ab['group'] = ['test']*half_data + ['control']*half_data
data_ab.head(3)

Unnamed: 0,info_col_1,info_col_2,feature_col_1,feature_col_2,feature_col_3,feature_col_4,feature_col_5,feature_col_6,treatment_1,outcome_1,outcome_2,group
0,4795,B,male,Investment,0.42,-0.35,-0.62,1.0,0.0,0.62,0.62,test
1,8758,V,male,Investment,-0.34,0.26,0.05,0.0,0.0,-1.1,1.81,test
2,13723,B,female,Deposit,2.24,-0.24,-1.44,3.0,1.0,8.22,8.22,test


### 3.1 Full AB-test

Full (basic) version of test includes calculation of all available metrics, which are: "diff in means", "diff in diff" and "cuped"<br>
Pay attention, that for "cuped" and "diff in diff" metrics requred target before pilot.

In [56]:
model = ABTest()
results = model.execute(
    data=data_ab, 
    target_field=dataset_ab.outcome_name[1], 
    target_field_before=dataset_ab.outcome_name[0], 
    group_field='group'
)
results

{'size': {'test': 2500, 'control': 2500},
 'difference': {'ate': -0.011083229665972602,
  'cuped': 0.035543294166314965,
  'diff_in_diff': 0.049875421893031735},
 'p_value': {'t_test': 0.9141948432189249, 'mann_whitney': 0.7430650566070216}}

To see results in more convenient way `show_beautiful_result` can be used

In [57]:
model.show_beautiful_result()

Unnamed: 0,size
test,2500
control,2500


Unnamed: 0,difference
ate,-0.01
cuped,0.04
diff_in_diff,0.05


Unnamed: 0,p_value
t_test,0.91
mann_whitney,0.74


### 3.2 Simple AB-test
To estimate effect without target data before pilot `calc_difference_method='ate'` can be used - effect will be estimated with "diff in means" method

In [59]:
model = ABTest(calc_difference_method='ate')
model.execute(data=data_ab, target_field=dataset_ab.outcome_name[0], group_field='group')

model.show_beautiful_result()

Unnamed: 0,size
test,2500
control,2500


Unnamed: 0,difference
ate,-0.06


Unnamed: 0,p_value
t_test,0.55
mann_whitney,0.76
