# Tutorial 2: How to perform AA and AB tests
*AB-test is shown below*

## 0. Import Libraries

In [25]:
import pandas as pd
from hypex.ab_test.ab_tester import AATest, ABTest
from hypex.dataset import DataGenerator

pd.options.display.float_format = '{:,.2f}'.format

## 1. Create or upload your dataset
In this case we will create random dataset with known effect size  
If you have your own dataset, go to the part 2 

In [2]:
data = DataGenerator(is_treatment=False, 
                     na_columns=['feature_1', 'feature_2'], 
                     num_info_cols=1, num_features=2)
data.df

Unnamed: 0,info,feature_1,feature_2,feature_3,feature_4,target
0,4492,,Investment,-0.93,1.00,2.35
1,11302,female,,-1.46,1.00,-1.72
2,1501,female,Investment,-0.75,2.00,-0.86
3,5767,male,Deposit,1.15,0.00,0.87
4,2344,female,Credit,1.42,0.00,4.00
...,...,...,...,...,...,...
4995,6055,female,Credit,0.53,3.00,8.03
4996,7279,male,Deposit,1.00,3.00,4.86
4997,13000,male,Deposit,1.90,3.00,4.54
4998,7381,female,Credit,-0.48,3.00,6.87


## 2. AATest 

### 2.0 Initialize parameters
`info_col` used to define informative attributes that should NOT be part of testing, such as user_id and signup_month <br>

In [4]:
info_cols = data.info_col_names[0]
target = data.target_names[0]

### 2.1 Simple AA-test
This is the easiest way to initialize and calculate metrics on a AA-test (default - on 10 iterations)<br>
Use it when you are clear about each attribute or if you don't have any additional task conditions (like grouping)

In [5]:
experiment = AATest(info_cols=info_cols, target_fields=target)

In [6]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data.df, iterations=10)

100%|██████████| 10/10 [00:00<00:00, 55.39it/s]


`experiment_result` is a table of results of experiments, which includes 
- means of all targets in a and b samples, 
- p_values of Student t-test and test Kolmogorova-Smirnova, 
- and results of tests (did data on the random_state passes the uniform test)

In [7]:
experiment_result.head(3)

Unnamed: 0,random_state,target a mean,target b mean,target ab delta,target ab delta %,target t_test p_value,target ks_test p_value,target t_test passed,target ks_test passed,mean_tests_score
0,0,2.62,2.64,0.02,0.86,0.78,0.97,True,True,0.87
1,1,2.64,2.62,-0.01,-0.55,0.86,0.98,True,True,0.92
2,2,2.54,2.72,0.19,6.88,0.02,0.01,False,False,0.02


`dict_of_datas` is a dictionary with random_states as keys and dataframes as values.<br>
Result of separation can be find in column 'group', it contains values 'test' and 'control'

In [8]:
dict_of_datas[0].head(3)

Unnamed: 0,info,feature_1,feature_2,feature_3,feature_4,target,group
0,11302,female,,-1.46,1.0,-1.72,test
1,1501,female,Investment,-0.75,2.0,-0.86,test
2,5767,male,Deposit,1.15,0.0,0.87,test


#### - Single experiment
To get stable results lets fix `random_state`

In [9]:
random_state = 11

To perform single experiment you can use `sampling_metrics()`

In [10]:
experiment = AATest(info_cols=info_cols, target_fields=target)
metrics, dict_of_datas = experiment.sampling_metrics(data=data.df, random_state=random_state).values()

The results contains the same info as in multisampling, but on one experiment

In [11]:
metrics

{'random_state': 11,
 'target a mean': 2.577879190129928,
 'target b mean': 2.6804315322428436,
 'target ab delta': 0.10255234211291553,
 'target ab delta %': 3.825963874820748,
 'target t_test p_value': 0.20106066484924415,
 'target ks_test p_value': 0.3857943719884363,
 'target t_test passed': True,
 'target ks_test passed': True,
 'mean_tests_score': 0.29342751841884024}

In [12]:
dict_of_datas[random_state]

Unnamed: 0,info,feature_1,feature_2,feature_3,feature_4,target,group
0,11302,female,,-1.46,1.00,-1.72,test
1,1501,female,Investment,-0.75,2.00,-0.86,test
2,14590,female,Investment,-1.02,1.00,3.11,test
3,14995,male,Investment,0.84,2.00,2.35,test
4,2671,male,,-0.41,0.00,0.57,test
...,...,...,...,...,...,...,...
4995,6682,male,Investment,-0.25,2.00,3.78,control
4996,3367,female,Investment,-0.43,2.00,4.45,control
4997,10159,female,,0.30,3.00,3.20,control
4998,5356,male,Investment,0.61,1.00,6.59,control


### 2.2 AA-test with grouping

To perform experiment that separates samples by groups `group_col` can be used

In [14]:
info_cols = data.info_col_names[0]
target = data.target_names[0]

group_cols = 'feature_2'

In [15]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)

In [16]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data=data.df)

100%|██████████| 10/10 [00:00<00:00, 23.47it/s]


The result is in the same format as without groups

In this regime groups equally divided on each sample (test and control):

In [18]:
dict_of_datas[0].groupby(['feature_2', 'group'])[['info']].count()

Unnamed: 0_level_0,Unnamed: 1_level_0,info
feature_2,group,Unnamed: 2_level_1
Credit,control,742
Credit,test,742
Deposit,control,774
Deposit,test,773
Investment,control,735
Investment,test,734


## 3. AB-test

### 3.0 Data
Lets correct data to see how AB-test works

In [19]:
dataset_ab = DataGenerator(num_targets=2, is_treatment=False)
data_ab = dataset_ab.df.copy()
half_data = int(dataset_ab.df.shape[0]/2)
data_ab['group'] = ['test']*half_data + ['control']*half_data
data_ab.head(3)

Unnamed: 0,info_1,info_2,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,target_1,target_2,group
0,2560,W,male,Credit,1.52,-0.78,0.8,2.0,4.09,4.09,test
1,532,G,female,Deposit,3.12,-1.94,-2.24,0.0,-1.59,-1.59,test
2,10423,G,male,Investment,1.71,-1.65,-1.15,1.0,2.3,2.3,test


### 3.1 Full AB-test

Full (basic) version of test includes calculation of all available metrics, which are: "diff in means", "diff in diff" and "cuped"<br>
Pay attention, that for "cuped" and "diff in diff" metrics requred target before pilot.

In [21]:
model = ABTest()
results = model.execute(
    data=data_ab, 
    target_field=dataset_ab.target_names[1], 
    target_field_before=dataset_ab.target_names[0], 
    group_field='group'
)
results

{'size': {'test': 2500, 'control': 2500},
 'difference': {'ate': 0.025658157433372093,
  'cuped': -0.12723335811015196,
  'diff_in_diff': -0.16500729426782224},
 'p_value': {'t_test': 0.80048088853259, 'mann_whitney': 0.5701683822287207}}

To see results in more convenient way `show_beautiful_result` can be used

In [22]:
model.show_beautiful_result()

Unnamed: 0,size
test,2500
control,2500


Unnamed: 0,difference
ate,0.03
cuped,-0.13
diff_in_diff,-0.17


Unnamed: 0,p_value
t_test,0.8
mann_whitney,0.57


### 3.2 Simple AB-test
To estimate effect without target data before pilot `calc_difference_method='ate'` can be used - effect will be estimated with "diff in means" method

In [24]:
model = ABTest(calc_difference_method='ate')
model.execute(data=data_ab, target_field=dataset_ab.target_names[0], group_field='group')

model.show_beautiful_result()

Unnamed: 0,size
test,2500
control,2500


Unnamed: 0,difference
ate,0.19


Unnamed: 0,p_value
t_test,0.06
mann_whitney,0.13
