# How to perform AA and AB tests
*AB-test is shown below*

## 0. Import Libraries

In [1]:
import pandas as pd
import numpy as np
from lightautoml.addons.hypex.ABTesting.ab_tester import AATest, ABTest
from lightautoml.addons.hypex.utils.tutorial_data_creation import create_test_data

pd.options.display.float_format = '{:,.2f}'.format

np.random.seed(52) #needed to create example data

'nlp' extra dependecy package 'gensim' isn't installed. Look at README.md in repo 'LightAutoML' for installation instructions.
'nlp' extra dependecy package 'transformers' isn't installed. Look at README.md in repo 'LightAutoML' for installation instructions.
'nlp' extra dependecy package 'gensim' isn't installed. Look at README.md in repo 'LightAutoML' for installation instructions.
'nlp' extra dependecy package 'transformers' isn't installed. Look at README.md in repo 'LightAutoML' for installation instructions.




## 1. Create or upload your dataset
In this case we will create random dataset with known effect size  
If you have your own dataset, go to the part 2 

In [None]:
data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])
data

## 2. AATest 

### 2.0 Initialize parameters
`info_col` used to define informative attributes that should NOT be part of testing, such as user_id and signup_month <br>

In [None]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

### 2.1 Simple AA-test
This is the easiest way to initialize and calculate metrics on a AA-test (default - on 10 iterations)<br>
Use it when you are clear about each attribute or if you don't have any additional task conditions (like grouping)

In [None]:
experiment = AATest(data=data, info_cols=info_cols, target_fields=target)

In [None]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(iterations=10)

`experiment_result` is a table of results of experiments, which includes 
- means of all targets in a and b samples, 
- p_values of Student t-test and test Kolmogorova-Smirnova, 
- and results of tests (did data on the random_state passes the uniform test)

In [None]:
experiment_result.head(3)

`dict_of_datas` is a dictionary with random_states as keys and dataframes as values.<br>
Result of separation can be find in column 'group', it contains values 'test' and 'control'

In [None]:
dict_of_datas[0].head(3)

#### - Single experiment
To get stable results lets fix `random_state`

In [None]:
random_state = 11

To perform single experiment you can use `sampling_metrics()`

In [None]:
experiment = AATest(data=data, info_cols=info_cols, target_fields=target)
metrics, dict_of_datas = experiment.sampling_metrics(random_state=random_state).values()

The results contains the same info as in multisampling, but on one experiment

In [None]:
metrics

In [None]:
dict_of_datas[random_state]

### 2.2 AA-test with grouping

To perform experiment that separates samples by groups `group_col` can be used

In [None]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

group_cols = 'industry'

In [None]:
experiment = AATest(data=data, info_cols=info_cols, target_fields=target, group_cols=group_cols)

In [None]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling()

The result is in the same format as without groups

In this regime groups equally divided on each sample (test and control):

In [None]:
dict_of_datas[0].groupby(['industry', 'group'])[['user_id']].count()

## 3. AB-test

### 3.0 Data
Lets correct data to see how AB-test works

In [None]:
data_ab = data.copy()

half_data = int(data.shape[0]/2)
data_ab['group'] = ['test']*half_data + ['control']*half_data
data_ab.head(3)

### 3.1 Full AB-test

Full (basic) version of test includes calculation of all available metrics, which are: "diff in means", "diff in diff" and "cuped"<br>
Pay attention, that for "cuped" and "diff in diff" metrics requred target before pilot.

In [None]:
model = ABTest()
results = model.execute(
    data=data_ab, 
    target_field='post_spends', 
    target_field_before='pre_spends', 
    group_field='group'
)
results

To see results in more convenient way `show_beautiful_result` can be used

In [None]:
model.show_beautiful_result()

### 3.2 Simple AB-test
To estimate effect without target data before pilot `calc_difference_method='ate'` can be used - effect will be estimated with "diff in means" method

In [None]:
model = ABTest(calc_difference_method='ate')
model.execute(data=data_ab, target_field='post_spends', group_field='group')

model.show_beautiful_result()