# Tutorial 2: How to perform AA and AB tests
*AB-test is shown below*

## 0. Import Libraries

In [1]:
import pandas as pd
from hypex.ab_test.ab_tester import AATest, ABTest
from hypex.dataset.dataset import Dataset

pd.options.display.float_format = '{:,.2f}'.format

  from .autonotebook import tqdm as notebook_tqdm


## 1. Create or upload your dataset
In this case we will create random dataset with known effect size  
If you have your own dataset, go to the part 2 

In [2]:
data = Dataset(is_treatment=False, na_columns=['feature_col_1', 'feature_col_2'], 
               num_info_cols=1, num_main_causes_cols=2)
data.df

Unnamed: 0,info_col,feature_col_1,feature_col_2,feature_col_3,feature_col_4,outcome
0,9217,,Credit,1.18,2.00,5.51
1,6037,male,,1.66,2.00,7.76
2,10774,male,Deposit,0.63,1.00,5.01
3,14344,female,Deposit,-0.75,1.00,0.67
4,1987,female,Deposit,0.74,3.00,3.87
...,...,...,...,...,...,...
4995,5119,male,Investment,1.89,3.00,9.64
4996,520,female,Deposit,0.36,2.00,2.64
4997,9982,female,Deposit,0.05,2.00,4.57
4998,14329,male,Investment,-0.08,2.00,1.18


## 2. AATest 

### 2.0 Initialize parameters
`info_col` used to define informative attributes that should NOT be part of testing, such as user_id and signup_month <br>

In [3]:
info_cols = data.info_col_names[0]
target = data.outcome_name[0]

### 2.1 Simple AA-test
This is the easiest way to initialize and calculate metrics on a AA-test (default - on 10 iterations)<br>
Use it when you are clear about each attribute or if you don't have any additional task conditions (like grouping)

In [4]:
experiment = AATest(info_cols=info_cols, target_fields=target)

In [5]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data.df, iterations=10)

100%|██████████| 10/10 [00:00<00:00, 15.96it/s]


`experiment_result` is a table of results of experiments, which includes 
- means of all targets in a and b samples, 
- p_values of Student t-test and test Kolmogorova-Smirnova, 
- and results of tests (did data on the random_state passes the uniform test)

In [6]:
experiment_result.head(3)

Unnamed: 0,random_state,outcome a mean,outcome b mean,outcome ab delta,outcome ab delta %,outcome t_test p_value,outcome ks_test p_value,outcome t_test passed,outcome ks_test passed,mean_tests_score
0,0,3.41,3.46,0.05,1.34,0.58,0.83,True,True,0.71
1,1,3.4,3.47,0.07,1.88,0.44,0.43,True,True,0.43
2,2,3.43,3.43,-0.0,-0.0,1.0,0.72,True,True,0.86


`dict_of_datas` is a dictionary with random_states as keys and dataframes as values.<br>
Result of separation can be find in column 'group', it contains values 'test' and 'control'

In [7]:
dict_of_datas[0].head(3)

Unnamed: 0,info_col,feature_col_1,feature_col_2,feature_col_3,feature_col_4,outcome,group
0,6037,male,,1.66,2.0,7.76,test
1,10774,male,Deposit,0.63,1.0,5.01,test
2,14344,female,Deposit,-0.75,1.0,0.67,test


#### - Single experiment
To get stable results lets fix `random_state`

In [8]:
random_state = 11

To perform single experiment you can use `sampling_metrics()`

In [9]:
experiment = AATest(info_cols=info_cols, target_fields=target)
metrics, dict_of_datas = experiment.sampling_metrics(data=data.df, random_state=random_state).values()

The results contains the same info as in multisampling, but on one experiment

In [10]:
metrics

{'random_state': 11,
 'outcome a mean': 3.4351840768250046,
 'outcome b mean': 3.4327940853777283,
 'outcome ab delta': -0.002389991447276252,
 'outcome ab delta %': -0.0696223364359927,
 'outcome t_test p_value': 0.9774393174825711,
 'outcome ks_test p_value': 0.9062946363493458,
 'outcome t_test passed': True,
 'outcome ks_test passed': True,
 'mean_tests_score': 0.9418669769159584}

In [11]:
dict_of_datas[random_state]

Unnamed: 0,info_col,feature_col_1,feature_col_2,feature_col_3,feature_col_4,outcome,group
0,6037,male,,1.66,2.00,7.76,test
1,10774,male,Deposit,0.63,1.00,5.01,test
2,9295,female,Investment,-1.72,1.00,-0.76,test
3,5191,male,Deposit,-0.34,3.00,6.94,test
4,11497,male,,-0.21,1.00,-0.78,test
...,...,...,...,...,...,...,...
4995,6229,male,Investment,-1.50,3.00,1.56,control
4996,3970,female,Credit,0.55,3.00,8.27,control
4997,6634,female,,0.53,2.00,2.91,control
4998,13213,female,Deposit,0.84,1.00,4.85,control


### 2.2 AA-test with grouping

To perform experiment that separates samples by groups `group_col` can be used

In [12]:
info_cols = data.info_col_names[0]
target = data.outcome_name[0]

group_cols = 'feature_col_2'

In [13]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)

In [14]:
experiment_result, dict_of_datas = experiment.search_dist_uniform_sampling(data=data.df)

100%|██████████| 10/10 [00:00<00:00, 22.86it/s]


The result is in the same format as without groups

In this regime groups equally divided on each sample (test and control):

In [16]:
dict_of_datas[0].groupby(['feature_col_2', 'group'])[['info_col']].count()

Unnamed: 0_level_0,Unnamed: 1_level_0,info_col
feature_col_2,group,Unnamed: 2_level_1
Credit,control,728
Credit,test,728
Deposit,control,767
Deposit,test,767
Investment,control,755
Investment,test,755


## 3. AB-test

### 3.0 Data
Lets correct data to see how AB-test works

In [17]:
dataset_ab = Dataset(num_outcomes=2, is_treatment=False)
data_ab = dataset_ab.df.copy()
half_data = int(dataset_ab.df.shape[0]/2)
data_ab['group'] = ['test']*half_data + ['control']*half_data
data_ab.head(3)

Unnamed: 0,info_col_1,info_col_2,feature_col_1,feature_col_2,feature_col_3,feature_col_4,feature_col_5,feature_col_6,outcome_1,outcome_2,group
0,11056,A,male,Credit,1.3,2.48,-0.52,3.0,6.32,6.32,test
1,11416,A,female,Credit,-0.49,0.43,-0.85,0.0,-2.18,-1.49,test
2,11602,F,female,Credit,-1.47,2.02,-1.4,0.0,0.93,0.93,test


### 3.1 Full AB-test

Full (basic) version of test includes calculation of all available metrics, which are: "diff in means", "diff in diff" and "cuped"<br>
Pay attention, that for "cuped" and "diff in diff" metrics requred target before pilot.

In [18]:
model = ABTest()
results = model.execute(
    data=data_ab, 
    target_field=dataset_ab.outcome_name[1], 
    target_field_before=dataset_ab.outcome_name[0], 
    group_field='group'
)
results

{'size': {'test': 2500, 'control': 2500},
 'difference': {'ate': -0.033053330033714676,
  'cuped': -0.007325100014748093,
  'diff_in_diff': -0.0022962851872462275},
 'p_value': {'t_test': 0.7396299733540724, 'mann_whitney': 0.8476875465382554}}

To see results in more convenient way `show_beautiful_result` can be used

In [19]:
model.show_beautiful_result()

Unnamed: 0,size
test,2500
control,2500


Unnamed: 0,difference
ate,-0.03
cuped,-0.01
diff_in_diff,-0.0


Unnamed: 0,p_value
t_test,0.74
mann_whitney,0.85


### 3.2 Simple AB-test
To estimate effect without target data before pilot `calc_difference_method='ate'` can be used - effect will be estimated with "diff in means" method

In [20]:
model = ABTest(calc_difference_method='ate')
model.execute(data=data_ab, target_field=dataset_ab.outcome_name[0], group_field='group')

model.show_beautiful_result()

Unnamed: 0,size
test,2500
control,2500


Unnamed: 0,difference
ate,-0.03


Unnamed: 0,p_value
t_test,0.76
mann_whitney,0.96
