# AB test 

A/B testing is the research method that allows you to find out the effect of a particular change in the product. The study shows which of the two versions of the product or offer gives greater effect on the selected metrics and if it is statistically significant.  

<ul>
  <li><a href="#creation-of-a-new-test-dataset-with-synthetic-data">Creation of a new test dataset with synthetic data.
  <li><a href="#ab-test">AB test.
  <li><a href="#additional-tests-in-ab-test">Additional tests in AB Test.
  <li><a href="#abn-test">ABn Test.
</ul>

In [1]:
import random

import numpy as np

from hypex.dataset import Dataset, InfoRole, TreatmentRole, TargetRole
from hypex.experiments.ab import ABTest

## Creation of a new test dataset with synthetic data. 

In order to be able to work with our data in HypEx, first we need to convert it into `dataset`. It is important to mark the data fields by assigning the appropriate `roles`:
- FeatureRole: a role for columns that contain features or predictor variables. Our split will be based on them. Applied by default if the role is not specified for the column.
- TreatmentRole: a role for columns that show the treatment or intervention.
- TargetRole: a role for columns that show the target or outcome variable.
- InfoRole: a role for columns that contain information about the data, such as user IDs. 

In [2]:
data = Dataset(
    roles={
        "user_id": InfoRole(int),
        "treat": TreatmentRole(),
        "pre_spends": TargetRole(),
        "post_spends": TargetRole(), 
        "gender": TargetRole()
    }, data="data.csv",
)
data

      user_id  signup_month  treat  pre_spends  post_spends   age gender  \
0           0             0      0       488.0   414.444444   NaN      M   
1           1             8      1       512.5   462.222222  26.0    NaN   
2           2             7      1       483.0   479.444444  25.0      M   
3           3             0      0       501.5   424.333333  39.0      M   
4           4             1      1       543.0   514.555556  18.0      F   
...       ...           ...    ...         ...          ...   ...    ...   
9995     9995            10      1       538.5   450.444444  42.0      M   
9996     9996             0      0       500.5   430.888889  26.0      F   
9997     9997             3      1       473.0   534.111111  22.0      F   
9998     9998             2      1       495.0   523.222222  67.0      F   
9999     9999             7      1       508.0   475.888889  38.0      F   

        industry  
0     E-commerce  
1     E-commerce  
2      Logistics  
3     E-com

In [3]:
data["treat"] = [random.choice([0, 1, 2]) for _ in range(len(data))]
data

      user_id  signup_month  treat  pre_spends  post_spends   age gender  \
0           0             0      2       488.0   414.444444   NaN      M   
1           1             8      2       512.5   462.222222  26.0    NaN   
2           2             7      1       483.0   479.444444  25.0      M   
3           3             0      1       501.5   424.333333  39.0      M   
4           4             1      0       543.0   514.555556  18.0      F   
...       ...           ...    ...         ...          ...   ...    ...   
9995     9995            10      2       538.5   450.444444  42.0      M   
9996     9996             0      2       500.5   430.888889  26.0      F   
9997     9997             3      0       473.0   534.111111  22.0      F   
9998     9998             2      1       495.0   523.222222  67.0      F   
9999     9999             7      2       508.0   475.888889  38.0      F   

        industry  
0     E-commerce  
1     E-commerce  
2      Logistics  
3     E-com

The roles' data types can be assigned automatically as shown below. Also, the fields, which were not marked, receive Feature role by default.

In [4]:
data.roles

{'user_id': Info(<class 'int'>),
 'treat': Treatment(<class 'int'>),
 'pre_spends': Target(<class 'float'>),
 'post_spends': Target(<class 'float'>),
 'gender': Target(<class 'str'>),
 'signup_month': Feature(<class 'int'>),
 'age': Feature(<class 'float'>),
 'industry': Feature(<class 'str'>)}

## AB test
Then we select one of the pre-assembled pipelines, in our case `ABTest`. Also, a custom pipline can be created based on your specific needs and requirements with custom executors.
After that we wrap our prepared `dataset` into `ExperimentData` to be able to run experiments on it and then execute the test with this data passed as the argument.

In [5]:
test = ABTest()
result = test.execute(data)

### Experiment results
To show the report with summary of the test we run the `resume` method of the output of the experiment.

It displays the results of the test in the form of a table with the following columns:
- `feature`: name of the target feature, change of which we want to analyze.
- `group`: name of the test group we compare with the control group.
- `TTest pass`: result of the TTest, if it is significant or not.
- `TTest p-value`: p-value of the TTest shows the probability of obtaining the result when the null hypothesis is true. The lower the value the more significant the result is.
- `control mean`: the mean of the feature value across the control group.
- `test mean`: the mean of the feature value across the test group.
- `difference`: the difference between the mean of the test group and the mean of the control group.
- `difference %`: the normalized difference between the mean of the test group and the mean of the control group.

In [6]:
result.resume

       feature group TTest pass  TTest p-value  control mean   test mean  \
0   pre_spends     1     NOT OK       0.980263    486.929299  486.918048   
1   pre_spends     2     NOT OK       0.282812    452.216131  451.240568   
2  post_spends     1     NOT OK       0.309598    486.929299  487.432396   
3  post_spends     2     NOT OK       0.399857    452.216131  453.030412   

   difference  difference %  
0   -0.011251     -0.002311  
1   -0.975563     -0.215729  
2    0.503097      0.103320  
3    0.814281      0.180065  

The method sizes shows the statistics on the groups of the data.

The columns are:
- `control size`: the size of the control group.
- `test size`: the size of the test group.
- `control size %`: the share of the control group in the whole dataset.
- `test size %`: the share of the test group in the whole dataset.
- `group`: name of the test group.

In [7]:
result.sizes 

   control size  test size  control size %  test size % group
1          3338       3319              50           49     1
2          3338       3343              49           50     2

In [8]:
result.multitest

"There was less than three groups or multitest method wasn't provided"

## Additional tests in AB Test 

It is possible to add u-test and chi2-test in pipline. 

In [9]:
test = ABTest(additional_tests=['t-test', 'u-test', 'chi2-test'])
result = test.execute(data)

The additional columns are:
- `UTest pass`: result of the UTest, if it is significant or not.
- `UTest p-value`: p-value of the UTest shows the probability of obtaining the result when the null hypothesis is true. The lower the value the more significant the result is.
- `Chi2Test pass`: result of the Chi2Test, if it is significant or not.
- `Chi2Test p-value`: p-value of the Chi2Test shows the probability of obtaining the result when the null hypothesis is true. The lower the value the more significant the result is.

In [10]:
result.resume

       feature group TTest pass  TTest p-value UTest pass  UTest p-value  \
0   pre_spends     1     NOT OK       0.980263     NOT OK       0.886970   
1   pre_spends     2     NOT OK       0.282812     NOT OK       0.681137   
2  post_spends     1     NOT OK       0.309598     NOT OK       0.148130   
3  post_spends     2     NOT OK       0.399857     NOT OK       0.797893   

  Chi2Test pass  Chi2Test p-value  control mean   test mean  difference  \
0           NaN               NaN    486.929299  486.918048   -0.011251   
1           NaN               NaN    452.216131  451.240568   -0.975563   
2           NaN               NaN    486.929299  487.432396    0.503097   
3           NaN               NaN    452.216131  453.030412    0.814281   

   difference %  
0     -0.002311  
1     -0.215729  
2      0.103320  
3      0.180065  

In [11]:
result.multitest

"There was less than three groups or multitest method wasn't provided"

In [12]:
result.sizes

   control size  test size  control size %  test size % group
1          3338       3319              50           49     1
2          3338       3343              49           50     2

## ABn Test 

Finally, we may run multiple ab tests with different methods.

In [13]:
test = ABTest(multitest_method="bonferroni")
result = test.execute(data)

In [14]:
result.resume

       feature group TTest pass  TTest p-value  control mean   test mean  \
0   pre_spends     1     NOT OK       0.980263    486.929299  486.918048   
1   pre_spends     2     NOT OK       0.282812    452.216131  451.240568   
2  post_spends     1     NOT OK       0.309598    486.929299  487.432396   
3  post_spends     2     NOT OK       0.399857    452.216131  453.030412   

   difference  difference %  
0   -0.011251     -0.002311  
1   -0.975563     -0.215729  
2    0.503097      0.103320  
3    0.814281      0.180065  

In [15]:
result.sizes

   control size  test size  control size %  test size % group
1          3338       3319              50           49     1
2          3338       3343              49           50     2

In [16]:
result.multitest

   correction        field  new p-value  old p-value  rejected   test group
0    0.980263   pre_spends          1.0     0.980263     False  TTest     1
1    0.282812  post_spends          1.0     0.282812     False  TTest     1
2    0.309598   pre_spends          1.0     0.309598     False  TTest     2
3    0.399857  post_spends          1.0     0.399857     False  TTest     2