# AB test 

A/B testing is a research method that allows you to find out people's reaction to any changes. The study shows which of the two versions of the product or offer is better and gives greater effect.

In [1]:
import random

from hypex.dataset import Dataset, InfoRole, TreatmentRole, TargetRole
from hypex import ABTest

## Creation of a new test dataset with synthetic data.
It is important to mark the data fields by assigning the appropriate roles:

* FeatureRole: a role for columns that contain features or predictor variables. Our split will be based on them. Applied by default if the role is not specified for the column.
* TreatmentRole: a role for columns that show the treatment or intervention.
* TargetRole: a role for columns that show the target or outcome variable.
* InfoRole: a role for columns that contain information about the data, such as user IDs.

In [2]:
data = Dataset(
    roles={
        "user_id": InfoRole(int),
        "treat": TreatmentRole(),
        "pre_spends": TargetRole(),
        "post_spends": TargetRole(), 
        "gender": TargetRole()
    }, data="data.csv",
)
data

      user_id  signup_month  treat  pre_spends  post_spends   age gender  \
0           0             0      0       488.0   414.444444   NaN      M   
1           1             8      1       512.5   462.222222  26.0    NaN   
2           2             7      1       483.0   479.444444  25.0      M   
3           3             0      0       501.5   424.333333  39.0      M   
4           4             1      1       543.0   514.555556  18.0      F   
...       ...           ...    ...         ...          ...   ...    ...   
9995     9995            10      1       538.5   450.444444  42.0      M   
9996     9996             0      0       500.5   430.888889  26.0      F   
9997     9997             3      1       473.0   534.111111  22.0      F   
9998     9998             2      1       495.0   523.222222  67.0      F   
9999     9999             7      1       508.0   475.888889  38.0      F   

        industry  
0     E-commerce  
1     E-commerce  
2      Logistics  
3     E-com

In [3]:
data["treat"] = [random.choice([0, 1, 2]) for _ in range(len(data))]
data

      user_id  signup_month  treat  pre_spends  post_spends   age gender  \
0           0             0      1       488.0   414.444444   NaN      M   
1           1             8      0       512.5   462.222222  26.0    NaN   
2           2             7      0       483.0   479.444444  25.0      M   
3           3             0      0       501.5   424.333333  39.0      M   
4           4             1      0       543.0   514.555556  18.0      F   
...       ...           ...    ...         ...          ...   ...    ...   
9995     9995            10      0       538.5   450.444444  42.0      M   
9996     9996             0      2       500.5   430.888889  26.0      F   
9997     9997             3      0       473.0   534.111111  22.0      F   
9998     9998             2      2       495.0   523.222222  67.0      F   
9999     9999             7      2       508.0   475.888889  38.0      F   

        industry  
0     E-commerce  
1     E-commerce  
2      Logistics  
3     E-com

In [4]:
data.roles

{'user_id': Info(<class 'int'>),
 'treat': Treatment(<class 'int'>),
 'pre_spends': Target(<class 'float'>),
 'post_spends': Target(<class 'float'>),
 'gender': Target(<class 'str'>),
 'signup_month': Default(<class 'int'>),
 'age': Default(<class 'float'>),
 'industry': Default(<class 'str'>)}

## Simple AB Test 

Simple pipline contains group sizes, group differences and TTest estimation. 

In [5]:
test = ABTest()
result = test.execute(data)

In [6]:
result.resume

       feature group TTest pass  TTest p-value  control mean   test mean  \
0   pre_spends     1     NOT OK       0.298872    486.823175  487.136096   
1  post_spends     1     NOT OK       0.576805    486.823175  487.310187   

   difference  difference %  
0    0.312921      0.064278  
1    0.487012      0.100039  

In [7]:
result.sizes 

         control size  test size  control size %  test size % group
1┆treat          3391       3377              50           49     1
2┆treat          3391       3232              51           48     2

In [8]:
result.multitest

"There was less than three groups or multitest method wasn't provided"

## Additional tests in AB Test 

It is possible to add u-test and chi2-test in pipline. 

In [9]:
test = ABTest(additional_tests=['t-test', 'u-test', 'chi2-test'])
result = test.execute(data)

In [10]:
result.resume

       feature group TTest pass  TTest p-value UTest pass  UTest p-value  \
0   pre_spends     1     NOT OK       0.298872     NOT OK       0.264834   
1  post_spends     1     NOT OK       0.576805     NOT OK       0.851549   

  Chi2Test pass  Chi2Test p-value  control mean   test mean  difference  \
0           NaN               NaN    486.823175  487.136096    0.312921   
1           NaN               NaN    486.823175  487.310187    0.487012   

   difference %  
0      0.064278  
1      0.100039  

In [11]:
result.multitest

"There was less than three groups or multitest method wasn't provided"

In [12]:
result.sizes

          control size  test size  control size %  test size % group
1┆gender          3391       3377              50           49     1
2┆gender          3391       3232              51           48     2

## ABn Test 

Finally, we may estimate multiple ab test with different methods.

In [13]:
test = ABTest(multitest_method="bonferroni")
result = test.execute(data)

In [14]:
result.resume

       feature group TTest pass  TTest p-value  control mean   test mean  \
0   pre_spends     1     NOT OK       0.298872    486.823175  487.136096   
1  post_spends     1     NOT OK       0.576805    486.823175  487.310187   

   difference  difference %  
0    0.312921      0.064278  
1    0.487012      0.100039  

In [15]:
result.sizes

          control size  test size  control size %  test size % group
1┆gender          3391       3377              50           49     1
2┆gender          3391       3232              51           48     2

In [16]:
result.multitest

   correction        field  new p-value  old p-value  rejected   test group
0    0.496498   pre_spends          1.0     0.496498     False  TTest     1
1    0.298872  post_spends          1.0     0.298872     False  TTest     1
2    0.896968   pre_spends          1.0     0.896968     False  TTest     2
3    0.576805  post_spends          1.0     0.576805     False  TTest     2