# AB test 

A/B testing is a research method that allows you to find out people's reaction to any changes. The study shows which of the two versions of the product or offer is better and gives greater effect.

In [1]:
import random

from hypex.dataset import Dataset, InfoRole, TreatmentRole, TargetRole
from hypex import HomogeneityTest

  from .autonotebook import tqdm as notebook_tqdm


## Creation of a new test dataset with synthetic data.
It is important to mark the data fields by assigning the appropriate roles:

* FeatureRole: a role for columns that contain features or predictor variables. Our split will be based on them. Applied by default if the role is not specified for the column.
* TreatmentRole: a role for columns that show the treatment or intervention.
* TargetRole: a role for columns that show the target or outcome variable.
* InfoRole: a role for columns that contain information about the data, such as user IDs.

In [2]:
data = Dataset(
    roles={
        "user_id": InfoRole(int),
        "treat": TreatmentRole(),
        "pre_spends": TargetRole(),
        "post_spends": TargetRole(), 
        "gender": TargetRole()
    }, data="data.csv",
)
data

      user_id  signup_month  treat  pre_spends  post_spends   age gender  \
0           0             0      0       488.0   414.444444   NaN      M   
1           1             8      1       512.5   462.222222  26.0    NaN   
2           2             7      1       483.0   479.444444  25.0      M   
3           3             0      0       501.5   424.333333  39.0      M   
4           4             1      1       543.0   514.555556  18.0      F   
...       ...           ...    ...         ...          ...   ...    ...   
9995     9995            10      1       538.5   450.444444  42.0      M   
9996     9996             0      0       500.5   430.888889  26.0      F   
9997     9997             3      1       473.0   534.111111  22.0      F   
9998     9998             2      1       495.0   523.222222  67.0      F   
9999     9999             7      1       508.0   475.888889  38.0      F   

        industry  
0     E-commerce  
1     E-commerce  
2      Logistics  
3     E-com

In [3]:
data.roles

{'user_id': Info(<class 'int'>),
 'treat': Treatment(<class 'int'>),
 'pre_spends': Target(<class 'float'>),
 'post_spends': Target(<class 'float'>),
 'gender': Target(<class 'str'>),
 'signup_month': Feature(<class 'int'>),
 'age': Feature(<class 'float'>),
 'industry': Feature(<class 'str'>)}

## Homogeneity Test  

In [4]:
test = HomogeneityTest()
result = test.execute(data)

In [5]:
result.resume

       feature group TTest pass  TTest p-value KSTest pass  KSTest p-value  \
0   pre_spends     1     NOT OK   2.315047e-30      NOT OK    1.559150e-13   
1  post_spends     1     NOT OK   0.000000e+00      NOT OK    0.000000e+00   
2       gender     1        NaN            NaN         NaN             NaN   

  Chi2Test pass  Chi2Test p-value  
0           NaN               NaN  
1           NaN               NaN  
2            OK          0.351553  