Experiments

## Intro

One of the most troublesome tasks that we have is running multiple statistical test (z-test)
while analysing results of an experiment. We often have to perform this operation for
all the metrics in a test.

Here's how to do quickly. :)

## Data prep

The key step is to have the data prepared in a standardized way. What is absolutely necessary?
* having each variant in a separate row;
* having `variant` column. Its name can be different, but we need to have variant as a
dimension;
* having `total_users` column. Again, give it your name, but we'll need this number to
calculate all the conversion rates;
* list of metrics / column names that we want to perform the tests for.

Below is a good example (other columns are allowed, we'll simply omit them in the analysis):

| date | market | variant | total\_users | converted\_users1 | converted\_users2
| :--- | :--- | :--- | :--- | :--- | :--- |
| 2021-04-23 | us | 0 | 24386 |86 |246 |
| 2021-04-23 | us | 1 | 24376 | 376 | 243 |

### Data for the analysis of statistical means
Analyzing statistical means will require one more column (or columns) in the analyzed dataset. In order to calculate
the t-stat for means providing variance is required. Details on how to add to the dataset will be covered in the example
with mean analysis.

## Running the test
Having the data prepared in that way, running the tests will require just
two of lines of code.

```python
from da_toolkit.experiments import Analysis
Analysis(df=df, metrics=['converted_users1', 'converted_users2'])
```

The test are being run using a Python package called [statsmodels](https://www.google.com)
(and its `proportions_ztest` function). But you don't have to worry, it produces the same
results as the spreadsheet-based solution you may be familiar with.

That's it. Let's take a look at a real-life example.

## Example
### T-test for proportions

In [1]:
# importing components
from da_toolkit.databases import BigQuery
from da_toolkit.experiments import Analysis

Getting my experiments data

In [2]:
bq = BigQuery(project='brainly-tutoring')
query = "SELECT * FROM `brainly-tutoring.experiments.us_and_PlansInMetering` WHERE date = '2021-02-26'"
df = bq.query(query)

df

Unnamed: 0,date,market,variant,total_users,users_add_answer,users_add_question,users_app_exception,events_app_exception,users_content_block,events_content_block,...,users_sign_up,users_answer_display,users_answer_read,users_tutoring_intro,users_subs_form_tut,total_subs,tutoring_subs,bplus_subs,metering_tutoring_subs,metering_bplus_subs
0,2021-02-26,us,2,24830,416,925,115,212,7671,51052,...,1074,20472,20226,1873,1559,274,72,204,53,203
1,2021-02-26,us,0,25280,424,911,118,182,7846,49936,...,1089,21007,20744,1872,1559,242,72,170,51,170
2,2021-02-26,us,1,24900,401,947,117,203,7744,44397,...,1046,20642,20355,1845,1508,272,72,200,52,200


Running the tests for 3 metrics (by putting their column names as `metrics` argument)

The results will be saved as 2 attributes - a dictionary (`results`) or (more convenient)
pandas Data Frame (`results_df`)

In [3]:
exp = Analysis(df=df, metrics=['users_add_answer', 'bplus_subs', 'tutoring_subs'], alpha=0.1)
exp.results_df

Unnamed: 0,Unnamed: 1,cvr,delta,z_stat,p_val,power,res
users_add_answer,1,0 0.016772 1 0.016104 dtype: float64,-0.039812,0.588118,0.556453,0.158106,not significant
users_add_answer,2,0 0.016772 1 0.016754 dtype: float64,-0.001087,0.015888,0.987323,0.100043,not significant
bplus_subs,1,0 0.006725 1 0.008032 dtype: float64,0.194425,-1.711661,0.086959,0.527466,significant!
bplus_subs,2,0 0.006725 1 0.008216 dtype: float64,0.221748,-1.939096,0.05249,0.616473,significant!
tutoring_subs,1,0 0.002848 1 0.002892 dtype: float64,0.015261,-0.091006,0.927488,0.101405,not significant
tutoring_subs,2,0 0.002848 1 0.002900 dtype: float64,0.018123,-0.107922,0.914057,0.101975,not significant


We got the results as a table listing all of my analyzed metrics and results of the test
in columns.

### T-test for means
Getting my experiments data

In [4]:
# we're using a bit different data this time
query = "SELECT * FROM `brainly-tutoring.experiments.us_web_TutorVerified1day2`"
df = bq.query(query)

df

Unnamed: 0,experimentVariant,user_counter,times_user_logged_prime,number_of_answers_prime,number_of_questions_asked_prime,number_of_search_prime,number_of_rates_prime,number_of_thanks_prime,var_times_user_logged_prime,var_number_of_answers_prime,var_number_of_questions_asked_prime,var_number_of_search_prime,var_number_of_rates_prime,var_number_of_thanks_prime
0,0,114973,0.064111,0.018831,0.015491,1.462778,0.05383,0.063745,0.162826,0.184238,0.047172,33.307118,0.595552,0.717025
1,1,115698,0.066881,0.020337,0.01427,1.483621,0.054521,0.065498,0.207512,0.238962,0.04641,34.399228,0.526963,0.755833
2,2,114672,0.065543,0.017354,0.015758,1.48255,0.052062,0.062666,0.132914,0.162321,0.054421,35.230155,0.498323,0.639374


Similarly, we will run the tests for 3 metrics (again, by putting their column names as `metrics` argument).
This time we'll also have to specify columns that hold respective variance value for the metrics. If we don't `Analysis`
will assume that the data contains columns that have the same names as metrics but with `var_` preffix.

To specify running the test for means we add one more argument `kind='mean'`.

In [7]:
metrics = ['times_user_logged_prime', 'number_of_answers_prime','number_of_questions_asked_prime']
exp = Analysis(df=df, metrics=metrics, variant_col='experimentVariant', total_col='user_counter',
               alpha=0.05, kind='mean')
exp.results_df

Unnamed: 0,Unnamed: 1,mean,delta,t_stat,p_val,res
times_user_logged_prime,1,"[0.06411070425230271, 0.06688101782226144]",0.043211,-1.545706,0.122177,not significant
times_user_logged_prime,2,"[0.06411070425230271, 0.06554346309473984]",0.022348,-0.892694,0.372022,not significant
number_of_answers_prime,1,"[0.018830508032320645, 0.020337430206226578]",0.080026,-0.786519,0.431564,not significant
number_of_answers_prime,2,"[0.018830508032320645, 0.017353844007255478]",-0.078419,0.84994,0.395359,not significant
number_of_questions_asked_prime,1,"[0.015490593443678091, 0.014269909592214226]",-0.078802,1.355171,0.175364,not significant
number_of_questions_asked_prime,2,"[0.015490593443678091, 0.015757988000558117]",0.017262,-0.284286,0.776192,not significant


I will continue updating documentation of `da_toolkit` to describe all the capabilities.
You can always run `help()` to get to know more about a module.

In [8]:
help(Analysis)


Help on class Analysis in module da_toolkit.experiments:

class Analysis(builtins.object)
 |  Analysis(df, metrics, variances=None, total_col='total_users', variant_col='variant', kind='prop', alpha=0.05)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, df, metrics, variances=None, total_col='total_users', variant_col='variant', kind='prop', alpha=0.05)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  extract_variances(self)
 |  
 |  get_variants(self)
 |  
 |  run(self)
 |  
 |  save_to_df(self)
 |  
 |  sort_df(self)
 |  
 |  test_mean(self, metric, variance)
 |  
 |  test_prop(self, metric)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

