Experiments

## Intro

One of the most troublesome tasks that we have is running multiple statistical test (z-test)
while analysing results of an experiment. We often have to perform this operation for
all the metrics in a test.

Here's how to do quickly. :)

## Data prep

The key step is to have the data prepared in a standardized way. What is absolutely necessary:
* having each variant in a separate row
* having `variant` column. Its name can be different, but we need to have variant as a
dimension
* having `total_users` column. Again, give it your name, but we'll need this number to
calculate all the conversion rates
* list of metrics / column names that we want to perform the tests for 

Below is a good example (other columns are allowed, we'll simply omit them in the analysis):

| date | market | variant | total\_users | converted\_users1 | converted\_users2
| :--- | :--- | :--- | :--- | :--- | :--- |
| 2021-04-23 | us | 0 | 24386 |86 |246 |
| 2021-04-23 | us | 1 | 24376 | 376 | 243 |

## Running the test
Having the data prepared in that way, running the tests will require just
two of lines of code.

```python
from da_toolkit.experiments import Analysis
Analysis(df=df, metrics=['converted_users1', 'converted_users2'])
```

That's it. Let's take a look at a real-life example.

## Example

In [1]:
# importing components
from da_toolkit.databases import BigQuery
from da_toolkit.experiments import Analysis

Getting my experiments data

In [2]:
bq = BigQuery(project='brainly-tutoring')
query = "SELECT * FROM `brainly-tutoring.experiments.us_and_PlansInMetering` WHERE date = '2021-02-26'"
df = bq.query(query)

df

Unnamed: 0,date,market,variant,total_users,users_add_answer,users_add_question,users_app_exception,events_app_exception,users_content_block,events_content_block,...,users_sign_up,users_answer_display,users_answer_read,users_tutoring_intro,users_subs_form_tut,total_subs,tutoring_subs,bplus_subs,metering_tutoring_subs,metering_bplus_subs
0,2021-02-26,us,2,24830,416,925,115,212,7671,51052,...,1074,20472,20226,1873,1559,274,72,204,53,203
1,2021-02-26,us,0,25280,424,911,118,182,7846,49936,...,1089,21007,20744,1872,1559,242,72,170,51,170
2,2021-02-26,us,1,24900,401,947,117,203,7744,44397,...,1046,20642,20355,1845,1508,272,72,200,52,200


Running the tests for 3 metrics (by putting their column names as `metrics` argument)

The results will be saved as 2 attributes - a dictionary (`results`) or (more convenient)
pandas Data Frame (`results_df`)

In [6]:
exp = Analysis(df=df, metrics=['users_add_answer', 'bplus_subs', 'tutoring_subs'])
exp.results_df

Unnamed: 0,Unnamed: 1,cvr,delta,z_stat,p_val,power,res
users_add_answer,1,0 0.016754 1 0.016772 dtype: float64,0.001088,-0.015888,0.493662,0.050029,not significant
users_add_answer,2,0 0.016754 1 0.016104 dtype: float64,-0.038768,0.569717,0.284435,0.087938,not significant
bplus_subs,1,0 0.008216 1 0.006725 dtype: float64,-0.181501,1.939096,0.026245,0.492341,significant!
bplus_subs,2,0 0.008216 1 0.008032 dtype: float64,-0.022364,0.228229,0.409734,0.055988,not significant
tutoring_subs,1,0 0.002900 1 0.002848 dtype: float64,-0.017801,0.107922,0.457029,0.051335,not significant
tutoring_subs,2,0 0.002900 1 0.002892 dtype: float64,-0.002811,0.016916,0.493252,0.050033,not significant


I got the results as a table listing all of my analyzed metrics and results of the test
in columns.

I will continue updateing documentation of `da_toolkit` to describe all the capabilities.
You can alway run `help()` to get to know more about a module.

In [7]:
help(Analysis)

Help on class Analysis in module da_toolkit.experiments:

class Analysis(builtins.object)
 |  Analysis(df, metrics, total_col='total_users', variant_col='variant', alpha=0.05)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, df, metrics, total_col='total_users', variant_col='variant', alpha=0.05)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  get_variants(self)
 |  
 |  run(self)
 |  
 |  save_to_df(self)
 |  
 |  test_metric(self, metric)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

