# `rosalie` usage examples

## Simple example

### Import dependencies

- Import `rosalie` and other dependencies to specify your evaluator methods.

In [3]:
from scipy.stats import mannwhitneyu
from statsmodels.stats.weightstats import ttest_ind

import rosalie as ro

# High resulution plots
%config InlineBackend.figure_format ='retina'

### Load data

- You can load data any way and from anywhere you want.

- The only requirement on the data is that there is an `id` column and a column with at least one metric you want to perform the evaluation for.

- Below I use `rosalie`'s data reader, which will eventually make it very easy to read customer and restaurant level JET data. For now it reads a small customer-level dataset that I then process further to get into the needed shape.

- Specifying a `cache_path` is optional, but automatically stores data locally if it doesn't exist already to allow for faster reading in the future.

In [4]:
CACHE_PATH = f"/Users/fabian.gunzinger/tmp/rosalie/customer.csv"

df = (
    ro.DataReader().load_data("customer", cache_path=CACHE_PATH)
    .groupby('id')
    .order_price
    .mean()
    .dropna()
    .reset_index()
)
df.head()

Reading data from cache...


Unnamed: 0,id,order_price
0,JE:IE:1000053,18.8512
1,JE:IE:1000071,25.94375
2,JE:IE:1000093,25.0
3,JE:IE:1000153,17.041111
4,JE:IE:100021,21.775


### Define evaluation functions

- You can specify any evaluation functions you want.

- The only requirements are that they take a Dataframe and a metric as arguments, and return a p-value.

In [5]:
def welch(df, metric):
    """Return p-value of Welch's t-test."""
    control_sample = df[df["assignments"] == "control"][metric]
    variant_sample = df[df["assignments"] == "treatment"][metric]
    _, p, _ = ttest_ind(control_sample, variant_sample, usevar="unequal")
    return p


def mmw(df, metric):
    """
    Returns p-value of Mann-Whitney-Wilcoxon rank-sum test.
    """
    control_sample = df[df["assignments"] == "control"][metric]
    variant_sample = df[df["assignments"] == "treatment"][metric]
    _, p = mannwhitneyu(control_sample, variant_sample, alternative="two-sided")
    return p

### Run simulator

- To run the simulator, pass the data and the evaluator functions.

- There are a number of optional arguments we will use below, but the class has sensible defaults.

- The `Simulator` returns a `Result` object which contains the evaluation results and allows us to plot them easily. To see the plot, use the `plot()` method.

In [6]:
eval = ro.Simulator(
    df=df,
    metrics=['order_price'],
    evaluators=[welch, mmw],
    )
result = eval.run()
result.plot()

INFO - Initializing Simulator with specified evaluators: ['welch', 'mmw']
INFO - Generating datasets...


100%|██████████| 20/20 [00:41<00:00,  2.10s/it]
INFO - Evaluating experiments...
100%|██████████| 2000/2000 [00:50<00:00, 39.51it/s] 


To access the precise results, use the `data` attribute.

In [7]:
result.data

Unnamed: 0,metric,evaluator,sample_size,mdes,power
0,order_price,mmw,100,0.01,0.1
1,order_price,mmw,7369,0.01,0.14
2,order_price,mmw,14637,0.01,0.3
3,order_price,mmw,21906,0.01,0.44
4,order_price,mmw,29174,0.01,0.6
5,order_price,mmw,36443,0.01,0.72
6,order_price,mmw,43711,0.01,0.78
7,order_price,mmw,50980,0.01,0.82
8,order_price,mmw,58248,0.01,0.82
9,order_price,mmw,65517,0.01,0.92


## More complex example -- comparing CUPED implementations

### Import dependencies

In [8]:
import logging

import numpy as np
import pandas as pd
import statsmodels.api as sm

from causaljet.experiment_evaluation.models import Cuped
import rosalie as ro

# Silence info logging of root logger to silence Cuped logging
logging.getLogger().setLevel(logging.WARNING)


%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2

### Load data

In [9]:
UNIT_LEVEL = "customer"
UNIT_ID = 'id'
METRICS = ['order_price', 'gmv']
PRE_PERIOD = '1 Jan 2023', '31 May 2023'
POST_PERIOD ='1 Jun 2023', ' 31 Aug 2023'

CACHE_PATH = f'/Users/fabian.gunzinger/tmp/rosalie/{UNIT_LEVEL}.csv'

df = (
    ro.DataReader().load_data(UNIT_LEVEL, cache_path=CACHE_PATH)
    .pipe(ro.add_artificial_gmw)
    .pipe(ro.create_pre_post_data, 
          id_col=UNIT_ID,
          metrics=METRICS,
          pre_period=PRE_PERIOD,
          post_period=POST_PERIOD)
)
ro.data_info(df)

Reading data from cache...
Shape: (62279, 5)
Units: 62,279
                  id  order_price        gmv  order_price_pre    gmv_pre
53943  JE:IE:1000071    22.625000  25.030966        27.049999  25.716751
53944  JE:IE:1000153    14.245000  26.105558        17.390625  27.075264
53945   JE:IE:100021    12.950000  23.238115        30.600000  21.306284
53947  JE:IE:1000294    23.299999  22.887318        29.455999  26.708223
53948  JE:IE:1000373    27.950001  25.307949        26.139999  24.921795


### Define evaluation functions

In [10]:
def causal_jet_cuped(df, metric):
    """Run Causal Jet CUPED implementation and return p-value.

    Because data is already pre-processed, we only need to supply the following:
    - A cross-section dataframe with `metric` and `metric_pre` columns to `ass_w_cov_panel_df`
    - The metric name to `metric_name`
    - The unit identifier to `unit_identifier`

    All other parameters can be left as default.
    """
    result = Cuped(
        ass_w_cov_panel_df=df,
        metric_name=metric,
        unit_identifier=UNIT_ID,
        cluster_identifier=UNIT_ID,
        is_treated_col='is_treated',
        weight_col='assignments_freq',
        additional_regressors=[],
        start_date=None,
        date_identifier=None,
        lookback=None,
    )._get_results()
    return result.pvalues[1]


def traditional_cuped(df, metric):
    """Run traditional CUPED and return p-value."""
    
    def _cuped_adjusted_metric(df, metric, metric_pre):
        y = df[metric].values
        x = df[metric_pre].values
        valid_indices = (~np.isnan(y)) & (~np.isnan(x))
        y_valid, x_valid = y[valid_indices], x[valid_indices]
        m = np.cov(y_valid, x_valid)
        theta = m[0, 1] / m[1, 1]
        return (y - (x - np.nanmean(x)) * theta)

    # Perform experiment evaluation and return p-value
    # (Use WLS to be consistent with CausalJet)
    y = _cuped_adjusted_metric(df, metric, f"{metric}_pre")
    x = sm.add_constant(df["is_treated"].astype(float))
    w = df["assignments_freq"]
    model = sm.WLS(endog=y, exog=x, weights=w)
    results = model.fit()
    return results.pvalues["is_treated"]

In [11]:
evaluators = [traditional_cuped, causal_jet_cuped]

simulator = ro.Simulator(
    df=df,
    evaluators=evaluators,
    # preprocessors=None,
    baseline_evaluator="wls",
    id_col=UNIT_ID,
    # time_col="timeframe",
    metrics=METRICS,
    # sample_min=None,
    # sample_max=None,
    num_steps=20,
    # sample_timestamps=False,
    num_runs=50,
    # random_seed=2312,
    mdes=[0.01],
    # alpha=0.05,
    )

result = simulator.run()
print(result.data.head())
result.plot()

INFO - Initializing Simulator with specified evaluators: ['wls', 'traditional_cuped', 'causal_jet_cuped']


2023-11-02 15:01:16,697 | rosalie.simulator    | simulator.py:295 | INFO     | Initializing Simulator with specified evaluators: ['wls', 'traditional_cuped', 'causal_jet_cuped']


INFO - Generating datasets...


2023-11-02 15:01:16,701 | rosalie.simulator    | simulator.py:110 | INFO     | Generating datasets...


100%|██████████| 20/20 [00:17<00:00,  1.15it/s]
INFO - Evaluating experiments...


2023-11-02 15:01:34,175 | rosalie.simulator    | simulator.py:127 | INFO     | Evaluating experiments...


100%|██████████| 6000/6000 [00:30<00:00, 197.64it/s]


  metric         evaluator  sample_size  mdes  power
0    gmv  causal_jet_cuped          100  0.01   0.04
1    gmv  causal_jet_cuped         3373  0.01   0.80
2    gmv  causal_jet_cuped         6645  0.01   0.98
3    gmv  causal_jet_cuped         9918  0.01   1.00
4    gmv  causal_jet_cuped        13190  0.01   1.00


## Gotchas

- Ensure that your evaluator methods don't have side effects. For instance, if your evaluator changes values in the dataset it is passed, this can affect the results of evaluators used later on.

- For example: if you define first run an evaluator that CUPED-adjusts the metric and that overwrites the metric value in the dataset, then subsequent evaluators that don't CUPED-adjust the metric but are just evaluating the metric will effectively be evaluating the CUPED-adjusted metric.

- To avoid this, make sure that your evaluator methods don't change the dataset they are passed. If you need to change the dataset, make a copy of it first and change that copy.

- We could also deal with this in the evaluator class, by making a copy of the sample data before it is passed to each evaluator. But this would be inefficient, so it's better to deal with it in the evaluator methods themselves.