# Custom Experiment Analysis with Optimizely Stats Engine Service (Abbreviated)

## The experiment

We'll use simulated data from the following Optimizely Full Stack "experiment" in this notebook:

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/optimizely/ses-research-public/master/img/control.png" alt="Control" style="width:100%; padding-left:0px">
        </td>
        <td>
            <img src="https://raw.githubusercontent.com/optimizely/ses-research-public/master/img/message_1.png" alt="Message #1" style="width:100%; padding-right:0px">
        </td>
        <td>
            <img src="https://raw.githubusercontent.com/optimizely/ses-research-public/master/img/message_2.png" alt="Message #2" style="width:100%; padding-right:0px">
        </td>
    </tr>
    <tr>
        <td style="background-color:white; text-align:center">
            "control"
        </td>
        <td style="background-color:white; text-align:center">
            "message_1"
        </td>
        <td style="background-color:white; text-align:center">
            "message_2"
        </td>
    </tr>
</table>

## The challenge

What impact did this experiment have on Customer support call center volumes?  

Customer support calls are managed and tracked by a third party and do not appear on Optimizely's results page.  We're going to load metric "observation" data computed from a variety of sources and use Optimizely's Stats Engine Service to compute sequential p-values and confidence intervals for these metrics.

## Initialize global variables

You'll need to enter a valid Optimizely Personal Access token to use Stats Engine Service.  

In [None]:
from getpass import getpass

# When this cell is run, you will be prompted to enter an Optimizely account ID and API token
OPTIMIZELY_ACCOUNT_ID = input("Enter your Optimizely account ID")
OPTIMIZELY_API_TOKEN = getpass("Enter your Optimizely session token")

print(f"Done.")

## Load time-aggregated metric data from our experiment

We'll use [Pandas](https://pandas.pydata.org/) to load and manipulate data.

In [None]:
import pandas as pd

# Load time-aggregated metric data
metric_data = pd.read_csv(
  "https://raw.githubusercontent.com/optimizely/ses-research-public/master/time_aggregated_metric_data.csv", 
  dtype={"variation_id" : str, "experiment_id" : str, "reference_variation_id" : str},
  parse_dates=["interval_timestamp"]
)

We can use the Pandas `head` function to examine our data:

In [None]:
metric_data.head()

Pandas also supports SQL-style queries using the `pandasql` module.  We'll start by install and importing `pandasql`

## Computing sequential statistics with Stats Engine Service

We'll start by transforming our metric data into the request format expected by Stats Engine Service.

In [None]:
# The CSV file we loaded contains timeseries data associated with several 
# different business metrics.  We're going to send data for two of these
# metrics to Stats Engine Service:
ses_metric_names = [
    "Customer support calls per visitor",
    "Total customer support minutes per visitor"
]

# Stats Engine Service expects a specific set of columns with each datapoint.
ses_metric_input_columns = [
  "interval_timestamp",
  "variation_id",
  "unit_count",
  "unit_observation_sum",
  "unit_observation_sum_of_squares"
]

# metric_data is a single dataframe containing time-aggregated data for several
# different business metrics.  Stats Engine Service expects input data to be 
# split out by metric, so we start by splitting metric_data into a list of 
# separate dataframes, one for each metric in our ses_metric_names list.
metric_dfs = [
    metric_data \
      .assign(interval_timestamp=metric_data.interval_timestamp.astype(int) / 10**9)
      [metric_data.metric_name == metric_name] \
      [ses_metric_input_columns] \
      .sort_values("interval_timestamp")
    for metric_name in ses_metric_names
]

# Construct the request headers expected by Stats Engine Service
ses_request_headers = {
  "Content-Type": "application/json",
  "account": f"{OPTIMIZELY_ACCOUNT_ID}",
  "Authorization": f"Bearer {OPTIMIZELY_API_TOKEN}"
}

# Construct the request data expected by Stats Engine Service
ses_request_data = {
  "config": {
    "reference_variation_id": "control",
    "use_stats_resets": True,
  },
  "metrics": [
    {
      "config": {
        "is_binary": False,
      },
      "data": obs_df.to_dict("records") # Convert dataframes to JSON
    }
    for obs_df in metric_dfs
  ]
}

Let's take a look at the input data we'll send to Stats Engine Service:

In [None]:
import pprint 

print(f"{pprint.pformat(ses_request_headers)}")

print(f"{pprint.pformat(ses_request_data)[:2000]}...")

The input data contains two high-level components:

- `config` - a set of high level configuration options
- `metrics` - a list of configs+data with one entry per input metric

Each "metric" object in the `metrics` list contains a metric-specific configuration, and a list of datapoints (`data`).  Each datapoint is associated with a particular interval in time, and contains the following fields:

- `interval_timestamp` - the unix timestamp (in seconds) associated with this datapoint
- `unit_count` - refers the number of subjects observed during this time period. "Units" are the things that are exposed to treatments in your experiment.  In most experiments a "unit" is a website visitor or app user, but some experiments use alternatives such as visitor sessions or service requests.
- `unit_observation_sum` - the sum of the numerical "observations" we've made about the units in this time interval.  For a conversion rate metric, this value would be the number of the visitors who took some specified action _at least once_ during our experiment.
- `unit_observation_sum_of_squares` - the sum of the squares of the numerical observations made about the units in this time interval.  Stats Engine Service uses this value to estimate the variance in the input data.


Now that we've constructed our input data, we're ready to send it to Stat Engine Service in order to compute sequential p-values and confidence intervals.

In [None]:
import requests

STATS_ENGINE_SERVICE_URI = "https://api.optimizely.com/stats-engine/v0/batch"

# Send the request to Stats Engine Service
ses_response = requests.post(
  STATS_ENGINE_SERVICE_URI, 
  headers=ses_request_headers, 
  json=ses_request_data
)

# Check to make sure SES did not return an error
if ses_response.status_code != 200:
  raise Exception(f"Error: received {ses_response.status_code} from Stats Engine Service ({STATS_ENGINE_SERVICE_URI}): {ses_response.text}")

# Convert the data returned by SES into a list of DataFrames so that it is
# easier to explore
stats_dfs = [pd.DataFrame(stats_json) for stats_json in ses_response.json()]

# Combine the SES response dataframes into a single dataframe
results = pd.concat(
    stats_dfs, 
    keys=ses_metric_names, 
    names = ["metric_name"],
)

We've stored the response from Stats Engine Service in a list of dataframes, `stats_dfs`.  Each dataframe in this list contains sequential stats data corresponding to the metric represented by the input data in `metric_dfs`.

In order to make it easier to examine the Stats Engine Service output, we've combined these metric-specific results dataframe into one large `results` dataframe.

Let's take a look at our `results` data:

In [None]:
results.head()

**That's it**-- we've used Stats Engine Service to compute sequential, always-valid p-values and confidence intervals for our experiment data.

The dataset returned by Stats Engine Service contains many fields, but we are primarily concerned with just a few of them:

- `corrected_p_value` - the cumulative "always valid" p-value corresponding to a particular variation during a particular time interval.  "`corrected`" refers to the False Discovery Rate correction that Stats Engine applies to correct for multiple-comparisons errors.
- `corrected_conf_interval_lower` and `corrected_conf_interval_upper` - the cumulative "always valid" lower and upper bounds on the "true value" of the metric for a specified variation

Since each of these fields is cumulative, we can look at the last time interval in the response to get the "current" values of each:

In [None]:
results.groupby(["metric_name", "variation_id"]).last()[[
    "lift_estimate",
    "corrected_p_value",
    "corrected_conf_interval_lower", 
    "corrected_conf_interval_upper"
]]

## Rendering a results report

In this section we'll render a simple experiment report with our data.  We'll need to download a special python library and a set of HTML and CSS templates to do this.

In [None]:
%%bash

# Remove old copies of this library so that this cell
# may be run more than once
rm -rf master.zip ses-research-public-master lib

# Download a zipped copy of the github repository containing our rendering library
curl -L -O https://github.com/optimizely/ses-research-public/archive/master.zip 

# Unzip the repository
unzip -q master.zip

# Move the rendering library to our working directory
mv ses-research-public-master/lib .

In [None]:
from lib.report import render
from IPython.display import display, HTML

for i, metric_name in enumerate(ses_metric_names):
    table_html = render.render_se_metric_overview_table(
        observations_timeseries=metric_dfs[i],
        statistics=stats_dfs[i],
        reference_variation_id="control",
        metric_name=metric_name
    )
    display(HTML(table_html))