# Run Campaign and Analyze Results

In [1]:
%load_ext autoreload
%autoreload 2

::: {.content-hidden}
Import necessary Python modules
:::

In [2]:
import os
import sys
from calendar import month_name
from datetime import datetime
from typing import List, Tuple

import mlflow.sklearn
import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.stats import proportion

::: {.content-hidden}
Get relative path to project root directory
:::

In [3]:
PROJ_ROOT_DIR = os.path.join(os.pardir)
src_dir = os.path.join(PROJ_ROOT_DIR, "src")
sys.path.append(src_dir)

::: {.content-hidden}
Import custom Python modules
:::

In [4]:
%aimport audience_size_helpers
import audience_size_helpers as ash

%aimport model_helpers
import model_helpers as modh

%aimport statistical_checks
import statistical_checks as sc

%aimport utils
import utils as ut

## About

### Overview
This step assess the impact of running the marketing campaign. This step can be performed retrospectively, at the end of the campaign, when the outcome of the return visit of the first-time visitors to the merchandise store (during the inference period) is known. This is step 5. from a [typical A/B Testing workflow](https://www.datacamp.com/blog/data-demystified-what-is-a-b-testing).

For the current use-case, if the marketing campaign results in more conversions in the control cohort compared to the test cohort, then this could suggest that the campaign has grown the customer base and thereby met the objective of this project. However, a test of statistical significance will be needed in order to ensure that this impact seen by running the campaign (growth in conversions) was not a random occurrence.

This step compares the proportions (conversions) taken from two independent samples (test and control cohorts). The purpose is to determine if the conversion rate (KPI) of the test cohort is statistically different from that of the control cohort. If the

1. conversion rate is higher in the test cohort
2. difference in conversion rate between the test and control cohort is statistically significant at some level of confidence (eg. 95%)

then it is possible to say with 95% confidence that the campaign has grown the customer base.

### Implementation
In python, [such a comparison is implemented](https://stats.stackexchange.com/a/544507/144450) using the `statsmodels` library in the [`proportions_chisquare()` method](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html) where the `count` parameter represents the number of convertions in each cohort (test or control) and `nobs` represents the overall size of the same cohort.

## User Inputs

Define the following

1. start and end dates for inference data
2. confidence levels at which the difference in the cohort conversion rates is to be checked

In [5]:
#|echo: true
# 1. start and end dates
infer_start_date = "20170301"
infer_end_date = "20170331"

# 2. confidence levels to check difference in conversion rates
ci_levels = np.arange(0.20, 1.00, 0.05, dtype=float)

::: {.content-hidden}
Get path to data sub-folders
:::

In [6]:
data_dir = os.path.join(PROJ_ROOT_DIR, "data")
raw_data_dir = os.path.join(data_dir, "raw")
processed_data_dir = os.path.join(data_dir, "processed")

::: {.content-hidden}
Get the name of the month covering the inference data period
:::

In [7]:
infer_month = month_name[1:][int(infer_end_date[5:6]) - 1]

::: {.content-hidden}
Define MLFlow storage paths
:::

In [8]:
mlruns_db_fpath = f"{raw_data_dir}/mlruns.db"
mlflow.set_tracking_uri(f"sqlite:///{mlruns_db_fpath}")

::: {.content-hidden}
Set environment variable to silence MLFlow `git` warning messsage
:::

In [9]:
os.environ["GIT_PYTHON_REFRESH"] = "quiet"

The following helper functions are defined in the module `src/statistical_checks.py` and are used here

1. `get_inference_data_with_cohorts()`
   - loads inference predictions data with audience groups and cohorts assigned as separate columns
2. `get_outcome_labels()`
   - loads the outcome (ML label) of the inference predictions data
   - for demonstration purposes only, this outcome is randomly generated here
3. `get_cohorts()`
   - filters audience data to retrieve visitors that were placed in one of the two audience cohorts (test or control)
4. `get_overall_and_converted_cohort_sizes`
   - calculate the size of the overall cohort and the conversions, for both test and control cohorts
5. `check_significance_using_chisq()`
   - checks significance of the difference in conversions between test and control cohort

## Get Data

### Fetch Latest Version of Best Deployment Candidate Model from Model Registry

Get best deployment candidate model from model registry

In [10]:
#|echo: true
df_candidate_mlflow_models = modh.get_all_deployment_candidate_models()

### Get Inference Data with Audience Cohorts, Associated with Best Model

Load inference data, with the audience groups and cohorts shown as separate columns. This data should contain the outcome for all these first-time visitors to the store during the inference period. Filter this data to only get visitors who were placed in a test or control cohorts, and exclude others. This is done below using random outcomes (for demonstration purposes)

In [11]:
#|echo: true
df_infer_audience_cohorts = (
    sc.get_inference_data_with_cohorts(df_candidate_mlflow_models, "audience_cohorts")
    .pipe(sc.get_outcome_labels)
    .pipe(sc.get_cohorts, "cohort")
)

::: {.callout-note title="Notes"}

1. `sc.get_outcome_labels()` retrieves the campaign outcomes, which is the number of conversions in both the control and test cohorts. Here, for demonstration purposes, random values are used for the outcomes for both cohorts.
:::

::: {.content-hidden}
(Optional, Sanity check) Verify that equally sized test and control cohorts are found in each audience group in the inference data
:::

In [12]:
assert (
    df_infer_audience_cohorts.groupby(["maudience", "cohort"], as_index=False)[
        "fullvisitorid"
    ]
    .count()
    .rename(columns={"fullvisitorid": "num_visitors"})
    .groupby("maudience")["num_visitors"]
    .diff(1)
    .dropna()
    == 0
).all()

## Compare Difference in KPI Between Cohorts

Check if the difference between conversions across the two cohorts is statistically significant

In [13]:
#|echo: true
overall_sizes, conversion_sizes = sc.get_overall_and_converted_cohort_sizes(
    df_infer_audience_cohorts, verbose=False
)
df_sig_checks = sc.check_significance_using_chisq(
    overall_sizes, conversion_sizes, ci_levels
).pipe(
    ash.set_datatypes,
    {
        "check": pd.StringDtype(),
        "p_value": pd.Float32Dtype(),
        "ci_level": pd.Int8Dtype(),
        "control_size": pd.Int16Dtype(),
        "test_size": pd.Int16Dtype(),
        "control_conversions": pd.Int16Dtype(),
        "test_conversions": pd.Int16Dtype(),
        "control_conversion_rate": pd.Float32Dtype(),
        "test_conversion_rate": pd.Float32Dtype(),
    },
)
df_sig_checks

Set all specified datatypes.


Unnamed: 0,check,p_value,ci_level,control_size,test_size,control_conversions,test_conversions,control_conversion_rate,test_conversion_rate
0,not statistically significant,0.268837,94,6249,6249,325,353,5.200832,5.648904
1,statistically significant,0.268837,70,6249,6249,325,353,5.200832,5.648904


::: {.callout-note title="Notes"}

1. The chi-squared test indicates if the difference between the conversion rate in the test and control cohort is statistically significant at a particular confidence level.
:::

::: {.callout-tip title="Observations"}

1. The top row shows the test of significance for the maximum specified confidence level (in this case 94%).
2. The bottom row shows the maximum confidence level (40%) for which the difference in conversion rates is significant.
:::

## Export to Disk and ML Experiment Tracking

Get the best MLFlow run ID

In [14]:
#|echo: true
best_run_id = df_candidate_mlflow_models.squeeze()["run_id"]

::: {.content-hidden}
Show summary of `DataFrame` with check of statistical significance
:::

In [15]:
#|output: false
ut.summarize_df(df_sig_checks)

Unnamed: 0,column,dtype,missing
0,check,string[python],0
1,p_value,Float32,0
2,ci_level,Int8,0
3,control_size,Int16,0
4,test_size,Int16,0
5,control_conversions,Int16,0
6,test_conversions,Int16,0
7,control_conversion_rate,Float32,0
8,test_conversion_rate,Float32,0


Export to disk and log exported file as MLFlow artifact

In [16]:
#| echo: true
#| output: false
ut.export_and_track(
    os.path.join(
        processed_data_dir,
        f"campaign_analysis__run_"
        f"{best_run_id}__"
        f"infer_month_{infer_month}__"
        f"{datetime.now().strftime('%Y%m%d_%H%M%S')}.parquet.gzip",
    ),
    df_sig_checks,
    f"campaign outcome analysis for inference during {infer_month}",
    best_run_id,
)

Exported campaign outcome analysis for inference during March to file campaign_analysis__run_0415bf0429b747faa0255ba6656c4342__infer_month_March__20230612_215830.parquet.gzip
Logged campaign outcome analysis for inference during March as artifact in file campaign_analysis__run_0415bf0429b747faa0255ba6656c4342__infer_month_March__20230612_215830.parquet.gzip


## Next Step

The next step will clean up all project resources related to MLFlow.