## A/B Test Forecast

This forecast is built for SaaS products with free trial periods. It will predict whether or not there will be a statistically significant difference in conversion rate between different groups of an A/B test.

The forecast first gathers historical data from the A/B test and analyzes how the conversion rate is trending for that historical data. Once the predicted conversion rate for the historical data is complete, the assumed conversion rates are applied to each group in the A/B test for future data.

**Users have the ability to select different inputs for future test data, including...**

1. Days the A/B test runs in the future
2. How many entries are added to the A/B test each day
3. Adjustments in conversion rate performance (+/- X%)

**The forecast also includes the following assumptions...**

1. There can be up to 4 different groups in the A/B test
2. Forecast allows up to 6 weeks of historical data

In [1]:
# Import packages
import pandas as pd
import numpy as np


In [2]:
# Upload historical data from test
raw = pd.read_csv('/Users/Matt/Desktop/Programming/Python/AB_TestForecast/test_data.csv', na_values=' ')


In [5]:
# Convert date fields from object to date format
raw['TRIAL_START_DATE'] = pd.to_datetime(raw['TRIAL_START_DATE'])
raw['CONVERT_DATE'] = pd.to_datetime(raw['CONVERT_DATE'])


In [6]:
# Preview data
raw.info()
raw.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25483 entries, 0 to 25482
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   UID               25483 non-null  int64         
 1   GROUP_NAME        25483 non-null  object        
 2   TRIAL_START_DATE  25483 non-null  datetime64[ns]
 3   CONVERT_DATE      6297 non-null   datetime64[ns]
dtypes: datetime64[ns](2), int64(1), object(1)
memory usage: 796.5+ KB


Unnamed: 0,UID,GROUP_NAME,TRIAL_START_DATE,CONVERT_DATE
0,1114728035448,4947a_Control,2013-09-02,2013-09-05
1,1114567510056,4947b_Limit10,2013-08-18,2013-08-20
2,1114497359375,4947a_Control,2013-08-14,2013-09-01
3,1114329428592,4947a_Control,2013-07-29,2013-07-31
4,1114539386764,4947a_Control,2013-08-16,NaT


In [8]:
# Create variable for first and last dates of the A/B Test
test_start = raw.TRIAL_START_DATE.min()
test_latest = raw.TRIAL_START_DATE.max()


In [None]:
# Calculate difference between the test start and entrant date
# (raw['TRIAL_START_DATE'] - trial_start) results in timedelta, so need to use dt.days to convert to int format
raw['trialDiff'] = (raw['TRIAL_START_DATE'] - trial_start).dt.days


In [None]:

raw.info()
raw.head()


In [None]:
# Create columns for Converting on days 1 - 60
# Calculate difference between the test start and entrant date
raw['convertDiff'] = (raw['convertDate'] - raw['TRIAL_START_DATE']).dt.days


In [None]:
# Create function to create columns for converting on days 1 - 60
def create_conversion_columns():
    i = 1
    while i <= 60:
        raw['convert'+str(i)] = raw['convertDiff'].apply(lambda row: 1 if row <= i else 0)
        i += 1


In [None]:
# run function to add conversion columns
create_conversion_columns()


In [None]:
raw.head(100)