# 3. Success indicators

- **3.1. Test group:**
  - **3.1.1. Completion rate:** The proportion of users who reach the final ‘confirm’ step.
  - **3.1.2. Time spent on each step:** The average duration users spend on each step.
  - **3.1.3. Error rates:** Steps where users go back to a previous step.

\
- **3.2. Control group:**
  - **3.2.1. Completion rate**
  - **3.2.2. Time spent on each step**
  - **3.2.3. Error rates**

\
- **3.3. Redesign outcome:** Given the 3 KPIs, how the new design’s performance compare to the old one?

In [1]:
import pandas as pd
import numpy as np
from IPython.display import display, HTML
import scipy.stats as st
from scipy.stats import norm
from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import ttest_ind

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
df_client_profiles = pd.read_csv("/Users/milenko/My Drive (1307mile@gmail.com)/bootcamp/w5/w5w6_project2/data/final/df_client_profiles_final.csv")
df_digital_footprint = pd.read_csv("/Users/milenko/My Drive (1307mile@gmail.com)/bootcamp/w5/w5w6_project2/data/final/df_digital_footprint_final.csv")
df_experiment_roster = pd.read_csv("/Users/milenko/My Drive (1307mile@gmail.com)/bootcamp/w5/w5w6_project2/data/raw/df_final_experiment_clients.txt")

# Dataset exploration

In [3]:
df_digital_footprint.describe(include='all')

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
count,755405.0,755405,755405,755405,755405
unique,,130236,158095,5,629363
top,,722943003_3441581446,875138661_34710212496_881092,start,2017-05-02 10:07:41
freq,,104,104,243945,24
mean,5013632.0,,,,
std,2879152.0,,,,
min,169.0,,,,
25%,2519882.0,,,,
50%,5033441.0,,,,
75%,7496063.0,,,,


In [4]:
df_digital_footprint.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 755405 entries, 0 to 755404
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   client_id     755405 non-null  int64 
 1   visitor_id    755405 non-null  object
 2   visit_id      755405 non-null  object
 3   process_step  755405 non-null  object
 4   date_time     755405 non-null  object
dtypes: int64(1), object(4)
memory usage: 28.8+ MB


In [5]:
# Cast 'date_time' as datetime type.
df_digital_footprint['date_time'] = pd.to_datetime(df_digital_footprint['date_time'])

# Merge

These are 3 merge attempts that didn't work for me:
1. (df_digital_footprint, df_client_profiles, how=left) to preserve all the process_steps for all the users.
- 744641 rows, but 427518 process_steps are with unknown group.
2. (df_digital_footprint, df_experiment_roster, how=left)
- 755405 rows, but 434096 process_steps are with unknown group.
3. df_digital_footprint, df_experiment_roster, inner)
- 317123 rows, but 128522 nulls.

In [6]:
# Inner-joining (df_digital_footprint, df_client_profiles) to only keep the rows with known user_id.
# 317123 rows, but 0 nulls.
df_digital_footprint = pd.merge(df_digital_footprint, df_client_profiles, on='client_id', how='left')
df_digital_footprint

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time,client_tenure_in_years,client_age,gender,number_of_accounts,balance,calls_per_year,logons_per_year,experiment_group,tenure_group,age_group,balance_group
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07,5.0,79.0,U,2.0,189023.86,2.0,8.0,Test,5-10 years,61-80 years,"100,001 - 200k"
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51,5.0,79.0,U,2.0,189023.86,2.0,8.0,Test,5-10 years,61-80 years,"100,001 - 200k"
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22,5.0,79.0,U,2.0,189023.86,2.0,8.0,Test,5-10 years,61-80 years,"100,001 - 200k"
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13,5.0,79.0,U,2.0,189023.86,2.0,8.0,Test,5-10 years,61-80 years,"100,001 - 200k"
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04,5.0,79.0,U,2.0,189023.86,2.0,8.0,Test,5-10 years,61-80 years,"100,001 - 200k"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
755400,9668240,388766751_9038881013,922267647_3096648104_968866,start,2017-05-24 18:46:10,,,,,,,,,,,
755401,9668240,388766751_9038881013,922267647_3096648104_968866,start,2017-05-24 18:45:29,,,,,,,,,,,
755402,9668240,388766751_9038881013,922267647_3096648104_968866,step_1,2017-05-24 18:44:51,,,,,,,,,,,
755403,9668240,388766751_9038881013,922267647_3096648104_968866,start,2017-05-24 18:44:34,,,,,,,,,,,


In [7]:
# Drop duplicated rows. There are no null values.
display(f"'digital_footprint': duplicated rows: {df_digital_footprint.duplicated().sum()}")
display(f"'digital_footprint': null values: {df_digital_footprint.isna().sum().sum()}")

df_digital_footprint = df_digital_footprint.drop_duplicates()
display(f"'digital_footprint': duplicated rows: {df_digital_footprint.duplicated().sum()}")

"'digital_footprint': duplicated rows: 10764"

"'digital_footprint': null values: 4776310"

"'digital_footprint': duplicated rows: 0"

In [8]:
# Sort df_digital_footprint by 'date_time' column in ascending order
df_digital_footprint = df_digital_footprint.sort_values(by='date_time')

# Replace missing values with the values from the preceding row using forward fill
df_digital_footprint = df_digital_footprint.ffill()
df_digital_footprint = df_digital_footprint.bfill()


In [9]:
df_digital_footprint

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time,client_tenure_in_years,client_age,gender,number_of_accounts,balance,calls_per_year,logons_per_year,experiment_group,tenure_group,age_group,balance_group
286566,9088444,242404224_96732670250,423038079_46067236368_400417,step_3,2017-03-15 00:03:03,28.0,67.0,M,3.0,352322.35,4.0,10.0,Control,20-30 years,61-80 years,"300,001 - 400k"
35293,7179755,167765295_97487764427,264484508_5982901710_928530,start,2017-03-15 00:19:28,28.0,67.0,M,3.0,352322.35,4.0,10.0,Control,20-30 years,61-80 years,"300,001 - 400k"
35292,7179755,167765295_97487764427,264484508_5982901710_928530,step_1,2017-03-15 00:20:50,28.0,67.0,M,3.0,352322.35,4.0,10.0,Control,20-30 years,61-80 years,"300,001 - 400k"
35291,7179755,167765295_97487764427,264484508_5982901710_928530,step_2,2017-03-15 00:22:52,28.0,67.0,M,3.0,352322.35,4.0,10.0,Control,20-30 years,61-80 years,"300,001 - 400k"
35290,7179755,167765295_97487764427,264484508_5982901710_928530,step_3,2017-03-15 00:23:47,28.0,67.0,M,3.0,352322.35,4.0,10.0,Control,20-30 years,61-80 years,"300,001 - 400k"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
519273,4159912,169591677_18042085072,358114812_31313949217_790854,start,2017-06-20 23:58:40,13.0,55.0,F,2.0,174412.75,8.0,14.0,Control,10-20 years,41-60 years,"100,001 - 200k"
519272,4159912,169591677_18042085072,358114812_31313949217_790854,step_1,2017-06-20 23:59:00,13.0,55.0,F,2.0,174412.75,8.0,14.0,Control,10-20 years,41-60 years,"100,001 - 200k"
519271,4159912,169591677_18042085072,358114812_31313949217_790854,step_2,2017-06-20 23:59:23,13.0,55.0,F,2.0,174412.75,8.0,14.0,Control,10-20 years,41-60 years,"100,001 - 200k"
527151,5640210,854970757_75895258102,959123368_90077952691_695747,start,2017-06-20 23:59:53,13.0,55.0,F,2.0,174412.75,8.0,14.0,Control,10-20 years,41-60 years,"100,001 - 200k"


## 3.1.1. completion rate

In [10]:
# Filter for the Test group to measure the proportion of users who reach the final 'confirm' step
df_test_group = df_digital_footprint[df_digital_footprint['experiment_group'] == 'Test']

completion_rate_test = (df_test_group['process_step'] == 'confirm').sum() / len(df_test_group)
print("The proportion of Test group reaching the final 'confirm' step is: ", f"{completion_rate_test:.2%}")

The proportion of Test group reaching the final 'confirm' step is:  14.19%


In [11]:
# How this compares to other steps?
steps = df_digital_footprint['process_step'].unique()
steps

for step in steps:
    proportion = (df_test_group['process_step'] == step).sum() / len(df_test_group)
    print(f"{step}: {proportion:.2%}")

step_3: 14.68%
start: 31.53%
step_1: 22.00%
step_2: 17.60%
confirm: 14.19%


## 3.1.2. time spent on each step

In [12]:
# Initialize a dictionary to store mean duration strings for the Test group
test_durations = {}

# Define pairs of steps to calculate the transition duration between
step_pairs = [
    ('start', 'step_1'),
    ('step_1', 'step_2'),
    ('step_2', 'step_3'),
    ('step_3', 'confirm'),
]

# Loop over each pair of steps for the Test group
for start_step, end_step in step_pairs:
    # Filter for relevant steps
    relevant_steps = df_test_group[df_test_group['process_step'].isin([start_step, end_step])]
    
    # Sort by visit_id and date_time to ensure chronological order
    relevant_steps_sorted = relevant_steps.sort_values(by=['visit_id', 'date_time'])
    
    # Calculate the time difference between steps for each visit_id
    relevant_steps_sorted['time_diff'] = relevant_steps_sorted.groupby('visit_id')['date_time'].diff()
    
    # Select end_step rows to use the calculated time differences as durations from start_step to end_step
    end_step_durations = relevant_steps_sorted[relevant_steps_sorted['process_step'] == end_step]
    
    # Calculate the mean duration from start_step to end_step
    test_mean_duration = end_step_durations['time_diff'].mean()
    
    # Format the mean duration string to include days, hours, minutes, and seconds
    test_mean_duration_str = f"{test_mean_duration.days}d {test_mean_duration.components.hours}h {test_mean_duration.components.minutes}m {test_mean_duration.components.seconds}s"
    
    # Store the result in the dictionary keyed by the step pair
    test_durations[(start_step, end_step)] = test_mean_duration_str

## 3.1.3. error rates

In [13]:
# Sort by visit_id and date_time to ensure chronological order
df_test_group = df_test_group.sort_values(by=['client_id', 'visit_id', 'date_time'])

# Assign step orders
step_order = {'start': 0,
               'step_1': 1,
               'step_2': 2,
               'step_3': 3,
               'confirm': 4}


# Detect backward movements
df_test_group['step_order'] = df_test_group['process_step'].map(step_order)

# Calculate the difference in step order to identify backward movements
df_test_group['step_diff'] = df_test_group.groupby(['client_id', 'visit_id'])['step_order'].diff()

# A negative step_diff indicates a backward movement
df_test_group['is_backward'] = df_test_group['step_diff'] < 0

# Proportion of sessions with at least one backward movement
error_sessions_test = df_test_group[df_test_group['is_backward']].groupby(['client_id', 'visit_id']).ngroups
total_sessions_test = df_test_group.groupby(['client_id', 'visit_id']).ngroups
test_error_rate = error_sessions_test / total_sessions_test

print(f"Proportion of Test group sessions with errors: {test_error_rate:.2%}")

# Count of backward movements by step
errors_by_step = df_test_group[df_test_group['is_backward']]['process_step'].value_counts()
errors_by_step

Proportion of Test group sessions with errors: 20.56%


process_step
start     19848
step_1     7162
step_2     4084
step_3       58
Name: count, dtype: int64

# 3.2. Control group: success indicators

## 3.2.1. completion rate

In [14]:
# Filter for the Control group to measure the proportion of users who reach the final 'confirm' step
df_control_group = df_digital_footprint[df_digital_footprint['experiment_group'] == 'Control']

completion_rate_control = (df_control_group['process_step'] == 'confirm').sum() / len(df_control_group)
print("The proportion of Control group reaching the final 'confirm' step is: ", f"{completion_rate_control:.2%}")

The proportion of Control group reaching the final 'confirm' step is:  13.32%


In [15]:
# How this compares to other steps?
steps = df_digital_footprint['process_step'].unique()
steps

for step in steps:
    proportion = (df_control_group['process_step'] == step).sum() / len(df_control_group)
    print(f"{step}: {proportion:.2%}")

step_3: 15.31%
start: 31.58%
step_1: 21.72%
step_2: 18.06%
confirm: 13.32%


## 3.2.2. time spent on each step

In [16]:
# Initialize a dictionary to store mean duration strings for the Control group
control_durations = {}

# Define pairs of steps to calculate the transition duration between, as before
step_pairs = [
    ('start', 'step_1'),
    ('step_1', 'step_2'),
    ('step_2', 'step_3'),
    ('step_3', 'confirm'),
]

# Loop over each pair of steps for the Control group
for start_step, end_step in step_pairs:
    # Filter for relevant steps
    relevant_steps = df_control_group[df_control_group['process_step'].isin([start_step, end_step])]
    
    # Sort by visit_id and date_time to ensure chronological order
    relevant_steps_sorted = relevant_steps.sort_values(by=['visit_id', 'date_time'])
    
    # Calculate the time difference between steps for each visit_id
    relevant_steps_sorted['time_diff'] = relevant_steps_sorted.groupby('visit_id')['date_time'].diff()
    
    # Select end_step rows to use the calculated time differences as durations from start_step to end_step
    end_step_durations = relevant_steps_sorted[relevant_steps_sorted['process_step'] == end_step]
    
    # Calculate the mean duration from start_step to end_step
    control_mean_duration = end_step_durations['time_diff'].mean()
    
    # Format the mean duration string to include days, hours, minutes, and seconds
    control_mean_duration_str = f"{control_mean_duration.days}d {control_mean_duration.components.hours}h {control_mean_duration.components.minutes}m {control_mean_duration.components.seconds}s"
    
    # Store the result in the dictionary keyed by the step pair
    control_durations[(start_step, end_step)] = control_mean_duration_str

## 3.2.3. error rates

In [17]:
# Sort by visit_id and date_time to ensure chronological order
df_control_group = df_control_group.sort_values(by=['client_id', 'visit_id', 'date_time'])

# Assign step orders
step_order = {'start': 0,
               'step_1': 1,
               'step_2': 2,
               'step_3': 3,
               'confirm': 4}


# Detect backward movements
df_control_group['step_order'] = df_control_group['process_step'].map(step_order)

# Calculate the difference in step order to identify backward movements
df_control_group['step_diff'] = df_control_group.groupby(['client_id', 'visit_id'])['step_order'].diff()

# A negative step_diff indicates a backward movement
df_control_group['is_backward'] = df_control_group['step_diff'] < 0

# Proportion of sessions with at least one backward movement
# Similarly, adjust for the control group
error_sessions_control = df_control_group[df_control_group['is_backward']].groupby(['client_id', 'visit_id']).ngroups
total_sessions_control = df_control_group.groupby(['client_id', 'visit_id']).ngroups
control_error_rate = error_sessions_control / total_sessions_control

print(f"Proportion of Control group sessions with errors: {control_error_rate:.2%}")

# Count of backward movements by step
errors_by_step = df_control_group[df_control_group['is_backward']]['process_step'].value_counts()
errors_by_step

Proportion of Control group sessions with errors: 18.39%


process_step
start     15287
step_1     6567
step_2     4397
step_3      121
Name: count, dtype: int64

# 3.3. Redesign outcome

In [18]:
# Test group has higher completion rate
print("The proportion of Test group reaching the final 'confirm' step: ", f"{completion_rate_test:.2%}")
print("The proportion of Control group reaching the 'confirm' step is: ", f"{completion_rate_control:.2%}\n")

# Control group has lower average time spent between steps
for start_step, end_step in step_pairs:
    test_str = test_durations.get((start_step, end_step), "No data")
    control_str = control_durations.get((start_step, end_step), "No data")
    print(f"Transition from '{start_step}' to '{end_step}':")
    print(f"  Test group: {test_str}")
    print(f"  Control g.: {control_str}\n")

# Control group has lower proportion of sessions with errors.
print(f"Proportion of Test group sessions with errors: {test_error_rate:.2%}")
print(f"Proportion of Control g. sessions with errors: {control_error_rate:.2%}")

The proportion of Test group reaching the final 'confirm' step:  14.19%
The proportion of Control group reaching the 'confirm' step is:  13.32%

Transition from 'start' to 'step_1':
  Test group: 0d 0h 0m 52s
  Control g.: 0d 0h 0m 53s

Transition from 'step_1' to 'step_2':
  Test group: 0d 0h 1m 1s
  Control g.: 0d 0h 0m 56s

Transition from 'step_2' to 'step_3':
  Test group: 0d 0h 1m 44s
  Control g.: 0d 0h 1m 44s

Transition from 'step_3' to 'confirm':
  Test group: 0d 0h 2m 13s
  Control g.: 0d 0h 2m 16s

Proportion of Test group sessions with errors: 20.56%
Proportion of Control g. sessions with errors: 18.39%


# 4. Hypotheses testing

## 4.1. completion rate
Test group had a higher completion rate compared to the Control group.\
H0: This difference is not statistically significant.\
H1: The difference is statistically significant.

In [19]:
# Calculate completion rates and number of observations
x1, x2 = (df_test_group['process_step'] == 'confirm').sum(), (df_control_group['process_step'] == 'confirm').sum()
n1, n2 = len(df_test_group), len(df_control_group)

# Part 1: Two-sided test for completion rate comparison
stat, pval = proportions_ztest([x1, x2], [n1, n2], alternative='two-sided')
print("Part 1 - Completion Rate Comparison")
print(f"Z-statistic: {stat:.2f}, P-value: {pval:.4f}")

# Interpretation for Part 1
alpha = 0.05
if pval < alpha:
    print("Reject H0: Significant difference in completion rates between the Test and Control groups.")
else:
    print("Fail to reject H0: No significant difference in completion rates.")

# Part 2: One-sided test for comparing against Control + 5%
# Adjusted completion rate for Control by adding 5%
control_completion_rate_adjusted = (x2 + 0.05 * n2) / n2

# Comparing Test completion rate directly to adjusted Control rate
print("\nPart 2 - Completion Rate with Cost-Effectiveness Threshold")
completion_rate_test = x1 / n1
if completion_rate_test > control_completion_rate_adjusted and pval < alpha:
    print(f"Reject H0: Test group's completion rate exceeds Control's by >5%, indicating cost-effectiveness.")
else:
    print("Fail to reject H0: Test group's completion rate does not exceed Control's by >5%, indicating lack of cost-effectiveness.")

Part 1 - Completion Rate Comparison
Z-statistic: 10.87, P-value: 0.0000
Reject H0: Significant difference in completion rates between the Test and Control groups.

Part 2 - Completion Rate with Cost-Effectiveness Threshold
Fail to reject H0: Test group's completion rate does not exceed Control's by >5%, indicating lack of cost-effectiveness.


# 4.2. error rate
Test group had a higher error rate 26,8% compared to the Control group 20.22%.\
H0: This difference is not statistically significant.\
H1: The difference is statistically significant.

In [20]:
# Calculate error rates and number of observations
x1, x2 = error_sessions_test, error_sessions_control
n1, n2 = total_sessions_test, total_sessions_control

# Part 1: Two-sided test for error rate comparison
stat, pval = proportions_ztest([x1, x2], [n1, n2], alternative='two-sided')
print("Error Rate Comparison Between Test and Control Groups")
print(f"Z-statistic: {stat:.2f}, P-value: {pval:.4f}")

# Interpretation
alpha = 0.05
if pval < alpha:
    print("Reject H0: Significant difference in error rates between the Test and Control groups.")
else:
    print("Fail to reject H0: No significant difference in error rates.")

Error Rate Comparison Between Test and Control Groups
Z-statistic: 12.25, P-value: 0.0000
Reject H0: Significant difference in error rates between the Test and Control groups.


## 4.3. t-test of various means between the groups

In [21]:
def compare_groups(df, column_name, group_col='experiment_group', test_label='Test', control_label='Control'):
    # Data filtering
    test_group = df[df[group_col] == test_label][column_name]
    control_group = df[df[group_col] == control_label][column_name]
    
    # Calculate means
    test_mean = test_group.mean()
    control_mean = control_group.mean()
    
    # Perform t-test
    stat, p = ttest_ind(test_group, control_group)
    
    # Print results
    print(f"\nComparison based on {column_name}:")
    print(f"The average {column_name} of the Test users: {test_mean:.3f}")
    print(f"The average {column_name} of the Control users: {control_mean:.3f}")
    print(f"{column_name.capitalize()} comparison between groups - T-statistic: {stat:.3f}, P-value: {p:.3f}")
    
    # Interpretation
    if p < 0.05:
        print(f"With the P-value {p:.3f}, we reject the H0.")
        print(f"The average {column_name} of Test group users is significantly different than that of Control group.\n")
    else:
        print(f"With the P-value {p:.3f}, we fail to reject the H0.")
        print(f"The average {column_name} of Test group users is not significantly different than that of Control group.\n")

# Example usage
compare_groups(df_client_profiles, 'client_age')
compare_groups(df_client_profiles, 'client_tenure_in_years')
compare_groups(df_client_profiles, 'number_of_accounts')
compare_groups(df_client_profiles, 'balance')
compare_groups(df_client_profiles, 'calls_per_year')
compare_groups(df_client_profiles, 'logons_per_year')


Comparison based on client_age:
The average client_age of the Test users: 46.885
The average client_age of the Control users: 47.257
Client_age comparison between groups - T-statistic: -2.683, P-value: 0.007
With the P-value 0.007, we reject the H0.
The average client_age of Test group users is significantly different than that of Control group.


Comparison based on client_tenure_in_years:
The average client_tenure_in_years of the Test users: 11.983
The average client_tenure_in_years of the Control users: 12.088
Client_tenure_in_years comparison between groups - T-statistic: -1.715, P-value: 0.086
With the P-value 0.086, we fail to reject the H0.
The average client_tenure_in_years of Test group users is not significantly different than that of Control group.


Comparison based on number_of_accounts:
The average number_of_accounts of the Test users: 2.250
The average number_of_accounts of the Control users: 2.260
Number_of_accounts comparison between groups - T-statistic: -2.102, P-va

# 5. Experiment evaluation

In [22]:
# There's 23% difference in size between the experiment groups, which seems a bit off-balance.
# The experiment duration of 97 days seems sufficient.

print(f"Test group size: {len(df_test_group)}")
print(f"Control group size: {len(df_control_group)}")
balance = abs(len(df_test_group) - len(df_control_group)) / ((len(df_test_group) + len(df_control_group)) / 2)
print(f"Balance between groups: {balance:.2f}\n")

experiment_start = df_digital_footprint['date_time'].min()
experiment_end = df_digital_footprint['date_time'].max()
experiment_duration = experiment_end - experiment_start

print(f"Experiment duration: {experiment_duration.days} days")

Test group size: 380371
Control group size: 364270
Balance between groups: 0.04

Experiment duration: 97 days
