# Evaluation of Subscription Renewal Reminder Feature with A/B Testing

## Overview

The goal of this project is to use A/B testing to determine if a subscription renewal reminder email at the end of a trial period is beneficial to the business. Like most subscriptions, this service charges users the renewal fee automatically, if users do not cancel their service before the end of their trial period.

#### Data Used
- experiment_data.csv

This dataset contains data from an experiment on users of a subscription service who had a 7-day trial. In one variant (the "YES" variant), users were sent a renewal reminder email 24 hours before prior to the trial expiration; in the other (the "NO" variant), users did not receive a reminder email. Users were assigned randomly to one of the variants when they signed up for the 7-day trial.

#### Approach
I focused on two KPIs for this exercise: **user acquisition rate** and **expected revenue per account**. The reason for looking at both is because a higher trial sign up rate doesn't necessarily result in more revenue. My goal is to help the business increase their profit, while also taking into account user experience and ethics.

#### Results 
In the end, I recommended the implementation of the subscription renewal reminder feature for users of all financial statuses. Even though the feature only led to an increase in expected revenue from users with fair and good financial statuses, it might not be worth it to sacrifice the consistency in user experience in this specific case, considering the possible backlash over ethical issues as well.

## Libraries

In [10]:
import pandas as pd
import numpy as np
from statsmodels.stats import weightstats as stests

## Data Load

ab_data = pd.read_csv("experiment_data.csv")

For my analysis, I will be considering the group without the reminder email to be the control group, since the reminder feature can be considered as an enhancement that we want to test.

In [15]:
# breaking up the data into control vs. test for hypothesis testing
control = ab_data[ab_data['exp_name']=='NO']
test = ab_data[ab_data['exp_name']=='YES']

In [16]:
# sanity check: making sure that % is roughly 50%
control_prop = control.shape[0]/ab_data.shape[0]
test_prop = test.shape[0]/ab_data.shape[0]
print('% of samples in control: ' + str(control_prop))
print('% of samples in test: ' + str(test_prop))

% of samples in control: 0.4967359119290746
% of samples in test: 0.5032640880709254


## Feature Evaluation

### User Acquisition Metric

#### Initial Hypothesis
The first question I want to explore is: did the reminder feature encourage more people to sign up for a free trial?

My hypothesis is that the feature would increase the trial sign up rate, as it offers users peace of mind, knowing that the likelihood of them forgeting to actively make a decision at the end of the trial period decreases with a reminder email.

In [17]:
# quick look at summary stats
ab_summary = ab_data.pivot_table(values='enter_cc', index='exp_name', aggfunc=np.sum)
ab_summary['total'] = ab_data.pivot_table(values='enter_cc', index='exp_name', aggfunc=lambda x: len(x))
ab_summary['rate'] = ab_data.pivot_table(values='enter_cc', index='exp_name')
ab_summary

Unnamed: 0_level_0,enter_cc,total,rate
exp_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
NO,23824,171281,0.139093
YES,26243,173532,0.151229


The pivot table above shows a likelihood that the reminder feature has slightly increased the trial sign up rate. Let's see if this has a different effect based on a user's financial status.

In [18]:
# quick look at summary stats
ab_summary = ab_data.pivot_table(values='enter_cc', index=['exp_name','fin_status'], aggfunc=np.sum)
ab_summary['total'] = ab_data.pivot_table(values='enter_cc', index=['exp_name','fin_status'], aggfunc=lambda x: len(x))
ab_summary['rate'] = ab_data.pivot_table(values='enter_cc', index=['exp_name','fin_status'])
ab_summary

Unnamed: 0_level_0,Unnamed: 1_level_0,enter_cc,total,rate
exp_name,fin_status,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
NO,Fair,9609,71405,0.13457
NO,Good,9133,49197,0.185641
NO,Poor,5082,50679,0.100278
YES,Fair,10191,71870,0.141798
YES,Good,11053,50640,0.218266
YES,Poor,4999,51022,0.097977


Interestingly, the trial sign up rate for users with a poor financial status seems to have decreased. This is surprising to me, as I would think that the reminder would have the biggest positive impact on users with a poor financial status.

I will perform hypothesis testing to see if the following are statistically significant:
- the overall improvement in trial sign up rate
- the decrease in trial sign up rate for users with a poor financial status

#### Probability Distribution
I assume that the distribution is binomial for both groups because the data is a series of Bernoulli trials, where each trial results in either a trial sign up or no sign up. 

X ~ Bernoulli(p), where p is the probability of signing up for a trial
<br> E(X) = p
<br> Var(X) = p(1-p)

Since we have more than 170k trials in each group, we can use the Central Limit Theorem to aid our analysis. Now, we can assume that the random variable is normally distributed with the following properties.

X ~ Normal(p, sqrt(p * (1-p))/sqrt(n)), where p is defined above and n is the sample size

#### Defining Null and Alternative Hypotheses

Now, we are ready to define the null and alternative hypotheses.

To test overall improvement in trial sign up rate:
- Null: There would be either no change or a decrease in the trial sign up rate due to the reminder (i.e. sample mean of test group minus sample mean of control group is less than or equal to 0)
- Alternative: There would be an increase in trial sign up rate due to the reminder (i.e. sample mean of test group minus sample mean of control group is larger than 0)

To test the decrease in trial sign up rate for users with a poor financial status:
- Null: There would be either no change or an increase in the trial sign up rate for poor users due to the reminder (i.e. sample mean of test group minus sample mean of control group is more than or equal to 0)
- Alternative: There would be a decrease in trial sign up rate for poor users due to the reminder (i.e. sample mean of test group minus sample mean of control group is less than 0)

A basic property of variance is that the variance of the sum or difference of two random independent variables is the sum of the variances. This means that the null and alternative hypotheses will have the same variance, which will be the sum of the variances of the two groups. This is important as it allows us to use a pooled standard error when solving our Z-test later on.

Since the null hypothesis includes an assumption that deviations in both directions are possible, I will use the two-tailed Gaussian test, which assumes the standard deviation of the samples to be the same.

Details of function: https://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.ztest.html

In [19]:
ztest ,pval = stests.ztest(x1 = test['enter_cc'], 
                           x2 = control['enter_cc'], 
                           value=0,
                           alternative='larger')
print("for testing the overall improvement in trial sign up rate, the p-value is " + str(float(pval)))

ztest ,pval = stests.ztest(x1 = test[test['fin_status']=='Poor']['enter_cc'], 
                           x2 = control[control['fin_status']=='Poor']['enter_cc'], 
                           value=0,
                           alternative='smaller')
print("for testing the decrease in trial sign up rate for users with a poor financial status, the p-value is " + str(float(pval)))

for testing the overall improvement in trial sign up rate, the p-value is 2.374481637218152e-24
for testing the decrease in trial sign up rate for users with a poor financial status, the p-value is 0.1097751200649486


Since the p-value is extremely small when testing the overall improvement in trial sign up rate, I will reject the null hypothesis. 

Since the p-value is not small (>0.05) when testing the decrease in trial sign up rate for users with a poor financial status, I will accept the null hypothesis.

This suggests that the improvement in trial sign up rate with the email reminder feature is statistically significant overall.

### Expected Revenue Per Account

Even though the feature seems to have slightly improved the trial sign up rate, it wouldn't necessarily result in more revenue. If there are more new trial users due to the email reminder feature, but only a small proportion remains as paid users after the trial period, this might result in a loss for the company. 

Hence, it is important to see if there is any change to expected revenue per account. I will start by first understanding more about the 'balance' column.

#### Definition of 'balance'

'balance' is defined as the following: 
- How much money the user has paid in cents
- If the user has any balance, this means that they converted to being a paying user
- In other words, if the balance is 0, the user either canceled their trial before the trial period or was refunded after

However, it is unclear to me if the balance value is life-to-date, or if it refers to the amount paid by a user immediately after a trial period. Normally, I would ask a colleague or the data owner. Given that this is a take-home challenge, I will do some analysis to help me make a reasonable assumption.

In [20]:
ab_data['yr'] = pd.DatetimeIndex(ab_data['created_at']).year
ab_data['month'] = pd.DatetimeIndex(ab_data['created_at']).month

In [21]:
ab_data['period'] = ab_data['yr'].astype(str) + '/' + ab_data['month'].astype(str)
ab_data.head()

Unnamed: 0,id,created_at,enter_cc,balance,source_group,fin_status,total,refunded,exp_name,yr,month,period
0,679701,2019-08-05 23:43:29 UTC,0,0,SEO,Fair,,,YES,2019,8,2019/8
1,1922889,2019-07-05 22:46:03 UTC,0,0,SEO,Fair,,,YES,2019,7,2019/7
2,3248979,2019-08-20 18:09:00 UTC,0,0,SEO,Fair,,,YES,2019,8,2019/8
3,703675,2019-07-29 07:23:41 UTC,0,0,SEO,Fair,,,YES,2019,7,2019/7
4,2172213,2019-09-15 09:26:03 UTC,0,0,SEO,Fair,,,YES,2019,9,2019/9


In [22]:
# making that people who didn't start a trial are not direct-to-paid users
ab_data[ab_data['enter_cc']==0]['balance'].value_counts()

0    294746
Name: balance, dtype: int64

In [23]:
ab_data.groupby(['period','balance']).count()['id']

period   balance
2019/10  0           3402
         16200          5
         18000         40
         23400         16
         26000         84
         32400          6
         36000         94
         46800         44
         52000        180
2019/6   0          58264
         16200         51
         18000        718
         23400        196
         26000       1318
         32400        128
         36000       1832
         46800        648
         52000       3408
2019/7   0          80481
         16200         67
         18000       1032
         23400        277
         26000       1869
         32400        238
         36000       2540
         46800        912
         52000       4780
2019/8   0          80955
         16200         82
         18000        963
         23400        261
         26000       1841
         32400        180
         36000       2604
         46800        886
         52000       4862
2019/9   0          78110
         16200       

I am going to assume that balance represents the amount paid by a user immediately after a trial period because of the following reasons:
- The number of distinct values does not increase with time since account creation period.
- If balance is a LTD value, then there should be higher amounts in earlier account creation periods, assuming that the data "snapshot" was taken on the same day.
- The max value for each period is $520, which is 2 months of regular subscription fee. It is highly unlikely that everyone decides to stop using the service after 2 months. It is more likely that there is an option to pre-pay for 2 months of service at the end of a trial, even though it doesn't make sense to do so since there is no discount.

#### Initial Hypothesis

In [24]:
# quick look at summary stats
ab_summary = ab_data.pivot_table(values='balance', index='exp_name', aggfunc=np.sum)
ab_summary['total'] = ab_data.pivot_table(values='balance', index='exp_name', aggfunc=lambda x: len(x))
ab_summary['per_account'] = ab_summary['balance'] / ab_summary['total']
ab_summary

Unnamed: 0_level_0,balance,total,per_account
exp_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
NO,793591600,171281,4633.272809
YES,942051400,173532,5428.689809


The pivot table above shows a likelihood that the reminder feature has increased the expected revenue per account overall. Again, let's see if this changes for users of different financial statuses.

In [25]:
# quick look at summary stats
ab_summary = ab_data.pivot_table(values='balance', index=['exp_name','fin_status'], aggfunc=np.sum)
ab_summary['total'] = ab_data.pivot_table(values='balance', index=['exp_name','fin_status'], aggfunc=lambda x: len(x))
ab_summary['per_account'] = ab_summary['balance'] / ab_summary['total']
ab_summary

Unnamed: 0_level_0,Unnamed: 1_level_0,balance,total,per_account
exp_name,fin_status,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
NO,Fair,342928800,71405,4802.588054
NO,Good,327605800,49197,6659.060512
NO,Poor,123057000,50679,2428.165512
YES,Fair,369643800,71870,5143.228051
YES,Good,457436200,50640,9033.100316
YES,Poor,114971400,51022,2253.369135


It seems like the expected revenue per account has gone down for poor users, which is not surprising given their financial standing. 

I will perform hypothesis testing to see if the following are statistically significant:
- the overall improvement in expected revenue per account
- the decrease in expected revenue per account for users with a poor financial status

#### Probability Distribution

Again, given the large sample size, I can safely assume that the random variable is normally distributed.

#### Defining Null and Alternative Hypotheses

To test overall improvement in expected revenue per account:
- Null: There would be either no change or a decrease in the expected revenue per account due to the reminder (i.e. sample mean of test group minus sample mean of control group is less than or equal to 0)
- Alternative: There would be an increase in the expected revenue per account due to the reminder (i.e. sample mean of test group minus sample mean of control group is larger than 0)

To test the decrease in expected revenue per account for users with a poor financial status:
- Null: There would be either no change or an increase in the expected revenue per account for poor users due to the reminder (i.e. sample mean of test group minus sample mean of control group is more than or equal to 0)
- Alternative: There would be a decrease in the expected revenue per account for poor users due to the reminder (i.e. sample mean of test group minus sample mean of control group is less than 0)

In [26]:
ztest ,pval = stests.ztest(x1 = test['balance'], 
                           x2 = control['balance'], 
                           value=0,
                           alternative='larger')
print("for testing the overall improvement in expected revenue per account, the p-value is " + str(float(pval)))

ztest ,pval = stests.ztest(x1 = test[test['fin_status']=='Poor']['balance'], 
                           x2 = control[control['fin_status']=='Poor']['balance'], 
                           value=0,
                           alternative='smaller')
print("for testing the decrease in expected revenue per account for users with a poor financial status, the p-value is " + str(float(pval)))

for testing the overall improvement in expected revenue per account, the p-value is 1.9112358907209333e-63
for testing the decrease in expected revenue per account for users with a poor financial status, the p-value is 0.000470664315266027


Since both p-values are small (<0.05), I will reject both null hypotheses. This suggests the improvement in expected revenue per account with the email reminder feature is statistically significant for users with fair or good financial statuses. For poor users, there is a decrease in expected revenue per account.

## Recommendation

For users with fair and good financial statuses, it is highly likely that the trial ending reminder would lead to an overall improvement in trial sign up rate and expected revenue per account. For poor users, I observed a statistically significant decrease in expected revenue per account with the trial ending reminder, without a big difference in trial sign up rate. 

It could be tempting to implement the feature only for users with fair and good financial statuses. However, it might not be worth it to sacrifice the consistency in user experience in this specific case, considering the possible backlash over ethical issues as well.

#### Therefore, based only on the analysis I have conducted in this notebook, I would recommend implementing the feature for all users.

## Future Enhancements

#### Improvement to 'Expected Revenue Per Account' Analysis

In my analysis, when looking at expected revenue per account, both of the following groups have a balance of 0:
- users who did not sign up for a trial
- users who signed up for a trial and then churned before converting to paid

One way to improve this analysis is to change the balance of the second group to a negative number, since the company actually loses money on these users by paying for their trials. For instance, for a student user who churns at the end of a trial, the balance could be -4500 since the student fee is $45 per week.

#### Trial-To-Paid Conversion Rate Analysis

Instead of or in addition to looking at the expected revenue per account, I could also look at the trial to paid conversion rate. This analysis focuses specifically on users who have decided to start a trial. It is important to analyze the user retention rate at the end of a trial period, since more trial sign ups may not be beneficial if users are not converting to paid plans afterwards. 

#### User Segmentation

It would also be interesting to see if any of the metrics changes based on the source group. Since users from different source groups have different characteristics, I might find more valuable insights by analyzing based on user groups.

## References

I found the following articles very helpful when conducting this analysis:
- [The Math Behind A/B Testing with Example Python Code](https://towardsdatascience.com/the-math-behind-a-b-testing-with-example-code-part-1-of-2-7be752e1d06f)
- [How to analyze A/B testing result with Python?](https://towardsdatascience.com/how-to-analyze-a-b-testing-result-with-python-600eea37530d)
- [Hypothesis Testing in Machine Learning using Python](https://towardsdatascience.com/hypothesis-testing-in-machine-learning-using-python-a0dc89e169ce)