# Regression in Payments A/B Testing

This case study is taken from this [GitHub Repository](https://github.com/matheusfacure/python-causality-handbook/tree/master).

The task is to determine the impact of sending an email that invites people to negotiate their debt. The outcome of interest is the amount of payments made by late customers.

After following steps 1 - 4 from the previous notebook on A/B tests, the data sciente team randomly selects 5,000 customers from your pool of late customers to conduct a randomized test. 

Each customer randomly receives the email or is placed in the control group. The goal is to discover how much additional money the email generates.

**Data dictionary:**

- payments: amount of payments made by the customer
- email: whether the customer received the email (1) or not (0)
- opened: whether the customer opened the email (1) or not (0)
- agreement: whether the customer contacted the collections department to negotiate the debt (1) or not (0), after receiving the email
- credit_limit: the customer's credit line prior to getting late
- risk_score: estimated risk score of the customer prior to receiving the email

In [1]:
import pandas as pd
import statsmodels.formula.api as smf
import scipy.stats as stats

## Check the data

In [2]:
# Read data
df = pd.read_csv('data/collections_email.csv')
df.head()

Unnamed: 0,payments,email,opened,agreement,credit_limit,risk_score
0,740,1,1.0,0.0,2348.49526,0.666752
1,580,1,1.0,1.0,334.111969,0.207395
2,600,1,1.0,1.0,1360.660722,0.550479
3,770,0,0.0,0.0,1531.828576,0.560488
4,660,0,0.0,0.0,979.855647,0.45514


In [3]:
# Descriptive Statistics
df.describe()

Unnamed: 0,payments,email,opened,agreement,credit_limit,risk_score
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,669.672,0.4908,0.2734,0.1608,1194.845188,0.480812
std,103.970065,0.499965,0.445749,0.367383,480.978996,0.100376
min,330.0,0.0,0.0,0.0,193.695573,0.131784
25%,600.0,0.0,0.0,0.0,843.049867,0.414027
50%,670.0,0.0,0.0,0.0,1127.640297,0.486389
75%,730.0,1.0,1.0,0.0,1469.096523,0.552727
max,1140.0,1.0,1.0,1.0,3882.178408,0.773459


## Regression for A/B Testing

What is the effect of sending an email to late customers on the amount of payments made?

$$
\text{Payments} = \beta_0 + \beta_1 \times \text{Email} + \epsilon
$$

In [4]:
model_base = ('payments ~ email')
base = smf.ols(model_base, data=df)
results_base = base.fit(cov_type='HC1')
results_base.summary()

0,1,2,3
Dep. Variable:,payments,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,0.04453
Date:,"Wed, 31 Jul 2024",Prob (F-statistic):,0.833
Time:,18:13:43,Log-Likelihood:,-30315.0
No. Observations:,5000,AIC:,60630.0
Df Residuals:,4998,BIC:,60650.0
Df Model:,1,,
Covariance Type:,HC1,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,669.9764,2.097,319.515,0.000,665.867,674.086
email,-0.6203,2.940,-0.211,0.833,-6.382,5.141

0,1,2,3
Omnibus:,75.007,Durbin-Watson:,1.991
Prob(Omnibus):,0.0,Jarque-Bera (JB):,80.378
Skew:,0.277,Prob(JB):,3.52e-18
Kurtosis:,3.279,Cond. No.,2.6


In [5]:
base_est = results_base.params['email']
base_se = results_base.bse['email']
base_ci = results_base.conf_int().loc['email']
print(f"The effect of the treatment is: {base_est:.2f}")
print(f"Standard error: {base_se:.4f}")
print(f"Confidence interval: [{base_ci[0]:.2f}, {base_ci[1]:.2f}]")

The effect of the treatment is: -0.62
Standard error: 2.9395
Confidence interval: [-6.38, 5.14]


What can you conclude? Does sending an email increase the amount of payments made by late customers? Is this difference statistically significant?

- Sending an email decreases the amount of payments made by \$0.6 on average. However, this difference is not statistically significant.

## Comparing the groups

Are the groups similar in terms of credit limit and risk score?

In [6]:
df['high_risk'] = df['risk_score'] > 0.5
df['high_credit'] = df['credit_limit'] > 1200
payment_grouped = df.groupby(['high_risk', 'high_credit', 'email']).agg({'payments': 'mean'}).unstack()
payment_grouped['diff'] = payment_grouped['payments'][1] - payment_grouped['payments'][0]
payment_grouped.round(0).astype(int)

Unnamed: 0_level_0,Unnamed: 1_level_0,payments,payments,diff
Unnamed: 0_level_1,email,0,1,Unnamed: 4_level_1
high_risk,high_credit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
False,False,612,621,9
False,True,679,694,15
True,False,656,660,4
True,True,739,734,-5


In [7]:
# Compare the average payments between treatment and control groups for low-risk customers
t_results = stats.ttest_ind(df.loc[(df.high_risk == False) & (df.email == 0), 'payments'],
                            df.loc[(df.high_risk == False) & (df.email == 1), 'payments'])
print(f"t-statistic: {t_results.statistic:.4f} with p-value: {t_results.pvalue:.4f}")

t-statistic: -3.2049 with p-value: 0.0014


In [8]:
# Compare the average payments between treatment and control groups for low-credit customers
t_results = stats.ttest_ind(df.loc[(df.high_credit == False) & (df.email == 0), 'payments'],
                            df.loc[(df.high_credit == False) & (df.email == 1), 'payments'])
print(f"t-statistic: {t_results.statistic:.4f} with p-value: {t_results.pvalue:.4f}")

t-statistic: -2.9775 with p-value: 0.0029


In [9]:
# The same as before but better
model_base = ('email ~ high_risk')
base = smf.ols(model_base, data=df)
results_base = base.fit(cov_type='HC1')
results_base.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.5061,0.009,53.308,0.000,0.488,0.525
high_risk[T.True],-0.0344,0.014,-2.422,0.015,-0.062,-0.007


In [10]:
# The same as before but better
model_base = ('email ~ high_credit')
base = smf.ols(model_base, data=df)
results_base = base.fit(cov_type='HC1')
results_base.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.5047,0.009,53.286,0.000,0.486,0.523
high_credit[T.True],-0.0313,0.014,-2.202,0.028,-0.059,-0.003


If a variable is a good predictor of the outcome, it will explain away a lot of its variance.

If risk score and credit limit are strong predictors of payment behavior, controlling for them can help isolate the effect of the email on payments. 

Therefore, if we compare customers with similar risk and credit limits, the variation in payment amounts should be reduced. 

In other words, if risk and credit limit accurately predict payment levels, customers with similar risk and credit limits will have more consistent payment behaviors, resulting in less variability.

In [11]:
model_add = ('payments ~ email + risk_score + credit_limit')
add = smf.ols(model_add, data=df)
results_add = add.fit(cov_type='HC1')
results_add.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,490.8653,10.196,48.141,0.000,470.881,510.850
email,4.4304,2.126,2.084,0.037,0.263,8.598
risk_score,-8.0516,40.488,-0.199,0.842,-87.406,71.303
credit_limit,0.1511,0.008,17.792,0.000,0.134,0.168


In [12]:
add_est = results_add.params['email']
add_se = results_add.bse['email']
add_ci = results_add.conf_int().loc['email']
print(f"The effect of the treatment is: {add_est:.2f}")
print(f"Standard error: {add_se:.4f}")
print(f"Confidence interval: [{add_ci[0]:.2f}, {add_ci[1]:.2f}]")

The effect of the treatment is: 4.43
Standard error: 2.1264
Confidence interval: [0.26, 8.60]


In [13]:
df['risk_dmn'] = df['risk_score'] - df['risk_score'].mean()
df['credit_dmn'] = df['credit_limit'] - df['credit_limit'].mean()
model_int = ('payments ~ email * (risk_dmn + credit_dmn)')
int = smf.ols(model_int, data=df)
results_int = int.fit(cov_type='HC1')
results_int.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,667.5222,1.487,448.838,0.000,664.607,670.437
email,4.4361,2.127,2.085,0.037,0.267,8.606
risk_dmn,8.4719,56.463,0.150,0.881,-102.193,119.137
credit_dmn,0.1474,0.012,12.369,0.000,0.124,0.171
email:risk_dmn,-35.7283,80.615,-0.443,0.658,-193.730,122.273
email:credit_dmn,0.0080,0.017,0.475,0.635,-0.025,0.041


In [14]:
int_est = results_int.params['email']
int_se = results_int.bse['email']
int_ci = results_int.conf_int().loc['email']
print(f"The effect of the treatment is: {int_est:.2f}")
print(f"Standard error: {int_se:.4f}")
print(f"Confidence interval: [{int_ci[0]:.2f}, {int_ci[1]:.2f}]")

The effect of the treatment is: 4.44
Standard error: 2.1273
Confidence interval: [0.27, 8.61]


What happens when we add open and agreement to the model?

In [15]:
model_base2 = ('payments ~ email + risk_score + credit_limit + opened + agreement')
base2 = smf.ols(model_base2, data=df)
results_base2 = base2.fit(cov_type='HC1')
results_base2.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,488.4416,10.173,48.011,0.000,468.502,508.381
email,-1.6095,2.708,-0.594,0.552,-6.917,3.698
risk_score,-2.0929,40.340,-0.052,0.959,-81.158,76.973
credit_limit,0.1507,0.008,17.799,0.000,0.134,0.167
opened,3.9808,3.974,1.002,0.316,-3.808,11.769
agreement,11.7093,4.210,2.781,0.005,3.458,19.961
