## Prescriptive Models - A/B Testing


#### Load data and import packages

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('charitable_giving.csv')

In [3]:
df.head()

Unnamed: 0,donation_amount,donation_dummy,control,treatment,match_ratio,ratio1,ratio2,ratio3,red_state_dummy,months_since_last_donation,highest_previous_donation,prior_donations_num
0,0.0,0.0,0.0,1.0,1.0,1,0.0,0.0,1.0,19.0,500.0,32.0
1,0.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,29.0,300.0,22.0
2,0.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,3.0,500.0,22.0
3,0.0,0.0,0.0,1.0,3.0,0,0.0,1.0,0.0,4.0,250.0,29.0
4,0.0,0.0,0.0,1.0,2.0,0,1.0,0.0,0.0,8.0,50.0,17.0


## Part 1. Table 1
***

In [4]:
# Remove null values for the columns we are using in the regression
df1=df.dropna(axis=0,subset=['months_since_last_donation','treatment'])

In [5]:
m1 = sm.OLS(df1[['months_since_last_donation']],sm.add_constant((df1['treatment']))).fit()

In [6]:
m1.summary()

0,1,2,3
Dep. Variable:,months_since_last_donation,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,0.01428
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,0.905
Time:,15:52:50,Log-Likelihood:,-195850.0
No. Observations:,50082,AIC:,391700.0
Df Residuals:,50080,BIC:,391700.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,12.9981,0.094,138.979,0.000,12.815,13.181
treatment,0.0137,0.115,0.119,0.905,-0.211,0.238

0,1,2,3
Omnibus:,8031.352,Durbin-Watson:,1.714
Prob(Omnibus):,0.0,Jarque-Bera (JB):,12471.135
Skew:,1.163,Prob(JB):,0.0
Kurtosis:,3.751,Cond. No.,3.23


#### Part 1, Question 1

In [7]:
# Average months since last donation for treatment:
m1.predict(pd.DataFrame([[1,1]]))[0]

13.011828117982022

In [8]:
# Average months since last donation for control:
m1.predict(pd.DataFrame([[1,0]]))[0]

12.99814226643523

The values align with those in Table 1 in the paper - the means for treatment and control are computed above. However, we can conclude that the effect of months since last donation on treatment is statistically not significant due to the P value which is 0.905, and also has a very small effect.

#### Part 1, Question 2

The p value is 0.905, which means the result is statistically not significant at a 95% confidence level. This confirms that the treatment group was chosen at random, since we see that the months since last donation and the treatment dummy are unrelated.

#### Part 1, Question 3
This part of the paper right at the start is used to show that the treatment and control groups have been chosen at random, and have no correlation with other variables used. This confirms that there would be no bias when we run further A/B tests using this data, and our experiments would have valid results and interpretations.

## Part 2. Response Rate Regressions
***

In [9]:
df2=df.dropna(axis=0,subset=['donation_dummy','treatment'])

In [10]:
m2 = sm.OLS(df2[['donation_dummy']],sm.add_constant((df2['treatment']))).fit()

In [11]:
m2.summary()

0,1,2,3
Dep. Variable:,donation_dummy,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,9.618
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,0.00193
Time:,15:52:50,Log-Likelihood:,26630.0
No. Observations:,50083,AIC:,-53260.0
Df Residuals:,50081,BIC:,-53240.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0179,0.001,16.225,0.000,0.016,0.020
treatment,0.0042,0.001,3.101,0.002,0.002,0.007

0,1,2,3
Omnibus:,59814.28,Durbin-Watson:,1.997
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4317152.727
Skew:,6.74,Prob(JB):,0.0
Kurtosis:,46.44,Cond. No.,3.23


#### Part 2, Question 1
The intercept is the mean rate of response of the control set, and the treatment coefficient indicates the added response when the treatment is present. The coefficients are also statistically significant. We can conclude that giving the treatment does have an effect on whether or not a donation is made.\
Below computed averages match with the first row of table 2a as well.

In [12]:
# Average of donation dummy - Treatment
m2.predict(pd.DataFrame([[1,1]]))[0]

0.02203856749311288

In [13]:
# Average of donation dummy - Control
m2.predict(pd.DataFrame([[1,0]]))[0]

0.01785821298016435

#### Part 2, Question 2

In [14]:
df3=df.dropna(axis=0,subset=['ratio1','ratio2','ratio3','treatment'])

In [15]:
m3 = sm.OLS(df3[['donation_dummy']],sm.add_constant(df3[['ratio1','ratio2','ratio3']])).fit()

In [16]:
m3.summary()

0,1,2,3
Dep. Variable:,donation_dummy,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,3.665
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,0.0118
Time:,15:52:50,Log-Likelihood:,26630.0
No. Observations:,50083,AIC:,-53250.0
Df Residuals:,50079,BIC:,-53220.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0179,0.001,16.225,0.000,0.016,0.020
ratio1,0.0029,0.002,1.661,0.097,-0.001,0.006
ratio2,0.0048,0.002,2.744,0.006,0.001,0.008
ratio3,0.0049,0.002,2.802,0.005,0.001,0.008

0,1,2,3
Omnibus:,59812.754,Durbin-Watson:,1.997
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4316693.217
Skew:,6.74,Prob(JB):,0.0
Kurtosis:,46.438,Cond. No.,4.26


The constant can be interpreted as the average response of the control group when there is no treatment, i.e. when a match ratio isn't present.
The other 3 coefficients are the change in the average response when the treatment (varying match ratios) is applied. The coefficients are also significant at the 90% level of confidence.

#### Part 2, Question 3

In [17]:
# Mean of match ratio 1:1
m3.predict(pd.DataFrame([[1,1,0,0]]))[0]

0.020749124225276225

In [18]:
# Mean of match ratio 2:1
m3.predict(pd.DataFrame([[1,0,1,0]]))[0]

0.02263337524699132

In [19]:
# Mean of match ratio 3:1
m3.predict(pd.DataFrame([[1,0,0,1]]))[0]

0.02273339922724417

#### Part 2, Question 4

The results of the regression as well as Table 2a show that using matched donations doesn't make a very big difference. The coefficients for each of the ratios in the regression and the means do not vary very much, so we can conclude that having higher ratios does not make a very significant difference in donations. These coefficients are also significant - due to their P values. Having a matching rate as a treatment on the other hand, makes a significant difference worth noticing.

## Part 3. Response Rates in Red/ Blue States
***

In [20]:
df4=df.dropna(axis=0,subset=['donation_dummy','treatment','red_state_dummy'])

In [21]:
df4a=df4[df4.red_state_dummy==1]
df4b=df4[df4.red_state_dummy==0]

In [22]:
m4a=sm.OLS(df4a[['donation_dummy']],sm.add_constant(df4a[['treatment']])).fit()

In [23]:
m4a.summary()

0,1,2,3
Dep. Variable:,donation_dummy,R-squared:,0.001
Model:,OLS,Adj. R-squared:,0.001
Method:,Least Squares,F-statistic:,17.24
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,3.31e-05
Time:,15:52:51,Log-Likelihood:,10839.0
No. Observations:,20242,AIC:,-21670.0
Df Residuals:,20240,BIC:,-21660.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0146,0.002,8.398,0.000,0.011,0.018
treatment,0.0088,0.002,4.152,0.000,0.005,0.013

0,1,2,3
Omnibus:,24251.343,Durbin-Watson:,2.002
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1766349.071
Skew:,6.759,Prob(JB):,0.0
Kurtosis:,46.721,Cond. No.,3.25


In [24]:
m4b=sm.OLS(df4b[['donation_dummy']],sm.add_constant(df4b[['treatment']])).fit()

In [25]:
m4b.summary()

0,1,2,3
Dep. Variable:,donation_dummy,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,0.3567
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,0.55
Time:,15:52:51,Log-Likelihood:,15783.0
No. Observations:,29806,AIC:,-31560.0
Df Residuals:,29804,BIC:,-31550.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0200,0.001,14.085,0.000,0.017,0.023
treatment,0.0010,0.002,0.597,0.550,-0.002,0.004

0,1,2,3
Omnibus:,35568.6,Durbin-Watson:,1.996
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2547856.644
Skew:,6.727,Prob(JB):,0.0
Kurtosis:,46.25,Cond. No.,3.21


#### Part 3, Question 1
The coefficient for treatment for the model with the red state dummy is much larger than that for the model with the blue state dummy. The red state coefficient is also statistically significant. We can infer from this that the treatment has a much greater effect in the red states and doesn't have a significant effect in the blue states.

#### Part 3, Question 2
The treatment has a statistically significant effect on the red states, but it is not so in the blue states. We understand this from the coefficients and their p values. The difference in treatment effects between states may not be due to the fact that they are red or blue states, but due to the demographic and how people from these states respond to these treatments themselves.

## Part 4. Response Rates and Donation Amount
***

In [26]:
df5=df.dropna(axis=0,subset=['donation_amount','treatment'])

In [27]:
m5=sm.OLS(df5[['donation_amount']],sm.add_constant(df5[['treatment']])).fit()

In [28]:
m5.summary()

0,1,2,3
Dep. Variable:,donation_amount,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,3.461
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,0.0628
Time:,15:52:51,Log-Likelihood:,-179460.0
No. Observations:,50083,AIC:,358900.0
Df Residuals:,50081,BIC:,358900.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.8133,0.067,12.063,0.000,0.681,0.945
treatment,0.1536,0.083,1.861,0.063,-0.008,0.315

0,1,2,3
Omnibus:,96861.113,Durbin-Watson:,1.987
Prob(Omnibus):,0.0,Jarque-Bera (JB):,240735713.63
Skew:,15.297,Prob(JB):,0.0
Kurtosis:,341.269,Cond. No.,3.23


#### Part 4, Question 1
The constant shows the average of donation amount among the control set, and the coefficient shows the added effect of the treatment. The coefficient is significant at a 90% confidence level but not at a 95% confidence level. This raises the suspicion that the treatment may effect whether or not someone donates, but not the amount they choose to donate. We investigate further in the next part where we run the regression only when the donation dummy is equal to 1.


In [29]:
df6=df5[df5.donation_dummy==1]

In [30]:
m6=sm.OLS(df6[['donation_amount']],sm.add_constant(df6[['treatment']])).fit()

In [31]:
m6.summary()

0,1,2,3
Dep. Variable:,donation_amount,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.3374
Date:,"Thu, 02 Feb 2023",Prob (F-statistic):,0.561
Time:,15:52:51,Log-Likelihood:,-5326.8
No. Observations:,1034,AIC:,10660.0
Df Residuals:,1032,BIC:,10670.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,45.5403,2.423,18.792,0.000,40.785,50.296
treatment,-1.6684,2.872,-0.581,0.561,-7.305,3.968

0,1,2,3
Omnibus:,587.258,Durbin-Watson:,1.838
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5623.279
Skew:,2.464,Prob(JB):,0.0
Kurtosis:,13.307,Cond. No.,3.49


#### Part 4, Question 2
The constant shows the average of donation amount among the control set, and the coefficient shows the added effect of the treatment, but only among the candidates that made a donation. The suspicion in the previous part was valid, since we see that the treatment doesn't really have an effect on the amount of donation made, just whether a donation was made or not. The coefficient in the above regression is not only tiny and negative, but also statistically not significant.