In [43]:
import pandas as pd
import numpy as np
import math
from scipy.stats import ttest_ind
import statsmodels.api as sm

# Exercise 1: Game Fun: Customer Acquisition through Digital Advertising

In [3]:
gf = pd.read_excel('GameFun.xlsx')
gf.head()

Unnamed: 0,id,test,purchase,site,impressions,income,gender,gamer
0,1956,0,0,site1,0,100,1,0
1,45821,1,0,site1,20,70,1,0
2,59690,1,0,site1,22,100,1,0
3,18851,0,0,site1,13,90,1,0
4,60647,1,0,site1,12,60,1,0


1.a compare the averages of the income, gender and gamer variables in the test and control groups. You should also report the % difference in the averages. Compute its statistical significance. [2 pts]

In [4]:
test_avg = gf.loc[gf['test'] == 1, ['income', 'gender', 'gamer']].mean()
control_avg = gf.loc[gf['test'] == 0, ['income', 'gender', 'gamer']].mean()
diff_pct = (test_avg - control_avg) / control_avg * 100
print(diff_pct)

income   -0.412890
gender   -0.095049
gamer    -0.081720
dtype: float64


In [5]:
# Perform t-test
income_t, income_p = ttest_ind(gf.loc[gf['test'] == 1, 'income'], gf.loc[gf['test'] == 0, 'income'])
gender_t, gender_p = ttest_ind(gf.loc[gf['test'] == 1, 'gender'], gf.loc[gf['test'] == 0, 'gender'])
gamer_t, gamer_p = ttest_ind(gf.loc[gf['test'] == 1, 'gamer'], gf.loc[gf['test'] == 0, 'gamer'])

In [6]:
# Print results
print('Income')
print('Test group average:', test_avg['income'])
print('Control group average:', control_avg['income'])
print('% difference:', diff_pct['income'], '%')
print('p-value:', income_p)
print('')
print('Gender')
print('Test group average:', test_avg['gender'])
print('Control group average:', control_avg['gender'])
print('% difference:', diff_pct['gender'], '%')
print('p-value:', gender_p)
print('')
print('Gamer')
print('Test group average:', test_avg['gamer'])
print('Control group average:', control_avg['gamer'])
print('% difference:', diff_pct['gamer'], '%')
print('p-value:', gamer_p)

Income
Test group average: 54.93823644583674
Control group average: 55.166011541356525
% difference: -0.4128902727524984 %
p-value: 0.1283580345995143

Gender
Test group average: 0.647289167348973
Control group average: 0.6479049928911934
% difference: -0.0950487415558184 %
p-value: 0.906033323148871

Gamer
Test group average: 0.6013313872770638
Control group average: 0.6018231997992808
% difference: -0.08172043257571342 %
p-value: 0.9267036713286598


Based on the results, there is no significant different between different groups.

b. Briefly comment on what these metrics tell you about probabilistic equivalence for this experiment. [2 pts]

Based on the results, for all the statistical tests, under 5% significance level, all the p-values of t-tests are over 0.05, which mean that we do not have enough evidence to reject the null hypothesis to say that there is significnat difference in the averages of income, gender and gammer in the test and control groups.
Also, for all the percent difference in the averages, the percent differences are all very small, which are all close to 0, which can also show the probabilistic equivalence for this experiment.

c. If you had run this type of analysis BEFORE executing an experiment and found a large difference between test and control groups, what you should do? [5 pts]

If we had run this type of analysis before executing an experiment and found a large difference between the test and control groups, we should first investigate the cause of the difference and ensure that it is not due to any confounding variables that may have biased the results. To adjust the experiment design to control for these confounding variables, we could use random assignment to ensure that the test and control groups are similar in terms of demographic characteristics and other relevant factors. Alternatively, we could also use matching techniques or stratification to balance the groups on these variables.

It is important to note that if the difference is due to chance or random variation, we should not exclude any participants or adjust the experiment design, as doing so may introduce bias into the analysis. In this case, we should simply report the results as they are and interpret them cautiously.

d. (Open/Ended Question) If you had millions of consumers, your “classic” statistical significance tests would not work (this is because the number of samples is used to compute those classic statistical tests). Do some research online and propose what significance test would you do in case you had “big data”? [5 pts]

To conduct statistical tests in big data, we can try some resampling techniques such as the bootstrap or permutation test.

The bootstrap involves randomly sampling with replacement from the original dataset to generate multiple "bootstrap" samples, and then computing the statistic of interest (e.g. mean, difference in means) for each of these samples. The distribution of the statistic across these bootstrap samples can then be used to estimate the sampling distribution and compute confidence intervals or p-values.

Similarly, the permutation test involves randomly shuffling the labels (e.g. test vs control) of the dataset to generate multiple permutations, and then computing the statistic of interest for each permutation. The distribution of the statistic across these permutations can be used to compute p-values.

2. Evaluate the average purchase rates in the test and control for the following groups. For each comparison, report the average purchase rate for the test, average purchase rate for the control and the absolute difference (not the % difference) between the test and control.

a. Comparison 1: All customers [2 pts]

In [7]:
test_c = gf[gf['test'] == 1]['purchase'].mean()
cont_c = gf[gf['test'] == 0]['purchase'].mean()
diff_c = abs(test_c - cont_c)
print('Comparison 1: All customers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_c:.3f}')
print(f'Average Purchase Rate (Control): {cont_c:.3f}')
print(f'Absolute Difference: {diff_c:.3f}\n')

Comparison 1: All customers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.077
Average Purchase Rate (Control): 0.036
Absolute Difference: 0.041



b. Comparison 2: Male vs Female customers [2 pts]

In [8]:
# Male
test_m = gf[(gf['test'] == 1) & (gf['gender'] == 1)]['purchase'].mean()
cont_m = gf[(gf['test'] == 0) & (gf['gender'] == 1)]['purchase'].mean()
diff_m = abs(test_c - cont_c)
print('Comparison 2.1: Male customers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_m:.3f}')
print(f'Average Purchase Rate (Control): {cont_m:.3f}')
print(f'Absolute Difference: {diff_m:.3f}\n')

Comparison 2.1: Male customers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.075
Average Purchase Rate (Control): 0.037
Absolute Difference: 0.041



In [9]:
# Female
test_f = gf[(gf['test'] == 1) & (gf['gender'] == 0)]['purchase'].mean()
cont_f = gf[(gf['test'] == 0) & (gf['gender'] == 0)]['purchase'].mean()
diff_f = abs(test_f - cont_f)
print('Comparison 2.2: Female customers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_f:.3f}')
print(f'Average Purchase Rate (Control): {cont_f:.3f}')
print(f'Absolute Difference: {diff_f:.3f}\n')

Comparison 2.2: Female customers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.081
Average Purchase Rate (Control): 0.034
Absolute Difference: 0.047



c. Comparison 3: Gamers vs Non-Gamers Customers [2 pts]

In [10]:
# Gamer
test_g = gf[(gf['test'] == 1) & (gf['gamer'] == 1)]['purchase'].mean()
cont_g = gf[(gf['test'] == 0) & (gf['gamer'] == 1)]['purchase'].mean()
diff_g = abs(test_g - cont_g)
print('Comparison 3.1: Gamers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_g:.3f}')
print(f'Average Purchase Rate (Control): {cont_g:.3f}')
print(f'Absolute Difference: {diff_g:.3f}\n')

Comparison 3.1: Gamers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.104
Average Purchase Rate (Control): 0.035
Absolute Difference: 0.069



In [11]:
# Non-Gamer
test_n = gf[(gf['test'] == 1) & (gf['gamer'] == 0)]['purchase'].mean()
cont_n = gf[(gf['test'] == 0) & (gf['gamer'] == 0)]['purchase'].mean()
diff_n = abs(test_n - cont_n)
print('Comparison 3.2: Non-Gamers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_n:.3f}')
print(f'Average Purchase Rate (Control): {cont_n:.3f}')
print(f'Absolute Difference: {diff_n:.3f}\n')

Comparison 3.2: Non-Gamers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.035
Average Purchase Rate (Control): 0.037
Absolute Difference: 0.002



d. Comparison 4: Female Gamers vs Male Gamers [2 pts]

In [12]:
# Female Gamers
test_fg = gf[(gf['test'] == 1) & (gf['gamer'] == 1) & (gf['gender'] == 0)]['purchase'].mean()
cont_fg = gf[(gf['test'] == 0) & (gf['gamer'] == 1) & (gf['gender'] == 0)]['purchase'].mean()
diff_fg = abs(test_fg - cont_fg)
print('Comparison 4.1: Female Gamers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_fg:.3f}')
print(f'Average Purchase Rate (Control): {cont_fg:.3f}')
print(f'Absolute Difference: {diff_fg:.3f}\n')

Comparison 4.1: Female Gamers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.110
Average Purchase Rate (Control): 0.032
Absolute Difference: 0.078



In [13]:
# Male Gamers
test_mg = gf[(gf['test'] == 1) & (gf['gamer'] == 1) & (gf['gender'] == 1)]['purchase'].mean()
cont_mg = gf[(gf['test'] == 0) & (gf['gamer'] == 1) & (gf['gender'] == 1)]['purchase'].mean()
diff_mg = abs(test_mg - cont_mg)
print('Comparison 4.2: Male Gamers')
print(f'Test Group: 1\nControl Group: 0\n')
print(f'Average Purchase Rate (Test): {test_mg:.3f}')
print(f'Average Purchase Rate (Control): {cont_mg:.3f}')
print(f'Absolute Difference: {diff_mg:.3f}\n')

Comparison 4.2: Male Gamers
Test Group: 1
Control Group: 0

Average Purchase Rate (Test): 0.101
Average Purchase Rate (Control): 0.037
Absolute Difference: 0.064



3. Assess the expected revenue in the test vs. control for the following comparisons:

a. Comparison 1: All customers [4 pts]

In [14]:
# Based on historical data, a new customer subscription brings a revenue of $37.5 on average.
rev_rate = 37.5
# For all customers:
n_test = len(gf[gf['test'] == 1])
rev_test = n_test * test_c * rev_rate
n_cont = len(gf[gf['test'] == 0])
rev_cont = n_cont * cont_c * rev_rate
diff_c_rev = abs(rev_test - rev_cont)
print('Comparison on expected revenue: All Customers')
print(f'Expected Revenue (Test): {rev_test:.2f}')
print(f'Expected Revenue (Control): {rev_cont:.2f}')
print(f'Absolute Difference: {diff_c_rev:.2f}\n')

Comparison on expected revenue: All Customers
Expected Revenue (Test): 80925.00
Expected Revenue (Control): 16237.50
Absolute Difference: 64687.50



b. Comparison 4: Female Gamers vs Male Gamers [4 pts]

In [15]:
# For female gamers:
f_test = len(gf[(gf['test'] == 1) & (gf['gamer'] == 1) & (gf['gender'] == 0)])
rev_test_fg = f_test * test_fg * rev_rate
f_cont = len(gf[(gf['test'] == 0) & (gf['gamer'] == 1) & (gf['gender'] == 0)])
rev_cont_fg = f_cont * cont_fg * rev_rate
diff_fg_rev = abs(rev_test_fg - rev_cont_fg)
print('Comparison on expected revenue: Female Gamers')
print(f'Expected Revenue (Test): {rev_test_fg:.2f}')
print(f'Expected Revenue (Control): {rev_cont_fg:.2f}')
print(f'Absolute Difference: {diff_fg_rev:.2f}\n')

Comparison on expected revenue: Female Gamers
Expected Revenue (Test): 24750.00
Expected Revenue (Control): 3037.50
Absolute Difference: 21712.50



In [16]:
# For male gamers:
m_test = len(gf[(gf['test'] == 1) & (gf['gamer'] == 1) & (gf['gender'] == 1)])
rev_test_mg = m_test * test_mg * rev_rate
m_cont = len(gf[(gf['test'] == 0) & (gf['gamer'] == 1) & (gf['gender'] == 1)])
rev_cont_mg = m_cont * cont_mg * rev_rate
diff_mg_rev = abs(rev_test_mg - rev_cont_mg)
print('Comparison on expected revenue: Male Gamers')
print(f'Expected Revenue (Test): {rev_test_mg:.2f}')
print(f'Expected Revenue (Control): {rev_cont_mg:.2f}')
print(f'Absolute Difference: {diff_mg_rev:.2f}\n')

Comparison on expected revenue: Male Gamers
Expected Revenue (Test): 41437.50
Expected Revenue (Control): 6525.00
Absolute Difference: 34912.50



4. Based on your previous answers, provide a brief recommendation to your management team summarizing the expected financial outcome for Game-Fun.
a. Should Game-Fun run this promotion again in the future? If no, explain why. If yes, should Game-Fun offer it to all customers or a targeted segment. [10 pts]

Based on my analysis, Game-Fun should run this promotion again in the future. However, Game-Fun should not offer it to all customers, instead, it should focus on customers who are gamers, especially female gamers.

First, I check the validity of the experiment by conducting statistical tests to compare people in test group and control group. The result shows that the assignment of groups is reasonable and the result of this experiment can be reliable.

Then, let's check the results of the experiment. Generally, for all customers, by advertising, the Average Purchase Rate for test group is 0.041 higher than control group, which results in 64687.50 increase in the expected revenue. This increase shows that it makes sense to do some advertising for Game-Fun in the future since the revenue is greatly increased.

Let's take a further look at the comparisons. For non-gamers, the average purchase rate changes very subtly, which means that this group of people may not be elastic to our advertisements, and they can not contribute a lot to the revenue increase. Therefore, the target customers should be gamers.

For gamers, by advertising, the female gamers see an increase of percentage of 7.8%, which is higher than males, with an increase of 6.4%. As a result, for gamers, female gamers may be more sensitive to the advertisements.

To conclude, according to all of the analysis, since it is a valid experiment, Game-Fun can offer promotions to gamers, especially female gamers.

# Exercise 2: Non-Compliance in Randomized Experiments

In [17]:
sd = pd.read_csv('sommer_deger.csv')
sd.head()

Unnamed: 0,instrument,treatment,outcome
0,0,0,0
1,0,0,0
2,0,0,0
3,0,0,0
4,0,0,0


1. The first data scientist advised that one should compare the survival rate of babies whose mothers were offered Vitamin A shots to the survival rate of babies whose mothers were not offered a Vitamin A shot.

a. What percent of babies whose mothers were offered Vitamin A shots for their babies died? [3 pts]

In [18]:
instru_1 = len(sd[sd['instrument'] == 1])
instru_1_d = len(sd[(sd['instrument'] == 1) & (sd['outcome'] == 1)])
per_1a = round((instru_1_d/instru_1) * 100,4)
print(f'The percent of babies whose mothers were offered Vitamin A shots for their babies died is: {per_1a}%')

The percent of babies whose mothers were offered Vitamin A shots for their babies died is: 0.3804%


b. What percent of babies whose mothers were not offered Vitamin A shots for their babies died? [3 pts]

In [19]:
instru_0 = len(sd[sd['instrument'] == 0])
instru_0_d = len(sd[(sd['instrument'] == 0) & (sd['outcome'] == 1)])
per_1b = round((instru_0_d/instru_0) * 100,4)
print(f'The percent of babies whose mothers were not offered Vitamin A shots for their babies died is: {per_1b}%')

The percent of babies whose mothers were not offered Vitamin A shots for their babies died is: 0.6386%


c. What is the difference in mortality, and under what assumptions is the difference between these two percentages a valid estimate of the causal impact of receiving vitamin A shots on survival? [4 pts]

In [20]:
diff_1c = round(abs(per_1a - per_1b),4)
print(f'The difference in mortality is: {diff_1c}%')

The difference in mortality is: 0.2582%


The assumption is that all mothers who were offered Vitmamin A gave their children the shots. For mothers who have received Vitamin A shots, may be some of them failed to use them, which may make the results invalid because of self-selection problem. 
Moreover, randomization, no interference, no hidden confounders and large sample size are also the assumptions.

2. The second data scientist advised that one should compare the survival rates of babies who received Vitamin A shots to babies who did not receive Vitamin A shots.

a. What percent of babies who received Vitamin A shots died? [3 pts]

In [21]:
treat_1 = len(sd[sd['treatment'] == 1])
treat_1_d = len(sd[(sd['treatment'] == 1) & (sd['outcome'] == 1)])
per_2a = round((treat_1_d/treat_1) * 100,4)
print(f'The percent of babies who received Vitamin A shots died is: {per_2a}%')

The percent of babies who received Vitamin A shots died is: 0.124%


b. What percent of babies who did not receive Vitamin A shots died? [3 pts]

In [22]:
treat_0 = len(sd[sd['treatment'] == 0])
treat_0_d = len(sd[(sd['treatment'] == 0) & (sd['outcome'] == 1)])
per_2b = round((treat_0_d/treat_0) * 100,4)
print(f'The percent of babies who did not receive Vitamin A shots died is: {per_2b}%')

The percent of babies who did not receive Vitamin A shots died is: 0.771%


c. What is the difference in mortality, and under what assumptions is the difference
between these two percentages a valid estimate of the causal impact of receiving vitamin A shots on survival? [4 pts]

In [23]:
diff_2c = round(abs(per_2a - per_2b),4)
print(f'The difference in mortality is: {diff_2c}%')

The difference in mortality is: 0.647%


The assumption is that there is no confounding variables. The shots that all children who received should be from the same source.
Moreover, randomization, no interference, no hidden confounders and large sample size are also the assumptions.

3. The third data scientist advised that one should consider only babies whose mothers were offered Vitamin A shots, and compare babies who received shots to babies who did not receive shots.

a. What percent of babies who received Vitamin A shots died? [3 pts]

In [24]:
it_1 = len(sd[(sd['treatment'] == 1) & (sd['instrument'] == 1)])
it_1_d = len(sd[(sd['treatment'] == 1) & (sd['instrument'] == 1) & (sd['outcome'] == 1)])
per_3a = round((it_1_d/it_1) * 100,4)
print(f'The percent of babies who received Vitamin A shots and whose mothers were offered the shots died is: {per_3a}%')

The percent of babies who received Vitamin A shots and whose mothers were offered the shots died is: 0.124%


b. What percent of babies whose mothers were offered Vitamin A shots, but the
mothers did not accept them, died? [3 pts]

In [25]:
it_0 = len(sd[(sd['treatment'] == 0) & (sd['instrument'] == 1)])
it_0_d = len(sd[(sd['treatment'] == 0) & (sd['instrument'] == 1) & (sd['outcome'] == 1)])
per_3b = round((it_0_d/it_0) * 100,4)
print(f'The percent of babies who did not receive Vitamin A shots and whose mothers were offered the shots died is: {per_3b}%')

The percent of babies who did not receive Vitamin A shots and whose mothers were offered the shots died is: 1.4055%


c. What is the difference in mortality, and under what assumptions is the difference
between these two percentages a valid estimate of the causal impact of receiving
vitamin A shots on survival? [4 pts]

In [26]:
diff_3c = round(abs(per_3a - per_3b),4)
print(f'The difference in mortality is: {diff_3c}%')

The difference in mortality is: 1.2815%


The assumption is that there is only one way for children to get vitamin a shots (the shots given to children should be from their mothers). 
Since all children get shots in this case is from their mother, the source of the shots should be stable and all of their mothers should give their children shots.
Moreover, randomization, no interference, no hidden confounders and large sample size are also the assumptions.

4. The fourth data scientist suggested the following Wald estimator for the effect of
Vitamin A shots on mortality:
(% 𝑜𝑓 𝑏𝑎𝑏𝑖𝑒𝑠 𝑜𝑓𝑓𝑒𝑟𝑒𝑑 𝑠h𝑜𝑡 𝑡h𝑎𝑡 𝑑𝑖𝑒𝑑 − % 𝑜𝑓 𝑏𝑎𝑏𝑖𝑒𝑠 𝑛𝑜𝑡 𝑜𝑓𝑓𝑒𝑟𝑒𝑑 𝑠h𝑜𝑡𝑠 𝑡h𝑎𝑡 𝑑𝑖𝑒𝑑) / (% 𝑜𝑓 𝑏𝑎𝑏𝑖𝑒𝑠 𝑤h𝑜 𝑤𝑒𝑟𝑒 𝑜𝑓𝑓𝑒𝑟𝑒𝑑 𝑎 𝑠h𝑜𝑡 𝑎𝑛𝑑 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑 𝑖𝑡)

a. Compute the above Wald estimate for the given dataset. [2 pts]

In [65]:
n = len(sd[sd['instrument'] == 1])
it_n = len(sd[(sd['instrument'] == 1) & (sd['treatment'] == 1)])
per_4a = round((it_n / n) * 100 , 4)
wald = round((per_1a - per_1b) / per_4a , 4)
print(f'The Wald estimate for this dataset is: {wald}')

The Wald estimate for this dataset is: -0.0032


b. Under what assumptions is this estimate a valid estimate of the causal impact of
vitamin A shots on survival? [4 pts]

The assumption is that there are no other confounding variables apart from the vitamin a given to mothers. In this case, only the vitamin a to mothers variable is controlled. As a result, to make this experiment more reliable, the shots given to children can not have access to other ways of getting Vitamin A shots apart from the Vitamin A shots from their mothers.
Moreover, randomization, no interference, no hidden confounders and large sample size are also the assumptions.

c. What is the standard error for the intent-to-treat estimate recommended by the
first data scientist? What is the standard error for the Wald estimate recommended by the fourth data scientist? [5 pts]

In [58]:
# SE for the intent-to-treat estimate
ins = sd[['instrument']]
treat = sd[['treatment']]
outcome = sd[['outcome']]
model_4_1 = sm.OLS(outcome, sm.add_constant(ins)).fit()
print(model_4_1.summary())
model_4_1.bse

                            OLS Regression Results                            
Dep. Variable:                outcome   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     7.830
Date:                Thu, 13 Apr 2023   Prob (F-statistic):            0.00514
Time:                        23:08:41   Log-Likelihood:                 29040.
No. Observations:               23682   AIC:                        -5.808e+04
Df Residuals:                   23680   BIC:                        -5.806e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0064      0.001      9.683      0.0

const         0.000660
instrument    0.000923
dtype: float64

In [61]:
print(f'The standard error for the intent-to-treat estimate recommended by the first data scientist is: 0.000923')

The standard error for the intent-to-treat estimate recommended by the first data scientist is: 0.000923


In [59]:
model = sm.OLS(treat, sm.add_constant(ins)).fit()
predicted_treat = model.predict(sm.add_constant(ins))

In [60]:
model_4_2 = sm.OLS(outcome, sm.add_constant(predicted_treat)).fit()
print(model_4_2.summary())
model_4_2.bse

                            OLS Regression Results                            
Dep. Variable:                outcome   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     7.830
Date:                Thu, 13 Apr 2023   Prob (F-statistic):            0.00514
Time:                        23:08:54   Log-Likelihood:                 29040.
No. Observations:               23682   AIC:                        -5.808e+04
Df Residuals:                   23680   BIC:                        -5.806e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0064      0.001      9.683      0.0

const    0.000660
0        0.001154
dtype: float64

In [62]:
# SE for the Wald estimate
print(f'The standard error for the Wald estimate recommended by the fourth data scientist is: 0.001154')

The standard error for the Wald estimate recommended by the fourth data scientist is: 0.001154


i. Which one is larger and why? [4 pts]

Based on the result, the SE of the Wald estimate is larger. 
It is because the Wald estimate takes into account the one-sided non-compliance, while the intent-to-treat estimate does not. Non-compliance can introduce bias in the analysis, leading to incorrect estimates of the treatment effect and its SE.
Since there might be one-sided non-compliance in this case, the SE of the Wald estimate is larger than that of the intent-to-treat estimate because it takes into account the additional uncertainty introduced by the non-compliance.

ii. Why might these standard errors be biased? What information would you
ideally want to have to address this bias? [5 pts]

These standard errors might be biased because they assume that the randomization was perfect and that there were no other confounding factors that could have affected the outcome. However, in real-world scenarios, there may be unobserved or unmeasured confounders that can lead to bias in the estimates.

To address this bias, we would ideally want to have information on all possible confounding factors that could have affected the outcome. This could include variables such as maternal health status, socio-economic status, and access to healthcare. We would also want to have information on the reasons for non-compliance, as this could provide insight into whether there were any underlying factors that could have influenced the outcome. Additionally, we would want to have data on the compliance rates within each treatment group, as this could help us to better estimate the correlation between treatment assignment and compliance. Finally, we would want to use appropriate statistical methods to adjust for any confounding factors and estimate the causal effect of the treatment more accurately.

# Exercise 3: Causal Inference in Observational Studies

After reading this paper, the biggest point that resonated with me is that 'These methods(propensity score methods) are intended to eliminate bias, but are not intended to increase precision.' Before I read this paper, when it comes to propensity score methods, all I can think about is that it is a method for us to eliminate bias so that the precision can be increased, but the author mentioned that the elimination of bias and the increase in precision are actually separated two things. Precision refers to the accuracy and consistency of the estimated effect size. It is affected by the sample size and the variance of the outcome variable. For propensity score methods, it can only eliminate bias when the assignment mechanism is truly unconfounded given the observed covariates X, but still, only randomization can elinimate bias due to all covariates. The author recommended blocking and matching on paticular covariates as techniques of increasing precision. Blocking refers to the process of stratifying the sample into subgroups based on a particular covariate or set of covariates that are strongly associated with the treatment assignment and the outcome. By doing this, the treatment and control groups within each stratum will be more similar, which reduces the variance of the outcome variable and increases precision. Matching involves selecting controls for each treated subject that have similar propensity scores. This creates pairs of treated and control subjects that are similar in terms of observed covariates, which again reduces the variance of the outcome variable and increases precision.

In summary, propensity score methods can eliminate bias due to confounding variables, but precision can only be increased by using other techniques such as blocking and matching. However, it is important to keep in mind that the precision of the estimate is also affected by other factors such as the sample size and the variance of the outcome variable.