---
title: "QTM 385 - Experimental Methods"
subtitle: Assignment 03
---

# Instructions

This assignment covers the last two lectures of the course. As usual, it consists of 10 questions, each worth one point. You can answer the questions in any format you prefer, but I recommend using Jupyter Notebooks and converting the answers to PDF or html, as they are easier to read on Canvas. Please write at least one or two paragraphs for each written question.

If you have any questions about the assignment, feel free to email me at <danilo.freire@emory.edu>.

Good luck!

# Questions 

1. Compare and contrast Type I and Type II errors. In causal inference experiments, why might a researcher be more concerned with one type of error over the other?

Type I error (false positive) occurs when a researcher incorrectly rejects a true null hypothesis, concluding that an effect exists when it does not. Type II error (false negative) happens when a researcher fails to reject a false null hypothesis, missing a real effect. In simpler terms, Type I error leads to a false claim of causation, while Type II error results in overlooking a genuine causal relationship.

In causal inference experiments, the concern over Type I or Type II error depends on the study's context. A researcher studying the effectiveness of a new medical treatment may prioritize minimizing Type I errors to avoid approving an ineffective or harmful drug. Conversely, in social policy experiments, a researcher might be more concerned with Type II errors, as failing to detect a real effect could mean missing an opportunity to implement beneficial policies. Balancing these errors requires careful consideration of statistical power, sample size, and significance thresholds.


2. Explain the concept of randomisation inference and outline its advantages over traditional parametric tests, especially in the context of testing the sharp null hypothesis.

Randomization inference is a nonparametric statistical approach that assesses treatment effects by comparing observed outcomes to a distribution generated under the null hypothesis through repeated random shuffling of treatment assignments. This method relies on the random assignment of treatments rather than assumptions about the underlying data distribution, making it particularly useful for small samples and non-normal data. It is often used to test the sharp null hypothesis, which states that the treatment has no effect for every unit in the study.

Compared to traditional parametric tests, randomization inference has several advantages. It does not require assumptions about normality or homoskedasticity, making it more robust to violations of these conditions. Additionally, it provides exact p-values rather than relying on asymptotic approximations, increasing reliability in small samples. In causal inference, this method ensures that any detected effect is due to the treatment assignment process rather than spurious correlations, making it a powerful tool for experimental analysis.

3. Compare Neyman’s hypothesis testing framework with Fisher’s sharp null hypothesis approach. What are the main advantages and disadvantages of each method in experimental settings?

Neyman’s hypothesis testing framework focuses on comparing two hypotheses: the null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_A\)), using a pre-specified significance level to control Type I error rates. It emphasizes repeated sampling properties, statistical power, and confidence intervals, making it well-suited for decision-making in experimental settings. However, this approach requires assumptions about population distributions and typically estimates average treatment effects (ATE), rather than individual-level effects.

Fisher’s sharp null hypothesis approach, in contrast, tests whether the treatment has absolutely no effect on any individual unit in an experiment. It relies on randomization inference, generating a distribution of possible outcomes under the null by reshuffling treatment assignments. This method provides exact p-values without parametric assumptions but does not easily extend to estimating confidence intervals or alternative hypotheses. While Fisher’s method is more robust in small samples, Neyman’s framework is generally more useful for making broader inferences about population-level effects.

4. Critically evaluate the use of p-values in hypothesis testing. What alternatives are suggested (or implied) in the lectures, and what are the potential benefits of these alternatives?

P-values are widely used in hypothesis testing to measure the strength of evidence against a null hypothesis. However, they are often misinterpreted—many researchers mistakenly believe a p-value represents the probability that the null hypothesis is true. Additionally, p-values are sensitive to sample size, meaning that with large enough data, even trivial effects can appear statistically significant. This reliance on an arbitrary significance threshold (e.g., 0.05) also encourages "p-hacking," where researchers manipulate analyses to achieve significant results, undermining the reliability of findings.

Alternatives to p-values include Bayesian inference, which incorporates prior knowledge and provides a probability estimate for hypotheses, and confidence intervals, which offer a range of plausible values for an effect size rather than a binary decision. Additionally, randomization inference avoids distributional assumptions and provides exact significance tests. These alternatives emphasize effect sizes, uncertainty, and robustness over dichotomous accept/reject conclusions, leading to more nuanced and informative statistical inferences.

5. The code below simulates a dataset. Modify the code so that it adds a new variable called `treat` with 500 treated individuals and 500 control individuals (complete random assignment). Also include a binary covariate called `gender` (0 = male, 1 = female; with equal probability) and update the outcome (`interviews`) by adding 2 points if the individual is female.

```r
## Set seed for reproducibility
set.seed(385)

# Load packages
# install.packages("fabricatr")
# install.packages("randomizr") # if you haven't installed them yet
library(fabricatr)
library(randomizr)

## Simulate data
data <- fabricate(
  N = 1000,
  treat = complete_ra(N = 1000, num_treated = 500),  # Assign 500 treated, 500 control
  gender = rbinom(1000, 1, 0.5),  # Binary gender variable (0 = male, 1 = female)
  interviews = round(rnorm(1000, mean = 10, sd = 2) + 5 * treat + 2 * gender, digits = 0)  # Adjust interviews
)

head(data)
```

6. Using the dataset created in the previous question, estimate the average treatment effect on the outcome `interviews` using the `lm_robust()` function from the `estimatr` package. Interpret the results.

```r
# Load necessary package
# install.packages("estimatr") # Uncomment if not installed
library(estimatr)

# Estimate the treatment effect using lm_robust
ate_model <- lm_robust(interviews ~ treat, data = data)

# Display results
summary(ate_model)
```

7. Using the same dataset, estimate the average treatment effect of the treatment on the outcome `interviews` using randomisation inference. Interpret the results.

```r
# Load necessary package
# install.packages("ri2") # Uncomment if not installed
library(ri2)

# Define the sharp null hypothesis (no effect of treatment)
sharp_null <- 0

# Run randomization inference
ri_results <- conduct_ri(
  formula = interviews ~ treat,
  declaration = declare_ra(N = 1000, num_arms = 2, prob_each = c(0.5, 0.5)),  # Complete randomization setup
  assignment = data$treat,
  outcome = data$interviews,
  sharp_hypothesis = sharp_null
)

# Display results
summary(ri_results)
```

8. Explain how including covariates in an experimental regression model can increase the precision of the treatment effect estimate. Under what conditions might this adjustment lead to biased estimates?

Including covariates in an experimental regression model increases the precision of the treatment effect estimate by reducing unexplained variance in the outcome variable. By controlling for variables that influence the outcome but are unrelated to treatment assignment, the model can isolate the treatment effect more effectively, leading to smaller standard errors and narrower confidence intervals. This is particularly useful in cases where there is residual heterogeneity in outcomes that can be accounted for by observed covariates, improving statistical power.

However, adjusting for covariates can lead to biased estimates if the covariates are post-treatment variables or if they introduce collider bias by conditioning on a variable influenced by both treatment and outcome. Additionally, if treatment assignment is not truly random (e.g., due to imperfect randomization or non-compliance), adjusting for covariates correlated with treatment can introduce endogeneity, leading to incorrect causal inferences. Therefore, while covariate adjustment enhances precision, it must be done cautiously to avoid introducing bias.

9. Simulate a dataset with heterogeneous treatment effects (e.g., the treatment effect is larger for individuals with higher education). Estimate the treatment effect for different subgroups using an interaction term.

```r
# Set seed for reproducibility
set.seed(385)

# Load necessary packages
# install.packages("fabricatr")
# install.packages("estimatr")
library(fabricatr)
library(estimatr)

# Simulate data
data <- fabricate(
  N = 1000,
  treat = complete_ra(N = 1000, num_treated = 500),  # Randomly assign 500 to treatment
  education = rbinom(1000, 1, 0.5),  # Binary education (0 = low, 1 = high)
  interviews = round(rnorm(1000, mean = 10, sd = 2) + 
                     4 * treat + 3 * treat * education, digits = 0)  # Heterogeneous treatment effect
)

# Estimate treatment effects using an interaction term
model <- lm_robust(interviews ~ treat * education, data = data)

# Display results
summary(model)
```

10. Why is the publication of null results important in experimental research? What are the main challenges in publishing null results, and how can the scientific community address these challenges?

Publishing null results is crucial in experimental research because it helps prevent publication bias and ensures a more complete and accurate understanding of the phenomenon being studied. When only studies with statistically significant findings are published, the scientific literature becomes skewed, leading to false-positive results appearing more common than they actually are. Null results also help refine theories, guide future research, and prevent wasteful duplication of studies by informing researchers that certain interventions or treatments may not be effective.

However, several challenges hinder the publication of null results. Journals often have a bias toward significant findings, making it harder for researchers to publish studies that fail to reject the null hypothesis. Additionally, researchers themselves may engage in file-drawer effects, where null results remain unpublished because they are perceived as uninteresting or career-damaging. To address these challenges, the scientific community can encourage pre-registration of studies, promote results-blind peer review, and establish journals dedicated to publishing null findings. These steps would help create a more transparent and reliable body of scientific knowledge.

### I was having trouble running the R code in VS, so below is the python code, additionally, I attached an R file with the correct R code and outputs as well

In [13]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Set seed for reproducibility
np.random.seed(385)

# Simulate dataset with treatment and gender
N = 1000
treat = np.random.choice([0, 1], size=N, p=[0.5, 0.5])  # Random assignment
gender = np.random.choice([0, 1], size=N, p=[0.5, 0.5])  # Binary gender (0 = male, 1 = female)

# Generate outcome variable 'interviews'
interviews = np.round(np.random.normal(loc=10, scale=2, size=N) + 5 * treat + 2 * gender, decimals=0)

# Create DataFrame
data = pd.DataFrame({'treat': treat, 'gender': gender, 'interviews': interviews})

# Estimate Average Treatment Effect (ATE) using OLS regression
model_ate = smf.ols("interviews ~ treat", data=data).fit()
print(model_ate.summary())

# Simulate dataset with heterogeneous treatment effects (education)
education = np.random.choice([0, 1], size=N, p=[0.5, 0.5])  # Binary education (0 = low, 1 = high)
interviews_het = np.round(np.random.normal(loc=10, scale=2, size=N) + 4 * treat + 3 * treat * education, decimals=0)

# Update DataFrame with new education variable and adjusted interviews
data['education'] = education
data['interviews_het'] = interviews_het

# Estimate treatment effects using an interaction term (treat * education)


                            OLS Regression Results                            
Dep. Variable:             interviews   R-squared:                       0.581
Model:                            OLS   Adj. R-squared:                  0.580
Method:                 Least Squares   F-statistic:                     1382.
Date:                Wed, 12 Feb 2025   Prob (F-statistic):          1.36e-190
Time:                        20:06:59   Log-Likelihood:                -2194.0
No. Observations:                1000   AIC:                             4392.
Df Residuals:                     998   BIC:                             4402.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     10.7495      0.098    110.067      0.0