# Experiment Designs 

<a target="_blank" href="https://colab.research.google.com/github/ignaciomsarmiento/Urban_Slides/tree/main/Lecture14/Notebook_Power.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


## Experiment Design:  Power Analysis

- We want to separate signal from noise.

- Two errors that can be made


<div >
<img src = "figs/matrix_2.png" />
</div>


- Type I error = probability of rejecting the null hypothesis, given true effect $=$ 0.

- Power (1- Type II error)= probability of rejecting null hypothesis, given true effect $\ne$ 0.


- Power, in other words, it is the ability to detect an effect given that it exists.

- Power analysis is something we do **before** we run a study.

  - Helps you figure out the sample you need to detect a given effect size.
    
  - Or helps you figure out a minimal detectable difference given a set sample size.
    
  - May help you decide whether to run a study.


### Approaches to power calculation


    - Analytical calculations of power
    - Simulation


## Analytical calculations of power

- Formula:
  \begin{align*}
  \text{Power} &= \Phi\left(\frac{|\tau| \sqrt{N}}{2\sigma}- \Phi^{-1}(1- \frac{\alpha}{2})\right)
  \end{align*}

- Components:
  - $\Phi$: standard normal CDF is monotonically increasing
  - $\tau$: the effect size
  - $N$: the sample size
  - $\sigma$: the standard deviation of the outcome
  - $\alpha$: the significance level (typically 0.05)

### Power Formula Derivation

**Power = Probability of detecting a true effect**

Under the alternative hypothesis $H_1$, we want to know: "What's the probability our test statistic exceeds the critical value?"



#### Step-by-Step Derivation

**Setup:** Two-sample test comparing treatment vs. control (equal groups of size N/2 each)

1. **Under $H_0$** (no effect):
   $$\text{Test statistic: } T = \frac{\bar{Y}_{\text{treat}} - \bar{Y}_{\text{control}}}{\text{SE}} \sim N(0,1)$$

2. **Under $H_1$** (true effect = $\tau$):
   $$T \sim N\left(\frac{\tau}{\text{SE}}, 1\right)$$
   
   where $\text{SE} = \sqrt{\frac{2\sigma^2}{N/2}} = \frac{2\sigma}{\sqrt{N}}$

3. **Critical value** (two-sided test at level $\alpha$):
   $$c = \Phi^{-1}\left(1 - \frac{\alpha}{2}\right)$$

---

#### Computing Power

4. **Power** = P(Reject H₀ | H₁ is true)  
   $$= P\left(T > c \mid T \sim N\left(\frac{\tau}{\text{SE}}, 1\right)\right)$$

5. **Standardize** the distribution:
   $$= P\left(Z > c - \frac{\tau}{\text{SE}}\right) \text{ where } Z \sim N(0,1)$$

6. **Substitute** $\text{SE} = \frac{2\sigma}{\sqrt{N}}$:
   $$\text{Power} = \Phi\left(\frac{\tau\sqrt{N}}{2\sigma} - \Phi^{-1}\left(1-\frac{\alpha}{2}\right)\right)$$



#### Limitations to analytical power calculations

- Only derived for some test statistics (e.g. differences of means)

- Makes specific assumptions about the data-generating process

- Incompatible with more complex designs




## Simulation-based power calculation

- Create dataset and simulate research design.

- Assumptions are necessary for simulation studies, but you make your own.

- For the DeclareDesign approach, see <https://declaredesign.org/>

### Steps


  - Model
  
  - Inquiry
  
  - Data Strategy
     
  - Answer Strategy


#### Practical simulation checklist

1. Encode plausible effect sizes, variances, and correlations that reflect the institutional context (e.g., block-level randomization, intra-cluster correlation).
2. Simulate the full assignment and measurement process so diagnostics include any imbalance, attrition, or transformation you expect in the field.
3. Summarize the simulated estimands with the same estimator you plan to report; power is the share of simulations that reject $H_0$ at your target $\alpha$.
4. Iterate by scaling sample sizes or tightening measurement (adding covariates, stratifying) until the simulated power reaches the desired threshold (often 0.8).
5. Archive the code and assumptions with the pre-analysis plan so reviewers can audit exactly how design choices were justified.

In [None]:
install.packages("pacman")
library("pacman")

p_load("DeclareDesign")

## Simple Design 

### Model

- Models are theoretical abstractions we use to make sense of the world and organize our understanding of it.
- Models describe the units, conditions, and outcomes that define inquiries. 


<div >
    <img src = "figs/figure-6-1.svg" />
</div>

- To assess many properties of a research design we often need to make the leap from nonparametric models to parametric structural causal models. 
- We need to enumerate beliefs about 
    - effect sizes,
    - specific functional forms, 
    - etc..
    
Since any particular choice for these parameters could be close or far from the truth, we will typically consider a range of plausible values for each model parameter.

One possible parametric model is given by the following:

$$
Y = 1 \times Z + U
$$ 



In [None]:
design<-declare_model(
        N=40,
        U=rnorm(N,sd=1),
        potential_outcomes(Y~ 1*Z +U)) +NULL

In [None]:
head(draw_data(design),10)

### Inquiry

- An inquiry is a question we ask of the world, and in the same way, of our models of the world. 
- If we stipulate a reference model, then our inquiry is a summary of the model. 
- Suppose in some reference model that $Z$ affects $Y$. Inquiries might be: 
    - Descriptive: what is the average level of $Y$ when $Z=1$ , under the model? 
    - Causal: what is the average treatment effect of $Z$ on $Y$ ? 
    - etc.
    
- Here we defind our **estimand**:
    - Estimand: Parameter in the population which is to be estimated in a statistical analysis
    - Estimator: A rule for calculating an estimate of a given quantity based on observed data. Function of the observations, i.e., how observations are put together
    - Estimation: The process of finding an estimate, or approximation.


In [None]:
design<- declare_model(
            N=40,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ 1*Z +U )) +
            declare_inquiry(ATE=Y_Z_1-Y_Z_0)

### Data strategy

- Depending on the design, the data strategy could include decisions about any or all of the following: 
    - sampling: the procedure for selecting which units will be measured 
    - treatment assignment: procedure for allocating treatments to sampled unit
    - measurement: measurement is the procedure for turning information about the sampled units into data.  
 


In [None]:
design<- declare_model(
            N=40,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ 1*Z +U )) +
            declare_inquiry(ATE=Y_Z_1-Y_Z_0) + 
            declare_assignment(Z=complete_ra(N,prob=0.5)) +
            declare_measurement(Y=reveal_outcomes(Y~Z))

head(draw_data(design),10)

### Answer Strategy

-  The answer strategy is what we use to summarize the data produced by the data strategy. 
- Just like the inquiry summarizes a part of the model, the answer strategy summarizes a part of the data. 
- Answer strategies are functions that take in data and return answers, e.g. `lm`

#### Estimand vs. estimator vs. estimate

- **Estimand (ATE)**: the population quantity implied by the model. In notation,\n
  \begin{equation*}\text{ATE} = \mathbb{E}[Y(1)-Y(0)] = \frac{1}{N} \sum_{i=1}^{N} \left( Y_i(1) - Y_i(0) \right).\end{equation*}

  In DeclareDesign we encode this with `declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0))`.\n
- **Estimator (OLS coefficient on $Z$)**: a mapping from observed data to a number. With the specification

  \begin{equation*} Y_i = \beta_0 + \beta_1 Z_i + \varepsilon_i, \end{equation*}
  
  the estimator for the ATE is the closed-form OLS coefficient
  
  \begin{equation*} \hat{\beta}_1 = \frac{\sum_i (Z_i-\bar{Z})(Y_i-\bar{Y})}{\sum_i (Z_i-\bar{Z})^2}, \end{equation*}

  which is implemented through `declare_estimator(Y ~ Z, .method = lm)`.\n
- **Estimate**: the realized value of the estimator in one dataset. In the regression output under the `estimate` column you can see, for example, `1.13` as the sample ATE, accompanied by its standard error, test statistic, and confidence interval.

Together: the estimand is the theoretical ATE, the estimator is the linear model rule that targets it, and the estimate is the numeric result produced by applying that rule to simulated or observed data.


In [None]:
design<- declare_model(
            N=40,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ 1*Z +U )) +
            declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0)) + 
            declare_assignment(Z=complete_ra(N,prob=0.5)) +
            declare_measurement(Y=reveal_outcomes(Y~Z))+
            declare_estimator(Y~Z,.method=lm)

In [None]:
draw_estimates(design)

## Diagnosis

- Once a design is declared in code, diagnosing it is usually the easy part. 
- `diagnose_design` handles almost everything.

In [None]:
diagnose_design(design,sims=30)

### Redesign

- More often, you’ll vary designs over a parameter with redesign to assess your experimental design. 
- Any quantity that you define in the global environment and use in a declaration step can become a parameter like this and then altered via redesign.

In [None]:
# Define default values first
N <- 100  # or whatever default you want
ates <- 1

# Now create your design using those variables
design4 <- declare_model(
            N=N,
            U=rnorm(N,sd=1),
            ates=ates,  # now references the variable ates
            potential_outcomes(Y~ ates*Z +U )) + 
        declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0)) +
        declare_assignment(Z=complete_ra(N,prob=0.5)) + 
        declare_measurement(Y=reveal_outcomes(Y~Z)) +
        declare_estimator(Y~Z,term="Z", .method=lm)




In [None]:
multiple_design <- redesign(design4, ates=c(0.5,1,1.5), N=c(20,40,50))
diagnose_design(multiple_design, sims=30)

## Simple discrimination design

- Consider White, Nathan, and Faller (2015), which seeks to measure discrimination against Latinos by election officials through assessing whether election officials respond to emailed requests for information from Latino or White voters.

- Discriminators are defined by their behavior: they would respond to the White voter but not to the Latino voter. 

- We imagine three types of election officials: those who would always respond to the request (regardless of the emailer’s ethnicity), those who would never respond to the request (again regardless of the emailer’s ethnicity), and officials who discriminate against Latinos. 

| Type                      | Yi(Zi=White) | Yi(Zi=Latino) |
|---------------------------|--------------|---------------|
| Always-responder          | 1            | 1             |
| Anti-Latino discriminator | 1            | 0             |
| Never-responder           | 0            | 0             |

The inquiry here is descriptive: *the fraction of the sample that discriminates*: 
$$
\mathbb{E}[\textrm{Type}_i = \textrm{Anti}~\textrm{Latino}~\textrm{discriminator}]
$$

## Simple design consistent with [Christensen et al. (2021)](https://www.nber.org/system/files/working_papers/w29516/w29516.pdf)

<div >
    <img src = "figs/christensen1.png" style="width:600px;height:400px;"/>
</div>

In [None]:
panels <- fabricate(
  listings = add_level(N = 500, listing_fe = runif(N, -.1, .1)),
  days = add_level(N = 2, day_shock = runif(N, -.05, .05), nest = FALSE),
  obs = cross_levels(
    by = join_using(listings, days),
    U =rnorm(N, 0, .1),
    epsilon = listing_fe + day_shock + U
  )
  
  
)

require(tidyverse)
panels  %>% arrange(listings)

<div >
    <img src = "figs/christensen2.png" style="height:100px;"/>
</div>

In [None]:
design0<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.6-0.05* Z+epsilon)))+ NULL

head(draw_data(design0) %>% arrange(listings),20)

In [None]:
design0<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.6-0.05* Z+epsilon)))+
                      declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0))  + declare_assignment(Z=block_ra(blocks=listings)) + 
                      declare_measurement(Y=reveal_outcomes(Y~Z))+
                      declare_estimator(Y~Z,term="Z", .method=lm, label="OLS") +
                        declare_estimator(Y~Z+factor(listings)+factor(days),term="Z", .method=lm, label="FE") +
                        declare_estimator(Y~Z,term="Z", .method=glm, family="binomial", label="Logit")
diagnose_design(design0,sims=30)
#head(draw_data(design0) %>% arrange(listings),20)

In [None]:
p_load("margins") # for margins
p_load("broom") # for tidy

tidy_margins <- function(x) {
  tidy(margins(x, data = x$data), conf.int = TRUE)
}

In [None]:
design1<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.6-0.05* Z+epsilon)))+
  declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0)) +
  declare_assignment(Z=block_ra(blocks=listings)) + 
  declare_measurement(Y=reveal_outcomes(Y~Z))+
  declare_estimator(Y~Z+factor(listings)+factor(days),term="Z", .method=lm, label="FE") +
  declare_estimator(Y~Z,term="Z", .method=lm, label="OLS") +
  declare_estimator(Y~Z,term="Z", .method=glm, family="binomial", label="Logit", .summary = tidy_margins)



diagnose_design(design1,sims=30)

## Simple design consistent with [Christensen et al. (2022)](https://direct.mit.edu/rest/article-abstract/104/4/807/97712/Housing-Discrimination-and-the-Toxics-Exposure-Gap?redirectedFrom=fulltext)

<div >
    <img src = "figs/fig4_ba.png" style="height:500px;"/>
</div>

In [None]:
panels <- fabricate(
  listings = add_level(N = 500, listing_fe = runif(N, -.1, .1), more_than_mile = rbinom(N,size=1,prob=0.3)),
  days = add_level(N = 2, day_shock = runif(N, -.05, .05), nest = FALSE),
  obs = cross_levels(
    by = join_using(listings, days),
    U =rnorm(N, 0, .01),
    epsilon = listing_fe + day_shock + U
  )
)


In [None]:
design0<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.4-0.1* Z -0.08*Z*more_than_mile+epsilon)))+
  declare_inquiry(CATE_X1 =mean(Y_Z_1[more_than_mile == 1] - Y_Z_0[ more_than_mile== 1])  ,
                  CATE_X0 = mean(Y_Z_1[more_than_mile == 0] - Y_Z_0[more_than_mile == 0]),
                  diff_in_CATEs = CATE_X1- CATE_X0) +
  declare_assignment(Z=block_ra(blocks=listings)) + 
  declare_measurement(Y=reveal_outcomes(Y~Z))+
  declare_estimator(Y~Z + more_than_mile + Z * more_than_mile,term="Z", .method=lm, label="within_mile", inquiry="CATE_X0") +
  declare_estimator(Y ~ Z + more_than_mile + Z * more_than_mile, 
                    .method=lm,
                    term = "Z:more_than_mile", 
                    inquiry = "diff_in_CATEs")


diagnose_design(design0,sims=30)

## Key takeaways for the DeclareDesign workflow

- DeclareDesign forces us to make each component explicit: model, inquiry, data strategy, and answer strategy.
- Analytical formulas are fast, but simulations let us accommodate spillovers, heterogeneous treatment effects, and clustered assignments common in urban economics projects.
- Diagnosing designs early reveals whether we need more units, better measures, or different estimators before committing resources to data collection.
- Reproducing the diagnosis (set seeds, save scripts) keeps the dialogue between analysts and decision-makers focused on evidence rather than guesswork.