# Experiment Designs 

## Experiment Design:  Power Analysis

- We want to separate signal from noise.

- Two errors that can be made


<div >
<img src = "figures/matrix_2.png" />
</div>


- Type I error = probability of rejecting the null hypothesis, given true effect $=$ 0.

- Power (1- Type II error)= probability of rejecting null hypothesis, given true effect $\ne$ 0.


- Power, in other words, it is the ability to detect an effect given that it exists.

- Power analysis is something we do **before** we run a study.

  
    - Helps you figure out the sample you need to detect a given effect size.
    
    - Or helps you figure out a minimal detectable difference given a set sample size.
    
    - May help you decide whether to run a study.


### Approaches to power calculation


    - Analytical calculations of power
    - Simulation


## Analytical calculations of power

- Formula:
  \begin{align*}
  \text{Power} &= \Phi\left(\frac{|\tau| \sqrt{N}}{2\sigma}- \Phi^{-1}(1- \frac{\alpha}{2})\right)
  \end{align*}

- Components:
  - $\Phi$: standard normal CDF is monotonically increasing
  - $\tau$: the effect size
  - $N$: the sample size
  - $\sigma$: the standard deviation of the outcome
  - $\alpha$: the significance level (typically 0.05)


#### Limitations to analytical power calculations

- Only derived for some test statistics (differences of means)

- Makes specific assumptions about the data-generating process

- Incompatible with more complex designs




## Simulation-based power calculation

- Create dataset and simulate research design.

- Assumptions are necessary for simulation studies, but you make your own.

- For the DeclareDesign approach, see <https://declaredesign.org/>

### Steps


  - Model
  
  - Inquiry
  
  - Data Strategy
     
  - Answer Strategy


In [1]:
require("DeclareDesign")

Loading required package: DeclareDesign

Loading required package: randomizr

Loading required package: fabricatr

Loading required package: estimatr



## Simple Design 

### Model

- Models are theoretical abstractions we use to make sense of the world and organize our understanding of it.
- Models describe the units, conditions, and outcomes that define inquiries. 


<div >
    <img src = "figures/figure-6-1.svg" />
</div>

- To assess many properties of a research design we often need to make the leap from nonparametric models to parametric structural causal models. 
- We need to enumerate beliefs about 
    - effect sizes,
    - specific functional forms, 
    - etc..
    
Since any particular choice for these parameters could be close or far from the truth, we will typically consider a range of plausible values for each model parameter.

One possible parametric model is given by the following:

$$
Y = 1 \times Z + U
$$ 



In [5]:
design<-declare_model(
        N=40,
        U=rnorm(N,sd=1),
        potential_outcomes(Y~ 1*Z +U)) +NULL

In [6]:
head(draw_data(design),10)

Unnamed: 0_level_0,ID,U,Y_Z_0,Y_Z_1
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>
1,1,1.05472849,1.05472849,2.0547285
2,2,0.62283787,0.62283787,1.6228379
3,3,0.81430292,0.81430292,1.8143029
4,4,-0.01589005,-0.01589005,0.98411
5,5,-1.15118771,-1.15118771,-0.1511877
6,6,-0.47855506,-0.47855506,0.5214449
7,7,-0.23671762,-0.23671762,0.7632824
8,8,1.18447559,1.18447559,2.1844756
9,9,-0.70920306,-0.70920306,0.2907969
10,10,0.8654024,0.8654024,1.8654024


### Inquiry

- An inquiry is a question we ask of the world, and in the same way, of our models of the world. 
- If we stipulate a reference model, then our inquiry is a summary of the model. 
- Suppose in some reference model that $Z$ affects $Y$. Inquiries might be: 
    - Descriptive: what is the average level of $Y$ when $Z=1$ , under the model? 
    - Causal: what is the average treatment effect of $Z$ on $Y$ ? 
    - etc.
    
- Here we defind our **estimand**:
    - Estimand: Parameter in the population which is to be estimated in a statistical analysis
    - Estimator: A rule for calculating an estimate of a given quantity based on observed data. Function of the observations, i.e., how observations are put together
    - Estimation: The process of finding an estimate, or approximation.


In [7]:
design<- declare_model(
            N=40,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ 1*Z +U )) +
            declare_inquiry(ATE=Y_Z_1-Y_Z_0)

### Data strategy

- Depending on the design, the data strategy could include decisions about any or all of the following: 
    - sampling: the procedure for selecting which units will be measured 
    - treatment assignment: procedure for allocating treatments to sampled unit
    - measurement: measurement is the procedure for turning information about the sampled units into data.  
 


In [8]:
design<- declare_model(
            N=40,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ 1*Z +U )) +
            declare_inquiry(ATE=Y_Z_1-Y_Z_0) + 
            declare_assignment(Z=complete_ra(N,prob=0.5)) +
            declare_measurement(Y=reveal_outcomes(Y~Z))

head(draw_data(design),10)

Unnamed: 0_level_0,ID,U,Y_Z_0,Y_Z_1,Z,Y
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<int>,<dbl>
1,1,1.8592217,1.8592217,2.8592217,0,1.8592217
2,2,-0.5074531,-0.5074531,0.4925469,1,0.4925469
3,3,1.77076982,1.77076982,2.7707698,0,1.7707698
4,4,2.24330975,2.24330975,3.2433098,1,3.2433098
5,5,0.06768167,0.06768167,1.0676817,1,1.0676817
6,6,0.86720465,0.86720465,1.8672046,0,0.8672046
7,7,0.54163411,0.54163411,1.5416341,0,0.5416341
8,8,-1.30752463,-1.30752463,-0.3075246,1,-0.3075246
9,9,0.39134289,0.39134289,1.3913429,1,1.3913429
10,10,-0.27979642,-0.27979642,0.7202036,0,-0.2797964


### Answer Strategy

-  The answer strategy is what we use to summarize the data produced by the data strategy. 
- Just like the inquiry summarizes a part of the model, the answer strategy summarizes a part of the data. 
- Answer strategies are functions that take in data and return answers, e.g. `lm`

In [13]:
design<- declare_model(
            N=40,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ 1*Z +U )) +
            declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0)) + 
            declare_assignment(Z=complete_ra(N,prob=0.5)) +
            declare_measurement(Y=reveal_outcomes(Y~Z))+
            declare_estimator(Y~Z,.method=lm)

In [14]:
draw_estimates(design)

estimator,term,estimate,std.error,statistic,p.value,conf.low,conf.high
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
estimator,Z,0.7622985,0.3094354,2.463514,0.01839941,0.1358793,1.388718


## Diagnosis

- Once a design is declared in code, diagnosing it is usually the easy part. 
- `diagnose_design` handles almost everything.

In [15]:
diagnose_design(design,sims=30)


Research design diagnosis based on 30 simulations. Diagnosis completed in 1 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

 Design Inquiry Estimator Term N Sims Mean Estimand Mean Estimate   Bias
 design     ATE estimator    Z     30          1.00          1.03   0.03
                                             (0.00)        (0.06) (0.06)
 SD Estimate   RMSE  Power Coverage
        0.38   0.38   0.90     0.90
      (0.04) (0.04) (0.06)   (0.05)

### Redesign

- More often, you’ll vary designs over a parameter with redesign to assess your experimental design. 
- Any quantity that you define in the global environment and use in a declaration step can become a parameter like this and then altered via redesign.

In [21]:


design4<- declare_model(
            N=sample_size,
            U=rnorm(N,sd=1),
            potential_outcomes(Y~ ates*Z +U )) + 
        declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0)) +
        declare_assignment(Z=complete_ra(N,prob=0.5)) + 
        declare_measurement(Y=reveal_outcomes(Y~Z)) +
        declare_estimator(Y~Z,term="Z", .method=lm)

In [22]:
multiple_desing<-redesign(design4,ates=c(0.5,1,1.5),
sample_size=c(20,40,50))

diagnose_design(multiple_desing,sims=30)


Research design diagnosis based on 30 simulations. Diagnosis completed in 4 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

   Design ates sample_size Inquiry Estimator Term N Sims Mean Estimand
 design_1  0.5          20     ATE estimator    Z     30          0.50
                                                                (0.00)
 design_2    1          20     ATE estimator    Z     30          1.00
                                                                (0.00)
 design_3  1.5          20     ATE estimator    Z     30          1.50
                                                                (0.00)
 design_4  0.5          40     ATE estimator    Z     30          0.50
                                                                (0.00)
 design_5    1          40     ATE estimator    Z     30          1.00
                                                                (0.00)
 design_6  1.5          40     ATE estimator   

## Simple discrimination design

- Consider White, Nathan, and Faller (2015), which seeks to measure discrimination against Latinos by election officials through assessing whether election officials respond to emailed requests for information from Latino or White voters.

- Discriminators are defined by their behavior: they would respond to the White voter but not to the Latino voter. 

- We imagine three types of election officials: those who would always respond to the request (regardless of the emailer’s ethnicity), those who would never respond to the request (again regardless of the emailer’s ethnicity), and officials who discriminate against Latinos. 

| Type                      | Yi(Zi=White) | Yi(Zi=Latino) |
|---------------------------|--------------|---------------|
| Always-responder          | 1            | 1             |
| Anti-Latino discriminator | 1            | 0             |
| Never-responder           | 0            | 0             |

The inquiry here is descriptive: *the fraction of the sample that discriminates*: 
$$
\mathbb{E}[\textrm{Type}_i = \textrm{Anti}~\textrm{Latino}~\textrm{discriminator}]
$$

## Simple design consistent with [Christensen et al. (2021)](https://www.nber.org/system/files/working_papers/w29516/w29516.pdf)

<div >
    <img src = "figures/christensen1.png" style="width:600px;height:400px;"/>
</div>

In [27]:
panels <- fabricate(
  listings = add_level(N = 500, listing_fe = runif(N, -.1, .1)),
  days = add_level(N = 2, day_shock = runif(N, -.05, .05), nest = FALSE),
  obs = cross_levels(
    by = join_using(listings, days),
    U =rnorm(N, 0, .1),
    epsilon = listing_fe + day_shock + U
  )
  
  
)

require(tidyverse)
panels  %>% arrange(listings)

Loading required package: tidyverse

── [1mAttaching core tidyverse packages[22m ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.1     
── [1mConflicts[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted

listings,listing_fe,days,day_shock,obs,U,epsilon
<chr>,<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>
001,0.014948195,1,0.009095622,0001,-0.1246741526,-0.100630336
001,0.014948195,2,-0.006322454,0501,0.0987677936,0.107393535
002,-0.019737996,1,0.009095622,0002,0.0129913898,0.002349016
002,-0.019737996,2,-0.006322454,0502,0.0234452738,-0.002615176
003,-0.025902949,1,0.009095622,0003,-0.0704187723,-0.087226099
003,-0.025902949,2,-0.006322454,0503,0.0240060442,-0.008219359
004,-0.063815766,1,0.009095622,0004,0.1118403129,0.057120169
004,-0.063815766,2,-0.006322454,0504,0.0412480287,-0.028890191
005,-0.064356577,1,0.009095622,0005,-0.0811283660,-0.136389321
005,-0.064356577,2,-0.006322454,0505,-0.0508824191,-0.121561450


<div >
    <img src = "figures/christensen2.png" style="height:100px;"/>
</div>

In [29]:
design0<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.6-0.05* Z+epsilon)))+ NULL

head(draw_data(design0) %>% arrange(listings),20)

Unnamed: 0_level_0,listings,listing_fe,days,day_shock,obs,U,epsilon,Y_Z_0,Y_Z_1
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<int>,<int>
1,1,0.01494819,1,0.009095622,1,-0.12467415,-0.100630336,0,0
2,1,0.01494819,2,-0.006322454,501,0.09876779,0.107393535,0,1
3,2,-0.019738,1,0.009095622,2,0.01299139,0.002349016,1,1
4,2,-0.019738,2,-0.006322454,502,0.02344527,-0.002615176,1,0
5,3,-0.02590295,1,0.009095622,3,-0.07041877,-0.087226099,1,1
6,3,-0.02590295,2,-0.006322454,503,0.02400604,-0.008219359,0,0
7,4,-0.06381577,1,0.009095622,4,0.11184031,0.057120169,1,1
8,4,-0.06381577,2,-0.006322454,504,0.04124803,-0.028890191,1,1
9,5,-0.06435658,1,0.009095622,5,-0.08112837,-0.136389321,1,0
10,5,-0.06435658,2,-0.006322454,505,-0.05088242,-0.12156145,0,1


In [34]:
design0<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.6-0.05* Z+epsilon)))+
                      declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0))  + declare_assignment(Z=block_ra(blocks=listings)) + 
                      declare_measurement(Y=reveal_outcomes(Y~Z))+
                      declare_estimator(Y~Z,term="Z", .method=lm, label="OLS") +
                        declare_estimator(Y~Z+factor(listings)+factor(days),term="Z", .method=lm, label="FE") +
                        declare_estimator(Y~Z,term="Z", .method=glm, family="binomial", label="Logit")
diagnose_design(design0,sims=30)
#head(draw_data(design0) %>% arrange(listings),20)


Research design diagnosis based on 30 simulations. Diagnosis completed in 9 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

  Design Inquiry Estimator Term N Sims Mean Estimand Mean Estimate   Bias
 design0     ATE        FE    Z     30         -0.05         -0.05  -0.00
                                              (0.00)        (0.01) (0.00)
 design0     ATE     Logit    Z     30         -0.05         -0.21  -0.16
                                              (0.00)        (0.02) (0.02)
 design0     ATE       OLS    Z     30         -0.05         -0.05  -0.00
                                              (0.00)        (0.01) (0.00)
 SD Estimate   RMSE  Power Coverage
        0.03   0.02   0.33     1.00
      (0.00) (0.00) (0.09)   (0.00)
        0.13   0.20   0.37     0.80
      (0.02) (0.02) (0.09)   (0.07)
        0.03   0.02   0.37     1.00
      (0.00) (0.00) (0.09)   (0.00)

In [35]:
require("margins") # for margins
require(broom) # for tidy

tidy_margins <- function(x) {
  tidy(margins(x, data = x$data), conf.int = TRUE)
}

Loading required package: margins

Loading required package: broom



In [36]:
design1<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.6-0.05* Z+epsilon)))+
  declare_inquiry(ATE=mean(Y_Z_1-Y_Z_0)) +
  declare_assignment(Z=block_ra(blocks=listings)) + 
  declare_measurement(Y=reveal_outcomes(Y~Z))+
  declare_estimator(Y~Z+factor(listings)+factor(days),term="Z", .method=lm, label="FE") +
  declare_estimator(Y~Z,term="Z", .method=lm, label="OLS") +
  declare_estimator(Y~Z,term="Z", .method=glm, family="binomial", label="Logit", .summary = tidy_margins)



diagnose_design(design1,sims=30)


Research design diagnosis based on 30 simulations. Diagnosis completed in 9 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

  Design Inquiry Estimator Term N Sims Mean Estimand Mean Estimate   Bias
 design1     ATE        FE    Z     30         -0.05         -0.05  -0.00
                                              (0.00)        (0.01) (0.00)
 design1     ATE     Logit    Z     30         -0.05         -0.05  -0.00
                                              (0.00)        (0.01) (0.00)
 design1     ATE       OLS    Z     30         -0.05         -0.05  -0.00
                                              (0.00)        (0.01) (0.00)
 SD Estimate   RMSE  Power Coverage
        0.03   0.03   0.47     1.00
      (0.00) (0.00) (0.11)   (0.00)
        0.03   0.03   0.43     1.00
      (0.00) (0.00) (0.10)   (0.00)
        0.03   0.03   0.43     1.00
      (0.00) (0.00) (0.10)   (0.00)

## Simple design consistent with [Christensen et al. (2022)](https://direct.mit.edu/rest/article-abstract/104/4/807/97712/Housing-Discrimination-and-the-Toxics-Exposure-Gap?redirectedFrom=fulltext)

<div >
    <img src = "figures/fig4_ba.png" style="height:500px;"/>
</div>

In [39]:
panels <- fabricate(
  listings = add_level(N = 500, listing_fe = runif(N, -.1, .1), more_than_mile = rbinom(N,size=1,prob=0.3)),
  days = add_level(N = 2, day_shock = runif(N, -.05, .05), nest = FALSE),
  obs = cross_levels(
    by = join_using(listings, days),
    U =rnorm(N, 0, .01),
    epsilon = listing_fe + day_shock + U
  )
)


In [41]:
design0<-declare_model(panels,
                       potential_outcomes(Y ~ rbinom(n = N, size = 1, prob = 0.4-0.1* Z -0.08*Z*more_than_mile+epsilon)))+
  declare_inquiry(CATE_X1 =mean(Y_Z_1[more_than_mile == 1] - Y_Z_0[ more_than_mile== 1])  ,
                  CATE_X0 = mean(Y_Z_1[more_than_mile == 0] - Y_Z_0[more_than_mile == 0]),
                  diff_in_CATEs = CATE_X1- CATE_X0) +
  declare_assignment(Z=block_ra(blocks=listings)) + 
  declare_measurement(Y=reveal_outcomes(Y~Z))+
  declare_estimator(Y~Z + more_than_mile + Z * more_than_mile,term="Z", .method=lm, label="within_mile", inquiry="CATE_X0") +
  declare_estimator(Y ~ Z + more_than_mile + Z * more_than_mile, 
                    .method=lm,
                    term = "Z:more_than_mile", 
                    inquiry = "diff_in_CATEs")


diagnose_design(design0,sims=30)


Research design diagnosis based on 30 simulations. Diagnosis completed in 1 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

  Design       Inquiry   Estimator             Term N Sims Mean Estimand
 design0       CATE_X0 within_mile                Z     30         -0.09
                                                                  (0.00)
 design0       CATE_X1        <NA>             <NA>     30         -0.17
                                                                  (0.01)
 design0 diff_in_CATEs   estimator Z:more_than_mile     30         -0.08
                                                                  (0.01)
 Mean Estimate   Bias SD Estimate   RMSE  Power Coverage
         -0.09   0.00        0.04   0.03   0.73     1.00
        (0.01) (0.00)      (0.00) (0.00) (0.09)   (0.00)
            NA     NA          NA     NA     NA       NA
            NA     NA          NA     NA     NA       NA
         -0.09  -0.01        0.0