-
Notifications
You must be signed in to change notification settings - Fork 1
/
06_lab.qmd
207 lines (145 loc) · 4.83 KB
/
06_lab.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: "Lab 6: External Validity"
subtitle: "**Due:** Monday, February 26, 11:59 PM"
author:
- "**Name:** Your name here"
- "**Mac ID:** The first half of your Mac email address"
format:
pdf:
documentclass: article
fontsize: 12pt
urlcolor: blue
highlight-style: nord
number-sections: true
geometry:
- left=1in
- right=1in
- top=1in
- bottom=1in
header-includes:
- \usepackage{setspace}
- \doublespacing
- \usepackage{float}
- \floatplacement{figure}{t}
- \floatplacement{table}{t}
- \usepackage{flafter}
- \usepackage{ragged2e}
- \usepackage{booktabs}
- \usepackage{amsmath}
- \usepackage{url}
---
```{r setup, include=FALSE}
# Global options for the knitting behavior of all subsequent code chunks
knitr::opts_chunk$set(echo = TRUE)
# Packages
library(tidyverse)
library(DeclareDesign)
# Add extra packages here if needed
```
# External Validity
In the readings for this week, [Coppock et al (2018)](https://doi.org/10.1073/pnas.1808083115) mention how the correspondence in effects between representative and convenience samples depends on the distribution of individual treatment effects.
The following design simulates a model with **heterogeneous treatment effects**, and compares the result of survey experiments conducted with a representative and convenience sample.
```{r}
# Parameters
N = 1000 # population
n = 100 # sample
effect = 0.5
# Model
model = declare_model(
N = N,
U = rnorm(N),
X = runif(N), # observed covariate
potential_outcomes(
# interaction represents heterogeneity
Y ~ Z * effect * X + U
)
)
```
We are also specifying `X` as an observed covariate that moderates the treatment effect, something like digital literacy. It's generated by random draws of a uniform distribution between 0 and 1 (hence the `runif` function). If `X` is 1, the unit experiences the full effect. If it is zero, the effect disappears. The numbers in between scale the treatment effect accordingly. This is a way to simulate heterogeneous treatment effects.
The inquiry is standard fare:
```{r}
# Inquiry
inquiry = declare_inquiry(
ATE = mean(Y_Z_1 - Y_Z_0)
)
```
Then we have to compare two data strategies, the survey experiment with a random sample and the one with a convenience sample. At this point our research design branches into two paths, since each data strategy will also have its own analogous answer strategy. They are essentially two different designs, but we can recycle some components.
This is how it looks for the representative sample:
```{r}
# Data strategy
r_sampling = declare_sampling(S = complete_rs(N, n = n))
assignment = declare_assignment(Z = complete_ra(N))
# Answer strategy
measurement = declare_measurement(Y = reveal_outcomes(Y ~ Z))
estimator = declare_estimator(
Y ~ Z,
inquiry = "ATE"
)
```
Then put everything together:
```{r}
r_design = model + inquiry +
r_sampling + assignment +
measurement + estimator
```
To create our convenience sample, we need a custom sampling function that makes it so that units with higher `X` are more likely to be drawn.
```{r}
convenience_sampling = function(data){
id = sample(data$ID, size = n, prob = data$X)
data$S = with(
data,
ifelse(
data$ID %in% id, 1, 0
)
)
data[data$S == 1, ]
}
```
Then we pass this custom function to `declare_sampling`.
```{r}
c_sampling = declare_sampling(handler = convenience_sampling)
```
And now we can create a separate design for our convenience sample
```{r}
c_design = model + inquiry +
c_sampling + assignment +
measurement + estimator
```
Then we can diagnose both designs at once:
```{r}
# remember to replace with student number
set.seed(123)
r_diag = diagnose_design(r_design)
c_diag = diagnose_design(c_design)
```
And we can use the following function to fetch the bias and RMSE of each design.
```{r}
diagnosands = rbind(
r_diag$diagnosands_df %>%
select(design, bias, rmse),
c_diag$diagnosands_df %>%
select(design, bias, rmse)
)
diagnosands
```
::: {.callout-note}
## **Task 1**
Which design is better in terms of bias and RMSE? What explains this?
:::
::: {.callout-note}
## **Task 2**
What happens to the bias and RMSE of both designs as the sample size `n` *increases* but the population `N` remains constant? What happens when the sample size decreases? What explains this?
_**Hint:**_ *Show two results, one with a larger sample and one with smaller sample.*
:::
::: {.callout-note}
## **Task 3**
What happens to the bias and RMSE when the population and sample sizes are the same? What explains this?
_**Hint:**_ *It may be faster to compute this by choosing a number in between the original population and sample sizes rather than making them both equal to 1,000.*
:::
# Answers
## Task 1
Work on your answer here.
## Task 2
Work on your answer here.
## Task 3
Work on your answer here.