-
Notifications
You must be signed in to change notification settings - Fork 0
/
presidential_plinko.Rmd
executable file
·437 lines (347 loc) · 15.2 KB
/
presidential_plinko.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
---
title: "Presidential Plinko"
author: "Matthew Kay"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
df_print: kable
vignette: >
%\VignetteIndexEntry{Presidential Plinko}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
<style type="text/css">
.kable-table table {
margin-left: 0;
}
img {
border: none;
}
</style>
```{r chunk_options, include=FALSE}
if (capabilities("cairo") && Sys.info()[['sysname']] != "Darwin") {
knitr::opts_chunk$set(
dev.args = list(png = list(type = "cairo"))
)
}
```
The following code constructs a Plinko board for a quantile dotplot of a
predictive distribution for the 2020 US Presidential election outcome.
See [presidential-plinko.com](https://presidential-plinko.com) and the
[mjskay/election-galton-board](https://github.com/mjskay/election-galton-board)
repository for more information.
## Setup
The following packages are required for this example:
```{r setup, message=FALSE, warning=FALSE}
library(plinko)
library(dplyr)
library(ggplot2)
library(ggdist)
library(ggforce)
library(distributional)
theme_set(theme_ggdist())
```
```{r hidden_options, include=FALSE}
# Something about vignette buiding order means multicore isn't working on CRAN on windows
# (getting errors about the package not having been installed) so on CRAN we'll use 1
# core (i.e. no parallel execution). CRAN doesn't like using more than 2 cores anyway
# so it's not a big loss.
.old_options = options(cores = 1)
```
## Desired target sample
Say we want to display FiveThirtyEight's probabilistic prediction of Biden's chance
of winning the 2020 US Presidential election. The following dataset is FiveThirtyEight's
prediction as of Oct 16, 2020:
```{r pres_pred_2020_plot, fig.width = 6, fig.height = 4.5}
data(pres_pred_2020, package = "plinko")
pres_pred_2020 %>%
ggplot(aes(x = total_ev, y = evprob_chal)) +
geom_col(fill = "gray75") +
geom_vline(xintercept = 269) +
coord_cartesian(xlim = c(0, 538)) +
labs(
x = "Electoral votes for Biden",
y = "Predicted probability",
title = "Predictive distribution for Biden's electoral votes",
subtitle = "FiveThirtyEight's model of the 2020 US election as of Oct 16, 2020"
)
```
This particular dataset is already summarized as a histogram: the `evprob_chal`
column is the probability associated with each x value (`total_ev`) in the
above chart. While we could operate directly on this summarized data using
weighted statistics (weighted means, quantiles, etc)---and this is what I did
for [the implementation of presidential-plinko.com](https://github.com/mjskay/election-galton-board)---for
the purposes of this example, we will convert the summarized data into a sample
first. This type of data is more commonly encountered and a bit easier to work with.
To convert the data into a sample of (say) 5000, we can just use `sample()` and
provide the probabilities as weights:
```{r ev_sample}
set.seed(1234) # for reproducibility
ev_sample = sample(pres_pred_2020$total_ev, 5000, replace = TRUE, prob = pres_pred_2020$evprob_chal)
```
We can verify the data looks essential the same (down to sampling error):
```{r ev_sample_plot, fig.width = 6, fig.height = 4.5}
tibble(ev_sample) %>%
ggplot(aes(x = ev_sample)) +
geom_histogram(binwidth = 1, fill = "gray75") +
geom_vline(xintercept = 269) +
coord_cartesian(xlim = c(0, 538)) +
labs(
x = "Electoral votes for Biden",
y = "Frequency",
title = "Predictive distribution for Biden's electoral votes (sampled)",
subtitle = "FiveThirtyEight's model of the 2020 US election as of Oct 16, 2020"
)
```
## Constructing a basic board
The `plinko_board()` function can be called directly on a numeric vector; however,
this usage typically requires you to already know exactly the board parameters
you need (number of bins, bin width, etc). Thus, it is usally easier to call
`plinko_board()` on a [distributional](https://pkg.mitchelloharawild.com/distributional/)
object, which is an object that represents some probability distribution.
Given a distributional object and either a desired bin width (`bin_width`) **or**
number of bins (`n_bin`), `plinko_board()` will do its best to automatically figure
out the necessary board parameters to accomodate some number of balls from that
distribution (`n_ball`, which by default is `50`, but we will use just `20` for this example).
We can construct a distributional object from sample data using `distributional::dist_sample()`
and pass it to `plinko_board()`:
```{r board_1}
ev_dist = dist_sample(list(ev_sample))
board = plinko_board(ev_dist, bin_width = 35, n_ball = 20)
board
```
## Plotting the board
### Using `autoplot()`
The Plinko board will automatically find random paths the balls could have taken
to end up in their final locations. We can see these paths by plotting the board
with the `autoplot()` function, which returns a `ggplot()` object:
```{r fig.height = 4.75, fig.width = 1.5}
board %>%
autoplot()
```
By default the plot overlays the Binomial distribution implied by the Plinko board so
you can judge the quality of the approximation. We can also turn this off:
```{r fig.height = 4.75, fig.width = 1.5}
board %>%
autoplot(show_dist = FALSE)
```
You can also see the board without paths by passing `show_paths = FALSE`, and you
can plot specific frames of the animation by passing the `frame` parameter:
```{r fig.height = 4.75, fig.width = 1.5}
autoplot(board, show_paths = FALSE, frame = 9)
```
### Plotting manually
We could also create a plot like the ones above manually by using the `slot_edges()`, `pins()`, `paths()`, and `balls()`
unctions, which return data frames containing the locations of all of the elements of the board:
```{r board_plot, fig.height = 4.75, fig.width = 1.5}
ggplot() +
geom_segment(aes(x = x, y = 0, xend = x, yend = height), data = slot_edges(board), color = "gray75", size = 1) +
geom_point(aes(x, y), data = pins(board), shape = 19, color = "#e41a1c", size = 1) +
geom_path(aes(x = x, y = y, group = ball_id), data = paths(board), alpha = 1/4, size = 1, color = "gray50") +
geom_circle(aes(x0 = x, y0 = y, r = width/2), data = balls(board), fill = "#1f78b4", color = NA) +
coord_fixed(expand = FALSE, clip = "off") +
ylab(NULL) +
scale_y_continuous(breaks = NULL) +
theme(
axis.line.y = element_blank(),
axis.line.x = element_line(color = "gray75", size = 1)
)
```
However, if you wish to customize the plot, it is probably easier to use the
`modify_layer()` and `add_layers()` functions described later in this document, as
these also impact the frames used when an animated plot is rendered.
## Forcing a slot edge to fall on 269
Before we get to animating this board, we also want to address an additional
constraint not handled automatically above: we will want to be able to color the
balls red if they fall below `269` (Trump wins) and blue if they fall above `269`
(Biden wins). To do that, we need a slot edge to fall on `269`.
Let's start by seeing
how well our current approximation does by turning off the display of ball paths
and adding on a vertical line at `269`. `autoplot()` returns a `ggplot` object
so this is straightforward:
```{r board_1_edge_plot, fig.height = 4.75, fig.width = 1.5}
autoplot(board, show_paths = FALSE) +
geom_vline(xintercept = 269)
```
The slot edge closest to 269 is a little too far away for my taste. We can
see how close it is in electoral votes:
```{r board_1_269_dist}
min(abs(board$slot_edges - 269))
```
A simple approach to fixing this problem might be to take a range of bin sizes we
are happy with (say 35 to 45 electoral votes) and find a bin width in that range
that minimizes the above number:
```{r find_bin_width}
closest_edge_to_269 = Vectorize(function(bin_width) {
board = plinko_board(ev_dist, bin_width = bin_width)
min(abs(board$slot_edges - 269))
})
bin_width = (35:45)[which.min(closest_edge_to_269(35:45))]
bin_width
```
Let's see how the board looks with `r bin_width` bins:
```{r board_2, fig.height = 4.75, fig.width = 1.5}
board = plinko_board(ev_dist, bin_width = bin_width, n_ball = 20)
autoplot(board, show_paths = FALSE) +
geom_vline(xintercept = 269)
```
That looks pretty good! If we want to get *really* persnickety about it, we
are still off by a little over 1 electoral vote:
```{r closest_edge_at_bin_width}
closest_edge_to_269(bin_width)
```
We could fix that by shifting the center of the board slightly to compensate. To
do that we need to know the signed difference between the slot edge closest to
269 and 269:
```{r slot_edge_shift}
slot_edge_shift = 269 - board$slot_edges[which.min(abs(board$slot_edges - 269))]
slot_edge_shift
```
Then we can use this to shift the board center slightly
```{r board_3, fig.height = 4.75, fig.width = 1.5}
shifted_center = board$center + slot_edge_shift
board = plinko_board(ev_dist, bin_width = bin_width, n_ball = 20, center = shifted_center)
autoplot(board, show_paths = FALSE) +
geom_vline(xintercept = 269)
```
## Customizing board appearance
Now that our board has the parameters we want---a distribution that matches
our target distribution and a slot edge falling on 269---we can customize the
rest of the appearance of the board.
### Adjusting limits
First, we can adjust the limits of the board using the `limits` parameter.
For this use case we should show the distribution compared to the full
range of possible electoral votes:
```{r board_limits, fig.height = 4.5, fig.width = 1.5}
board = plinko_board(ev_dist, bin_width = bin_width, n_ball = 20, center = shifted_center, limits = c(0, 538))
autoplot(board)
```
### Customizing ggplot layers
We can also customize
existing layers in the `ggplot` objects generated for
each frame using the `modify_layer()` function, which takes a layer name
followed by parameters you would normally pass to a ggplot layer/geom.
Aesthetic mappings you provide are merged with the existing aesthetics in the
layer, and new arguments you provide override existing ones.
For example, say you want to make the ball fill color depend on its x position, and
change the outline color to be black:
```{r fig.height = 4.5, fig.width = 2.5}
board %>%
modify_layer("balls", aes(fill = x), color = "black") %>%
autoplot(show_dist = FALSE)
```
You can use `modify_layer()` to adjust the following layers:
- "slot_edges": A `geom_segment()` that draws the edges of the slots
- "pins": A `geom_point()` that draws the pins
- "paths": A `geom_path()` that draws ball paths
- "balls": A `geom_circle()` that draws the balls
- "dist": A `geom_step()` that draws the reference binomial distribution
When plotting a single frame, you can also add additional ggplot objects after
the call to `autoplot()`, as with any ggplot object:
```{r fig.height = 4.5, fig.width = 1.5}
board %>%
autoplot(show_dist = FALSE, show_paths = FALSE) +
geom_vline(xintercept = 269, color = "black", alpha = 0.15, size = 1) +
annotate("label", x = 269, y = 1500, label = "269", hjust = 0.5, color = "gray50")
```
However, layers added in this way are not saved into the `board` object, and so
**will not be displayed when the board is animated**.
To add ggplot objects to the `board` object so that they are displayed when the
board is animated, you must use the `add_layers()` function *before* calling
`autoplot()` or `animate()`. Here is the same example using `add_layers()`:
```{r fig.height = 4.5, fig.width = 1.5}
board %>%
add_layers(
geom_vline(xintercept = 269, color = "black", alpha = 0.15, size = 1),
annotate("label", x = 269, y = 1500, label = "269", hjust = 0.5, color = "gray50")
) %>%
autoplot(show_dist = FALSE, show_paths = FALSE, show_target_dist = FALSE)
```
As static plots, both look identical. However, when we get to animating (next),
we'll have to use the `add_layers()` approach.
## Animating the board
By default, no tweening is done between balls on the board:
```{r animate_no_tween, fig.height = 4.5, fig.width = 2}
board %>%
# width is determined automatically based on height
animate(fps = 7.5, height = 450)
```
Most likely you will want some tweening between ball states. I find that using 4 times the
base number of frames with a `"bounce-out"` easing function (the default) makes for a smooth
animation that feels more physically accurate. You can add tweening using the
`tween_balls()` function. You'll also want to increase the `fps` to account for
the additional frames, and we will add a 3-second (90-frame) pause at the end
of the animation using `end_pause`:
```{r animate_tween}
board %>%
tween_balls(frame_mult = 4, ease = "bounce-out") %>%
animate(fps = 30, height = 450, end_pause = 90)
```
Given the increased number of frames due to tweening,
you may want to increase speed by rendering frames in parallel using the `cores` argument to
`animate.plinko_board()`. The `cores` argument defaults to `getOptions("cores", 1)`,
so if you want to set the number of cores to the total number available for all
subsequent calls to `animate.plinko_board()`, you can also use:
```r
options(cores = parallel::detectCores())
```
For testing purposes (e.g. faster rendering), it can also be useful to filter the frames in the final animation.
You can do this using `filter_frames()`, which takes filtering conditions in the same format
as `filter()` and applies them to the data frame returned by `frames(board)`. The `"ball_id"` column
can be combined with the `"stopped"` column (which is `TRUE` after the ball hits the
bottom of a slot and stops moving) to show just one ball dropping, for example:
```{r animate_one_ball}
board %>%
filter_frames(ball_id == 1, !stopped) %>%
tween_balls(frame_mult = 4, ease = "bounce-out") %>%
animate(fps = 30, height = 450)
```
## A full, annotated board
Finally, we can combine everything together to adjust the existing geometries
with `modify_layer()`, add annotations with `add_layers()`, add tweening
with `tween_balls()`, then animate:
```{r fig.height = 5.5, fig.width = 2}
Biden_color = "#0571b0"
Trump_color = "#ca0020"
annotation_height = 2000
board %>%
modify_layer("pins", color = "gray50") %>%
# the "balls" layer uses the frame(board) data frame, which has a "region"
# column giving the part of the board the ball is in ("start", "pin", or "slot")
# we can use this to change the ball color when it falls into the slot
modify_layer(
"balls",
aes(fill = ifelse(region == "slot", ifelse(x <= 269, "Trump", "Biden"), "none")),
color = NA
) %>%
add_layers(
geom_vline(xintercept = 269, color = "black", alpha = 0.15, size = 1),
annotate("text",
x = 290, y = 0.92 * annotation_height,
label = "Biden wins", hjust = 0, color = Biden_color,
),
annotate("text",
x = 250, y = 0.92 * annotation_height,
label = "Trump wins", hjust = 1, color = Trump_color
),
annotate("label",
x = 269, y = 0.97 * annotation_height,
label = "269", hjust = 0.5, color = "gray50",
fontface = "bold"
),
expand_limits(y = annotation_height),
scale_fill_manual(
limits = c("none", "Biden", "Trump"),
values = c("gray45", Biden_color, Trump_color),
guide = FALSE
),
theme(axis.title.x = element_text(hjust = 0, size = 10, color = "gray25")),
xlab("Electoral votes for Biden")
) %>%
tween_balls(frame_mult = 4, ease = "bounce-out") %>%
animate(fps = 30, height = 550, end_pause = 90)
```
```{r reset_options, include=FALSE}
options(.old_options)
```