-
Notifications
You must be signed in to change notification settings - Fork 16
/
week3-3.Rmd
248 lines (206 loc) · 9.99 KB
/
week3-3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
---
title: Linked Brushing
layout: post
output:
md_document:
preserve_yaml: true
---
_More examples defining brush queries using Shiny and `ggplot2`._
[Code](https://github.com/krisrs1128/stat679_code/blob/main/notes/week3-3.Rmd), [Recording](https://mediaspace.wisc.edu/media/Week%203%20-%203%3A%20Linked%20Brushing/1_jdtr6xg6)
```{r, echo = FALSE}
library(knitr)
opts_knit$set(base_dir = "/", base.url = "/")
opts_chunk$set(
warning = FALSE,
message = FALSE,
fig.path = "stat679_notes/assets/week3-3/",
cache = TRUE,
fig.height = 5,
retine = 4
)
```
1. These notes provide more realistic examples of linked brushing. Though the
visual design problems they address are more complex, they follow the same
recipe described earlier,
* A `reactiveVal` is defined to track the currently selected samples.
* An `observeEvent` is used to update the `reactiveVal` every time a plot is
brushed.
* Downstream `render` contexts update plot and data table outputs whenever
the `reactiveVal` is changed.
1. The first example implements linked brushing on the movie ratings dataset
presented earlier. Before we used a slider to select movies within a
user-specified time range. Our graphical alternative is to allow selections over
a histogram of movie release dates within the dataset. Specifically, we will
create an interactive version of the histogram below,
```{r}
library(tidyverse)
library(lubridate)
movies <- read_csv("https://raw.githubusercontent.com/krisrs1128/stat479_s22/main/_posts/2022-02-10-week04-03/apps/data/movies.csv") %>%
mutate(
date = as_date(Release_Date, format = "%b %d %Y"),
year = year(date),
Major_Genre = fct_explicit_na(Major_Genre)
)
movies %>%
count(year) %>%
ggplot(aes(year, n)) +
geom_bar(stat = "identity", width = 1) +
scale_y_continuous(expand = c(0, 0))
```
and when a subset of years of has been brushed, we will highlight the
corresponding movies in the same kind of scatterplot used in the earlier,
slider-based implementation.
```{r}
ggplot(movies) +
geom_point(aes(Rotten_Tomatoes_Rating, IMDB_Rating))
```
1. Viewed more abstractly, we are going to use a brush to link the histogram
and scatterplot views. We will be able to evaluate the change in a
visualization (the scatterplot) after "conditioning" on a subset defined by a
complementary view (the histogram). This is analogous to the penguins dataset
example -- only the form of the base plots has changed.
1. The main logic needed to link these views is given in the block below. The
histogram `plotOutput` in the UI is given a brush which will be used to select
years^[Note that we restrict brush motion to the $x$-direction. This is because
the $x$ direction alone encodes year information, which we want to select.]. We
use the `selected` reactive value to store a list of `TRUE/FALSE`'s indicating
which movie falls into the currently brushed time range. Each time the brushed
range is changed, the `output$scatterplot` and `output$table` outputs are
regenerated, highlighting those movies that appear in the `selected()` list.
```{r, eval = FALSE}
ui <- fluidPage(
fluidRow(
column(6, plotOutput("histogram", brush = brushOpts("plot_brush", direction = "x"))),
column(6, plotOutput("scatterplot"))
),
dataTableOutput("table")
)
server <- function(input, output) {
selected <- reactiveVal(rep(TRUE, nrow(movies)))
observeEvent(
input$plot_brush,
selected(reset_selection(movies, input$plot_brush))
)
output$histogram <- renderPlot(histogram(movies))
output$scatterplot <- renderPlot(scatterplot(movies, selected()))
output$table <- renderDataTable(data_table(movies, selected()))
}
```
1. We haven't included the full code for `histogram`, `scatterplot`, and
`data_table`, since they in and of themselves don't require any logic for
interactivity. You can try out the full code
[here](https://github.com/krisrs1128/stat679_code/blob/main/examples/week3/3-3/3-3-1/app.R)
and tinker with the interface below.
<iframe src="https://data-viz.it.wisc.edu/content/1014c0e9-cf88-4b8d-93a6-52fd77669c23/" allowfullscreen="" data-external="1" height=700 width=600></iframe>
1. A natural extension of the previous app is to allow brushing on both the
histogram and the scatterplot. Brushing over the scatterplot would show the
years during which the selected movies were released -- this can be used to find
out if very poorly or highly rated movies are associated with specific time
ranges, for example.
1. The updated application is below. The main differences are that,
* The scatterplot `plotOutput` now includes a brush.
* We are passing in the reactive value of the `selected()` movies into the
histogram as well.
```{r, eval = FALSE}
ui <- fluidPage(
fluidRow(
column(6, plotOutput("histogram", brush = brushOpts("plot_brush", direction = "x"))),
column(6, plotOutput("scatterplot", brush = "plot_brush"))
),
dataTableOutput("table")
)
server <- function(input, output) {
selected <- reactiveVal(rep(TRUE, nrow(movies)))
observeEvent(
input$plot_brush,
selected(reset_selection(movies, input$plot_brush))
)
output$histogram <- renderPlot(histogram(movies, selected()))
output$scatterplot <- renderPlot(scatterplot(movies, selected()))
output$table <- renderDataTable(data_table(movies, selected()))
}
shinyApp(ui, server)
```
1. For the scatterplot, we simply reduced the transparency for the movies that
weren't selected. We cannot do this for the histogram, though, because the
movies are not directly represented in this plot, only their counts over time.
Instead, our idea will be to draw two overlapping histograms. A static one in
the background will represent the year distribution before any selection. A
changing one in the foreground will be redrawn whenever the selected movies are
changed. For example, the code below overlays two `geom_bar` layers, with one
corresponding only to the first 500 movies in the dataset.
```{r}
sub_counts <- movies[1:500, ] %>%
count(year)
movies %>%
count(year) %>%
ggplot(aes(year, n)) +
geom_bar(stat = "identity", fill = "#d3d3d3", width = 1) +
geom_bar(data = sub_counts, stat = "identity", width = 1) +
scale_y_continuous(expand = c(0, 0))
```
1. Combining these ideas leads to the app
[here](https://github.com/krisrs1128/stat679_code/blob/main/examples/week3/3-3/3-3-2/app.R)
and included below. Try brushing on both the scatterplot and the histogram. The
especially interesting thing about this approach is that, without introducing
any new screen elements, we've widened the class of questions of that can be
answered. In a sense, we've increased the information density of the display --
we can present more information without having to introduce any peripheral UI
components or graphical marks.
<iframe src="https://data-viz.it.wisc.edu/content/fae55716-8e52-4f6e-ae29-4db14c063002/" allowfullscreen="" data-external="1" height=700 width=600></iframe>
1. In our last problem, we would like to use a dataset of flight delays to
understand what characteristics of the flights make some more / less likely to
be delayed. The basic difficulty is that there are many potentially relevant
variables, and they might interact in ways that are not obvious in advance.
```{r}
library(nycflights13)
head(flights)
```
1. Our solution strategy will be to dynamically link complementary histograms.
By brushing the histogram of delays time, we'll be able to see the conditional
distributions for other variables of interest. In principle, we could do this
for every variable in the dataset, but for the example, we'll focus on just the
scheduled departure time and flight distance.
1. The UI in this case creates three separate histograms, each of which
introduces a brush. We will plan on brushing one histogram at a time, which is
then used to update overlays on each.
```{r, eval = FALSE}
ui <- fluidPage(
fluidRow(
column(
6,
plotOutput("h1", brush = brushOpts("plot_brush", direction = "x"), height = 200),
plotOutput("h2", brush = brushOpts("plot_brush", direction = "x"), height = 200),
plotOutput("h3", brush = brushOpts("plot_brush", direction = "x"), height = 200)
),
column(6, dataTableOutput("table"))
),
)
```
1. The logic for drawing the overlays is encapsulated by the functions below.
The `bar_plot` function draws two bar plots over one another, one referring to a
global `counts` object of unchanging histogram bar heights. The second refers to
the bar heights for the continually updated overlays. Notice that we use
`.data[[v]]` to use variable names encoded in strings. The `plot_overlay`
function provides the histogram bar heights for variable `v` after brushing over
the flights in `selected_`.
```{r}
bar_plot <- function(sub_flights, v) {
ggplot(counts[[v]], aes(.data[[v]], n)) +
geom_bar(fill = "#d3d3d3", stat = "identity") +
geom_bar(data = sub_flights, stat = "identity")
}
plot_overlay <- function(selected_, v) {
flights %>%
filter(selected_) %>%
count(.data[[v]]) %>%
bar_plot(v)
}
```
1. Code for the full application is linked
[here](https://github.com/krisrs1128/stat679_code/blob/main/examples/week3/3-3/3-3-3/app.R).
Thanks to shiny's `reactiveVal` and `brushedPoints` definitions, implementing
interactivity only requires about 20 lines (starting from `ui <- ...` to the
end). The rest of the code is used to draw new static plots depending on the
current selection.
<iframe src="https://data-viz.it.wisc.edu/content/02da9288-2810-4ce8-b92c-75177bb4cdc9/" allowfullscreen="" data-external="1" height=700 width=600></iframe>