/
thereAndBackAgain.Rmd
391 lines (313 loc) · 16.5 KB
/
thereAndBackAgain.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
---
title: "There And Back Again"
subtitle: "Differences When Transforming"
author: "R.W. Oldford and Zehao Xu"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 3
geometry: margin=.75in
urlcolor: blue
graphics: yes
vignette: >
%\VignetteIndexEntry{There And Back Again}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include=FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE,
warning = FALSE,
message = FALSE,
fig.align = "center",
fig.width = 6,
fig.height = 5,
out.width = "40%",
collapse = TRUE,
comment = "#>",
tidy.opts = list(width.cutoff = 65),
tidy = FALSE)
library(knitr)
set.seed(12314159)
imageDirectory <- "./l_ggplot"
dataDirectory <- "./l_ggplot"
library(grid, quietly = TRUE)
library(gridExtra, quietly = TRUE)
library(ggplot2, quietly = TRUE)
library(loon, quietly = TRUE)
library(loon.ggplot, quietly = TRUE)
```
# Some data
Consider the following artificially generated dataset:
```{r fake data}
data <- data.frame(A = c(19, 19, 25, 62, 34,
58, 62, 40, 24, 60,
70, 40, 40, 34, 26),
B = c(68, 63, 63, 13, 55,
78, 14, 14, NA, 28,
NA, 55, 57, 40, 78) )
```
There are `r nrow(data)` observations in the dataset; variable `A` is complete (has no missing values), whereas variables `B` and `C` are missing `r sum(is.na(data$B))` and `r sum(is.na(data$C))` observations, respectively.
# `ggplot` to `loon` transformation
## the `ggplot` and its states
Begin with a `ggplot` defined using the interactive grammar extension:
```{r ggplot}
ggp <- ggplot(data,
mapping = aes(x = A, y = B)) +
ggtitle("Some title") +
geom_point(color = "grey", size = 5) +
linking(linkingGroup = "my plots")
plot(ggp)
```
Information on the `ggplot` corresponding to `loon` states is had using `ggplot_build()`
```{r ggp states}
# get the ggplot data corresponding to loon "states"
ggp_states <- ggplot_build(ggp)$data[[1]]
ggp_states
```
Note that some of the `x` and `y` values are `NA`, representing missing values.
## the `loon` plot and its states
Now construct the `loon` plot from this `ggplot`:
```{r loon plot}
lp <- loon.ggplot(ggp)
plot(lp)
```
The `loon` plot looks slightly different:
- title in a different position,
- the relative sizes of the margins and drawing areas,
- axis tic marks and labels,
- font sizes and styles, and
- points are a slightly darker grey in the `loon` plot.
Plots in `loon` have more constrained layouts than those in `ggplot2`, since focus is primarily on interactive as opposed to publication quality graphics.
Some differences (e.g., colour) are also because `loon` is based on `Tcl` for drawing primitives.
Plot states in `loon` are accessible using `[]` and the name of the state.
Values are assigned in the same way.
Names of interactively changeable states in `loon` are
```{r loon plot states}
names(lp)
# and accessed with [] as in
lp["title"]
```
(See also `?l_info_states`.)
Some, like `x`, `y`, and `color`, are *n-dimensional states* whose values correspond to those on individual observations. These may be changed using the `[] <-` notation as well.
Because `loon` plots exclude observations missing values in any its n-dimensional states, the `loon` plot will have fewer observations than the corresponding `ggplot`, even though the same number of points may be plotted. The actual number of observations in the `loon` plot is
```{r loon n}
lp["n"]
```
which is less than the number of observations recorded in the `ggplot` structure (namely `nrow(ggp_states) =` `r nrow(ggp_states)`).
## changing from ggplot to loon
Differences between `ggplot` structure and `loon` plot structure (and how these affect the visualizations) is worth exploring in a little more detail.
A better understanding will be helpful when transforming back and forth between the two.
### shape to glyph
In `R` the point symbol plotted is associated with a numerical code `pch`. In `loon` the point symbol is a glyph, identified by a string.
```{r glyphs}
ggp_states$shape
lp["glyph"]
```
As with other plot "states", there is not always a `glyph` in `loon` that matches a `pch` in `R`. Values of `pch` with no counterpart in `loon` will map to the default `glyph` in `loon`.
### point size
Size is another plot state that differs between `ggplot` and `loon`.
```{r sizes}
ggp_states$size
lp["size"]
```
The point `size` in `loon` identifies the approximate **area** of the point symbol, so when transforming from `ggplot` to `loon`, the `size` parameter of the `ggplot` is transformed to an integer value in `loon` that tries to match the area of the point in `ggplot`.
### colours
Comparing the plots, the points are slightly different `"grey"` in one plot than in the other.
This is because the two plots have different values stored as their colour state:
```{r colours}
ggp_states$colour
lp["color"]
```
First, there is a slight difference between what `R` regards as `"grey"` and what `Tcl` (used by `loon`) regards as `"grey"`.
Several other **named** colours in `R` also differ slightly from those in `Tcl` and hence in `loon`. `R` colours whose name **is the same** as that in `Tcl` **but whose colour is different** are **only** the following:
```{r tkcolors, echo = FALSE, out.width="60%"}
tohex <- function(x) {
sapply(x, function(xi) {
crgb <- as.vector(col2rgb(xi))
rgb(crgb[1], crgb[2], crgb[3], maxColorValue = 255)
})}
df <- data.frame(
R_col = tohex(colors()),
Tcl_col = hex12tohex6(l_hexcolor(colors())),
row.names = colors(),
stringsAsFactors = FALSE
)
df_diff <- df[df$R_col != df$Tcl_col,]
if (requireNamespace("grid", quietly = TRUE)) {
grid::grid.newpage()
grid::pushViewport(grid::plotViewport())
x_col <- grid::unit(0, "npc")
x_R <- grid::unit(6, "lines")
x_Tcl <- grid::unit(10, "lines")
grid::grid.text('color', x=x_col, y=grid::unit(1, "npc"),
just='left', gp=grid::gpar(fontface='bold'))
grid::grid.text('R', x=x_R, y=grid::unit(1, "npc"), just='center',
gp=grid::gpar(fontface='bold'))
grid::grid.text('Tcl', x=x_Tcl, y=grid::unit(1, "npc"), just='center',
gp=grid::gpar(fontface='bold'))
for (i in 1:nrow(df_diff)) {
y <- grid::unit(1, "npc") - grid::unit(i*1.2, "lines")
grid::grid.text(rownames(df_diff)[i], x=x_col, y=y, just='left')
grid::grid.rect(x=x_R, y=y, width=grid::unit(3, "line"),
height=grid::unit(1, "line"), gp=grid::gpar(fill=df_diff[i,1]))
grid::grid.rect(x=x_Tcl, y=y, width=grid::unit(3, "line"),
height=grid::unit(1, "line"), gp=grid::gpar(fill=df_diff[i,2]))
}
}
```
(See `?tkcolors` in `loon` for more information.)
Second, `loon` plot colour values are strings of **twelve** hexadecimal digits, as in `tk` (e.g., see `?tkcolors`); in contrast, the `ggplot` are either strings corresponding to the named `R` colours (see `?grDevices::colors` in `R`) or to **six** hexadecimal digits (two for each of the red, green, and blue components).
Twelve hex digit colours are turned into six hex digits using the `loon` function `hex12to6()`. To convert the `R` named colours to six hex digits, the following function can be used.
```{r tohex show, eval = FALSE}
tohex <- function(x) {
sapply(x, function(xi) {
crgb <- as.vector(col2rgb(xi))
rgb(crgb[1], crgb[2], crgb[3], maxColorValue = 255)
})}
```
So, comparing the two `"grey"` colours, the `ggplot` has the `R` hex colour `tohex("grey") =` `r tohex("grey")` and the `loon` plot has `Tcl` hex colour (converted to hex 6) `hex12tohex6(lp["color"][1]) =` `r hex12tohex6(lp["color"][1])`, which are clearly different RGB values.
## effect of missing data
The `loon` plot does not include the missing data
```{r missing data}
nrow(ggp_states) == lp["n"]
# Compare
ggp_states$y
lp["y"]
```
Note that, absent the missing `NA`s, the **order** of the points is identical.
### on linking keys
Default linking keys in `loon` are from `"0"` to `"n-1"` where `"n"` is the number of rows in the original data set.
Here `nrow(data) =` `r nrow(data)`. Note that this `"n"` is not the number of points actually plotted because `loon` drops the missing data, unlike `ggplot`.
The linking keys reflect this missingness:
```{r missing linking keys}
lp["linkingKey"]
```
which is missing `"8"` and `"10"`. Note also that `lp["n"] =` `r lp["n"]` is the number of points plotted in the `loon` plot.
Some care needs to be taken when dealing with linking, especially when there is missing data.
### for more information
See the vignette [Linking](https://great-northern-diver.github.io/loon.ggplot/articles/linking.html) and the `loon` vignette [Logical Queries](https://great-northern-diver.github.io/loon/articles/logicalQueries.html) for more information.
# `loon` to `ggplot` transformation
A `ggplot` from the `loon` plot will not have access to the original information.
```{r ggp from lp}
ggp_lp_1 <- loon.ggplot(lp)
ggp_lp_1
```
Note that, though a `ggplot`, the position of the title is now centred. This is because the objective of the transformation is to create a `ggplot` that looks as much like the original `loon` plot as possible.
## the `ggplot` states
To try to reflect the look of the `loon` plot, the states of the `ggplot` are **different** from those of the original `ggplot` `ggp`.
```{r ggp_lp_1 states}
ggp_lp_1_states <- ggplot_build(ggp_lp_1)$data[[1]]
ggp_lp_1_states
lp_ggp_lp_1 <- loon.ggplot(ggp_lp_1)
```
## After interactive changes
Perform some interactive changes, including selecting some points
```{r change loon plot}
selection <- lp["x"] > 50 &lp["y"] > 13
lp["selected"] <- selection
colorMeRed <- lp["x"] == 34
lp["color"][colorMeRed] <- "red"
```
### linking keys after `loon.ggplot()`
Linking problems can arise whenever a `ggplot` constructed from a `loon` plot is then made interactive again.
Suppose a `loon` plot, like `lp`, is turned into a `ggplot` (typically, after some interactive changes) via `loon.ggplot()`, and the resulting new `ggplot` is then itself turned into another interactive plot.
The new interactive plot will not necessarily share the same linking information as the original.
This is because the second interactive plot will have the default values of `linkingGroup`, `linkingKey` and linked display states; these values are lost in the transfer from the first interactive plot to the `ggplot`. That is,
- the *linking group* is not automatically carried over to the second interactive plot
- the *linked states* are not automatically carried over to the second interactive plot
- the default *linking keys* in the second interactive plot are `"0"`, ..., `"n-1"` where `n` is the number of observations that were displayed in the first plot.
So, provided the first interactive plot has the default linking keys
(and was built with complete data; i.e., no data was missing at creation), then the linking keys will match.
An **important special case** is when some observations were **selected** in the first interactive plot at the time the `ggplot` is created. In this case, the **linking keys will typically not match.**
For example,
```{r}
# Get a ggplot from the loon plot
ggp_lp <- loon.ggplot(lp)
```
The interactive `lp` and the new `ggplot` `ggp_lp` appear as follows.
```{r loon to ggplot, echo = FALSE, fig.width = 8, fig.height = 4.5, out.width = "80%", warning = FALSE}
# need grid for text
library(grid)
grid.arrange(plot(lp, draw = FALSE), # the interactive plot
ggp_lp, # the ggplot fromlp
grid.text("lp"),
grid.text("ggp_lp"),
ncol = 2,
nrow = 2,
widths = c(0.45, 0.55),
heights = c(0.5, 0.1))
```
The left plot, `lp`, is an interactive `loon` plot and the magenta points are selected.
The right plot, `ggp_lp`, is a static `ggplot` which has no `selected` state and the magenta points are simply points having that colour (and appear as that in the `ggplot` legend).
In an interactive plot, selected points are visually emphasized in two ways:
- their colour is changed to the highlight colour (here magenta), and
- the corresponding part of the display (here the points) are drawn **on top** of the rest of the display.
When transferring that to a `ggplot`, the selected points appear with the highlight colour **and** the **data order is changed** in the `ggplot` so that they appear on top of all other points in the display. If, instead, the argument `selectedOnTop = FALSE` is given to `loon.ggplot()` call, then the order of points will not be changed in the `ggplot`. This is **strongly recommended** whenever the `ggplot` will later be turned into an interactive plot; this will allow the linking to match more easily with the original interactive plot.
To see the effect of this (and of missing values) on linking, suppose an interactive plot is created from
the `ggplot` `ggp_lp`:
```{r}
# The loon plot from the resulting ggplot
lp_ggp_lp <- loon.ggplot(ggp_lp)
```
Compare the linking keys of this plot with the original:
```{r}
# The original loon plot has linking keys
lp["linkingKey"]
# And the loon plot from the derived ggplot
lp_ggp_lp["linkingKey"]
```
Clearly, these are not the same and will match observations correctly (`"0"` to `"7"`), some
incorrectly (`"9"`, `"11"`, and `"12"`), and some not at all (`"8"`, `"13"`, and `"14"`).
The difference is that `lp` was created with data missing values corresponding to the missing linking keys `"8"` and `"10"`. This information was not available on the `ggplot` `ggp_lp` so that the new `loon` plot
would be created with the default values `"0"` to `"12"`. The result is that some points will be wrongly linked between `lp` and `lp_ggp_lp`.
The selected points in `lp` (highlighted magenta) will cause reordering in `ggp_lp` and hence in `lp_ggp_lp`. This too will cause problems in linking and other states of the new `lp_ggp_lp`.
For example, the two point orders can be seen in the respective values of their `"x"`:
```{r}
# The original point order
lp["x"]
# The new plot's order
lp_ggp_lp["x"]
```
This makes problems for matching the correct observations.
Other states are also changed because of the transition from `loon` plot to `ggplot` to `loon` plot.
In particular, the `"color"` and `"selected"` states will not match.
```{r}
# Original selected
lp["selected"]
# the new plot has nothing selected
lp_ggp_lp["selected"]
```
This is for two reasons. First the selected points of `lp` changed to colour in `ggp_lp` and so the selected points from `lp` could not be transferred to `lp_ggp_lp`. The colours in `lp_ggp_lp` will not match those of. Second, the new plot has not (yet) joined the same linking group as the original.
```{r}
# Original selected
lp["color"]
# the new plot has nothing selected
lp_ggp_lp["color"]
```
When observations are also **highlighted** in the original `loon` plot, as in `lp`, a little more care needs to be taken with respect to the linking keys. The problem is that in constructing the `ggplot`, the
data has to be reordered to ensure that the selected highlighted points appear on top of the other points in the `ggplot`.
Care needs to be taken to manage the linking keys when moving from the static `ggplot` to the interactive `loon` plot. This information can be added when the interactive plot is created, as arguments to `loon.ggplot()` as follows.
```{r}
# Add the linking information when creating the interactive plot
lp_ggp_l1_lk <- loon.ggplot(ggp_lp,
linkingKey =lp["linkingKey"],
linkingGroup = "NA example")
# Now compare
lp_ggp_l1_lk["linkingKey"]
# to the original loon plot
lp["linkingKey"]
```
Alternatively, the grammar could have been used as in
```{r}
# Add the linking information when creating the interactive plot
lp_ggp_l1_ggk <- loon.ggplot(ggp_lp +
linking(linkingGroup = "NA example",
linkingKey =lp["linkingKey"]))
# Again compare
lp_ggp_l1_ggk["linkingKey"]
# to the original loon plot
lp["linkingKey"]
```
Of course, all this depends on the user knowing where to find the correct linking information.
Comparing the three different interactive plots shows how information can be lost, as well as how it can be maintained, when translating from interactive to static `ggplot` back to an interactive `loon` plot.