forked from stephaniehicks/jhustatcomputing2022
-
Notifications
You must be signed in to change notification settings - Fork 9
/
index.qmd
271 lines (184 loc) · 9.58 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
---
title: "11 - Plotting Systems"
author:
- name: Leonardo Collado Torres
url: http://lcolladotor.github.io/
affiliations:
- id: libd
name: Lieber Institute for Brain Development
url: https://libd.org/
- id: jhsph
name: Johns Hopkins Bloomberg School of Public Health Department of Biostatistics
url: https://publichealth.jhu.edu/departments/biostatistics
description: "Overview of three plotting systems in R"
categories: [module 3, week 3, R, programming, ggplot2, data viz]
---
*This lecture, as the rest of the course, is adapted from the version [Stephanie C. Hicks](https://www.stephaniehicks.com/) designed and maintained in 2021 and 2022. Check the recent changes to this file through the `r paste0("[GitHub history](https://github.com/lcolladotor/jhustatcomputing/commits/main/", basename(dirname(getwd())), "/", basename(getwd()), "/index.qmd)")`.*
> The data may not contain the answer. And, if you torture the data long enough, it will tell you anything. ---*John W. Tukey*
# Pre-lecture materials
### Read ahead
::: callout-note
## Read ahead
**Before class, you can prepare by reading the following materials:**
1. <https://r4ds.had.co.nz/data-visualisation>
2. Paul Murrell (2011). *R Graphics*, CRC Press.
3. Hadley Wickham (2009). *ggplot2*, Springer.
4. Deepayan Sarkar (2008). *Lattice: Multivariate Data Visualization with R*, Springer.
:::
I also highlighted these two free online books:
1. *ggplot2: Elegant Graphics for Data Analysis (3e)* <https://ggplot2-book.org/>
2. *R Graphics Cookbook, 2nd edition* <https://r-graphics.org/>
### Acknowledgements
Material for this lecture was borrowed and adopted from
- <https://rdpeng.github.io/Biostat776/lecture-plotting-systems>
# Learning objectives
::: callout-note
# Learning objectives
**At the end of this lesson you will:**
- Be able to identify and describe the three plotting systems in R
:::
# Plotting Systems
There are **three different plotting systems in R** and they each have different characteristics and modes of operation.
::: callout-tip
### Important
The three systems are
1. The base plotting system
2. The lattice system
3. The ggplot2 system
**This course will focus primarily on the ggplot2 plotting system**. The other two systems are presented for context.
:::
## The Base Plotting System
The **base plotting system** is the original plotting system for R. The basic model is sometimes **referred to as the "artist's palette" model**.
The idea is you start with blank canvas and build up from there.
In more R-specific terms, you **typically start with `plot()` function** (or similar plot creating function) to *initiate* a plot and then *annotate* the plot with various annotation functions (`text`, `lines`, `points`, `axis`)
The base plotting system is **often the most convenient plotting system** to use because it mirrors how we sometimes think of building plots and analyzing data.
If we do not have a completely well-formed idea of how we want to look at some data, often we will start by "throwing some data on the page" and then slowly add more information to it as our thought process evolves.
::: callout-tip
### Example
We might look at a simple scatterplot and then decide to add a linear regression line or a smoother to it to highlight the trends.
```{r}
#| fig-width: 5
#| fig-height: 5
#| fig-cap: "Scatterplot with loess curve"
data(airquality)
with(airquality, {
plot(Temp, Ozone)
lines(loess.smooth(Temp, Ozone))
})
```
:::
In the code above:
- The `plot()` function creates the initial plot and draws the points (circles) on the canvas.
- The `lines` function is used to annotate or add to the plot (in this case it adds a loess smoother to the scatterplot).
Next, we use the `plot()` function to draw the points on the scatterplot and then use the `main` argument to add a main title to the plot.
```{r}
#| fig-width: 5
#| fig-height: 5
#| fig-cap: "Scatterplot with loess curve"
data(airquality)
with(airquality, {
plot(Temp, Ozone, main = "my plot")
lines(loess.smooth(Temp, Ozone))
})
```
::: callout-tip
### Note
One downside with constructing base plots is that you **cannot go backwards once the plot has started**.
It is possible that you could start down the road of constructing a plot and realize later (when it is too late) that you do not have enough room to add a y-axis label or something like that
:::
If you have specific plot in mind, there is then a need to **plan in advance** to make sure, for example, that you have set your margins to be the right size to fit all of the annotations that you may want to include.
While the base plotting system is nice in that it gives you the flexibility to specify these kinds of details to painstaking accuracy, **sometimes it would be nice if the system could just figure it out for you**.
::: callout-tip
### Note
Another downside of the base plotting system is that it is **difficult to describe or translate a plot to others because there is no clear graphical language or grammar** that can be used to communicate what you have done.
The only real way to describe what you have done in a base plot is to just list the series of commands/functions that you have executed, which is not a particularly compact way of communicating things.
This is one problem that the `ggplot2` package attempts to address.
:::
::: callout-tip
### Example
Another typical base plot is constructed with the following code.
```{r}
#| fig-width: 5
#| fig-height: 5
#| fig-cap: "Base plot with title"
data(cars)
## Create the plot / draw canvas
with(cars, plot(speed, dist))
## Add annotation
title("Speed vs. Stopping distance")
```
:::
We will go into more detail on what these functions do in later lessons.
## The Lattice System
The **lattice plotting system** is implemented in the `lattice` R package which comes with every installation of R (although it is not loaded by default).
To **use the lattice plotting functions**, you must first load the `lattice` package with the `library` function.
```{r}
library(lattice)
```
With the lattice system, **plots are created with a single function call**, such as `xyplot()` or `bwplot()`.
There is **no real distinction between functions that create or initiate plots** and **functions that annotate plots** because it all happens at once.
Lattice plots tend to be **most useful for conditioning types of plots**, i.e. looking at how `y` changes with `x` across levels of `z`.
- e.g. these types of plots are useful for looking at multi-dimensional data and often allow you to squeeze a lot of information into a single window or page.
Another aspect of lattice that makes it different from base plotting is that **things like margins and spacing are set automatically**.
This is possible because entire plot is specified at once via a single function call, so all of the available information needed to figure out the spacing and margins is already there.
::: callout-tip
### Example
Here is a lattice plot that looks at the relationship between life expectancy and income and how that relationship varies by region in the United States.
```{r}
#| fig-width: 8
#| fig-height: 4
#| fig-cap: "Lattice plot"
state <- data.frame(state.x77, region = state.region)
xyplot(Life.Exp ~ Income | region, data = state, layout = c(4, 1))
```
:::
You can see that the entire plot was generated by the call to `xyplot()` and all of the data for the plot were stored in the `state` data frame.
The **plot itself contains four panels**---one for each region---and **within each panel is a scatterplot** of life expectancy and income.
The notion of *panels* comes up a lot with lattice plots because you typically have many panels in a lattice plot (each panel typically represents a *condition*, like "region").
::: callout-tip
### Note
Downsides with the lattice system
- It can sometimes be very **awkward to specify an entire plot** in a single function call (you end up with functions with many many arguments).
- **Annotation in panels in plots is not especially intuitive** and can be difficult to explain. In particular, the use of custom panel functions and subscripts can be difficult to wield and requires intense preparation.
- Once a plot is created, **you cannot "add" to the plot** (but of course you can just make it again with modifications).
:::
## The ggplot2 System
The **ggplot2 plotting system** attempts to split the difference between base and lattice in a number of ways.
::: callout-tip
### Note
Taking cues from lattice, the ggplot2 system automatically deals with spacings, text, titles but also allows you to annotate by "adding" to a plot.
:::
The ggplot2 system is implemented in the `ggplot2` package (part of the `tidyverse` package), which is available from CRAN (it does not come with R).
You can install it from CRAN via
```{r}
#| eval: false
install.packages("ggplot2")
```
and then load it into R via the `library()` function.
```{r}
library(ggplot2)
```
Superficially, the `ggplot2` functions are similar to `lattice`, but the system is generally easier and more intuitive to use.
The defaults used in `ggplot2` make many choices for you, but you can still customize plots to your heart's desire.
::: callout-tip
### Example
A typical plot with the `ggplot2` package looks as follows.
```{r}
#| message: false
#| fig-width: 6
#| fig-height: 5
#| fig-cap: "ggplot2 plot"
library(tidyverse)
data(mpg)
mpg %>%
ggplot(aes(displ, hwy)) +
geom_point()
```
:::
There are additional functions in `ggplot2` that allow you to make arbitrarily sophisticated plots.
We will discuss more about this in the next lecture.
# R session information
```{r}
options(width = 120)
sessioninfo::session_info()
```