hadley · hadley · Jul 12, 2019 · Jul 8, 2019 · Jul 12, 2019 · Jul 12, 2019
diff --git a/_bookdown.yml b/_bookdown.yml
@@ -7,26 +7,26 @@ rmd_files:
 - "preface-3e.Rmd"
 - "preface-2e.Rmd"
 
-- "introduction.rmd"
-- "getting-started.rmd"
+- "introduction.Rmd"
+- "getting-started.Rmd"
 - "faq.Rmd"
 
-- "toolbox.rmd"
+- "toolbox.Rmd"
 - "individual-geoms.Rmd"
 - "collective-geoms.Rmd"
 - "statistical-summaries.Rmd"
 - "space-time.Rmd"
 - "annotations.Rmd"
 - "arranging-plots.Rmd"
 
-- "mastery.rmd"
-- "layers.rmd"
-- "scales.rmd"
+- "mastery.Rmd"
+- "layers.Rmd"
+- "scales.Rmd"
 - "coord.Rmd"
 - "facet.Rmd"
-- "themes.rmd"
+- "themes.Rmd"
 
 - "extending.Rmd"
-- "programming.rmd"
+- "programming.Rmd"
 
-- "references.rmd"
+- "references.Rmd"
diff --git a/diagrams/diamond-dimensions.png b/diagrams/diamond-dimensions.png
diff --git a/diagrams/mastery-schema.png b/diagrams/mastery-schema.png
diff --git a/diagrams/position-facets.png b/diagrams/position-facets.png
diff --git a/diagrams/scale-guides.png b/diagrams/scale-guides.png
diff --git a/diagrams/vector-raster.png b/diagrams/vector-raster.png
diff --git a/facet.Rmd b/facet.Rmd
@@ -15,14 +15,11 @@ There are three types of facetting:
 * `facet_grid()`: produces a 2d grid of panels defined by variables which 
   form the rows and columns.
 
-The differences between `facet_wrap()` and `facet_grid()` are illustrated in Figure \ref{fig:facet-sketch}.
-
-\begin{figure}[htbp]
-  \centering
-    \includegraphics[width=0.75\linewidth]{diagrams/position-facets}
-  \caption{A sketch illustrating the difference between the two facetting systems. \texttt{facet\_grid()} (left) is fundamentally 2d, being made up of two independent components. \texttt{facet\_wrap()} (right) is 1d, but wrapped into 2d to save space.}
-  \label{fig:facet-sketch}
-\end{figure}
+The differences between `facet_wrap()` and `facet_grid()` are illustrated in Figure \@ref(fig:facet-sketch).
+
+```{r facet-sketch, echo = FALSE, out.width = "75%", fig.cap="A sketch illustrating the difference between the two facetting systems. `facet_grid()` (left) is fundamentally 2d, being made up of two independent components. `facet_wrap()` (right) is 1d, but wrapped into 2d to save space."}
+knitr::include_graphics("diagrams/position-facets.png", dpi = 300, auto_pdf = TRUE)
+```
 
 Faceted plots have the capability to fill up a lot of space, so for this chapter we will use a subset of the mpg dataset that has a manageable number of levels: three cylinders (4, 6, 8), two types of drive train (4 and f), and six classes. 
 

diff --git a/getting-started.rmd → getting-started.Rmd b/getting-started.rmd → getting-started.Rmd
@@ -488,7 +488,7 @@ ggplot(mpg, aes(drv, hwy)) +
   ylim(NA, 30)
 ```
 
-Changing the axes limits sets values outside the range to `NA`. You can suppress the associated warning with `na.rm = TRUE`.
+Changing the axes limits sets values outside the range to `NA`. You can suppress the associated warning with `na.rm = TRUE`, but be careful. If your plot calculates summary statistics (e.g., sample mean), this conversion to `NA` occurs *before* the summary statistics are computed, and may lead to undesirable results in some situations.
 
 ## Output {#output}
 

diff --git a/introduction.rmd → introduction.Rmd b/introduction.rmd → introduction.Rmd
diff --git a/layers.rmd → layers.Rmd b/layers.rmd → layers.Rmd
@@ -199,13 +199,13 @@ ggplot(mpg) +
   geom_point(aes(displ, hwy, colour = class))
 ```
 
-Within each layer, you can add, override, or remove mappings:
+Within each layer, you can add, override, or remove mappings. For example, if you have a plot using the `mpg` data that has `aes(displ, hwy)` as the starting point, the table below illustrates all three operations:
 
-|Operation |Layer aesthetics    |Result                       |
-|:---------|:-------------------|:----------------------------|
-|Add       |`aes(colour = cyl)` |`aes(mpg, wt, colour = cyl)` |
-|Override  |`aes(y = disp)`     |`aes(mpg, disp)`             |
-|Remove    |`aes(y = NULL)`     |`aes(mpg)`                   |
+|Operation |Layer aesthetics    |Result                          |
+|:---------|:-------------------|:-------------------------------|
+|Add       |`aes(colour = cyl)` |`aes(displ, hwy, colour = cyl)` |
+|Override  |`aes(y = cty)`      |`aes(displ, cty)`               |
+|Remove    |`aes(y = NULL)`     |`aes(displ)`                    |
 
 If you only have one layer in the plot, the way you specify aesthetics doesn't make any difference. However, the distinction is important when you start adding additional layers. These two plots are both valid and interesting, but focus on quite different aspects of the data:
 
@@ -450,7 +450,7 @@ ggplot(mpg, aes(trans, cty)) +
 
 I think it's best to use the second form because it makes it more clear that you're displaying a summary, not the raw data.
 
-### Generated variables
+### Generated variables {#generated-variables}
 
 Internally, a stat takes a data frame as input and returns a data frame as output, and so a stat can add new variables to the original dataset.  It is possible to map aesthetics to these new variables.  For example, `stat_bin`, the statistic used to make histograms, produces the following variables: \index{Stats!creating new variables} \indexf{stat\_bin}
 

diff --git a/mastery.rmd → mastery.Rmd b/mastery.rmd → mastery.Rmd
@@ -80,14 +80,11 @@ The values in the previous table have no meaning to the computer. We need to con
 
 In this example, we have three aesthetics that need to be scaled: horizontal position (`x`), vertical position (`y`) and `colour`. Scaling position is easy in this example because we are using the default linear scales. We need only a linear mapping from the range of the data to $[0, 1]$. We use $[0, 1]$ instead of exact pixels because the drawing system that ggplot2 uses, **grid**, takes care of that final conversion for us. A final step determines how the two positions (x and y) are combined to form the final location on the plot. This is done by the coordinate system, or **coord**. In most cases this will be Cartesian coordinates, but it might be polar coordinates, or a spherical projection used for a map.
 
-The process for mapping the colour is a little more complicated, as we have a non-numeric result: colours. However, colours can be thought of as having three components, corresponding to the three types of colour-detecting cells in the human eye. These three cell types give rise to a three-dimensional colour space. Scaling then involves mapping the data values to points in this space. There are many ways to do this, but here since `cyl` is a categorical variable we map values to evenly spaced hues on the colour wheel, as shown in Figure \ref{fig:colour-wheel}. A different mapping is used when the variable is continuous. \index{Colour!wheel}
+The process for mapping the colour is a little more complicated, as we have a non-numeric result: colours. However, colours can be thought of as having three components, corresponding to the three types of colour-detecting cells in the human eye. These three cell types give rise to a three-dimensional colour space. Scaling then involves mapping the data values to points in this space. There are many ways to do this, but here since `cyl` is a categorical variable we map values to evenly spaced hues on the colour wheel, as shown in Figure \@ref(fig:colour-wheel). A different mapping is used when the variable is continuous. \index{Colour!wheel}
 
-\begin{figure}[htbp]
-  \centering
-    \includegraphics[width=2in]{diagrams/colour-wheel}
-  \caption{A colour wheel illustrating the choice of five equally spaced colours. This is the default scale for discrete variables.}
-  \label{fig:colour-wheel}
-\end{figure}
+```{r colour-wheel, echo = FALSE, out.width = "50%", fig.cap="A colour wheel illustrating the choice of five equally spaced colours. This is the default scale for discrete variables."}
+knitr::include_graphics("diagrams/colour-wheel.png", dpi = 300)
+```
 
 The result of these conversions is below. As well as aesthetics that have been mapped to variable, we also include aesthetics that are constant. We need these so that the aesthetics for each point are completely specified and R can draw the plot. The points will be filled circles (shape 19 in R) with a 1-mm diameter:
 
@@ -146,14 +143,13 @@ As well as adding an additional step to summarise the data, we also need some ex
   local operation: the variables in each dataset are mapped to their aesthetic 
   values, producing a new dataset that can then be rendered by the geoms.
 
-Figure \ref{fig:schematic} illustrates the complete process schematically.
+Figure \@ref(fig:schematic) illustrates the complete process schematically.
+
+
+```{r schematic, echo = FALSE, out.width = "75%", fig.cap="Schematic description of the plot generation process. Each square represents a layer, and this schematic represents a plot with three layers and three panels. All steps work by transforming individual data frames except for training scales, which doesn't affect the data frame and operates across all datasets simultaneously."}
+knitr::include_graphics("diagrams/mastery-schema.png", dpi = 300, auto_pdf = TRUE)
+```
 
-\begin{figure}[htbp]
-  \centering
-  \includegraphics[width=3.7in]{diagrams/mastery-schema}
-  \caption{Schematic description of the plot generation process. Each square represents a layer, and this schematic represents a plot with three layers and three panels. All steps work by transforming individual data frames except for training scales, which doesn't affect the data frame and operates across all datasets simultaneously.}
-  \label{fig:schematic}
-\end{figure}
 
 ## Components of the layered grammar {#components}
 

diff --git a/programming.rmd → programming.Rmd b/programming.rmd → programming.Rmd
diff --git a/references.rmd → references.Rmd b/references.rmd → references.Rmd
diff --git a/scales.rmd → scales.Rmd b/scales.rmd → scales.Rmd
@@ -98,14 +98,12 @@ You've probably already figured out the naming scheme for scales, but to be conc
 
 The component of a scale that you're most likely to want to modify is the __guide__, the axis or legend associated with the scale. Guides allow you to read observations from the plot and map them back to their original values. In ggplot2, guides are produced automatically based on the layers in your plot. This is very different to base R graphics, where you are responsible for drawing the legends by hand. In ggplot2, you don't directly control the legend; instead you set up the data so that there's a clear mapping between data and aesthetics, and a legend is generated for you automatically. This can be frustrating when you first start using ggplot2, but once you get the hang of it, you'll find that it saves you time, and there is little you cannot do. If you're struggling to get the legend you want, it's likely that your data is in the wrong form. 
 
-You might find it surprising that axes and legends are the same type of thing, but while they look very different there are many natural correspondences between the two, as shown in table below and in Figure \ref{fig:guides}. \index{Guides} \index{Legend} \index{Axis}
+You might find it surprising that axes and legends are the same type of thing, but while they look very different there are many natural correspondences between the two, as shown in table below and in Figure \@ref(fig:guides). \index{Guides} \index{Legend} \index{Axis}
+
+```{r guides, echo = FALSE, out.width = "100%", fig.cap="Axis and legend components."}
+knitr::include_graphics("diagrams/scale-guides.png", dpi = 300, auto_pdf = TRUE)
+```
 
-\begin{figure}[htbp]
-  \centering
-  \includegraphics[width=\linewidth]{diagrams/scale-guides.pdf}
-  \caption{Axis and legend components}
-  \label{fig:guides}
-\end{figure}
 
 | Axis              | Legend        | Argument name
 |-------------------|---------------|-----------------
@@ -230,7 +228,7 @@ ggplot(df, aes(x, y)) +
 mb <- as.numeric(1:10 %o% 10 ^ (0:4))
 ggplot(df, aes(x, y)) + 
   geom_point() + 
-  scale_x_log10(minor_breaks = log10(mb))
+  scale_x_log10(minor_breaks = mb)
 ```
 
 Note the use of `%o%` to quickly generate the multiplication table, and that the minor breaks must be supplied on the transformed scale. \index{Log!ticks}
@@ -692,14 +690,12 @@ At the physical level, colour is produced by a mixture of wavelengths of light.
 
 Hues are not perceived as being ordered: e.g. green does not seem "larger" than red. The perception of chroma and luminance are ordered.
 
-The combination of these three components does not produce a simple geometric shape. Figure \ref{fig:hcl} attempts to show the 3d shape of the space. Each slice is a constant luminance (brightness) with hue mapped to angle and chroma to radius.  You can see the centre of each slice is grey and the colours get more intense as they get closer to the edge.
+The combination of these three components does not produce a simple geometric shape. Figure \@ref(fig:hcl) attempts to show the 3d shape of the space. Each slice is a constant luminance (brightness) with hue mapped to angle and chroma to radius.  You can see the centre of each slice is grey and the colours get more intense as they get closer to the edge.
 
-\begin{figure}[htbp]
-  \centering
-    \includegraphics[width=\linewidth]{diagrams/hcl-space}
-  \caption{The shape of the HCL colour space.  Hue is mapped to angle, chroma to radius and each slice shows a different luminance.  The HCL space is a pretty odd shape, but you can see that colours near the centre of each slice are grey, and as you move towards the edges they become more intense.  Slices for luminance 0 and 100 are omitted because they would, respectively, be a single black point and a single white point.}
-  \label{fig:hcl}
-\end{figure}
+
+```{r hcl, echo = FALSE, out.width = "100%", fig.cap="The shape of the HCL colour space.  Hue is mapped to angle, chroma to radius and each slice shows a different luminance.  The HCL space is a pretty odd shape, but you can see that colours near the centre of each slice are grey, and as you move towards the edges they become more intense.  Slices for luminance 0 and 100 are omitted because they would, respectively, be a single black point and a single white point."}
+knitr::include_graphics("diagrams/hcl-space.png", dpi = 300)
+```
 
 An additional complication is that many people (~10% of men) do not possess the normal complement of colour receptors and so can distinguish fewer colours than usual. \index{Colour!blindness} In brief, it's best to avoid red-green contrasts, and to check your plots with systems that simulate colour blindness. Visicheck is one online solution. Another alternative is the **dichromat** package [@dichromat] which provides tools for simulating colour blindness, and a set of colour schemes known to work well for colour-blind people. You can also help people with colour blindness in the same way that you can help people with black-and-white printers: by providing redundant mappings to other aesthetics like size, line type or shape.
 
@@ -897,9 +893,9 @@ area + scale_fill_brewer(palette = "Pastel1")
 
 ### The manual discrete scale {#scale-manual}
 
-The discrete scales, `scale_linetype()`, `scale_shape()`, and `scale_size_discrete()` basically have no options. These scales are just a list of valid values that are mapped to the unique discrete values. \index{Shape} \index{Line type} \index{Size} \indexf{scale\_shape\_manual} \indexf{scale\_colour\_manual} \indexf{scale\_linetype\_manual}
+The discrete scales --- for example `scale_linetype()`, `scale_shape()`, and `scale_colour_discrete()` --- basically have no options. These scales are just a list of valid values that are mapped to the unique discrete values. \index{Shape} \index{Line type} \index{Size} \indexf{scale\_shape\_manual} \indexf{scale\_colour\_manual} \indexf{scale\_linetype\_manual}
 
-If you want to customise these scales, you need to create your own new scale with the manual scale: `scale_shape_manual()`, `scale_linetype_manual()`, `scale_colour_manual()`. The manual scale has one important argument, `values`, where you specify the values that the scale should produce. If this vector is named, it will match the values of the output to the values of the input; otherwise it will match in order of the levels of the discrete variable. You will need some knowledge of the valid aesthetic values, which are described in `vignette("ggplot2-specs")`. 
+If you want to customise these scales, you need to create your own new scale with the "manual" version of each: `scale_linetype_manual()`, `scale_shape_manual()`, `scale_colour_manual()`, etc. The manual scale has one important argument, `values`, where you specify the values that the scale should produce. If this vector is named, it will match the values of the output to the values of the input; otherwise it will match in order of the levels of the discrete variable. You will need some knowledge of the valid aesthetic values, which are described in `vignette("ggplot2-specs")`. 
 
 The following code demonstrates the use of `scale_colour_manual()`: 
 

diff --git a/space-time.Rmd b/space-time.Rmd
@@ -7,16 +7,21 @@ columns(1, 2 / 3)
 
 ## Surface plots {#surface}
 
-ggplot2 does not support true 3d surfaces. However, it does support many common tools for representing 3d surfaces in 2d: contours, coloured tiles and bubble plots. These all work similarly, differing only in the aesthetic used for the third dimension. \index{Surface plots} \index{Contour plot} \indexf{geom\_contour} \index{3d}
+ggplot2 does not support true 3d surfaces. However, it does support many common tools for representing 3d surfaces in 2d: contours, coloured tiles and bubble plots. These all work similarly, differing only in the aesthetic used for the third dimension. Here is an example of a contour plot: \index{Surface plots} \index{Contour plot} \indexf{geom\_contour} \index{3d}
 
 ```{r}
 ggplot(faithfuld, aes(eruptions, waiting)) + 
   geom_contour(aes(z = density, colour = ..level..))
+```
+
+The reference to the `..level..` variable in this code may seem confusing, because there is no variable called `..level..` in the `faithfuld` data. In this context the `..` notation refers to a variable computed internally (see Section \@ref(generated-variables)). To display the same density as a heat map, you can use `geom_raster()`:
 
+```{r}
 ggplot(faithfuld, aes(eruptions, waiting)) + 
   geom_raster(aes(fill = density))
 ```
 
+
 ```{r}
 # Bubble plots work better with fewer observations
 small <- faithfuld[seq(1, nrow(faithfuld), by = 10), ]

diff --git a/statistical-summaries.Rmd b/statistical-summaries.Rmd
@@ -96,14 +96,11 @@ To demonstrate tools for large datasets, we'll use the built in `diamonds` datas
 diamonds
 ```
 
-The data contains the four C's of diamond quality: carat, cut, colour and clarity; and five physical measurements: depth, table, x, y and z, as described in Figure \ref{fig:diamond-dim}. \index{Data!diamonds@\texttt{diamonds}}
-
-\begin{figure}[htbp]
-  \centering
-    \includegraphics[width=0.8\linewidth]{diagrams/diamond-dimensions}
-  \caption{How the variables x, y, z, table and depth are measured.}
-  \label{fig:diamond-dim}
-\end{figure}
+The data contains the four C's of diamond quality: carat, cut, colour and clarity; and five physical measurements: depth, table, x, y and z, as described in Figure \@ref(fig:diamond-dim). \index{Data!diamonds@\texttt{diamonds}}
+
+```{r diamond-dim, echo = FALSE, out.width = "100%", fig.cap="How the variables x, y, z, table and depth are measured."}
+knitr::include_graphics("diagrams/diamond-dimensions.png", dpi = 300)
+```
 
 The dataset has not been well cleaned, so as well as demonstrating interesting facts about diamonds, it also shows some data quality problems. 
 

diff --git a/themes.rmd → themes.Rmd b/themes.rmd → themes.Rmd
@@ -471,14 +471,13 @@ When saving a plot to use in another program, you have two basic choices of  out
 * Raster graphics are stored as an array of pixel colours and have a fixed 
   optimal viewing size. The most useful raster graphic format is png.
 
-Figure \ref{fig:vector-raster} illustrates the basic differences in these formats for a circle. A good description is available at <http://tinyurl.com/rstrvctr>. 
-
-\begin{figure}[htbp]
-  \centering
-    \includegraphics[width= 0.5\linewidth]{diagrams/vector-raster}
-  \caption{The schematic difference between raster (left) and vector (right) graphics. }
-  \label{fig:vector-raster}
-\end{figure}
+Figure \@ref(fig:vector-raster) illustrates the basic differences in these formats for a circle. A good description is available at <http://tinyurl.com/rstrvctr>. 
+
+
+```{r vector-raster, echo = FALSE, out.width = "100%", fig.cap="The schematic difference between raster (left) and vector (right) graphics."}
+knitr::include_graphics("diagrams/vector-raster.png", dpi = 300, auto_pdf = TRUE)
+```
+
 
 Unless there is a compelling reason not to, use vector graphics: they look better in more places. There are two main reasons to use raster graphics:
 

diff --git a/toolbox.rmd → toolbox.Rmd b/toolbox.rmd → toolbox.Rmd