diff --git a/.travis.yml b/.travis.yml index 596b2b25..a0cca96d 100644 --- a/.travis.yml +++ b/.travis.yml @@ -3,10 +3,20 @@ cache: packages: true directories: ["book/tex", "_figures", "_cache"] +repos: + CRAN: https://cloud.r-project.org + ropensci: http://packages.ropensci.org + addons: apt: + sources: + - sourceline: 'ppa:ubuntugis/ppa' packages: - optipng + - libudunits2-dev + - libgdal-dev + - libgeos-dev + - libproj-dev before_script: - tlmgr install index diff --git a/DESCRIPTION b/DESCRIPTION index 998f0ba2..c6ce5331 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -8,6 +8,7 @@ URL: https://github.com/hadley/ggplot2-book Imports: babynames, broom, + conflicted, devtools, directlabels, dplyr, @@ -19,14 +20,18 @@ Imports: Lahman, lubridate, magrittr, + mapproj, maps, nlme, plyr, readr, rmarkdown, rvest, + sf, + sp, tidyr, USAboundaries, + USAboundariesData, wesanderson, xtable SystemRequirements: pandoc (>= 1.12.3) - http://johnmacfarlane.net/pandoc diff --git a/book/ggplot2-book.tex b/book/ggplot2-book.tex index 8a683c57..045b72de 100644 --- a/book/ggplot2-book.tex +++ b/book/ggplot2-book.tex @@ -29,6 +29,9 @@ \newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}} \newcommand{\ErrorTok} [1]{\textcolor[rgb]{1.00,0.00,0.00}{{#1}}} \newcommand{\NormalTok} [1]{{#1}} + +\newcommand{\OperatorTok} [1]{{#1}} +\newcommand{\ControlFlowTok} [1]{{#1}} % \usepackage{longtable} \usepackage{booktabs} diff --git a/book/render-tex.R b/book/render-tex.R index 2dcf03c3..1e89812e 100644 --- a/book/render-tex.R +++ b/book/render-tex.R @@ -1,14 +1,29 @@ library("methods") # avoids weird broom error library("rmarkdown") +tex_chapter <- function (chapter = NULL, latex_engine = c("xelatex", "pdflatex", + "lualatex"), code_width = 65) { + options(digits = 3) + set.seed(1014) + latex_engine <- match.arg(latex_engine) + rmarkdown::output_format(rmarkdown::knitr_options("html", chapter), + rmarkdown::pandoc_options(to = "latex", + from = "markdown_style", + ext = ".tex", + args = c("--top-level-division=chapter", + rmarkdown::pandoc_latex_engine_args(latex_engine)) + ), + clean_supporting = FALSE) +} + path <- commandArgs(trailingOnly = TRUE) # command line args should contain just one chapter name if (length(path) == 0) { message("No input supplied") } else { - base <- oldbookdown::tex_chapter() + base <- tex_chapter() base$knitr$opts_knit$width <- 67 base$pandoc$from <- "markdown" rmarkdown::render(path, base, output_dir = "book/tex", envir = globalenv(), quiet = TRUE) -} +} \ No newline at end of file diff --git a/book/tex/data-manip.tex b/book/tex/data-manip.tex deleted file mode 100644 index 15b40a05..00000000 --- a/book/tex/data-manip.tex +++ /dev/null @@ -1,949 +0,0 @@ -\chapter{Data transformation}\label{cha:dplyr} - -\section{Introduction}\label{introduction} - -Tidy data is important, but it's not the end of the road. Often you -won't have quite the right variables, or your data might need a little -aggregation before you visualise it. This chapter will show you how to -solve these problems (and more!) with the \textbf{dplyr} package. -\index{Data!manipulating} \index{dplyr} -\index{Grammar!of data manipulation} - -The goal of dplyr is to provide verbs (functions) that help you solve -the most common 95\% of data manipulation problems. dplyr is similar to -ggplot2, but instead of providing a grammar of graphics, it provides a -grammar of data manipulation. Like ggplot2, dplyr helps you not just by -giving you functions, but it also helps you think about data -manipulation. In particular, dplyr helps by constraining you: instead of -struggling to think about which of the thousands of functions that might -help, you can just pick from a handful that are design to be very likely -to be helpful. In this chapter you'll learn four of the most important -dplyr verbs: - -\begin{itemize} -\tightlist -\item - \texttt{filter()} -\item - \texttt{mutate()} -\item - \texttt{group\_by()} \& \texttt{summarise()} -\end{itemize} - -These verbs are easy to learn because they all work the same way: they -take a data frame as the first argument, and return a modified data -frame. The other arguments control the details of the transformation, -and are always interpreted in the context of the data frame so you can -refer to variables directly. I'll also explain each in the same way: -I'll show you a motivating example using the \texttt{diamonds} data, -give you more details about how the function works, and finish up with -some exercises for you to practice your skills with. - -You'll also learn how to create data transformation pipelines using -\texttt{\%\textgreater{}\%}. \texttt{\%\textgreater{}\%} plays a similar -role to \texttt{+} in ggplot2: it allows you to solve complex problems -by combining small pieces that are easily understood in isolation. - -This chapter only scratches the surface of dplyr's capabilities but it -should be enough to help you with visualisation problems. You can learn -more by using the resources discussed at the end of the chapter. - -\section{Filter observations}\label{filter-observations} - -It's common to only want to explore one part of a dataset. A great data -analysis strategy is to start with just one observation unit (one -person, one city, etc), and understand how it works before attempting to -generalise the conclusion to others. This is a great technique if you -ever feel overwhelmed by an analysis: zoom down to a small subset, -master it, and then zoom back out, to apply your conclusions to the full -dataset. \indexf{filter} - -Filtering is also useful for extracting outliers. Generally, you don't -want to just throw outliers away, as they're often highly revealing, but -it's useful to think about partitioning the data into the common and the -unusual. You summarise the common to look at the broad trends; you -examine the outliers individually to see if you can figure out what's -going on. - -For example, look at this plot that shows how the x and y dimensions of -the diamonds are related: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/diamonds-x-y-1} -\end{figure} - -There are around 50,000 points in this dataset: most of them lie along -the diagonal, but there are a handful of outliers. One clear set of -incorrect values are those diamonds with zero dimensions. We can use -\texttt{filter()} to pull them out: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{filter}\NormalTok{(diamonds, x ==}\StringTok{ }\DecValTok{0} \NormalTok{|}\StringTok{ }\NormalTok{y ==}\StringTok{ }\DecValTok{0}\NormalTok{)} -\CommentTok{#> Source: local data frame [8 x 10]} -\CommentTok{#> } -\CommentTok{#> carat cut color clarity depth table price x y} -\CommentTok{#> (dbl) (fctr) (fctr) (fctr) (dbl) (dbl) (int) (dbl) (dbl)} -\CommentTok{#> 1 1.07 Ideal F SI2 61.6 56 4954 0 6.62} -\CommentTok{#> 2 1.00 Very Good H VS2 63.3 53 5139 0 0.00} -\CommentTok{#> 3 1.14 Fair G VS1 57.5 67 6381 0 0.00} -\CommentTok{#> 4 1.56 Ideal G VS2 62.2 54 12800 0 0.00} -\CommentTok{#> 5 1.20 Premium D VVS1 62.1 59 15686 0 0.00} -\CommentTok{#> 6 2.25 Premium H SI2 62.8 59 18034 0 0.00} -\CommentTok{#> .. ... ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: z (dbl)} -\end{Highlighting} -\end{Shaded} - -This is equivalent to the base R code -\texttt{diamonds{[}diamonds\$x\ ==\ 0\ \textbar{}\ diamonds\$y\ ==\ 0,\ {]}}, -but is more concise because \texttt{filter()} knows to look for the bare -\texttt{x} in the data frame. - -(If you've used \texttt{subset()} before, you'll notice that it has very -similar behaviour. The biggest difference is that \texttt{subset()} can -select both observations and variables, where in dplyr, -\texttt{filter()} works exclusively with observations and -\texttt{select()} with variables. There are some other subtle -differences, but the main advantage to using \texttt{filter()} is that -it behaves identically to the other dplyr verbs and it tends to be a bit -faster than \texttt{subset()}.) - -In a real analysis, you'd look at the outliers in more detail to see if -you can find the root cause of the data quality problem. In this case, -we're just going to throw them out and focus on what remains. To save -some typing, we may provide multiple arguments to \texttt{filter()} -which combines them. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{diamonds_ok <-}\StringTok{ }\KeywordTok{filter}\NormalTok{(diamonds, x >}\StringTok{ }\DecValTok{0}\NormalTok{, y >}\StringTok{ }\DecValTok{0}\NormalTok{, y <}\StringTok{ }\DecValTok{20}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(diamonds_ok, }\KeywordTok{aes}\NormalTok{(x, y)) +} -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_abline}\NormalTok{(}\DataTypeTok{slope =} \DecValTok{1}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{, }\DataTypeTok{alpha =} \FloatTok{0.5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/diamonds-ok-1} -\end{figure} - -This plot is now more informative - we can see a very strong -relationship between \texttt{x} and \texttt{y}. I've added the reference -line to make it clear that for most diamonds, \texttt{x} and \texttt{y} -are very similar. However, this plot still has problems: - -\begin{itemize} -\item - The plot is mostly empty, because most of the data lies along the - diagonal. -\item - There are some clear bivariate outliers, but it's hard to select them - with a simple filter. -\end{itemize} - -We'll solve both of these problem in the next section by adding a new -variable that's a transformation of x and y. But before we continue on -to that, let's talk more about the details of \texttt{filter()}. - -\subsection{Useful tools}\label{useful-tools} - -The first argument to \texttt{filter()} is a data frame. The second and -subsequent arguments must be logical vectors: \texttt{filter()} selects -every row where all the logical expressions are \texttt{TRUE}. The -logical vectors must always be the same length as the data frame: if -not, you'll get an error. Typically you create the logical vector with -the comparison operators: - -\begin{itemize} -\tightlist -\item - \texttt{x\ ==\ y}: x and y are equal. -\item - \texttt{x\ !=\ y}: x and y are not equal. -\item - \texttt{x\ \%in\%\ c("a",\ "b",\ "c")}: \texttt{x} is one of the - values in the right hand side. -\item - \texttt{x\ \textgreater{}\ y}, \texttt{x\ \textgreater{}=\ y}, - \texttt{x\ \textless{}\ y}, \texttt{x\ \textless{}=\ y}: greater than, - greater than or equal to, less than, less than or equal to. -\end{itemize} - -And combine them with logical operators: - -\begin{itemize} -\tightlist -\item - \texttt{!x} (pronounced ``not x''), flips \texttt{TRUE} and - \texttt{FALSE} so it keeps all the values where \texttt{x} is - \texttt{FALSE}. -\item - \texttt{x\ \&\ y}: \texttt{TRUE} if both \texttt{x} and \texttt{y} are - \texttt{TRUE}. -\item - \texttt{x\ \textbar{}\ y}: \texttt{TRUE} if either \texttt{x} or - \texttt{y} (or both) are \texttt{TRUE}. -\item - \texttt{xor(x,\ y)}: \texttt{TRUE} if either \texttt{x} or \texttt{y} - are \texttt{TRUE}, but not both (exclusive or). -\end{itemize} - -Most real queries involve some combination of both: - -\begin{itemize} -\tightlist -\item - Price less than \$500: \texttt{price\ \textless{}\ 500} -\item - Size between 1 and 2 carats: - \texttt{carat\ \textgreater{}=\ 1\ \&\ carat\ \textless{}\ 2} -\item - Cut is ideal or premium: - \texttt{cut\ ==\ "Premium"\ \textbar{}\ cut\ ==\ "Ideal"}, or - \texttt{cut\ \%in\%\ c("Premium",\ "Ideal")} (note that R is case - sensitive) -\item - Worst colour, cut and clarity: - \texttt{cut\ ==\ "Fair"\ \&\ color\ ==\ "J"\ \&\ clarity\ ==\ "SI2"} -\end{itemize} - -You can also use functions in the filtering expression: - -\begin{itemize} -\tightlist -\item - Size is between 1 and 2 carats: \texttt{floor(carat)\ ==\ 1} -\item - An average dimension greater than 3: - \texttt{(x\ +\ y\ +\ z)\ /\ 3\ \textgreater{}\ 3} -\end{itemize} - -This is useful for simple expressions, but as things get more -complicated it's better to create a new variable first so you can check -that you've done the computation correctly before doing the subsetting. -You'll learn how to do that in the next section. - -The rules for \texttt{NA} are a bit trickier, so I'll explain them next. - -\subsection{Missing values}\label{missing-values} - -\texttt{NA}, R's missing value indicator, can be frustrating to work -with. R's underlying philosophy is to force you to recognise that you -have missing values, and make a deliberate choice to deal with them: -missing values never silently go missing. This is a pain because you -almost always want to just get rid of them, but it's a good principle to -force you to think about the correct option. \indexc{NA} -\index{Missing values} - -The most important thing to understand about missing values is that they -are infectious: with few exceptions, the result of any operation that -includes a missing value will be a missing value. This happens because -\texttt{NA} represents an unknown value, and there are few operations -that turn an unknown value into a known value. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{x <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\OtherTok{NA}\NormalTok{, }\DecValTok{2}\NormalTok{)} -\NormalTok{x ==}\StringTok{ }\DecValTok{1} -\CommentTok{#> [1] TRUE NA FALSE} -\NormalTok{x >}\StringTok{ }\DecValTok{2} -\CommentTok{#> [1] FALSE NA FALSE} -\NormalTok{x +}\StringTok{ }\DecValTok{10} -\CommentTok{#> [1] 11 NA 12} -\end{Highlighting} -\end{Shaded} - -When you first learn R, you might be tempted to find missing values -using \texttt{==}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{x ==}\StringTok{ }\OtherTok{NA} -\CommentTok{#> [1] NA NA NA} -\NormalTok{x !=}\StringTok{ }\OtherTok{NA} -\CommentTok{#> [1] NA NA NA} -\end{Highlighting} -\end{Shaded} - -But that doesn't work! A little thought reveals why: there's no reason -why two unknown values should be the same. Instead, use -\texttt{is.na(X)} to determine if a value is missing: \indexf{is.na} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{is.na}\NormalTok{(x)} -\CommentTok{#> [1] FALSE TRUE FALSE} -\end{Highlighting} -\end{Shaded} - -\texttt{filter()} only includes observations where all arguments are -\texttt{TRUE}, so \texttt{NA} values are automatically dropped. If you -want to include missing values, be explicit: -\texttt{x\ \textgreater{}\ 10\ \textbar{}\ is.na(x)}. In other parts of -R, you'll sometimes need to convert missing values into \texttt{FALSE}. -You can do that with \texttt{x\ \textgreater{}\ 10\ \&\ !is.na(x)} - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Practice your filtering skills by: - - \begin{itemize} - \tightlist - \item - Finding all the diamonds with equal x and y dimensions. - \item - A depth between 55 and 70. - \item - A carat smaller than the median carat. - \item - Cost more than \$10,000 per carat - \item - Are of good or better quality - \end{itemize} -\item - Fill in the question marks in this table: - - \begin{longtable}[c]{@{}llll@{}} - \toprule - Expression & \texttt{TRUE} & \texttt{FALSE} & - \texttt{NA}\tabularnewline - \midrule - \endhead - \texttt{x} & x &\tabularnewline - ? & & x\tabularnewline - \texttt{is.na(x)} & & & x\tabularnewline - \texttt{!is.na(x)} & ? & ? & ?\tabularnewline - ? & x & & x\tabularnewline - ? & & x & x\tabularnewline - \bottomrule - \end{longtable} -\item - Repeat the analysis of outlying values with \texttt{z}. Compared to - \texttt{x} and \texttt{y}, how would you characterise the relationship - of \texttt{x} and \texttt{z}, or \texttt{y} and \texttt{z}? -\item - Install the \textbf{ggplot2movies} package and look at the movies that - have a missing budget. How are they different from the movies with a - budget? (Hint: try a frequency polygon plus - \texttt{colour\ =\ is.na(budget)}.) -\item - What is \texttt{NA\ \&\ FALSE} and \texttt{NA\ \textbar{}\ TRUE}? Why? - Why doesn't \texttt{NA\ *\ 0} equal zero? What number times zero does - not equal 0? What do you expect \texttt{NA\ \^{}\ 0} to equal? Why? -\end{enumerate} - -\section{Create new variables}\label{mutate} - -To better explore the relationship between \texttt{x} and \texttt{y}, -it's useful to ``rotate'' the plot so that the data is flat, not -diagonal. We can do that by creating two new variables: one that -represents the difference between \texttt{x} and \texttt{y} (which in -this context represents the symmetry of the diamond) and one that -represents its size (the length of the diagonal). \indexf{mutate} -\index{Data!creating new variables} - -To create new variables use \texttt{mutate()}. Like \texttt{filter()} it -takes a data frame as its first argument and returns a data frame. Its -second and subsequent arguments are named expressions that generate new -variables. Like \texttt{filter()} you can refer to variables just by -their name, you don't need to also include the name of the dataset. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{diamonds_ok2 <-}\StringTok{ }\KeywordTok{mutate}\NormalTok{(diamonds_ok,} - \DataTypeTok{sym =} \NormalTok{x -}\StringTok{ }\NormalTok{y,} - \DataTypeTok{size =} \KeywordTok{sqrt}\NormalTok{(x ^}\StringTok{ }\DecValTok{2} \NormalTok{+}\StringTok{ }\NormalTok{y ^}\StringTok{ }\DecValTok{2}\NormalTok{)} -\NormalTok{)} -\NormalTok{diamonds_ok2} -\CommentTok{#> Source: local data frame [53,930 x 12]} -\CommentTok{#> } -\CommentTok{#> carat cut color clarity depth table price x y} -\CommentTok{#> (dbl) (fctr) (fctr) (fctr) (dbl) (dbl) (int) (dbl) (dbl)} -\CommentTok{#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98} -\CommentTok{#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84} -\CommentTok{#> 3 0.23 Good E VS1 56.9 65 327 4.05 4.07} -\CommentTok{#> 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23} -\CommentTok{#> 5 0.31 Good J SI2 63.3 58 335 4.34 4.35} -\CommentTok{#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96} -\CommentTok{#> .. ... ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: z (dbl), sym (dbl), size (dbl)} - -\KeywordTok{ggplot}\NormalTok{(diamonds_ok2, }\KeywordTok{aes}\NormalTok{(size, sym)) +}\StringTok{ } -\StringTok{ }\KeywordTok{stat_bin2d}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/mutate1-1} -\end{figure} - -This plot has two advantages: we can more easily see the pattern -followed by most diamonds, and we can easily select outliers. Here, it -doesn't seem important whether the outliers are positive (i.e. -\texttt{x} is bigger than \texttt{y}) or negative (i.e. \texttt{y} is -bigger \texttt{x}). So we can use the absolute value of the symmetry -variable to pull out the outliers. Based on the plot, and a little -experimentation, I came up with a threshold of 0.20. We'll check out the -results with a histogram. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds_ok2, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{abs}\NormalTok{(sym))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.10}\NormalTok{)} - -\NormalTok{diamonds_ok3 <-}\StringTok{ }\KeywordTok{filter}\NormalTok{(diamonds_ok2, }\KeywordTok{abs}\NormalTok{(sym) <}\StringTok{ }\FloatTok{0.20}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(diamonds_ok3, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{abs}\NormalTok{(sym))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.01}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/data-manip/sym-hist-1}% - \includegraphics[width=0.5\linewidth]{_figures/data-manip/sym-hist-2} -\end{figure} - -That's an interesting histogram! While most diamonds are close to being -symmetric there are very few that are perfectly symmetric (i.e. -\texttt{x\ ==\ y}). - -\subsection{Useful tools}\label{useful-tools-1} - -Typically, transformations will be suggested by your domain knowledge. -However, there are a few transformations that are useful in a -surprisingly wide range of circumstances. - -\begin{itemize} -\item - Log-transformations are often useful. They turn multiplicative - relationships into additive relationships; they compress data that - varies over orders of magnitude; they convert power relationships to - linear relationship. See examples at - \url{http://stats.stackexchange.com/questions/27951} -\item - Relative difference: If you're interested in the relative difference - between two variables, use \texttt{log(x\ /\ y)}. It's better than - \texttt{x\ /\ y} because it's symmetric: if x \textless{} y, - \texttt{x\ /\ y} takes values {[}0, 1), but if x \textgreater{} y, - \texttt{x\ /\ y} takes values (1, Inf). See Törnqvist, Vartia, and - Vartia (1985) for more details. \indexf{log} -\item - Sometimes integrating or differentiating might make the data more - interpretable: if you have distance and time, would speed or - acceleration be more useful? (or vice versa). (Note that integration - makes data more smooth; differentiation makes it less smooth.) -\item - Partition a number into magnitude and direction with \texttt{abs(x)} - and \texttt{sign(x)}. -\end{itemize} - -There are also a few useful ways to transform pairs of variables: - -\begin{itemize} -\item - Partitioning into overall size and difference is often useful, as seen - above. -\item - If you see a strong trend, use a model to partition it into pattern - and residuals is often useful. You'll learn more about that in the - next chapter. -\item - Sometimes it's useful to change positions to polar coordinates (or - vice versa): distance (\texttt{sqrt(x\^{}2\ +\ y\^{}2)}) and angle - (\texttt{atan2(y,\ x)}). -\end{itemize} - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Practice your variable creation skills by creating the following new - variables: - - \begin{itemize} - \tightlist - \item - The approximate volume of the diamond (using x, y, and z). - \item - The approximate density of the diamond. - \item - The price per carat. - \item - Log transformation of carat and price. - \end{itemize} -\item - How can you improve the data density of - \texttt{ggplot(diamonds,\ aes(x,\ z))\ +\ stat\_bin2d()}. What - transformation makes it easier to extract outliers? -\item - The depth variable is just the width of the diamond (average of - \texttt{x} and \texttt{y}) divided by its height (\texttt{z}) - multiplied by 100 and round to the nearest integer. Compute the depth - yourself and compare it to the existing depth variable. Summarise your - findings with a plot. -\item - Compare the distribution of symmetry for diamonds with \(x > y\) vs. - \(x < y\). -\end{enumerate} - -\section{Group-wise summaries}\label{sec:summarise} - -Many insightful visualisations require that you reduce the full dataset -down to a meaningful summary. ggplot2 provides a number of geoms that -will do summaries for you. But it's often useful to do summaries by -hand: that gives you more flexibility and you can use the summaries for -other purposes. \indexf{group\_by} \indexf{summarise} - -dplyr does summaries in two steps: - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\tightlist -\item - Define the grouping variables with \texttt{group\_by()}. -\item - Describe how to summarise each group with a single row with - \texttt{summarise()} -\end{enumerate} - -For example, to look at the average price per clarity, we first group by -clarity, then summarise: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{by_clarity <-}\StringTok{ }\KeywordTok{group_by}\NormalTok{(diamonds, clarity)} -\NormalTok{sum_clarity <-}\StringTok{ }\KeywordTok{summarise}\NormalTok{(by_clarity, }\DataTypeTok{price =} \KeywordTok{mean}\NormalTok{(price))} -\NormalTok{sum_clarity} -\CommentTok{#> Source: local data frame [8 x 2]} -\CommentTok{#> } -\CommentTok{#> clarity price} -\CommentTok{#> (fctr) (dbl)} -\CommentTok{#> 1 I1 3924} -\CommentTok{#> 2 SI2 5063} -\CommentTok{#> 3 SI1 3996} -\CommentTok{#> 4 VS2 3925} -\CommentTok{#> 5 VS1 3839} -\CommentTok{#> 6 VVS2 3284} -\CommentTok{#> .. ... ...} - -\KeywordTok{ggplot}\NormalTok{(sum_clarity, }\KeywordTok{aes}\NormalTok{(clarity, price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \DecValTok{1}\NormalTok{), }\DataTypeTok{colour =} \StringTok{"grey80"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/price-by-clarity-1} -\end{figure} - -You might be surprised by this pattern: why do diamonds with better -clarity have lower prices? We'll see why this is the case and what to do -about it in \protect\hyperlink{sub:trend}{removing trend}. - -Supply additional variables to \texttt{group\_by()} to create groups -based on more than one variable. The next example shows how we can -compute (by hand) a frequency polygon that shows how cut and depth -interact. The special summary function \texttt{n()} counts the number of -observations in each group. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\KeywordTok{group_by}\NormalTok{(diamonds, cut, depth), }\DataTypeTok{n =} \KeywordTok{n}\NormalTok{())} -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{filter}\NormalTok{(cut_depth, depth >}\StringTok{ }\DecValTok{55}\NormalTok{, depth <}\StringTok{ }\DecValTok{70}\NormalTok{)} -\NormalTok{cut_depth} -\CommentTok{#> Source: local data frame [455 x 3]} -\CommentTok{#> Groups: cut [5]} -\CommentTok{#> } -\CommentTok{#> cut depth n} -\CommentTok{#> (fctr) (dbl) (int)} -\CommentTok{#> 1 Fair 55.1 3} -\CommentTok{#> 2 Fair 55.2 6} -\CommentTok{#> 3 Fair 55.3 5} -\CommentTok{#> 4 Fair 55.4 2} -\CommentTok{#> 5 Fair 55.5 3} -\CommentTok{#> 6 Fair 55.6 4} -\CommentTok{#> .. ... ... ...} - -\KeywordTok{ggplot}\NormalTok{(cut_depth, }\KeywordTok{aes}\NormalTok{(depth, n, }\DataTypeTok{colour =} \NormalTok{cut)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/freqpoly-by-hand-1} -\end{figure} - -We can use a grouped \texttt{mutate()} to convert counts to proportions, -so it's easier to compare across the cuts. \texttt{summarise()} strips -one level of grouping off, so \texttt{cut\_depth} will be grouped by -cut. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{mutate}\NormalTok{(cut_depth, }\DataTypeTok{prop =} \NormalTok{n /}\StringTok{ }\KeywordTok{sum}\NormalTok{(n))} -\KeywordTok{ggplot}\NormalTok{(cut_depth, }\KeywordTok{aes}\NormalTok{(depth, prop, }\DataTypeTok{colour =} \NormalTok{cut)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/freqpoly-scaled-1} -\end{figure} - -\subsection{Useful tools}\label{useful-tools-2} - -\texttt{summarise()} needs to be used with functions that take a vector -of \(n\) values and always return a single value. Those functions -include: - -\begin{itemize} -\tightlist -\item - Counts: \texttt{n()}, \texttt{n\_distinct(x)}. -\item - Middle: \texttt{mean(x)}, \texttt{median(x)}. -\item - Spread: \texttt{sd(x)}, \texttt{mad(x)}, \texttt{IQR(x)}. -\item - Extremes: \texttt{quartile(x)}, \texttt{min(x)}, \texttt{max(x)}. -\item - Positions: \texttt{first(x)}, \texttt{last(x)}, \texttt{nth(x,\ 2)}. -\end{itemize} - -Another extremely useful technique is to use \texttt{sum()} or -\texttt{mean()} with a logical vector. When a logical vector is treated -as numeric, \texttt{TRUE} becomes 1 and \texttt{FALSE} becomes 0. This -means that \texttt{sum()} tells you the number of \texttt{TRUE}s, and -\texttt{mean()} tells you the proportion of \texttt{TRUE}s. For example, -the following code counts the number of diamonds with carat greater than -or equal to 4, and the proportion of diamonds that cost less than -\$1000. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{summarise}\NormalTok{(diamonds, } - \DataTypeTok{n_big =} \KeywordTok{sum}\NormalTok{(carat >=}\StringTok{ }\DecValTok{4}\NormalTok{), } - \DataTypeTok{prop_cheap =} \KeywordTok{mean}\NormalTok{(price <}\StringTok{ }\DecValTok{1000}\NormalTok{)} -\NormalTok{)} -\CommentTok{#> Source: local data frame [1 x 2]} -\CommentTok{#> } -\CommentTok{#> n_big prop_cheap} -\CommentTok{#> (int) (dbl)} -\CommentTok{#> 1 6 0.269} -\end{Highlighting} -\end{Shaded} - -Most summary functions have a \texttt{na.rm} argument: -\texttt{na.rm\ =\ TRUE} tells the summary function to remove any missing -values prior to summiarisation. This is a convenient shortcut: rather -than removing the missing values then summarising, you can do it in one -step. - -\subsection{Statistical -considerations}\label{statistical-considerations} - -When summarising with the mean or median, it's always a good idea to -include a count and a measure of spread. This helps you calibrate your -assessments - if you don't include them you're likely to think that the -data is less variable than it really is, and potentially draw -unwarranted conclusions. - -The following example extends our previous summary of the average price -by clarity to also include the number of observations in each group, and -the upper and lower quartiles. It suggests the mean might be a bad -summary for this data - the distributions of price are so highly skewed -that the mean is higher than the upper quartile for some of the groups! - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{by_clarity <-}\StringTok{ }\NormalTok{diamonds %>%} -\StringTok{ }\KeywordTok{group_by}\NormalTok{(clarity) %>%} -\StringTok{ }\KeywordTok{summarise}\NormalTok{(} - \DataTypeTok{n =} \KeywordTok{n}\NormalTok{(), } - \DataTypeTok{mean =} \KeywordTok{mean}\NormalTok{(price), } - \DataTypeTok{lq =} \KeywordTok{quantile}\NormalTok{(price, }\FloatTok{0.25}\NormalTok{), } - \DataTypeTok{uq =} \KeywordTok{quantile}\NormalTok{(price, }\FloatTok{0.75}\NormalTok{)} - \NormalTok{)} -\NormalTok{by_clarity} -\CommentTok{#> Source: local data frame [8 x 5]} -\CommentTok{#> } -\CommentTok{#> clarity n mean lq uq} -\CommentTok{#> (fctr) (int) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 I1 741 3924 2080 5161} -\CommentTok{#> 2 SI2 9194 5063 2264 5777} -\CommentTok{#> 3 SI1 13065 3996 1089 5250} -\CommentTok{#> 4 VS2 12258 3925 900 6024} -\CommentTok{#> 5 VS1 8171 3839 876 6023} -\CommentTok{#> 6 VVS2 5066 3284 794 3638} -\CommentTok{#> .. ... ... ... ... ...} -\KeywordTok{ggplot}\NormalTok{(by_clarity, }\KeywordTok{aes}\NormalTok{(clarity, mean)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_linerange}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{ymin =} \NormalTok{lq, }\DataTypeTok{ymax =} \NormalTok{uq)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \DecValTok{1}\NormalTok{), }\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{n))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/unnamed-chunk-2-1} -\end{figure} - -Another example of this comes from baseball. Let's take the MLB batting -data from the Lahman package and calculate the batting average: the -number of hits divided by the number of at bats. Who's the best batter -according to this metric? - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{data}\NormalTok{(Batting, }\DataTypeTok{package =} \StringTok{"Lahman"}\NormalTok{)} -\NormalTok{batters <-}\StringTok{ }\KeywordTok{filter}\NormalTok{(Batting, AB >}\StringTok{ }\DecValTok{0}\NormalTok{)} -\NormalTok{per_player <-}\StringTok{ }\KeywordTok{group_by}\NormalTok{(batters, playerID)} -\NormalTok{ba <-}\StringTok{ }\KeywordTok{summarise}\NormalTok{(per_player, } - \DataTypeTok{ba =} \KeywordTok{sum}\NormalTok{(H, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) /}\StringTok{ }\KeywordTok{sum}\NormalTok{(AB, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)} -\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(ba, }\KeywordTok{aes}\NormalTok{(ba)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.01}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/unnamed-chunk-3-1} -\end{figure} - -Wow, there are a lot of players who can hit the ball every single time! -Would you want them on your fantasy baseball team? Let's double check -they're really that good by calibrating also showing the total number of -at bats: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{ba <-}\StringTok{ }\KeywordTok{summarise}\NormalTok{(per_player, } - \DataTypeTok{ba =} \KeywordTok{sum}\NormalTok{(H, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) /}\StringTok{ }\KeywordTok{sum}\NormalTok{(AB, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{),} - \DataTypeTok{ab =} \KeywordTok{sum}\NormalTok{(AB, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)} -\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(ba, }\KeywordTok{aes}\NormalTok{(ab, ba)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{100}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/unnamed-chunk-4-1} -\end{figure} - -The highest batting averages occur for the players with the smallest -number of at bats - it's not hard to hit the ball every time if you've -only had two pitches. We can make the pattern a little more clear by -getting rid of the players with less than 10 at bats. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(}\KeywordTok{filter}\NormalTok{(ba, ab >=}\StringTok{ }\DecValTok{10}\NormalTok{), }\KeywordTok{aes}\NormalTok{(ab, ba)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/data-manip/unnamed-chunk-5-1} -\end{figure} - -You'll often see a similar pattern whenever you plot number of -observations vs.~an average. Be aware! - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - For each year in the \texttt{ggplot2movies::movies} data determine the - percent of movies with missing budgets. Visualise the result. -\item - How does the average length of a movie change over time? Display your - answer with a plot, including some display of uncertainty. -\item - For each combination of diamond quality (e.g.~cut, colour and - clarity), count the number of diamonds, the average price and the - average size. Visualise the results. -\item - Compute a histogram of carat by ``hand'' using a binwidth of 0.1. - Display the results with \texttt{geom\_bar(stat\ =\ "identity")}. - (Hint: you might need to create a new variable first). -\item - In the baseball example, the batting average seems to increase as the - number of at bats increases. Why? -\end{enumerate} - -\section{Transformation pipelines}\label{transformation-pipelines} - -Most real analyses require you to string together multiple -\texttt{mutate()}s, \texttt{filter()}s, \texttt{group\_by()}s , and -\texttt{summarise()}s. For example, above, we created a frequency -polygon by hand with a combination of all four verbs: \indexc{\%>\%} - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# By using intermediate values} -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{group_by}\NormalTok{(diamonds, cut, depth)} -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{summarise}\NormalTok{(cut_depth, }\DataTypeTok{n =} \KeywordTok{n}\NormalTok{())} -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{filter}\NormalTok{(cut_depth, depth >}\StringTok{ }\DecValTok{55}\NormalTok{, depth <}\StringTok{ }\DecValTok{70}\NormalTok{)} -\NormalTok{cut_depth <-}\StringTok{ }\KeywordTok{mutate}\NormalTok{(cut_depth, }\DataTypeTok{prop =} \NormalTok{n /}\StringTok{ }\KeywordTok{sum}\NormalTok{(n))} -\end{Highlighting} -\end{Shaded} - -This sequence of operations is a bit painful because we repeated the -name of the data frame many times. An alternative is just to do it with -one sequence of function calls: - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# By "composing" functions} -\KeywordTok{mutate}\NormalTok{(} - \KeywordTok{filter}\NormalTok{(} - \KeywordTok{summarise}\NormalTok{(} - \KeywordTok{group_by}\NormalTok{(} - \NormalTok{diamonds, } - \NormalTok{cut, } - \NormalTok{depth} - \NormalTok{), } - \DataTypeTok{n =} \KeywordTok{n}\NormalTok{()} - \NormalTok{), } - \NormalTok{depth >}\StringTok{ }\DecValTok{55}\NormalTok{, } - \NormalTok{depth <}\StringTok{ }\DecValTok{70} - \NormalTok{), } - \DataTypeTok{prop =} \NormalTok{n /}\StringTok{ }\KeywordTok{sum}\NormalTok{(n)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -But this is also hard to read because the sequence of operations is -inside out, and the arguments to each function can be quite far apart. -dplyr provides an alternative approach with the \textbf{pipe}, -\texttt{\%\textgreater{}\%}. With the pipe, we can write the above -sequence of operations as: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{cut_depth <-}\StringTok{ }\NormalTok{diamonds %>%}\StringTok{ } -\StringTok{ }\KeywordTok{group_by}\NormalTok{(cut, depth) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{n =} \KeywordTok{n}\NormalTok{()) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{filter}\NormalTok{(depth >}\StringTok{ }\DecValTok{55}\NormalTok{, depth <}\StringTok{ }\DecValTok{70}\NormalTok{) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{prop =} \NormalTok{n /}\StringTok{ }\KeywordTok{sum}\NormalTok{(n))} -\end{Highlighting} -\end{Shaded} - -This makes it easier to understand what's going on as we can read it -almost like an English sentence: first group, then summarise, then -filter, then mutate. In fact, the best way to pronounce -\texttt{\%\textgreater{}\%} when reading a sequence of code is as -``then''. \texttt{\%\textgreater{}\%} comes from the magrittr package, -by Stefan Milton Bache. It provides a number of other tools that dplyr -doesn't expose by default, so I highly recommend that you check out the -\href{https://github.com/smbache/magrittr}{magrittr website}. -\index{magrittr} - -\texttt{\%\textgreater{}\%} works by taking the thing on the left hand -side (LHS) and supplying it as the first argument to the function on the -right hand side (RHS). Each of these pairs of calls is equivalent: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{f}\NormalTok{(x, y)} -\CommentTok{# is the same as} -\NormalTok{x %>%}\StringTok{ }\KeywordTok{f}\NormalTok{(y)} - -\KeywordTok{g}\NormalTok{(}\KeywordTok{f}\NormalTok{(x, y), z)} -\CommentTok{# is the same as} -\NormalTok{x %>%}\StringTok{ }\KeywordTok{f}\NormalTok{(y) %>%}\StringTok{ }\KeywordTok{g}\NormalTok{(z)} -\end{Highlighting} -\end{Shaded} - -\subsection{Exercises}\label{exercises-3} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Translate each of the examples in this chapter to use the pipe. -\item - What does the following pipe do? - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(magrittr)} -\NormalTok{x <-}\StringTok{ }\KeywordTok{runif}\NormalTok{(}\DecValTok{100}\NormalTok{)} -\NormalTok{x %>%} -\StringTok{ }\KeywordTok{subtract}\NormalTok{(}\KeywordTok{mean}\NormalTok{(.)) %>%} -\StringTok{ }\KeywordTok{raise_to_power}\NormalTok{(}\DecValTok{2}\NormalTok{) %>%} -\StringTok{ }\KeywordTok{mean}\NormalTok{() %>%} -\StringTok{ }\KeywordTok{sqrt}\NormalTok{()} -\end{Highlighting} -\end{Shaded} -\item - Which player in the \texttt{Batting} dataset has had the most - consistently good performance over the course of their career? -\end{enumerate} - -\section{Learning more}\label{learning-more} - -dplyr provides a number of other verbs that are less useful for -visualisation, but important to know about in general: - -\begin{itemize} -\item - \texttt{arrange()} orders observations according to variable(s). This - is most useful when you're looking at the data from the console. It - can also be useful for visualisations if you want to control which - points are plotted on top. -\item - \texttt{select()} picks variables based on their names. Useful when - you have many variables and want to focus on just a few for analysis. -\item - \texttt{rename()} allows you to change the name of variables. -\item - Grouped mutates and filters are also useful, but more advanced. See - \texttt{vignette("window-functions",\ package\ =\ "dplyr")} for more - details. -\item - There are a number of verbs designed to work with two tables of data - at a time. These include SQL joins (like the base \texttt{merge()} - function) and set operations. Learn more about them in - \texttt{vignette("two-table",\ package\ =\ "dplyr")}. -\item - dplyr can work directly with data stored in a database - you use the - same R code as you do for local data and dplyr generates SQL to send - to the database. See - \texttt{vignette("databases",\ package\ =\ "dplyr")} for the details. -\end{itemize} - -Finally, RStudio provides a handy dplyr cheatsheet that will help jog -your memory when you're wondering which function to use. Get it from -\url{http://rstudio.com/cheatsheets}. - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-tornqvist:1985}{} -Törnqvist, Leo, Pentti Vartia, and Yrjö O Vartia. 1985. ``How Should -Relative Changes Be Measured?'' \emph{The American Statistician} 39 (1): -43--46. diff --git a/book/tex/ggplot.tex b/book/tex/ggplot.tex deleted file mode 100644 index 063a6a8d..00000000 --- a/book/tex/ggplot.tex +++ /dev/null @@ -1,1012 +0,0 @@ -\chapter{Getting started with ggplot2}\label{cha:getting-started} - -\section{Introduction}\label{introduction} - -The goal of this chapter is to teach you how to produce useful graphics -with ggplot2 as quickly as possible. You'll learn the basics of -\texttt{ggplot()} along with some useful ``recipes'' to make the most -important plots. \texttt{ggplot()} allows you to make complex plots with -just a few lines of code because it's based on a rich underlying theory, -the grammar of graphics. Here we'll skip the theory and focus on the -practice, and in later chapters you'll learn how to use the full -expressive power of the grammar. - -In this chapter you'll learn: - -\begin{itemize} -\item - About the \texttt{mpg} dataset included with ggplot2, - \protect\hyperlink{sec:fuel-economy-data}{mpg}. -\item - The three key components of every plot: data, aesthetics and geoms, - \protect\hyperlink{sec:basic-use}{key components}. -\item - How to add additional variables to a plot with aesthetics, - \protect\hyperlink{aesthetics}{aesthetics}. -\item - How to display additional categorical variables in a plot using small - multiples created by facetting, - \protect\hyperlink{sec:qplot-facetting}{facetting}. -\item - A variety of different geoms that you can use to create different - types of plots, \protect\hyperlink{sec:plot-geoms}{geoms}. -\item - How to modify the axes, \protect\hyperlink{sec:axes}{axes}. -\item - Things you can do with a plot object other than display it, like save - it to disk, \protect\hyperlink{sec:output}{output}. -\item - \texttt{qplot()}, a handy shortcut for when you just want to quickly - bang out a simple plot without thinking about the grammar at all, - \protect\hyperlink{qplot}{qplot}. -\end{itemize} - -\hypertarget{sec:fuel-economy-data}{\section{Fuel economy -data}\label{sec:fuel-economy-data}} - -In this chapter, we'll mostly use one data set that's bundled with -ggplot2: \texttt{mpg}. It includes information about the fuel economy of -popular car models in 1999 and 2008, collected by the US Environmental -Protection Agency, \url{http://fueleconomy.gov}. You can access the data -by loading ggplot2: \index{Data!mpg@\texttt{mpg}} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(ggplot2)} -\NormalTok{mpg} -\CommentTok{#> Source: local data frame [234 x 11]} -\CommentTok{#> } -\CommentTok{#> manufacturer model displ year cyl trans drv cty} -\CommentTok{#> (chr) (chr) (dbl) (int) (int) (chr) (chr) (int)} -\CommentTok{#> 1 audi a4 1.8 1999 4 auto(l5) f 18} -\CommentTok{#> 2 audi a4 1.8 1999 4 manual(m5) f 21} -\CommentTok{#> 3 audi a4 2.0 2008 4 manual(m6) f 20} -\CommentTok{#> 4 audi a4 2.0 2008 4 auto(av) f 21} -\CommentTok{#> 5 audi a4 2.8 1999 6 auto(l5) f 16} -\CommentTok{#> 6 audi a4 2.8 1999 6 manual(m5) f 18} -\CommentTok{#> .. ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: hwy (int), fl (chr), class (chr)} -\end{Highlighting} -\end{Shaded} - -The variables are mostly self-explanatory: - -\begin{itemize} -\item - \texttt{cty} and \texttt{hwy} record miles per gallon (mpg) for city - and highway driving. -\item - \texttt{displ} is the engine displacement in litres. -\item - \texttt{drv} is the drivetrain: front wheel (f), rear wheel (r) or - four wheel (4). -\item - \texttt{model} is the model of car. There are 38 models, selected - because they had a new edition every year between 1999 and 2008. -\item - \texttt{class} (not shown), is a categorical variable describing the - ``type'' of car: two seater, SUV, compact, etc. -\end{itemize} - -This dataset suggests many interesting questions. How are engine size -and fuel economy related? Do certain manufacturers care more about fuel -economy than others? Has fuel economy improved in the last ten years? We -will try to answer some of these questions, and in the process learn how -to create some basic plots with ggplot2. - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - List five functions that you could use to get more information about - the \texttt{mpg} dataset. -\item - How can you find out what other datasets are included with ggplot2? -\item - Apart from the US, most countries use fuel consumption (fuel consumed - over fixed distance) rather than fuel economy (distance travelled with - fixed amount of fuel). How could you convert \texttt{cty} and - \texttt{hwy} into the European standard of l/100km? -\item - Which manufacturer has the most the models in this dataset? Which - model has the most variations? Does your answer change if you remove - the redundant specification of drive train (e.g. ``pathfinder 4wd'', - ``a4 quattro'') from the model name? -\end{enumerate} - -\hypertarget{sec:basic-use}{\section{Key -components}\label{sec:basic-use}} - -Every ggplot2 plot has three key components: - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - \textbf{data}, -\item - A set of \textbf{aesthetic mappings} between variables in the data and - visual properties, and -\item - At least one layer which describes how to render each observation. - Layers are usually created with a \textbf{geom} function. -\end{enumerate} - -Here's a simple example: \index{Scatterplot} \indexf{ggplot} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{displ, }\DataTypeTok{y =} \NormalTok{hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/ggplot/qscatter-1} -\end{figure} - -This produces a scatterplot defined by: - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\tightlist -\item - Data: \texttt{mpg}. -\item - Aesthetic mapping: engine size mapped to x position, fuel economy to y - position. -\item - Layer: points. -\end{enumerate} - -Pay attention to the structure of this function call: data and aesthetic -mappings are supplied in \texttt{ggplot()}, then layers are added on -with \texttt{+}. This is an important pattern, and as you learn more -about ggplot2 you'll construct increasingly sophisticated plots by -adding on more types of components. - -Almost every plot maps a variable to \texttt{x} and \texttt{y}, so -naming these aesthetics is tedious, so the first two unnamed arguments -to \texttt{aes()} will be mapped to \texttt{x} and \texttt{y}. This -means that the following code is identical to the example above: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -I'll stick to that style throughout the book, so don't forget that the -first two arguments to \texttt{aes()} are \texttt{x} and \texttt{y}. -Note that I've put each command on a new line. I recommend doing this in -your own code, so it's easy to scan a plot specification and see exactly -what's there. In this chapter, I'll sometimes use just one line per -plot, because it makes it easier to see the differences between plot -variations. - -The plot shows a strong correlation: as the engine size gets bigger, the -fuel economy gets worse. There are also some interesting outliers: some -cars with large engines get higher fuel economy than average. What sort -of cars do you think they are? - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - How would you describe the relationship between \texttt{cty} and - \texttt{hwy}? Do you have any concerns about drawing conclusions from - that plot? -\item - What does - \texttt{ggplot(mpg,\ aes(model,\ manufacturer))\ +\ geom\_point()} - show? Is it useful? How could you modify the data to make it more - informative? -\item - Describe the data, aesthetic mappings and layers used for each of the - following plots. You'll need to guess a little because you haven't - seen all the datasets and functions yet, but use your common sense! - See if you can predict what the plot will look like before running the - code. - - \begin{enumerate} - \def\labelenumii{\arabic{enumii}.} - \tightlist - \item - \texttt{ggplot(mpg,\ aes(cty,\ hwy))\ +\ geom\_point()} - \item - \texttt{ggplot(diamonds,\ aes(carat,\ price))\ +\ geom\_point()} - \item - \texttt{ggplot(economics,\ aes(date,\ unemploy))\ +\ geom\_line()} - \item - \texttt{ggplot(mpg,\ aes(cty))\ +\ geom\_histogram()} - \end{enumerate} -\end{enumerate} - -\hypertarget{aesthetics}{\section{Colour, size, shape and other -aesthetic attributes}\label{aesthetics}} - -To add additional variables to a plot, we can use other aesthetics like -colour, shape, and size (NB: while I use British spelling throughout -this book, ggplot2 also accepts American spellings). These work in the -same way as the \texttt{x} and \texttt{y} aesthetics, and are added into -the call to \texttt{aes()}: \index{Aesthetics} \indexf{aes} - -\begin{itemize} -\tightlist -\item - \texttt{aes(displ,\ hwy,\ colour\ =\ class)} -\item - \texttt{aes(displ,\ hwy,\ shape\ =\ drv)} -\item - \texttt{aes(displ,\ hwy,\ size\ =\ cyl)} -\end{itemize} - -ggplot2 takes care of the details of converting data (e.g., `f', `r', -`4') into aesthetics (e.g., `red', `yellow', `green') with a -\textbf{scale}. There is one scale for each aesthetic mapping in a plot. -The scale is also responsible for creating a guide, an axis or legend, -that allows you to read the plot, converting aesthetic values back into -data values. For now, we'll stick with the default scales provided by -ggplot2. You'll learn how to override them in -\protect\hyperlink{cha:scales}{the scales chapter}. - -To learn more about those outlying variables in the previous -scatterplot, we could map the class variable to colour: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, cty, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/ggplot/qplot-aesthetics-1} -\end{figure} - -This gives each point a unique colour corresponding to its class. The -legend allows us to read data values from the colour, showing us that -the group of cars with unusually high fuel economy for their engine size -are two seaters: cars with big engines, but lightweight bodies. - -If you want to set an aesthetic to a fixed value, without scaling it, do -so in the individual layer outside of \texttt{aes()}. Compare the -following two plots: \index{Aesthetics!setting} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"blue"}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"blue"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-4-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-4-2} -\end{figure} - -In the first plot, the value ``blue'' is scaled to a pinkish colour, and -a legend is added. In the second plot, the points are given the R colour -blue. This is an important technique and you'll learn more about it in -\protect\hyperlink{sub:setting-mapping}{setting vs.~mapping}. See -\texttt{vignette("ggplot2-specs")} for the values needed for colour and -other aesthetics. - -Different types of aesthetic attributes work better with different types -of variables. For example, colour and shape work well with categorical -variables, while size works well for continuous variables. The amount of -data also makes a difference: if there is a lot of data it can be hard -to distinguish different groups. An alternative solution is to use -facetting, as described next. - -When using aesthetics in a plot, less is usually more. It's difficult to -see the simultaneous relationships among colour and shape and size, so -exercise restraint when using aesthetics. Instead of trying to make one -very complex plot that shows everything at once, see if you can create a -series of simple plots that tell a story, leading the reader from -ignorance to knowledge. - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Experiment with the colour, shape and size aesthetics. What happens - when you map them to continuous values? What about categorical values? - What happens when you use more than one aesthetic in a plot? -\item - What happens if you map a continuous variable to shape? Why? What - happens if you map \texttt{trans} to shape? Why? -\item - How is drive train related to fuel economy? How is drive train related - to engine size and class? -\end{enumerate} - -\hypertarget{sec:qplot-facetting}{\section{Facetting}\label{sec:qplot-facetting}} - -Another technique for displaying additional categorical variables on a -plot is facetting. Facetting creates tables of graphics by splitting the -data into subsets and displaying the same graph for each subset. You'll -learn more about facetting in -\protect\hyperlink{sec:facetting}{Facetting}, but it's such a useful -technique that you need to know it right away. \index{Facetting} - -There are two types of facetting: grid and wrapped. Wrapped is the most -useful, so we'll discuss it here, and you can learn about grid facetting -later. To facet a plot you simply add a facetting specification with -\texttt{facet\_wrap()}, which takes the name of a variable preceded by -\texttt{\textasciitilde{}}. \indexf{facet\_wrap} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~class)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/ggplot/facet-1} -\end{figure} - -You might wonder when to use facetting and when to use aesthetics. -You'll learn more about the relative advantages and disadvantages of -each in \protect\hyperlink{sub:group-vs-facet}{grouping vs.~facetting}. - -\subsection{Exercises}\label{exercises-3} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - What happens if you try to facet by a continuous variable like - \texttt{hwy}? What about \texttt{cyl}? What's the key difference? -\item - Use facetting to explore the 3-way relationship between fuel economy, - engine size, and number of cylinders. How does facetting by number of - cylinders change your assessement of the relationship between engine - size and fuel economy? -\item - Read the documentation for \texttt{facet\_wrap()}. What arguments can - you use to control how many rows and columns appear in the output? -\item - What does the \texttt{scales} argument to \texttt{facet\_wrap()} do? - When might you use it? -\end{enumerate} - -\hypertarget{sec:plot-geoms}{\section{Plot geoms}\label{sec:plot-geoms}} - -You might guess that by substituting \texttt{geom\_point()} for a -different geom function, you'd get a different type of plot. That's a -great guess! In the following sections, you'll learn about some of the -other important geoms provided in ggplot2. This isn't an exhaustive -list, but should cover the most commonly used plot types. You'll learn -more in \protect\hyperlink{cha:toolbox}{the toolbox}. - -\begin{itemize} -\item - \texttt{geom\_smooth()} fits a smoother to the data and displays the - smooth and its standard error. -\item - \texttt{geom\_boxplot()} produces a box-and-whisker plot to summarise - the distribution of a set of points. -\item - \texttt{geom\_histogram()} and \texttt{geom\_freqpoly()} show the - distribution of continuous variables. -\item - \texttt{geom\_bar()} shows the distribution of categorical variables. -\item - \texttt{geom\_path()} and \texttt{geom\_line()} draw lines between the - data points. A line plot is constrained to produce lines that travel - from left to right, while paths can go in any direction. Lines are - typically used to explore how things change over time. -\end{itemize} - -\subsection{Adding a smoother to a plot}\label{sub:smooth} - -If you have a scatterplot with a lot of noise, it can be hard to see the -dominant pattern. In this case it's useful to add a smoothed line to the -plot with \texttt{geom\_smooth()}: \index{Smoothing} -\indexf{geom\_smooth} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/ggplot/qplot-smooth-1} -\end{figure} - -This overlays the scatterplot with a smooth curve, including an -assessment of uncertainty in the form of point-wise confidence intervals -shown in grey. If you're not interested in the confidence interval, turn -it off with \texttt{geom\_smooth(se\ =\ FALSE)}. - -An important argument to \texttt{geom\_smooth()} is the \texttt{method}, -which allows you to choose which type of model is used to fit the smooth -curve: - -\begin{itemize} -\item - \texttt{method\ =\ "loess"}, the default for small n, uses a smooth - local regression (as described in \texttt{?loess}). The wiggliness of - the line is controlled by the \texttt{span} parameter, which ranges - from 0 (exceedingly wiggly) to 1 (not so wiggly). - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{span =} \FloatTok{0.2}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{span =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/smooth-loess-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/smooth-loess-2} - \end{figure} - - Loess does not work well for large datasets (it's \(O(n^2)\) in - memory), so an alternative smoothing algorithm is used when \(n\) is - greater than 1,000. -\item - \texttt{method\ =\ "gam"} fits a generalised additive model provided - by the \textbf{mgcv} package. You need to first load mgcv, then use a - formula like \texttt{formula\ =\ y\ \textasciitilde{}\ s(x)} or - \texttt{y\ \textasciitilde{}\ s(x,\ bs\ =\ "cs")} (for large data). - This is what ggplot2 uses when there are more than 1,000 points. - \index{mgcv} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(mgcv)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"gam"}\NormalTok{, }\DataTypeTok{formula =} \NormalTok{y ~}\StringTok{ }\KeywordTok{s}\NormalTok{(x))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/smooth-gam-1} - \end{figure} -\item - \texttt{method\ =\ "lm"} fits a linear model, giving the line of best - fit. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/smooth-lm-1} - \end{figure} -\item - \texttt{method\ =\ "rlm"} works like \texttt{lm()}, but uses a robust - fitting algorithm so that outliers don't affect the fit as much. It's - part of the \textbf{MASS} package, so remember to load that first. - \index{MASS} -\end{itemize} - -\subsection{Boxplots and jittered points}\label{sub:boxplot} - -When a set of data includes a categorical variable and one or more -continuous variables, you will probably be interested to know how the -values of the continuous variables vary with the levels of the -categorical variable. Say we're interested in seeing how fuel economy -varies within car class. We might start with a scatterplot like this: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-5-1} -\end{figure} - -Because there are few unique values of both class and hwy, there is a -lot of overplotting. Many points are plotted in the same location, and -it's difficult to see the distribution. There are three useful -techniques that help alleviate the problem: - -\begin{itemize} -\item - Jittering, \texttt{geom\_jitter()}, adds a little random noise to the - data which can help avoid overplotting. \index{Jittering} - \indexf{geom\_jitter} -\item - Boxplots, \texttt{geom\_boxplot()}, summarise the shape of the - distribution with a handful of summary statistics. \index{Boxplot} - \indexf{geom\_boxplot} -\item - Violin plots, \texttt{geom\_violin()}, show a compact representation - of the ``density'' of the distribution, highlighting the areas where - more points are found. \index{Violin plot} \indexf{geom\_violin} -\end{itemize} - -These are illustrated below: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +}\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +}\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +}\StringTok{ }\KeywordTok{geom_violin}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/ggplot/jitter-boxplot-1}% - \includegraphics[width=0.333\linewidth]{_figures/ggplot/jitter-boxplot-2}% - \includegraphics[width=0.333\linewidth]{_figures/ggplot/jitter-boxplot-3} -\end{figure} - -Each method has its strengths and weaknesses. Boxplots summarise the -bulk of the distribution with only five numbers, while jittered plots -show every point but only work with relatively small datasets. Violin -plots give the richest display, but rely on the calculation of a density -estimate, which can be hard to interpret. - -For jittered points, \texttt{geom\_jitter()} offers the same control -over aesthetics as \texttt{geom\_point()}: \texttt{size}, -\texttt{colour}, and \texttt{shape}. For \texttt{geom\_boxplot()} and -\texttt{geom\_violin()}, you can control the outline \texttt{colour} or -the internal \texttt{fill} colour. - -\subsection{Histograms and frequency polygons}\label{sub:distribution} - -Histograms and frequency polygons show the distribution of a single -numeric variable. They provide more information about the distribution -of a single group than boxplots do, at the expense of needing more -space. \index{Histogram} \indexf{geom\_histogram} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(hwy)) +}\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{()} -\CommentTok{#> `stat_bin()` using `bins = 30`. Pick better value with} -\CommentTok{#> `binwidth`.} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(hwy)) +}\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{()} -\CommentTok{#> `stat_bin()` using `bins = 30`. Pick better value with} -\CommentTok{#> `binwidth`.} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/dist-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/dist-2} -\end{figure} - -Both histograms and frequency polygons work in the same way: they bin -the data, then count the number of observations in each bin. The only -difference is the display: histograms use bars and frequency polygons -use lines. - -You can control the width of the bins with the \texttt{binwidth} -argument (if you don't want evenly spaced bins you can use the -\texttt{breaks} argument). It is \textbf{very important} to experiment -with the bin width. The default just splits your data into 30 bins, -which is unlikely to be the best choice. You should always try many bin -widths, and you may find you need multiple bin widths to tell the full -story of your data. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{2.5}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-6-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-6-2} -\end{figure} - -An alternative to the frequency polygon is the density plot, -\texttt{geom\_density()}. I'm not a fan of density plots because they -are harder to interpret since the underlying computations are more -complex. They also make assumptions that are not true for all data, -namely that the underlying distribution is continuous, unbounded, and -smooth. - -To compare the distributions of different subgroups, you can map a -categorical variable to either fill (for \texttt{geom\_histogram()}) or -colour (for \texttt{geom\_freqpoly()}). It's easier to compare -distributions using the frequency polygon because the underlying -perceptual task is easier. You can also use facetting: this makes -comparisons a little harder, but it's easier to see the distribution of -each group. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, }\DataTypeTok{colour =} \NormalTok{drv)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.5}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, }\DataTypeTok{fill =} \NormalTok{drv)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.5}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~drv, }\DataTypeTok{ncol =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/dist-fill-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/dist-fill-2} -\end{figure} - -\subsection{Bar charts}\label{sub:bar} - -The discrete analogue of the histogram is the bar chart, -\texttt{geom\_bar()}. It's easy to use: \index{Barchart} -\indexf{geom\_bar} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(manufacturer)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/ggplot/dist-bar-1} -\end{figure} - -(You'll learn how to fix the labels in -\protect\hyperlink{sub:theme-axis}{axis labels}). - -Bar charts can be confusing because there are two rather different plots -that are both commonly called bar charts. The above form expects you to -have unsummarised data, and each observation contributes one unit to the -height of each bar. The other form of bar chart is used for -presummarised data. For example, you might have three drugs with their -average effect: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{drugs <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{drug =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{),} - \DataTypeTok{effect =} \KeywordTok{c}\NormalTok{(}\FloatTok{4.2}\NormalTok{, }\FloatTok{9.7}\NormalTok{, }\FloatTok{6.1}\NormalTok{)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -To display this sort of data, you need to tell \texttt{geom\_bar()} to -not run the default stat which bins and counts the data. However, I -think it's even better to use \texttt{geom\_point()} because points take -up less space than bars, and don't require that the y axis includes 0. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(drugs, }\KeywordTok{aes}\NormalTok{(drug, effect)) +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(drugs, }\KeywordTok{aes}\NormalTok{(drug, effect)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-8-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-8-2} -\end{figure} - -\subsection{Time series with line and path plots}\label{sub:line} - -Line and path plots are typically used for time series data. Line plots -join the points from left to right, while path plots join them in the -order that they appear in the dataset (in other words, a line plot is a -path plot of the data sorted by x value). Line plots usually have time -on the x-axis, showing how a single variable has changed over time. Path -plots show how two variables have simultaneously changed over time, with -time encoded in the way that observations are connected. - -Because the year variable in the \texttt{mpg} dataset only has two -values, we'll show some time series plots using the \texttt{economics} -dataset, which contains economic data on the US measured over the last -40 years. The figure below shows two plots of unemployment over time, -both produced using \texttt{geom\_line()}. The first shows the -unemployment rate while the second shows the median number of weeks -unemployed. We can already see some differences in these two variables, -particularly in the last peak, where the unemployment percentage is -lower than it was in the preceding peaks, but the length of unemployment -is high. \indexf{geom\_line} \indexf{geom\_path} -\index{Data!economics@\texttt{economics}} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(date, unemploy /}\StringTok{ }\NormalTok{pop)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(date, uempmed)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/line-employment-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/line-employment-2} -\end{figure} - -To examine this relationship in greater detail, we would like to draw -both time series on the same plot. We could draw a scatterplot of -unemployment rate vs.~length of unemployment, but then we could no -longer see the evolution over time. The solution is to join points -adjacent in time with line segments, forming a \emph{path} plot. - -Below we plot unemployment rate vs.~length of unemployment and join the -individual observations with a path. Because of the many line crossings, -the direction in which time flows isn't easy to see in the first plot. -In the second plot, we colour the points to make it easier to see the -direction of time. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(unemploy /}\StringTok{ }\NormalTok{pop, uempmed)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_path}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} - -\NormalTok{year <-}\StringTok{ }\NormalTok{function(x) }\KeywordTok{as.POSIXlt}\NormalTok{(x)$year +}\StringTok{ }\DecValTok{1900} -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(unemploy /}\StringTok{ }\NormalTok{pop, uempmed)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_path}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \KeywordTok{year}\NormalTok{(date)))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/path-employ-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/path-employ-2} -\end{figure} - -We can see that unemployment rate and length of unemployment are highly -correlated, but in recent years the length of unemployment has been -increasing relative to the unemployment rate. - -With longitudinal data, you often want to display multiple time series -on each plot, each series representing one individual. To do this you -need to map the \texttt{group} aesthetic to a variable encoding the -group membership of each observation. This is explained in more depth in -\protect\hyperlink{sec:grouping}{grouping}. -\index{Longitudinal data|see{Data, longitudinal}} -\index{Data!longitudinal} - -\subsection{Exercises}\label{exercises-4} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - What's the problem with the plot created by - \texttt{ggplot(mpg,\ aes(cty,\ hwy))\ +\ geom\_point()}? Which of the - geoms described above is most effective at remedying the problem? -\item - One challenge with - \texttt{ggplot(mpg,\ aes(class,\ hwy))\ +\ geom\_boxplot()} is that - the ordering of \texttt{class} is alphabetical, which is not terribly - useful. How could you change the factor levels to be more informative? - - Rather than reordering the factor by hand, you can do it automatically - based on the data: - \texttt{ggplot(mpg,\ aes(reorder(class,\ hwy),\ hwy))\ +\ geom\_boxplot()}. - What does \texttt{reorder()} do? Read the documentation. -\item - Explore the distribution of the carat variable in the - \texttt{diamonds} dataset. What binwidth reveals the most interesting - patterns? -\item - Explore the distribution of the price variable in the - \texttt{diamonds} data. How does the distribution vary by cut? -\item - You now know (at least) three ways to compare the distributions of - subgroups: \texttt{geom\_violin()}, \texttt{geom\_freqpoly()} and the - colour aesthetic, or \texttt{geom\_histogram()} and facetting. What - are the strengths and weaknesses of each approach? What other - approaches could you try? -\item - Read the documentation for \texttt{geom\_bar()}. What does the - \texttt{weight} aesthetic do? -\item - Using the techniques already discussed in this chapter, come up with - three ways to visualise a 2d categorical distribution. Try them out by - visualising the distribution of \texttt{model} and - \texttt{manufacturer}, \texttt{trans} and \texttt{class}, and - \texttt{cyl} and \texttt{trans}. -\end{enumerate} - -\hypertarget{sec:axes}{\section{Modifying the axes}\label{sec:axes}} - -You'll learn the full range of options available in -\protect\hyperlink{cha:scales}{scales}, but two families of useful -helpers let you make the most common modifications. \texttt{xlab()} and -\texttt{ylab()} modify the x- and y-axis labels: \indexf{xlab} -\indexf{ylab} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{3}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{3}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\StringTok{"city driving (mpg)"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\StringTok{"highway driving (mpg)"}\NormalTok{)} - -\CommentTok{# Remove the axis labels with NULL} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{3}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/ggplot/unnamed-chunk-9-1}% - \includegraphics[width=0.333\linewidth]{_figures/ggplot/unnamed-chunk-9-2}% - \includegraphics[width=0.333\linewidth]{_figures/ggplot/unnamed-chunk-9-3} -\end{figure} - -\texttt{xlim()} and \texttt{ylim()} modify the limits of axes: -\indexf{xlim} \indexf{ylim} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +} -\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.25}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +} -\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.25}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\StringTok{"f"}\NormalTok{, }\StringTok{"r"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylim}\NormalTok{(}\DecValTok{20}\NormalTok{, }\DecValTok{30}\NormalTok{)} -\CommentTok{#> Warning: Removed 138 rows containing missing values (geom_point).} - -\CommentTok{# For continuous scales, use NA to set only one limit} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, hwy)) +} -\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.25}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylim}\NormalTok{(}\OtherTok{NA}\NormalTok{, }\DecValTok{30}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/ggplot/unnamed-chunk-10-1}% - \includegraphics[width=0.333\linewidth]{_figures/ggplot/unnamed-chunk-10-2}% - \includegraphics[width=0.333\linewidth]{_figures/ggplot/unnamed-chunk-10-3} -\end{figure} - -Changing the axes limits sets values outside the range to \texttt{NA}. -You can suppress the associated warning with \texttt{na.rm\ =\ TRUE}. - -\hypertarget{sec:output}{\section{Output}\label{sec:output}} - -Most of the time you create a plot object and immediately plot it, but -you can also save a plot to a variable and manipulate it: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \KeywordTok{factor}\NormalTok{(cyl))) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -Once you have a plot object, there are a few things you can do with it: - -\begin{itemize} -\item - Render it on screen with \texttt{print()}. This happens automatically - when running interactively, but inside a loop or function, you'll need - to \texttt{print()} it yourself. \indexf{print} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{print}\NormalTok{(p)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/ggplot/unnamed-chunk-11-1} - \end{figure} -\item - Save it to disk with \texttt{ggsave()}, described in - \protect\hyperlink{sec:saving}{saving your output}. - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Save png to disk} -\KeywordTok{ggsave}\NormalTok{(}\StringTok{"plot.png"}\NormalTok{, }\DataTypeTok{width =} \DecValTok{5}\NormalTok{, }\DataTypeTok{height =} \DecValTok{5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} -\item - Briefly describe its structure with \texttt{summary()}. - \indexf{summary} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{summary}\NormalTok{(p)} -\CommentTok{#> data: manufacturer, model, displ, year, cyl, trans, drv,} -\CommentTok{#> cty, hwy, fl, class [234x11]} -\CommentTok{#> mapping: x = displ, y = hwy, colour = factor(cyl)} -\CommentTok{#> faceting: facet_null() } -\CommentTok{#> -----------------------------------} -\CommentTok{#> geom_point: na.rm = FALSE} -\CommentTok{#> stat_identity: na.rm = FALSE} -\CommentTok{#> position_identity} -\end{Highlighting} -\end{Shaded} -\item - Save a cached copy of it to disk, with \texttt{saveRDS()}. This saves - a complete copy of the plot object, so you can easily re-create it - with \texttt{readRDS()}. \indexf{saveRDS} \indexf{readRDS} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{saveRDS}\NormalTok{(p, }\StringTok{"plot.rds"}\NormalTok{)} -\NormalTok{q <-}\StringTok{ }\KeywordTok{readRDS}\NormalTok{(}\StringTok{"plot.rds"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} -\end{itemize} - -You'll learn more about how to manipulate these objects in -\protect\hyperlink{cha:programming}{programming with ggplot2}. - -\hypertarget{qplot}{\section{Quick plots}\label{qplot}} - -In some cases, you will want to create a quick plot with a minimum of -typing. In these cases you may prefer to use \texttt{qplot()} over -\texttt{ggplot()}. \texttt{qplot()} lets you define a plot in a single -call, picking a geom by default if you don't supply one. To use it, -provide a set of aesthetics and a data set: \indexf{qplot} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{qplot}\NormalTok{(displ, hwy, }\DataTypeTok{data =} \NormalTok{mpg)} -\KeywordTok{qplot}\NormalTok{(displ, }\DataTypeTok{data =} \NormalTok{mpg)} -\CommentTok{#> `stat_bin()` using `bins = 30`. Pick better value with} -\CommentTok{#> `binwidth`.} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-15-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-15-2} -\end{figure} - -Unless otherwise specified, \texttt{qplot()} tries to pick a sensible -geometry and statistic based on the arguments provided. For example, if -you give \texttt{qplot()} \texttt{x} and \texttt{y} variables, it'll -create a scatterplot. If you just give it an \texttt{x}, it'll create a -histogram or bar chart depending on the type of variable. - -\texttt{qplot()} assumes that all variables should be scaled by default. -If you want to set an aesthetic to a constant, you need to use -\texttt{I()}: \indexf{I} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{qplot}\NormalTok{(displ, hwy, }\DataTypeTok{data =} \NormalTok{mpg, }\DataTypeTok{colour =} \StringTok{"blue"}\NormalTok{)} -\KeywordTok{qplot}\NormalTok{(displ, hwy, }\DataTypeTok{data =} \NormalTok{mpg, }\DataTypeTok{colour =} \KeywordTok{I}\NormalTok{(}\StringTok{"blue"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-16-1}% - \includegraphics[width=0.5\linewidth]{_figures/ggplot/unnamed-chunk-16-2} -\end{figure} - -If you're used to \texttt{plot()} you may find \texttt{qplot()} to be a -useful crutch to get up and running quickly. However, while it's -possible to use \texttt{qplot()} to access all of the customizability of -ggplot2, I don't recommend it. If you find yourself making a more -complex graph, e.g.~using different aesthetics in different layers or -manually setting visual properties, use \texttt{ggplot()}, not -\texttt{qplot()}. diff --git a/book/tex/ggplot2-book.tex b/book/tex/ggplot2-book.tex deleted file mode 100644 index 8a683c57..00000000 --- a/book/tex/ggplot2-book.tex +++ /dev/null @@ -1,110 +0,0 @@ -\documentclass[graybox,envcountchap,sectrefs]{svmono} - -\usepackage[scaled=0.92,varqu]{inconsolata} - -\usepackage{float} -\usepackage{index} -% index functions separately -\newindex{code}{adx}{and}{R code index} -\newcommand{\indexf}[1]{\index[code]{#1@\texttt{#1()}}} -\newcommand{\indexc}[1]{\index[code]{#1@\texttt{#1}}} - -% Taken from pandoc x.md -o test.tex --standalone -\usepackage{color} -\usepackage{fancyvrb} -\newcommand{\VerbBar}{|} -\newcommand{\VERB}{\Verb[commandchars=\\\{\}]} -\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}} -\newenvironment{Shaded}{}{} -\newcommand{\KeywordTok} [1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}} -\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}} -\newcommand{\DecValTok} [1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}} -\newcommand{\BaseNTok} [1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}} -\newcommand{\FloatTok} [1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}} -\newcommand{\CharTok} [1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}} -\newcommand{\StringTok} [1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}} -\newcommand{\CommentTok} [1]{\textcolor[rgb]{0.38,0.63,0.69}{{#1}}} -\newcommand{\OtherTok} [1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}} -\newcommand{\AlertTok} [1]{\textcolor[rgb]{1.00,0.00,0.00}{{#1}}} -\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}} -\newcommand{\ErrorTok} [1]{\textcolor[rgb]{1.00,0.00,0.00}{{#1}}} -\newcommand{\NormalTok} [1]{{#1}} -% -\usepackage{longtable} -\usepackage{booktabs} -\usepackage{graphicx} -\DeclareGraphicsExtensions{.pdf,.png} -\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} - -\usepackage[hyphens]{url} -\usepackage{hyperref} - -% Place links in parens -\renewcommand{\href}[2]{#2 (\url{#1})} -% Use auto ref for internal links -\let\oldhyperlink=\hyperlink -\renewcommand{\hyperlink}[2]{\autoref{#1}} -\def\chapterautorefname{Chapter} -\def\sectionautorefname{Section} -\def\subsectionautorefname{Section} -\def\subsubsectionautorefname{Section} - -\setlength{\emergencystretch}{3em} % prevent overfull lines -\vbadness=10000 % suppress underfull \vbox -\hbadness=10000 % suppress underfull \vbox -\hfuzz=10pt - -\makeindex -\title{ggplot2} -\subtitle{Elegant Graphics for Data Analysis} -\author{Hadley Wickham} - -\begin{document} - -\frontmatter -\maketitle - -\begin{dedication} -To my parents, Alison \& Brian Wickham. Without them, and their unconditional -love and support, none of this would have been possible. -\end{dedication} - -\include{preface} - -\tableofcontents - -\mainmatter - -\part{Getting started} - -\include{introduction} -\include{ggplot} -\include{toolbox} - -\part{The Grammar} - -\include{mastery} -\include{layers} -\include{scales} -\include{position} -\include{themes} - -\part{Data analysis} - -\include{tidy-data} -\include{data-manip} -\include{modelling} -\include{programming} - -\backmatter - -\let\hyperlink=\oldhyperlink % Restore old hyperlink behaviour -\cleardoublepage -\markboth{Index}{Index} -\addcontentsline{toc}{chapter}{Index} -\printindex - -\addcontentsline{toc}{chapter}{Code index} -\printindex[code] - -\end{document} diff --git a/book/tex/introduction.tex b/book/tex/introduction.tex deleted file mode 100644 index 771af44a..00000000 --- a/book/tex/introduction.tex +++ /dev/null @@ -1,436 +0,0 @@ -\chapter{Introduction}\label{cha:introduction} - -\section{Welcome to ggplot2}\label{welcome-to-ggplot2} - -ggplot2 is an R package for producing statistical, or data, graphics, -but it is unlike most other graphics packages because it has a deep -underlying grammar. This grammar, based on the Grammar of Graphics -(Wilkinson 2005), is made up of a set of independent components that can -be composed in many different ways. This makes ggplot2 very powerful -because you are not limited to a set of pre-specified graphics, but you -can create new graphics that are precisely tailored for your problem. -This may sound overwhelming, but because there is a simple set of core -principles and very few special cases, ggplot2 is also easy to learn -(although it may take a little time to forget your preconceptions from -other graphics tools). - -Practically, ggplot2 provides beautiful, hassle-free plots that take -care of fiddly details like drawing legends. The plots can be built up -iteratively and edited later. A carefully chosen set of defaults means -that most of the time you can produce a publication-quality graphic in -seconds, but if you do have special formatting requirements, a -comprehensive theming system makes it easy to do what you want. Instead -of spending time making your graph look pretty, you can focus on -creating a graph that best reveals the messages in your data. - -ggplot2 is designed to work iteratively. You can start with a layer -showing the raw data then add layers of annotations and statistical -summaries. It allows you to produce graphics using the same structured -thinking that you use to design an analysis, reducing the distance -between a plot in your head and one on the page. It is especially -helpful for students who have not yet developed the structured approach -to analysis used by experts. - -Learning the grammar not only will help you create graphics that you -know about now, but will also help you to think about new graphics that -would be even better. Without the grammar, there is no underlying -theory, so most graphics packages are just a big collection of special -cases. For example, in base R, if you design a new graphic, it's -composed of raw plot elements like points and lines, and it's hard to -design new components that combine with existing plots. In ggplot2, the -expressions used to create a new graphic are composed of higher-level -elements like representations of the raw data and statistical -transformations, and can easily be combined with new datasets and other -plots. - -This book provides a hands-on introduction to ggplot2 with lots of -example code and graphics. It also explains the grammar on which ggplot2 -is based. Like other formal systems, ggplot2 is useful even when you -don't understand the underlying model. However, the more you learn about -it, the more effectively you'll be able to use ggplot2. This book -assumes some basic familiarity with R, to the level described in the -first chapter of Dalgaard's \emph{Introductory Statistics with R}. - -This book will introduce you to ggplot2 as a novice, unfamiliar with the -grammar; teach you the basics so that you can re-create plots you are -already familiar with; show you how to use the grammar to create new -types of graphics; and eventually turn you into an expert who can build -new components to extend the grammar. - -\section{What is the grammar of -graphics?}\label{what-is-the-grammar-of-graphics} - -Wilkinson (2005) created the grammar of graphics to describe the deep -features that underlie all statistical graphics. The grammar of graphics -is an answer to a question: what is a statistical graphic? The layered -grammar of graphics (Wickham 2009) builds on Wilkinson's grammar, -focussing on the primacy of layers and adapting it for embedding within -R. In brief, the grammar tells us that a statistical graphic is a -mapping from data to aesthetic attributes (colour, shape, size) of -geometric objects (points, lines, bars). The plot may also contain -statistical transformations of the data and is drawn on a specific -coordinate system. Facetting can be used to generate the same plot for -different subsets of the dataset. It is the combination of these -independent components that make up a graphic. - -As the book progresses, the formal grammar will be explained in -increasing detail. The first description of the components follows -below. It introduces some of the terminology that will be used -throughout the book and outlines the basic responsibilities of each -component. Don't worry if it doesn't all make sense right away: you will -have many more opportunities to learn about the pieces and how they fit -together. - -All plots are composed of: - -\begin{itemize} -\item - \textbf{Data} that you want to visualise and a set of aesthetic - \textbf{mapping}s describing how variables in the data are mapped to - aesthetic attributes that you can perceive. -\item - \textbf{Layers} made up of geometric elements and statistical - transformation. Geometric objects, \textbf{geom}s for short, represent - what you actually see on the plot: points, lines, polygons, etc. - Statistical transformations, \textbf{stat}s for short, summarise data - in many useful ways. For example, binning and counting observations to - create a histogram, or summarising a 2d relationship with a linear - model. -\item - The \textbf{scale}s map values in the data space to values in an - aesthetic space, whether it be colour, or size, or shape. Scales draw - a legend or axes, which provide an inverse mapping to make it possible - to read the original data values from the plot. -\item - A coordinate system, \textbf{coord} for short, describes how data - coordinates are mapped to the plane of the graphic. It also provides - axes and gridlines to make it possible to read the graph. We normally - use a Cartesian coordinate system, but a number of others are - available, including polar coordinates and map projections. -\item - A \textbf{facet}ing specification describes how to break up the data - into subsets and how to display those subsets as small multiples. This - is also known as conditioning or latticing/trellising. -\item - A \textbf{theme} which controls the finer points of display, like the - font size and background colour. While the defaults in ggplot2 have - been chosen with care, you may need to consult other references to - create an attractive plot. A good starting place is Tufte's early - works (Tufte 1990; Tufte 1997; Tufte 2001). -\end{itemize} - -It is also important to talk about what the grammar doesn't do: - -\begin{itemize} -\item - It doesn't suggest what graphics you should use to answer the - questions you are interested in. While this book endeavours to promote - a sensible process for producing plots of data, the focus of the book - is on how to produce the plots you want, not knowing what plots to - produce. For more advice on this topic, you may want to consult - Robbins (2013), Cleveland (1993), Chambers et al. (1983), and J. W. - Tukey (1977). -\item - It does not describe interactivity: the grammar of graphics describes - only static graphics and there is essentially no benefit to displaying - them on a computer screen as opposed to a piece of paper. ggplot2 can - only create static graphics, so for dynamic and interactive graphics - you will have to look elsewhere (perhaps at ggvis, described below). - Cook and Swayne (2007) provides an excellent introduction to the - interactive graphics package GGobi. GGobi can be connected to R with - the rggobi package (Wickham et al. 2008). -\end{itemize} - -\section{How does ggplot2 fit in with other R -graphics?}\label{how-does-ggplot2-fit-in-with-other-r-graphics} - -There are a number of other graphics systems available in R: base -graphics, grid graphics and trellis/lattice graphics. How does ggplot2 -differ from them? - -\begin{itemize} -\item - Base graphics were written by Ross Ihaka based on experience - implementing the S graphics driver and partly looking at Chambers et - al. (1983). Base graphics has a pen on paper model: you can only draw - on top of the plot, you cannot modify or delete existing content. - There is no (user accessible) representation of the graphics, apart - from their appearance on the screen. Base graphics includes both tools - for drawing primitives and entire plots. Base graphics functions are - generally fast, but have limited scope. If you've created a single - scatterplot, or histogram, or a set of boxplots in the past, you've - probably used base graphics. \index{Base graphics} -\item - The development of ``grid'' graphics, a much richer system of - graphical primitives, started in 2000. Grid is developed by Paul - Murrell, growing out of his PhD work (Murrell 1998). Grid grobs - (graphical objects) can be represented independently of the plot and - modified later. A system of viewports (each containing its own - coordinate system) makes it easier to lay out complex graphics. Grid - provides drawing primitives, but no tools for producing statistical - graphics. \index{grid} -\item - The lattice package, developed by Deepayan Sarkar, uses grid graphics - to implement the trellis graphics system of Cleveland (1993) and is a - considerable improvement over base graphics. You can easily produce - conditioned plots and some plotting details (e.g., legends) are taken - care of automatically. However, lattice graphics lacks a formal model, - which can make it hard to extend. Lattice graphics are explained in - depth in Sarkar (2008). \index{lattice} -\item - ggplot2, started in 2005, is an attempt to take the good things about - base and lattice graphics and improve on them with a strong underlying - model which supports the production of any kind of statistical - graphic, based on the principles outlined above. The solid underlying - model of ggplot2 makes it easy to describe a wide range of graphics - with a compact syntax, and independent components make extension easy. - Like lattice, ggplot2 uses grid to draw the graphics, which means you - can exercise much low-level control over the appearance of the plot. -\item - Work on ggvis, the successor to ggplot2, started in 2014. It takes the - foundational ideas of ggplot2 but extends them to the web and - interactive graphics. The syntax is similar, but it's been re-designed - from scratch to take advantage of what I've learned in the 10 years - since creating ggplot2. The most exciting thing about ggvis is that - it's interactive and dynamic, so plots automatically re-draw - themselves when the underlying data or plot specification changes. - However, ggvis is work in progress and currently can create only a - fraction of the plots in ggplot2 can. Stay tuned for updates! - \index{ggvis} -\item - htmlwidgets, \url{http://www.htmlwidgets.org}, provides a common - framework for accessing web visualisation tools from R. Packages built - on top of htmlwidgets include leaflet - (\url{https://rstudio.github.io/leaflet/}, maps), dygraph - (\url{http://rstudio.github.io/dygraphs/}, time series) and networkD3 - (\url{http://christophergandrud.github.io/networkD3/}, networks). - htmlwidgets is to ggvis what the many specialised graphic packages are - to ggplot2: it provides graphics honed for specific purposes. - \index{htmlwidgets} -\end{itemize} - -Many other R packages, such as vcd (Meyer, Zeileis, and Hornik 2006), -plotrix (Lemon et al. 2008) and gplots (Warnes 2007), implement -specialist graphics, but no others provide a framework for producing -statistical graphics. A comprehensive list of all graphical tools -available in other packages can be found in the graphics task view at -\url{http://cran.r-project.org/web/views/Graphics.html}. - -\section{About this book}\label{about-this-book} - -The first chapter, \protect\hyperlink{cha:getting-started}{getting -started with ggplot2}, describes how to quickly get started using -ggplot2 to make useful graphics. This chapter introduces several -important ggplot2 concepts: geoms, aesthetic mappings and facetting. -\protect\hyperlink{cha:toolbox}{Toolbox} dives into more details, giving -you a toolbox designed to solve a wide range of problems. - -\protect\hyperlink{cha:mastery}{Mastery} describes the layered grammar -of graphics which underlies ggplot2. The theory is illustrated in -\protect\hyperlink{cha:layers}{layers} which demonstrates how to add -additional layers to your plot, exercising full control over the geoms -and stats used within them. - -Understanding how scales work is crucial for fine-tuning the perceptual -properties of your plot. Customising scales gives fine control over the -exact appearance of the plot and helps to support the story that you are -telling. \protect\hyperlink{cha:scales}{Scales} will show you what -scales are available, how to adjust their parameters, and how to control -the appearance of axes and legends. - -Coordinate systems and facetting control the position of elements of the -plot. These are described in \protect\hyperlink{cha:position}{position}. -Facetting is a very powerful graphical tool as it allows you to rapidly -compare different subsets of your data. Different coordinate systems are -less commonly needed, but are very important for certain types of data. - -To polish your plots for publication, you will need to learn about the -tools described in \protect\hyperlink{cha:polishing}{polishing}. There -you will learn about how to control the theming system of ggplot2 and -how to save plots to disk. - -The book concludes with four chapters that show how to use ggplot2 as -part of a data analysis pipeline. ggplot2 works best when your data is -tidy, so \protect\hyperlink{cha:data}{Tidying} discusses what that means -and how to make your messy data tidy. \protect\hyperlink{cha:dplyr}{Data -transformation} teaches you how to use the dplyr package to perform the -most common data manipulation operations. -\protect\hyperlink{cha:modelling}{Modelling} shows how to integrate -visualisation and modelling in two useful ways. Duplicated code is a big -inhibitor of flexibility and reduces your ability to respond to changes -in requirements. \protect\hyperlink{cha:programming}{Programming with -ggplot2} covers useful techniques for reducing duplication in your code. - -\section{Installation}\label{sec:installation} - -\index{Installation} - -To use ggplot2, you must first install it. Make sure you have a recent -version of R (at least version 3.2.0) from \url{http://r-project.org} -and then run the following code to download and install ggplot2: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{install.packages}\NormalTok{(}\StringTok{"ggplot2"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\section{Other resources}\label{sec:other-resources} - -This book teaches you the elements of ggplot2's grammar and how they fit -together, but it does not document every function in complete detail. -You will need additional documentation as your use of ggplot2 becomes -more complex and varied. - -The best resource for specific details of ggplot2 functions and their -arguments will always be the built-in documentation. This is accessible -online, \url{http://docs.ggplot2.org/}, and from within R using the -usual help syntax. The advantage of the online documentation is that you -can see all the example plots and navigate between topics more easily. - -If you use ggplot2 regularly, it's a good idea to sign up for the -ggplot2 mailing list, \url{http://groups.google.com/group/ggplot2}. The -list has relatively low traffic and is very friendly to new users. -Another useful resource is stackoverflow, -\url{http://stackoverflow.com}. There is an active ggplot2 community on -stackoverflow, and many common questions have already been asked and -answered. In either place, you're much more likely to get help if you -create a minimal reproducible example. The -\href{https://github.com/jennybc/reprex}{reprex} package by Jenny Bryan -provides a convenient way to do this, and also include advice on -creating a good example. The more information you provide, the easier it -is for the community to help you. - -The number of functions in ggplot2 can be overwhelming, but RStudio -provides some great cheatsheets to jog your memory at -\url{http://www.rstudio.com/resources/cheatsheets/}. - -Finally, the complete source code for the book is available online at -\url{https://github.com/hadley/ggplot2-book}. This contains the complete -text for the book, as well as all the code and data needed to recreate -all the plots. - -\section{Colophon}\label{colophon} - -This book was written in \href{http://rmarkdown.rstudio.com/}{R -Markdown} inside \href{http://www.rstudio.com/ide/}{RStudio}. -\href{http://yihui.name/knitr/}{knitr} and -\href{http://johnmacfarlane.net/pandoc/}{pandoc} converted the raw -Rmarkdown to html and pdf. The complete source is available from -\href{https://github.com/hadley/ggplot2-book}{github}. This version of -the book was built with: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{devtools::}\KeywordTok{session_info}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"ggplot2"}\NormalTok{, }\StringTok{"dplyr"}\NormalTok{, }\StringTok{"broom"}\NormalTok{))} -\CommentTok{#> Session info ------------------------------------------------------} -\CommentTok{#> setting value } -\CommentTok{#> version R version 3.2.3 (2015-12-10)} -\CommentTok{#> system x86_64, darwin13.4.0 } -\CommentTok{#> ui X11 } -\CommentTok{#> language (EN) } -\CommentTok{#> collate en_US.UTF-8 } -\CommentTok{#> tz America/Chicago } -\CommentTok{#> date 2016-02-27} -\CommentTok{#> Packages ----------------------------------------------------------} -\CommentTok{#> package * version date source } -\CommentTok{#> assertthat 0.1 2013-12-06 CRAN (R 3.2.0)} -\CommentTok{#> BH 1.58.0-1 2015-05-21 CRAN (R 3.2.0)} -\CommentTok{#> broom 0.4.0 2015-11-30 CRAN (R 3.2.2)} -\CommentTok{#> colorspace 1.2-6 2015-03-11 CRAN (R 3.2.0)} -\CommentTok{#> DBI 0.3.1 2014-09-24 CRAN (R 3.2.0)} -\CommentTok{#> dichromat 2.0-0 2013-01-24 CRAN (R 3.2.0)} -\CommentTok{#> digest 0.6.9 2016-01-08 CRAN (R 3.2.3)} -\CommentTok{#> dplyr * 0.4.3 2015-09-01 CRAN (R 3.2.0)} -\CommentTok{#> ggplot2 * 2.1.0 2016-02-26 local } -\CommentTok{#> gtable 0.2.0 2016-02-26 CRAN (R 3.2.3)} -\CommentTok{#> labeling 0.3 2014-08-23 CRAN (R 3.2.0)} -\CommentTok{#> lattice 0.20-33 2015-07-14 CRAN (R 3.2.3)} -\CommentTok{#> lazyeval 0.1.10 2015-01-02 CRAN (R 3.2.0)} -\CommentTok{#> magrittr 1.5 2014-11-22 CRAN (R 3.2.0)} -\CommentTok{#> MASS 7.3-45 2015-11-10 CRAN (R 3.2.3)} -\CommentTok{#> mnormt 1.5-3 2015-05-25 CRAN (R 3.2.0)} -\CommentTok{#> munsell 0.4.2 2013-07-11 CRAN (R 3.2.0)} -\CommentTok{#> nlme 3.1-122 2015-08-19 CRAN (R 3.2.3)} -\CommentTok{#> plyr 1.8.3 2015-06-12 CRAN (R 3.2.0)} -\CommentTok{#> psych 1.5.8 2015-08-30 CRAN (R 3.2.0)} -\CommentTok{#> R6 2.1.2 2016-01-26 CRAN (R 3.2.3)} -\CommentTok{#> RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.2.0)} -\CommentTok{#> Rcpp 0.12.3 2016-01-10 CRAN (R 3.2.3)} -\CommentTok{#> reshape2 1.4.1 2014-12-06 CRAN (R 3.2.0)} -\CommentTok{#> scales 0.4.0 2016-02-26 CRAN (R 3.2.3)} -\CommentTok{#> stringi 1.0-1 2015-10-22 CRAN (R 3.2.0)} -\CommentTok{#> stringr 1.0.0 2015-04-30 CRAN (R 3.2.0)} -\CommentTok{#> tidyr * 0.4.1 2016-02-05 CRAN (R 3.2.3)} -\KeywordTok{getOption}\NormalTok{(}\StringTok{"width"}\NormalTok{)} -\CommentTok{#> [1] 67} -\end{Highlighting} -\end{Shaded} - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-chambers:1983}{} -Chambers, John, William Cleveland, Beat Kleiner, and Paul Tukey. 1983. -\emph{Graphical Methods for Data Analysis}. Wadsworth. - -\hypertarget{ref-cleveland:1993}{} -Cleveland, William. 1993. \emph{Visualizing Data}. Hobart Press. - -\hypertarget{ref-cook:2007}{} -Cook, Dianne, and Deborah F. Swayne. 2007. \emph{Interactive and Dynamic -Graphics for Data Analysis: With Examples Using R and GGobi}. Springer. - -\hypertarget{ref-plotrix}{} -Lemon, Jim, Ben Bolker, Sander Oom, Eduardo Klein, Barry Rowlingson, -Hadley Wickham, Anupam Tyagi, et al. 2008. \emph{Plotrix: Various -Plotting Functions}. - -\hypertarget{ref-meyer:2006}{} -Meyer, David, Achim Zeileis, and Kurt Hornik. 2006. ``The Strucplot -Framework: Visualizing Multi-Way Contingency Tables with Vcd.'' -\emph{Journal of Statistical Software} 17 (3): 1--48. -\url{http://www.jstatsoft.org/v17/i03/}. - -\hypertarget{ref-murrell:1998}{} -Murrell, Paul. 1998. ``Investigations in Graphical Statistics.'' -PhD thesis, The University of Auckland. - -\hypertarget{ref-robbins:2004}{} -Robbins, Naomi. 2013. \emph{Creating More Effective Graphs}. Chart -House. - -\hypertarget{ref-sarkar:2008}{} -Sarkar, Deepayan. 2008. \emph{Lattice: Multivariate Data Visualization -with R}. Springer. - -\hypertarget{ref-tufte:1990}{} -Tufte, Edward R. 1990. \emph{Envisioning Information}. Graphics Press. - -\hypertarget{ref-tufte:1997}{} ----------. 1997. \emph{Visual Explanations}. Graphics Press. - -\hypertarget{ref-tufte:2001}{} ----------. 2001. \emph{The Visual Display of Quantitative Information}. -Second. Graphics Press. - -\hypertarget{ref-tukey:1977}{} -Tukey, John W. 1977. \emph{Exploratory Data Analysis}. Addison--Wesley. - -\hypertarget{ref-gplots}{} -Warnes, Gregory. 2007. \emph{Gplots: Various R Programming Tools for -Plotting Data}. - -\hypertarget{ref-wickham:2007d}{} -Wickham, Hadley. 2009. ``A Layered Grammar of Graphics.'' \emph{Journal -of Computational and Graphical Statistics}. - -\hypertarget{ref-wickham:2008b}{} -Wickham, Hadley, Michael Lawrence, Duncan Temple Lang, and Deborah F -Swayne. 2008. ``An Introduction to Rggobi.'' \emph{R-News} 8 (2): 3--7. -\url{http://CRAN.R-project.org/doc/Rnews/Rnews_2008-2.pdf}. - -\hypertarget{ref-wilkinson:2006}{} -Wilkinson, Leland. 2005. \emph{The Grammar of Graphics}. 2nd ed. -Statistics and Computing. Springer. diff --git a/book/tex/layers.tex b/book/tex/layers.tex deleted file mode 100644 index b7d85951..00000000 --- a/book/tex/layers.tex +++ /dev/null @@ -1,1070 +0,0 @@ -\chapter{Build a plot layer by layer}\label{cha:layers} - -\section{Introduction}\label{introduction} - -One of the key ideas behind ggplot2 is that it allows you to easily -iterate, building up a complex plot a layer at a time. Each layer can -come from a different dataset and have a different aesthetic mapping, -making it possible to create sophisticated plots that display data from -multiple sources. - -You've already created layers with functions like \texttt{geom\_point()} -and \texttt{geom\_histogram()}. In this chapter, you'll dive into the -details of a layer, and how you can control all five components: data, -the aesthetic mappings, the geom, stat, and position adjustments. The -goal here is to give you the tools to build sophisticated plots tailored -to the problem at hand. - -\section{Building a plot}\label{building-a-plot} - -So far, whenever we've created a plot with \texttt{ggplot()}, we've -immediately added on a layer with a geom function. But it's important to -realise that there really are two distinct steps. First we create a plot -with default dataset and aesthetic mappings: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy))} -\NormalTok{p} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/layers/layer1-1} -\end{figure} - -There's nothing to see yet, so we need to add a layer: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/layers/unnamed-chunk-1-1} -\end{figure} - -\texttt{geom\_point()} is a shortcut. Behind the scenes it calls the -\texttt{layer()} function to create a new layer: \indexf{layer} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p +}\StringTok{ }\KeywordTok{layer}\NormalTok{(} - \DataTypeTok{mapping =} \OtherTok{NULL}\NormalTok{, } - \DataTypeTok{data =} \OtherTok{NULL}\NormalTok{,} - \DataTypeTok{geom =} \StringTok{"point"}\NormalTok{, } - \DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{,} - \DataTypeTok{position =} \StringTok{"identity"} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -This call fully specifies the five components to the layer: -\index{Layers!components} - -\begin{itemize} -\item - \textbf{mapping}: A set of aesthetic mappings, specified using the - \texttt{aes()} function and combined with the plot defaults as - described in \protect\hyperlink{sec:aes}{aesthetic mappings}. If - \texttt{NULL}, uses the default mapping set in \texttt{ggplot()}. -\item - \textbf{data}: A dataset which overrides the default plot dataset. It - is usually omitted (set to \texttt{NULL}), in which case the layer - will use the default data specified in \texttt{ggplot()}. The - requirements for data are explained in more detail in - \protect\hyperlink{sec:data}{data}. -\item - \textbf{geom}: The name of the geometric object to use to draw each - observation. Geoms are discussed in more detail in - \protect\hyperlink{sec:data}{geom}, and - \protect\hyperlink{cha:toolbox}{the toolbox} explores their use in - more depth. - - Geoms can have additional arguments. All geoms take aesthetics as - parameters. If you supply an aesthetic (e.g.~colour) as a parameter, - it will not be scaled, allowing you to control the appearance of the - plot, as described in \protect\hyperlink{sub:setting-mapping}{setting - vs.~mapping}. You can pass params in \texttt{...} (in which case stat - and geom parameters are automatically teased apart), or in a list - passed to \texttt{geom\_params}. -\item - \textbf{stat}: The name of the statistical tranformation to use. A - statistical transformation performs some useful statistical summary, - and is key to histograms and smoothers. To keep the data as is, use - the ``identity'' stat. Learn more in - \protect\hyperlink{sec:stat}{statistical transformations}. - - You only need to set one of stat and geom: every geom has a default - stat, and every stat a default geom. - - Most stats take additional parameters to specify the details of - statistical transformation. You can supply params either in - \texttt{...} (in which case stat and geom parameters are automatically - teased apart), or in a list called \texttt{stat\_params}. -\item - \textbf{position}: The method used to adjust overlapping objects, like - jittering, stacking or dodging. More details in - \protect\hyperlink{sec:position}{position}. -\end{itemize} - -It's useful to understand the \texttt{layer()} function so you have a -better mental model of the layer object. But you'll rarely use the full -\texttt{layer()} call because it's so verbose. Instead, you'll use the -shortcut \texttt{geom\_} functions: -\texttt{geom\_point(mapping,\ data,\ ...)} is exactly equivalent to -\texttt{layer(mapping,\ data,\ geom\ =\ "point",\ ...)}. - -\hypertarget{sec:data}{\section{Data}\label{sec:data}} - -Every layer must have some data associated with it, and that data must -be in a tidy data frame. You'll learn about tidy data in -\protect\hyperlink{cha:data}{tidy data}, but for now, all you need to -know is that a tidy data frame has variables in the columns and -observations in the rows. This is a strong restriction, but there are -good reasons for it: \index{Data} \indexf{data.frame} - -\begin{itemize} -\item - Your data is very important, so it's best to be explicit about it. -\item - A single data frame is also easier to save than a multitude of - vectors, which means it's easier to reproduce your results or send - your data to someone else. -\item - It enforces a clean separation of concerns: ggplot2 turns data frames - into visualisations. Other packages can make data frames in the right - format (learn more about that in - \protect\hyperlink{sub:modelvis}{model visualisation}). -\end{itemize} - -The data on each layer doesn't need to be the same, and it's often -useful to combine multiple datasets in a single plot. To illustrate that -idea I'm going to generate two new datasets related to the mpg dataset. -First I'll fit a loess model and generate predictions from it. (This is -what \texttt{geom\_smooth()} does behind the scenes) - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mod <-}\StringTok{ }\KeywordTok{loess}\NormalTok{(hwy ~}\StringTok{ }\NormalTok{displ, }\DataTypeTok{data =} \NormalTok{mpg)} -\NormalTok{grid <-}\StringTok{ }\KeywordTok{data_frame}\NormalTok{(}\DataTypeTok{displ =} \KeywordTok{seq}\NormalTok{(}\KeywordTok{min}\NormalTok{(mpg$displ), }\KeywordTok{max}\NormalTok{(mpg$displ), }\DataTypeTok{length =} \DecValTok{50}\NormalTok{))} -\NormalTok{grid$hwy <-}\StringTok{ }\KeywordTok{predict}\NormalTok{(mod, }\DataTypeTok{newdata =} \NormalTok{grid)} - -\NormalTok{grid} -\CommentTok{#> Source: local data frame [50 x 2]} -\CommentTok{#> } -\CommentTok{#> displ hwy} -\CommentTok{#> (dbl) (dbl)} -\CommentTok{#> 1 1.60 33.1} -\CommentTok{#> 2 1.71 32.2} -\CommentTok{#> 3 1.82 31.3} -\CommentTok{#> 4 1.93 30.4} -\CommentTok{#> 5 2.04 29.6} -\CommentTok{#> 6 2.15 28.8} -\CommentTok{#> .. ... ...} -\end{Highlighting} -\end{Shaded} - -Next, I'll isolate observations that are particularly far away from -their predicted values: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{std_resid <-}\StringTok{ }\KeywordTok{resid}\NormalTok{(mod) /}\StringTok{ }\NormalTok{mod$s} -\NormalTok{outlier <-}\StringTok{ }\KeywordTok{filter}\NormalTok{(mpg, }\KeywordTok{abs}\NormalTok{(std_resid) >}\StringTok{ }\DecValTok{2}\NormalTok{)} -\NormalTok{outlier} -\CommentTok{#> Source: local data frame [6 x 11]} -\CommentTok{#> } -\CommentTok{#> manufacturer model displ year cyl trans drv cty} -\CommentTok{#> (chr) (chr) (dbl) (int) (int) (chr) (chr) (int)} -\CommentTok{#> 1 chevrolet corvette 5.7 1999 8 manual(m6) r 16} -\CommentTok{#> 2 pontiac grand prix 3.8 2008 6 auto(l4) f 18} -\CommentTok{#> 3 pontiac grand prix 5.3 2008 8 auto(s4) f 16} -\CommentTok{#> 4 volkswagen jetta 1.9 1999 4 manual(m5) f 33} -\CommentTok{#> 5 volkswagen new beetle 1.9 1999 4 manual(m5) f 35} -\CommentTok{#> 6 volkswagen new beetle 1.9 1999 4 auto(l4) f 29} -\CommentTok{#> Variables not shown: hwy (int), fl (chr), class (chr)} -\end{Highlighting} -\end{Shaded} - -I've generated these datasets because it's common to enhance the display -of raw data with a statistical summary and some annotations. With these -new datasets, I can improve our initial scatterplot by overlaying a -smoothed line, and labelling the outlying points: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{data =} \NormalTok{grid, }\DataTypeTok{colour =} \StringTok{"blue"}\NormalTok{, }\DataTypeTok{size =} \FloatTok{1.5}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\DataTypeTok{data =} \NormalTok{outlier, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{model))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/layers/unnamed-chunk-2-1} -\end{figure} - -(The labels aren't particularly easy to read, but you can fix that with -some manual tweaking.) - -Note that you need the explicit \texttt{data\ =} in the layers, but not -in the call to \texttt{ggplot()}. That's because the argument order is -different. This is a little inconsistent, but it reduces typing for the -common case where you specify the data once in \texttt{ggplot()} and -modify aesthetics in each layer. - -In this example, every layer uses a different dataset. We could define -the same plot in another way, omitting the default dataset, and -specifying a dataset for each layer: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{data =} \NormalTok{mpg) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{data =} \NormalTok{grid) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\DataTypeTok{data =} \NormalTok{outlier, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{model))} -\end{Highlighting} -\end{Shaded} - -I don't particularly like this style in this example because it makes it -less clear what the primary dataset is (and because of the way that the -arguments to \texttt{ggplot()} are ordered, it actually requires more -keystrokes). However, you may prefer it in cases where there isn't a -clear primary dataset, or where the aesthetics also vary from layer to -layer. - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - The first two arguments to ggplot are \texttt{data} and - \texttt{mapping}. The first two arguments to all layer functions are - \texttt{mapping} and \texttt{data}. Why does the order of the - arguments differ? (Hint: think about what you set most commonly.) -\item - The following code uses dplyr to generate some summary statistics - about each class of car (you'll learn how it works in - \protect\hyperlink{cha:dplyr}{data transformation}). - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(dplyr)} -\NormalTok{class <-}\StringTok{ }\NormalTok{mpg %>%}\StringTok{ } -\StringTok{ }\KeywordTok{group_by}\NormalTok{(class) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{n =} \KeywordTok{n}\NormalTok{(), }\DataTypeTok{hwy =} \KeywordTok{mean}\NormalTok{(hwy))} -\end{Highlighting} -\end{Shaded} - - Use the data to recreate this plot: - - \begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/layers/unnamed-chunk-5-1} - \end{figure} -\end{enumerate} - -\hypertarget{sec:aes}{\section{Aesthetic mappings}\label{sec:aes}} - -The aesthetic mappings, defined with \texttt{aes()}, describe how -variables are mapped to visual properties or \textbf{aesthetics}. -\texttt{aes()} takes a sequence of aesthetic-variable pairs like this: -\index{Aesthetics!mapping} \indexf{aes} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{displ, }\DataTypeTok{y =} \NormalTok{hwy, }\DataTypeTok{colour =} \NormalTok{class)} -\end{Highlighting} -\end{Shaded} - -(If you're American, you can use \emph{color}, and behind the scenes -ggplot2 will correct your spelling ;) - -Here we map x-position to \texttt{displ}, y-position to \texttt{hwy}, -and colour to \texttt{class}. The names for the first two arguments can -be omitted, in which case they correspond to the x and y variables. That -makes this specification equivalent to the one above: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)} -\end{Highlighting} -\end{Shaded} - -While you can do data manipulation in \texttt{aes()}, e.g. -\texttt{aes(log(carat),\ log(price))}, it's best to only do simple -calculations. It's better to move complex transformations out of the -\texttt{aes()} call and into an explicit \texttt{dplyr::mutate()} call, -as you'll learn about in \protect\hyperlink{mutate}{mutate}. This makes -it easier to check your work and it's often faster because you need only -do the transformation once, not every time the plot is drawn. - -Never refer to a variable with \texttt{\$} (e.g., -\texttt{diamonds\$carat}) in \texttt{aes()}. This breaks containment, so -that the plot no longer contains everything it needs, and causes -problems if ggplot2 changes the order of the rows, as it does when -facetting. \indexc{\$} - -\subsection{Specifying the aesthetics in the plot vs.~in the -layers}\label{sub:plots-and-layers} - -Aesthetic mappings can be supplied in the initial \texttt{ggplot()} -call, in individual layers, or in some combination of both. All of these -calls create the same plot specification: -\index{Aesthetics!plot vs. layer} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{class))} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{hwy, }\DataTypeTok{colour =} \NormalTok{class))} -\KeywordTok{ggplot}\NormalTok{(mpg) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class))} -\end{Highlighting} -\end{Shaded} - -Within each layer, you can add, override, or remove mappings: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -Operation & Layer aesthetics & Result\tabularnewline -\midrule -\endhead -Add & \texttt{aes(colour\ =\ cyl)} & -\texttt{aes(mpg,\ wt,\ colour\ =\ cyl)}\tabularnewline -Override & \texttt{aes(y\ =\ disp)} & -\texttt{aes(mpg,\ disp)}\tabularnewline -Remove & \texttt{aes(y\ =\ NULL)} & \texttt{aes(mpg)}\tabularnewline -\bottomrule -\end{longtable} - -If you only have one layer in the plot, the way you specify aesthetics -doesn't make any difference. However, the distinction is important when -you start adding additional layers. These two plots are both valid and -interesting, but focus on quite different aspects of the data: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{) +} -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-7-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-7-2} -\end{figure} - -Generally, you want to set up the mappings to illuminate the structure -underlying the graphic and minimise typing. It may take some time before -the best approach is immediately obvious, so if you've iterated your way -to a complex graphic, it may be worthwhile to rewrite it to make the -structure more clear. - -\hypertarget{sub:setting-mapping}{\subsection{Setting -vs.~mapping}\label{sub:setting-mapping}} - -Instead of mapping an aesthetic property to a variable, you can set it -to a \emph{single} value by specifying it in the layer parameters. We -\textbf{map} an aesthetic to a variable (e.g., -\texttt{aes(colour\ =\ cut)}) or \textbf{set} it to a constant (e.g., -\texttt{colour\ =\ "red"}). If you want appearance to be governed by a -variable, put the specification inside \texttt{aes()}; if you want -override the default size or colour, put the value outside of -\texttt{aes()}. \index{Aesthetics!setting} - -The following plots are created with similar code, but have rather -different outputs. The second plot \textbf{maps} (not sets) the colour -to the value `darkblue'. This effectively creates a new variable -containing only the value `darkblue' and then scales it with a colour -scale. Because this value is discrete, the default colour scale uses -evenly spaced colours on the colour wheel, and since there is only one -value this colour is pinkish. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"darkblue"}\NormalTok{) } - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"darkblue"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/layer15-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/layer15-2} -\end{figure} - -A third approach is to map the value, but override the default scale: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"darkblue"}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_identity}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-8-1} -\end{figure} - -This is most useful if you always have a column that already contains -colours. You'll learn more about that in -\protect\hyperlink{sub:scale-identity}{the identity scale}. - -It's sometimes useful to map aesthetics to constants. For example, if -you want to display multiple layers with varying parameters, you can -``name'' each layer: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"loess"}\NormalTok{), }\DataTypeTok{method =} \StringTok{"loess"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"lm"}\NormalTok{), }\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{) +} -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"Method"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/layers/unnamed-chunk-9-1} -\end{figure} - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Simplify the following plot specifications: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(mpg$disp, mpg$hwy))} - -\KeywordTok{ggplot}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{hwy, }\DataTypeTok{x =} \NormalTok{cty), }\DataTypeTok{data =} \NormalTok{mpg) +} -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{data =} \NormalTok{mpg, }\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(cty, hwy))} - -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(carat, price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\KeywordTok{log}\NormalTok{(brainwt), }\KeywordTok{log}\NormalTok{(bodywt)), }\DataTypeTok{data =} \NormalTok{msleep)} -\end{Highlighting} -\end{Shaded} -\item - What does the following code do? Does it work? Does it make sense? - Why/why not? - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(class, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{(}\KeywordTok{aes}\NormalTok{(trans, hwy))} -\end{Highlighting} -\end{Shaded} -\item - What happens if you try to use a continuous variable on the x axis in - one layer, and a categorical variable in another layer? What happens - if you do it in the opposite order? -\end{enumerate} - -\section{Geoms}\label{sec:geom} - -Geometric objects, or \textbf{geoms} for short, perform the actual -rendering of the layer, controlling the type of plot that you create. -For example, using a point geom will create a scatterplot, while using a -line geom will create a line plot. - -\begin{itemize} -\tightlist -\item - Graphical primitives: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_blank()}: display nothing. Most useful for adjusting - axes limits using data. - \item - \texttt{geom\_point()}: points. - \item - \texttt{geom\_path()}: paths. - \item - \texttt{geom\_ribbon()}: ribbons, a path with vertical thickness. - \item - \texttt{geom\_segment()}: a line segment, specified by start and end - position. - \item - \texttt{geom\_rect()}: rectangles. - \item - \texttt{geom\_polyon()}: filled polygons. - \item - \texttt{geom\_text()}: text. - \end{itemize} -\item - One variable: - - \begin{itemize} - \tightlist - \item - Discrete: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_bar()}: display distribution of discrete variable. - \end{itemize} - \item - Continuous - - \begin{itemize} - \tightlist - \item - \texttt{geom\_histogram()}: bin and count continuous variable, - display with bars. - \item - \texttt{geom\_density()}: smoothed density estimate. - \item - \texttt{geom\_dotplot()}: stack individual points into a dot plot. - \item - \texttt{geom\_freqpoly()}: bin and count continuous variable, - display with lines. - \end{itemize} - \end{itemize} -\item - Two variables: - - \begin{itemize} - \tightlist - \item - Both continuous: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_point()}: scatterplot. - \item - \texttt{geom\_quantile()}: smoothed quantile regression. - \item - \texttt{geom\_rug()}: marginal rug plots. - \item - \texttt{geom\_smooth()}: smoothed line of best fit. - \item - \texttt{geom\_text()}: text labels. - \end{itemize} - \item - Show distribution: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_bin2d()}: bin into rectangles and count. - \item - \texttt{geom\_density2d()}: smoothed 2d density estimate. - \item - \texttt{geom\_hex()}: bin into hexagons and count. - \end{itemize} - \item - At least one discrete: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_count()}: count number of point at distinct - locations - \item - \texttt{geom\_jitter()}: randomly jitter overlapping points. - \end{itemize} - \item - One continuous, one discrete: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_bar(stat\ =\ "identity")}: a bar chart of - precomputed summaries. - \item - \texttt{geom\_boxplot()}: boxplots. - \item - \texttt{geom\_violin()}: show density of values in each group. - \end{itemize} - \item - One time, one continuous - - \begin{itemize} - \tightlist - \item - \texttt{geom\_area()}: area plot. - \item - \texttt{geom\_line()}: line plot. - \item - \texttt{geom\_step()}: step plot. - \end{itemize} - \item - Display uncertainty: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_crossbar()}: vertical bar with center. - \item - \texttt{geom\_errorbar()}: error bars. - \item - \texttt{geom\_linerange()}: vertical line. - \item - \texttt{geom\_pointrange()}: vertical line with center. - \end{itemize} - \item - Spatial - - \begin{itemize} - \tightlist - \item - \texttt{geom\_map()}: fast version of \texttt{geom\_polygon()} for - map data. - \end{itemize} - \end{itemize} -\item - Three variables: - - \begin{itemize} - \tightlist - \item - \texttt{geom\_contour()}: contours. - \item - \texttt{geom\_tile()}: tile the plane with rectangles. - \item - \texttt{geom\_raster()}: fast version of \texttt{geom\_tile()} for - equal sized tiles. - \end{itemize} -\end{itemize} - -Each geom has a set of aesthetics that it understands, some of which -\emph{must} be provided. For example, the point geoms requires x and y -position, and understands colour, size and shape aesthetics. A bar -requires height (\texttt{ymax}), and understands width, border colour -and fill colour. Each geom lists its aesthetics in the documentation. - -Some geoms differ primarily in the way that they are parameterised. For -example, you can draw a square in three ways: -\index{Geoms!parameterisation} - -\begin{itemize} -\item - By giving \texttt{geom\_tile()} the location (\texttt{x} and - \texttt{y}) and dimensions (\texttt{width} and \texttt{height}). - \indexf{geom\_tile} -\item - By giving \texttt{geom\_rect()} top (\texttt{ymax}), bottom - (\texttt{ymin}), left (\texttt{xmin}) and right (\texttt{xmax}) - positions. \indexf{geom\_rect} -\item - By giving \texttt{geom\_polygon()} a four row data frame with the - \texttt{x} and \texttt{y} positions of each corner. -\end{itemize} - -Other related geoms are: - -\begin{itemize} -\tightlist -\item - \texttt{geom\_segment()} and \texttt{geom\_line()} -\item - \texttt{geom\_area()} and \texttt{geom\_ribbon()}. -\end{itemize} - -If alternative parameterisations are available, picking the right one -for your data will usually make it much easier to draw the plot you -want. - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Download and print out the ggplot2 cheatsheet from - \url{http://www.rstudio.com/resources/cheatsheets/} so you have a - handy visual reference for all the geoms. -\item - Look at the documentation for the graphical primitive geoms. Which - aesthetics do they use? How can you summarise them in a compact form? -\item - What's the best way to master an unfamiliar geom? List three resources - to help you get started. -\item - For each of the plots below, identify the geom used to draw it. - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-12-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-12-2} - \end{figure} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-13-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-13-2} - \end{figure} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-14-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/unnamed-chunk-14-2} - \end{figure} -\item - For each of the following problems, suggest a useful geom: - - \begin{itemize} - \tightlist - \item - Display how a variable has changed over time. - \item - Show the detailed distribution of a single variable. - \item - Focus attention on the overall trend in a large dataset. - \item - Draw a map. - \item - Label outlying points. - \end{itemize} -\end{enumerate} - -\hypertarget{sec:stat}{\section{Stats}\label{sec:stat}} - -A statistical transformation, or \textbf{stat}, transforms the data, -typically by summarising it in some manner. For example, a useful stat -is the smoother, which calculates the smoothed mean of y, conditional on -x. You've already used many of ggplot2's stats because they're used -behind the scenes to generate many important geoms: - -\begin{itemize} -\tightlist -\item - \texttt{stat\_bin()}: \texttt{geom\_bar()}, \texttt{geom\_freqpoly()}, - \texttt{geom\_histogram()} -\item - \texttt{stat\_bin2d()}: \texttt{geom\_bin2d()} -\item - \texttt{stat\_bindot()}: \texttt{geom\_dotplot()} -\item - \texttt{stat\_binhex()}: \texttt{geom\_hex()} -\item - \texttt{stat\_boxplot()}: \texttt{geom\_boxplot()} -\item - \texttt{stat\_contour()}: \texttt{geom\_contour()} -\item - \texttt{stat\_quantile()}: \texttt{geom\_quantile()} -\item - \texttt{stat\_smooth()}: \texttt{geom\_smooth()} -\item - \texttt{stat\_sum()}: \texttt{geom\_count()} -\end{itemize} - -You'll rarely call these functions directly, but they are useful to know -about because their documentation often provides more detail about the -corresponding statistical transformation. - -Other stats can't be created with a \texttt{geom\_} function: - -\begin{itemize} -\tightlist -\item - \texttt{stat\_ecdf()}: compute a empirical cumulative distribution - plot. -\item - \texttt{stat\_function()}: compute y values from a function of x - values. -\item - \texttt{stat\_summary()}: summarise y values at distinct x values. -\item - \texttt{stat\_summary2d()}, \texttt{stat\_summary\_hex()}: summarise - binned values. -\item - \texttt{stat\_qq()}: perform calculations for a quantile-quantile - plot. -\item - \texttt{stat\_spoke()}: convert angle and radius to position. -\item - \texttt{stat\_unique()}: remove duplicated rows. -\end{itemize} - -There are two ways to use these functions. You can either add a -\texttt{stat\_()} function and override the default geom, or add a -\texttt{geom\_()} function and override the default stat: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(trans, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{stat_summary}\NormalTok{(}\DataTypeTok{geom =} \StringTok{"point"}\NormalTok{, }\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{4}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(trans, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"summary"}\NormalTok{, }\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{4}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/layers/unnamed-chunk-15-1} -\end{figure} - -I think it's best to use the second form because it makes it more clear -that you're displaying a summary, not the raw data. - -\subsection{Generated variables}\label{generated-variables} - -Internally, a stat takes a data frame as input and returns a data frame -as output, and so a stat can add new variables to the original dataset. -It is possible to map aesthetics to these new variables. For example, -\texttt{stat\_bin}, the statistic used to make histograms, produces the -following variables: \index{Stats!creating new variables} -\indexf{stat\_bin} - -\begin{itemize} -\tightlist -\item - \texttt{count}, the number of observations in each bin -\item - \texttt{density}, the density of observations in each bin (percentage - of total / bar width) -\item - \texttt{x}, the centre of the bin -\end{itemize} - -These generated variables can be used instead of the variables present -in the original dataset. For example, the default histogram geom assigns -the height of the bars to the number of observations (\texttt{count}), -but if you'd prefer a more traditional histogram, you can use the -density (\texttt{density}). To refer to a generated variable like -density, ``\texttt{..}'' must surround the name. This prevents confusion -in case the original dataset includes a variable with the same name as a -generated variable, and it makes it clear to any later reader of the -code that this variable was generated by a stat. Each statistic lists -the variables that it creates in its documentation. \indexc{..} Compare -the y-axes on these two plots: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{500}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{..density..), }\DataTypeTok{binwidth =} \DecValTok{500}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/hist-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/hist-2} -\end{figure} - -This technique is particularly useful when you want to compare the -distribution of multiple groups that have very different sizes. For -example, it's hard to compare the distribution of \texttt{price} within -\texttt{cut} because some groups are quite small. It's easier to compare -if we standardise each group to take up the same area: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(price, }\DataTypeTok{colour =} \NormalTok{cut)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{500}\NormalTok{) +} -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(price, }\DataTypeTok{colour =} \NormalTok{cut)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{..density..), }\DataTypeTok{binwidth =} \DecValTok{500}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/freqpoly-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/freqpoly-2} -\end{figure} - -The result of this plot is rather surprising: low quality diamonds seem -to be more expensive on average. We'll come back to this result in -\protect\hyperlink{sub:trend}{removing trend}. - -\subsection{Exercises}\label{exercises-3} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - The code below creates a similar dataset to \texttt{stat\_smooth()}. - Use the appropriate geoms to mimic the default \texttt{geom\_smooth()} - display. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mod <-}\StringTok{ }\KeywordTok{loess}\NormalTok{(hwy ~}\StringTok{ }\NormalTok{displ, }\DataTypeTok{data =} \NormalTok{mpg)} -\NormalTok{smoothed <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{displ =} \KeywordTok{seq}\NormalTok{(}\FloatTok{1.6}\NormalTok{, }\DecValTok{7}\NormalTok{, }\DataTypeTok{length =} \DecValTok{50}\NormalTok{))} -\NormalTok{pred <-}\StringTok{ }\KeywordTok{predict}\NormalTok{(mod, }\DataTypeTok{newdata =} \NormalTok{smoothed, }\DataTypeTok{se =} \OtherTok{TRUE}\NormalTok{) } -\NormalTok{smoothed$hwy <-}\StringTok{ }\NormalTok{pred$fit} -\NormalTok{smoothed$hwy_lwr <-}\StringTok{ }\NormalTok{pred$fit -}\StringTok{ }\FloatTok{1.96} \NormalTok{*}\StringTok{ }\NormalTok{pred$se.fit} -\NormalTok{smoothed$hwy_upr <-}\StringTok{ }\NormalTok{pred$fit +}\StringTok{ }\FloatTok{1.96} \NormalTok{*}\StringTok{ }\NormalTok{pred$se.fit} -\end{Highlighting} -\end{Shaded} -\item - What stats were used to create the following plots? - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/layers/unnamed-chunk-17-1}% - \includegraphics[width=0.333\linewidth]{_figures/layers/unnamed-chunk-17-2}% - \includegraphics[width=0.333\linewidth]{_figures/layers/unnamed-chunk-17-3} - \end{figure} -\item - Read the help for \texttt{stat\_sum()} then use \texttt{geom\_count()} - to create a plot that shows the proportion of cars that have each - combination of \texttt{drv} and \texttt{trans}. -\end{enumerate} - -\hypertarget{sec:position}{\section{Position -adjustments}\label{sec:position}} - -\index{Position adjustments} - -Position adjustments apply minor tweaks to the position of elements -within a layer. Three adjustments apply primarily to bars: - -\index{Dodging} \index{Side-by-side|see{Dodging}} -\indexf{position\_dodge} \index{Stacking} \indexf{position\_stack} -\indexf{position\_fill} - -\begin{itemize} -\tightlist -\item - \texttt{position\_stack()}: stack overlapping bars (or areas) on top - of each other. -\item - \texttt{position\_fill()}: stack overlapping bars, scaling so the top - is always at 1. -\item - \texttt{position\_dodge()}: place overlapping bars (or boxplots) - side-by-side. -\end{itemize} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{dplot <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(color, }\DataTypeTok{fill =} \NormalTok{cut)) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\CommentTok{# position stack is the default for bars, so `geom_bar()` } -\CommentTok{# is equivalent to `geom_bar(position = "stack")`.} -\NormalTok{dplot +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\NormalTok{dplot +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{position =} \StringTok{"fill"}\NormalTok{)} -\NormalTok{dplot +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{position =} \StringTok{"dodge"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/layers/position-bar-1}% - \includegraphics[width=0.333\linewidth]{_figures/layers/position-bar-2}% - \includegraphics[width=0.333\linewidth]{_figures/layers/position-bar-3} -\end{figure} - -There's also a position adjustment that does nothing: -\texttt{position\_identity()}. The identity position adjustment isn't -useful for bars, because each bar obscures the bars behind, but there -are many geoms that don't need adjusting, like lines: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{dplot +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{position =} \StringTok{"identity"}\NormalTok{, }\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{2}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(color, }\DataTypeTok{colour =} \NormalTok{cut)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{cut), }\DataTypeTok{stat =} \StringTok{"count"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/position-identity-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/position-identity-2} -\end{figure} - -There are three position adjustments that are primarily useful for -points: - -\begin{itemize} -\tightlist -\item - \texttt{position\_nudge()}: move points by a fixed offset. -\item - \texttt{position\_jitter()}: add a little random noise to every - position. -\item - \texttt{position\_jitterdodge()}: dodge points within groups, then add - a little random noise. -\end{itemize} - -\indexf{position\_nudge} \indexf{position\_jitter} -\indexf{position\_jitterdodge} - -Note that the way you pass parameters to position adjustments differs to -stats and geoms. Instead of including additional arguments in -\texttt{...}, you construct a position adjustment object, supplying -additional arguments in the call: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{position =} \StringTok{"jitter"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{position =} \KeywordTok{position_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{height =} \FloatTok{0.5}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/layers/position-point-1}% - \includegraphics[width=0.5\linewidth]{_figures/layers/position-point-2} -\end{figure} - -This is rather verbose, so \texttt{geom\_jitter()} provides a convenient -shortcut: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.05}\NormalTok{, }\DataTypeTok{height =} \FloatTok{0.5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -Continuous data typically doesn't overlap exactly, and when it does -(because of high data density) minor adjustments, like jittering, are -often insufficient to fix the problem. For this reason, position -adjustments are generally most useful for discrete data. - -\subsection{Exercises}\label{exercises-4} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - When might you use \texttt{position\_nudge()}? Read the documentation. -\item - Many position adjustments can only be used with a few geoms. For - example, you can't stack boxplots or errors bars. Why not? What - properties must a geom possess in order to be stackable? What - properties must it possess to be dodgeable? -\item - Why might you use \texttt{geom\_jitter()} instead of - \texttt{geom\_count()}? What are the advantages and disadvantages of - each technique? -\item - When might you use a stacked area plot? What are the advantages and - disadvantages compared to a line plot? -\end{enumerate} diff --git a/book/tex/mastery.tex b/book/tex/mastery.tex deleted file mode 100644 index 1216b65b..00000000 --- a/book/tex/mastery.tex +++ /dev/null @@ -1,502 +0,0 @@ -\chapter{Mastering the grammar}\label{cha:mastery} - -\section{Introduction}\label{introduction} - -In order to unlock the full power of ggplot2, you'll need to master the -underlying grammar. By understanding the grammar, and how its components -fit together, you can create a wider range of visualizations, combine -multiple sources of data, and customise to your heart's content. - -This chapter describes the theoretical basis of ggplot2: the layered -grammar of graphics. The layered grammar is based on Wilkinson's grammar -of graphics (Wilkinson 2005), but adds a number of enhancements that -help it to be more expressive and fit seamlessly into the R environment. -The differences between the layered grammar and Wilkinson's grammar are -described fully in Wickham (2008). In this chapter you will learn a -little bit about each component of the grammar and how they all fit -together. The next chapters discuss the components in more detail, and -provide more examples of how you can use them in practice. -\index{Grammar!theory} - -The grammar makes it easier for you to iteratively update a plot, -changing a single feature at a time. The grammar is also useful because -it suggests the high-level aspects of a plot that \emph{can} be changed, -giving you a framework to think about graphics, and hopefully shortening -the distance from mind to paper. It also encourages the use of graphics -customised to a particular problem, rather than relying on specific -chart types. - -This chapter begins by describing in detail the process of drawing a -simple plot. \protect\hyperlink{sec:simple-plot}{Building a scatterplot} -starts with a simple scatterplot, then -\protect\hyperlink{sec:complex-plot}{Adding complexity} makes it more -complex by adding a smooth line and facetting. While working through -these examples you will be introduced to all six components of the -grammar, which are then defined more precisely in -\protect\hyperlink{sec:components}{Components of the layered grammar}. - -\hypertarget{sec:simple-plot}{\section{Building a -scatterplot}\label{sec:simple-plot}} - -How are engine size and fuel economy related? We might create a -scatterplot of engine displacement and highway mpg with points coloured -by number of cylinders: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \KeywordTok{factor}\NormalTok{(cyl))) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/mastery/unnamed-chunk-1-1} -\end{figure} - -You can create plots like this easily, but what is going on underneath -the surface? How does ggplot2 draw this plot? -\index{Scatterplot!principles of} - -\subsection{Mapping aesthetics to -data}\label{mapping-aesthetics-to-data} - -What precisely is a scatterplot? You have seen many before and have -probably even drawn some by hand. A scatterplot represents each -observation as a point, positioned according to the value of two -variables. As well as a horizontal and vertical position, each point -also has a size, a colour and a shape. These attributes are called -\textbf{aesthetics}, and are the properties that can be perceived on the -graphic. Each aesthetic can be mapped to a variable, or set to a -constant value. In the previous graphic, \texttt{displ} is mapped to -horizontal position, \texttt{hwy} to vertical position and \texttt{cyl} -to colour. Size and shape are not mapped to variables, but remain at -their (constant) default values. \index{Aesthetics!mapping} - -Once we have these mappings we can create a new dataset that records -this information: - -\begin{longtable}[c]{@{}rrr@{}} -\toprule -x & y & colour\tabularnewline -\midrule -\endhead -1.8 & 29 & 4\tabularnewline -1.8 & 29 & 4\tabularnewline -2.0 & 31 & 4\tabularnewline -2.0 & 30 & 4\tabularnewline -2.8 & 26 & 6\tabularnewline -2.8 & 26 & 6\tabularnewline -3.1 & 27 & 6\tabularnewline -1.8 & 26 & 4\tabularnewline -\bottomrule -\end{longtable} - -This new dataset is a result of applying the aesthetic mappings to the -original data. We can create many different types of plots using this -data. The scatterplot uses points, but were we instead to draw lines we -would get a line plot. If we used bars, we'd get a bar plot. Neither of -those examples makes sense for this data, but we could still draw them -(I've omitted the legends to save space): - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \KeywordTok{factor}\NormalTok{(cyl))) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \KeywordTok{factor}\NormalTok{(cyl))) +} -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{, }\DataTypeTok{position =} \StringTok{"identity"}\NormalTok{, }\DataTypeTok{fill =} \OtherTok{NA}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/mastery/other-geoms-1}% - \includegraphics[width=0.5\linewidth]{_figures/mastery/other-geoms-2} -\end{figure} - -In ggplot, we can produce many plots that don't make sense, yet are -grammatically valid. This is no different than English, where we can -create senseless but grammatical sentences like the angry rock barked -like a comma. - -Points, lines and bars are all examples of geometric objects, or -\textbf{geoms}. Geoms determine the ``type'' of the plot. Plots that use -a single geom are often given a special name: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -Named plot & Geom & Other features\tabularnewline -\midrule -\endhead -scatterplot & point &\tabularnewline -bubblechart & point & size mapped to a variable\tabularnewline -barchart & bar &\tabularnewline -box-and-whisker plot & boxplot &\tabularnewline -line chart & line &\tabularnewline -\bottomrule -\end{longtable} - -More complex plots with combinations of multiple geoms don't have a -special name, and we have to describe them by hand. For example, this -plot overlays a per group regression line on top of a scatterplot: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \KeywordTok{factor}\NormalTok{(cyl))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/mastery/complex-plot-1} -\end{figure} - -What would you call this plot? Once you've mastered the grammar, you'll -find that many of the plots that you produce are uniquely tailored to -your problems and will no longer have special names. \index{Named plots} - -\subsection{Scaling}\label{scaling} - -The values in the previous table have no meaning to the computer. We -need to convert them from data units (e.g., litres, miles per gallon and -number of cylinders) to graphical units (e.g., pixels and colours) that -the computer can display. This conversion process is called -\textbf{scaling} and performed by scales. Now that these values are -meaningful to the computer, they may not be meaningful to us: colours -are represented by a six-letter hexadecimal string, sizes by a number -and shapes by an integer. These aesthetic specifications that are -meaningful to R are described in \texttt{vignette("ggplot2-specs")}. -\index{Scales!introduction} - -In this example, we have three aesthetics that need to be scaled: -horizontal position (\texttt{x}), vertical position (\texttt{y}) and -\texttt{colour}. Scaling position is easy in this example because we are -using the default linear scales. We need only a linear mapping from the -range of the data to \([0, 1]\). We use \([0, 1]\) instead of exact -pixels because the drawing system that ggplot2 uses, \textbf{grid}, -takes care of that final conversion for us. A final step determines how -the two positions (x and y) are combined to form the final location on -the plot. This is done by the coordinate system, or \textbf{coord}. In -most cases this will be Cartesian coordinates, but it might be polar -coordinates, or a spherical projection used for a map. - -The process for mapping the colour is a little more complicated, as we -have a non-numeric result: colours. However, colours can be thought of -as having three components, corresponding to the three types of -colour-detecting cells in the human eye. These three cell types give -rise to a three-dimensional colour space. Scaling then involves mapping -the data values to points in this space. There are many ways to do this, -but here since \texttt{cyl} is a categorical variable we map values to -evenly spaced hues on the colour wheel, as shown in Figure -\ref{fig:colour-wheel}. A different mapping is used when the variable is -continuous. \index{Colour!wheel} - -\begin{figure}[htbp] - \centering - \includegraphics[width=2in]{diagrams/colour-wheel} - \caption{A colour wheel illustrating the choice of five equally spaced colours. This is the default scale for discrete variables.} - \label{fig:colour-wheel} -\end{figure} - -The result of these conversions is below. As well as aesthetics that -have been mapped to variable, we also include aesthetics that are -constant. We need these so that the aesthetics for each point are -completely specified and R can draw the plot. The points will be filled -circles (shape 19 in R) with a 1-mm diameter: - -\begin{longtable}[c]{@{}lllll@{}} -\toprule -x & y & colour & size & shape\tabularnewline -\midrule -\endhead -0.037 & 0.531 & \#F8766D & 1 & 19\tabularnewline -0.037 & 0.531 & \#F8766D & 1 & 19\tabularnewline -0.074 & 0.594 & \#F8766D & 1 & 19\tabularnewline -0.074 & 0.562 & \#F8766D & 1 & 19\tabularnewline -0.222 & 0.438 & \#00BFC4 & 1 & 19\tabularnewline -0.222 & 0.438 & \#00BFC4 & 1 & 19\tabularnewline -0.278 & 0.469 & \#00BFC4 & 1 & 19\tabularnewline -0.037 & 0.438 & \#F8766D & 1 & 19\tabularnewline -\bottomrule -\end{longtable} - -Finally, we need to render this data to create the graphical objects -that are displayed on the screen. To create a complete plot we need to -combine graphical objects from three sources: the \emph{data}, -represented by the point geom; the \emph{scales and coordinate system}, -which generate axes and legends so that we can read values from the -graph; and \emph{plot annotations}, such as the background and plot -title. - -\hypertarget{sec:complex-plot}{\section{Adding -complexity}\label{sec:complex-plot}} - -With a simple example under our belts, let's now turn to look at this -slightly more complicated example: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~year)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/mastery/complex-1} -\end{figure} - -This plot adds three new components to the mix: facets, multiple layers -and statistics. The facets and layers expand the data structure -described above: each facet panel in each layer has its own dataset. You -can think of this as a 3d array: the panels of the facets form a 2d -grid, and the layers extend upwards in the 3rd dimension. In this case -the data in the layers is the same, but in general we can plot different -datasets on different layers. - -The smooth layer is different to the point layer because it doesn't -display the raw data, but instead displays a statistical transformation -of the data. Specifically, the smooth layer fits a smooth line through -the middle of the data. This requires an additional step in the process -described above: after mapping the data to aesthetics, the data is -passed to a statistical transformation, or \textbf{stat}, which -manipulates the data in some useful way. In this example, the stat fits -the data to a loess smoother, and then returns predictions from evenly -spaced points within the range of the data. Other useful stats include 1 -and 2d binning, group means, quantile regression and contouring. - -As well as adding an additional step to summarise the data, we also need -some extra steps when we get to the scales. This is because we now have -multiple datasets (for the different facets and layers) and we need to -make sure that the scales are the same across all of them. Scaling -actually occurs in three parts: transforming, training and mapping. We -haven't mentioned transformation before, but you have probably seen it -before in log-log plots. In a log-log plot, the data values are not -linearly mapped to position on the plot, but are first log-transformed. - -\begin{itemize} -\item - Scale transformation occurs before statistical transformation so that - statistics are computed on the scale-transformed data. This ensures - that a plot of \(\log(x)\) vs. \(\log(y)\) on linear scales looks the - same as \(x\) vs. \(y\) on log scales. There are many different - transformations that can be used, including taking square roots, - logarithms and reciprocals. See - \protect\hyperlink{sub:scale-position}{continuous scales} for more - details. -\item - After the statistics are computed, each scale is trained on every - dataset from all the layers and facets. The training operation - combines the ranges of the individual datasets to get the range of the - complete data. Without this step, scales could only make sense locally - and we wouldn't be able to overlay different layers because their - positions wouldn't line up. Sometimes we do want to vary position - scales across facets (but never across layers), and this is described - more fully in \protect\hyperlink{sub:controlling-scales}{controlling - scales}. -\item - Finally the scales map the data values into aesthetic values. This is - a local operation: the variables in each dataset are mapped to their - aesthetic values, producing a new dataset that can then be rendered by - the geoms. -\end{itemize} - -Figure \ref{fig:schematic} illustrates the complete process -schematically. - -\begin{figure}[htbp] - \centering - \includegraphics[width=4in]{diagrams/mastery-schema} - \caption{Schematic description of the plot generation process. Each square represents a layer, and this schematic represents a plot with three layers and three panels. All steps work by transforming individual data frames except for training scales, which doesn't affect the data frame and operates across all datasets simultaneously.} - \label{fig:schematic} -\end{figure} - -\hypertarget{sec:components}{\section{Components of the layered -grammar}\label{sec:components}} - -In the examples above, we have seen some of the components that make up -a plot: data and aesthetic mappings, geometric objects (geoms), -statistical transformations (stats), scales, and facetting. We have also -touched on the coordinate system. One thing we didn't mention is the -position adjustment, which deals with overlapping graphic objects. -Together, the data, mappings, stat, geom and position adjustment form a -\textbf{layer}. A plot may have multiple layers, as in the example where -we overlaid a smoothed line on a scatterplot. All together, the layered -grammar defines a plot as the combination of: \index{Grammar!components} - -\begin{itemize} -\item - A default dataset and set of mappings from variables to aesthetics. -\item - One or more layers, each composed of a geometric object, a statistical - transformation, a position adjustment, and optionally, a dataset and - aesthetic mappings. -\item - One scale for each aesthetic mapping. -\item - A coordinate system. -\item - The facetting specification. -\end{itemize} - -The following sections describe each of the higher-level components more -precisely, and point you to the parts of the book where they are -documented. - -\subsection{Layers}\label{layers} - -\textbf{Layers} are responsible for creating the objects that we -perceive on the plot. A layer is composed of five parts: - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\tightlist -\item - Data -\item - Aesthetic mappings. -\item - A statistical transformation (stat). -\item - A geometric object (geom). -\item - A position adjustment. -\end{enumerate} - -The properties of a layer are described in -\protect\hyperlink{cha:layers}{layers} and their uses for data -visualisation in \protect\hyperlink{cha:toolbox}{toolbox}. - -\subsection{Scales}\label{sub:scales} - -A \textbf{scale} controls the mapping from data to aesthetic attributes, -and we need a scale for every aesthetic used on a plot. Each scale -operates across all the data in the plot, ensuring a consistent mapping -from data to aesthetics. Some examples are shown in Figure -\ref{fig:scale-legends}. - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/mastery/scale-legends-1} - \caption{Examples of legends from four different scales. From left to right: continuous variable mapped to size, and to colour, discrete variable mapped to shape, and to colour. The ordering of scales seems upside-down, but this matches the labelling of the $y$-axis: small values occur at the bottom.} - \label{fig:scale-legends} -\end{figure} - -A scale is a function and its inverse, along with a set of parameters. -For example, the colour gradient scale maps a segment of the real line -to a path through a colour space. The parameters of the function define -whether the path is linear or curved, which colour space to use (e.g., -LUV or RGB), and the colours at the start and end. - -The inverse function is used to draw a guide so that you can read values -from the graph. Guides are either axes (for position scales) or legends -(for everything else). Most mappings have a unique inverse (i.e., the -mapping function is one-to-one), but many do not. A unique inverse makes -it possible to recover the original data, but this is not always -desirable if we want to focus attention on a single aspect. - -For more details, see \protect\hyperlink{cha:scales}{scales chapter}. - -\subsection{Coordinate system}\label{sub:coordinate-systems} - -A coordinate system, or \textbf{coord} for short, maps the position of -objects onto the plane of the plot. Position is often specified by two -coordinates \((x, y)\), but potentially could be three or more (although -this is not implemented in ggplot2). The Cartesian coordinate system is -the most common coordinate system for two dimensions, while polar -coordinates and various map projections are used less frequently. - -Coordinate systems affect all position variables simultaneously and -differ from scales in that they also change the appearance of the -geometric objects. For example, in polar coordinates, bar geoms look -like segments of a circle. Additionally, scaling is performed before -statistical transformation, while coordinate transformations occur -afterward. The consequences of this are shown in -\protect\hyperlink{sub:coord-non-linear}{coordinate transformations}. - -Coordinate systems control how the axes and grid lines are drawn. Figure -\ref{fig:coord} illustrates three different types of coordinate systems. -Very little advice is available for drawing these for non-Cartesian -coordinate systems, so a lot of work needs to be done to produce -polished output. See \protect\hyperlink{sec:coord}{coordinate systems} -for more details. - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/mastery/coord-1}% - \includegraphics[width=0.333\linewidth]{_figures/mastery/coord-2}% - \includegraphics[width=0.333\linewidth]{_figures/mastery/coord-3} - \caption{Examples of axes and grid lines for three coordinate systems: Cartesian, semi-log and polar. The polar coordinate system illustrates the difficulties associated with non-Cartesian coordinates: it is hard to draw the axes well.} - \label{fig:coord} -\end{figure} - -\subsection{Facetting}\label{sub:intro-facetting} - -There is also another thing that turns out to be sufficiently useful -that we should include it in our general framework: facetting, a general -case of conditioned or trellised plots. This makes it easy to create -small multiples, each showing a different subset of the whole dataset. -This is a powerful tool when investigating whether patterns hold across -all conditions. The facetting specification describes which variables -should be used to split up the data, and whether position scales should -be free or constrained. Facetting is described in -\protect\hyperlink{cha:position}{position}. - -\section{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - One of the best ways to get a handle on how the grammar works is to - apply it to the analysis of existing graphics. For each of the - graphics listed below, write down the components of the graphic. Don't - worry if you don't know what the corresponding functions in ggplot2 - are called (or if they even exist!), instead focussing on recording - the key elements of a plot so you could communicate it to someone - else. - - \begin{enumerate} - \def\labelenumii{\arabic{enumii}.} - \item - ``Napoleon's march'' by Charles John Minard: - \url{http://www.datavis.ca/gallery/re-minard.php} - \item - ``Where the Heat and the Thunder Hit Their Shots'', by Jeremy White, - Joe Ward, and Matthew Ericson at The New York Times. - \url{http://nyti.ms/1duzTvY} - \item - ``London Cycle Hire Journeys'', by James Cheshire. - \url{http://bit.ly/1S2cyRy} - \item - The Pew Research Center's favorite data visualizations of 2014: - \url{http://pewrsr.ch/1KZSSN6} - \item - ``The Tony's Have Never Been so Dominated by Women'', by Joanna Kao - at FiveThirtyEight: \url{http://53eig.ht/1cJRCyG}. - \item - ``In Climbing Income Ladder, Location Matters'' by the Mike Bostock, - Shan Carter, Amanda Cox, Matthew Ericson, Josh Keller, Alicia - Parlapiano, Kevin Quealy and Josh Williams at the New York Times: - \url{http://nyti.ms/1S2dJQT} - \item - ``Dissecting a Trailer: The Parts of the Film That Make the Cut'', - by Shan Carter, Amanda Cox, and Mike Bostock at the New York Times: - \url{http://nyti.ms/1KTJQOE} - \end{enumerate} -\end{enumerate} - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-wickham:2008}{} -Wickham, Hadley. 2008. ``Practical Tools for Exploring Data and -Models.'' PhD thesis, Iowa State University. -\url{http://had.co.nz/thesis}. - -\hypertarget{ref-wilkinson:2006}{} -Wilkinson, Leland. 2005. \emph{The Grammar of Graphics}. 2nd ed. -Statistics and Computing. Springer. diff --git a/book/tex/modelling.tex b/book/tex/modelling.tex deleted file mode 100644 index 9e0a2b1c..00000000 --- a/book/tex/modelling.tex +++ /dev/null @@ -1,867 +0,0 @@ -\chapter{Modelling for visualisation}\label{cha:modelling} - -\section{Introduction}\label{introduction} - -Modelling is an essential tool for visualisation. There are two -particularly strong connections between modelling and visualisation that -I want to explore in this chapter: \index{Modelling} - -\begin{itemize} -\item - Using models as a tool to remove obvious patterns in your plots. This - is useful because strong patterns mask subtler effects. Often the - strongest effects are already known and expected, and removing them - allows you to see surprises more easily. -\item - Other times you have a lot of data, too much to show on a handful of - plots. Models can be a powerful tool for summarising data so that you - get a higher level view. -\end{itemize} - -In this chapter, I'm going to focus on the use of linear models to -acheive these goals. Linear models are a basic, but powerful, tool of -statistics, and I recommend that everyone serious about visualisation -learns at least the basics of how to use them. To this end, I highly -recommend two books by Julian J. Faraway: - -\begin{itemize} -\tightlist -\item - Linear Models with R \url{http://amzn.com/1439887330} -\item - Extending the Linear Model with R \url{http://amzn.com/158488424X} -\end{itemize} - -These books cover some of the theory of linear models, but are pragmatic -and focussed on how to actually use linear models (and their extensions) -in R. \index{Linear models} - -There are many other modelling tools, which I don't have the space to -show. If you understand how linear models can help improve your -visualisations, you should be able to translate the basic idea to other -families of models. This chapter just scratches the surface of what you -can do. But hopefully it reinforces how visualisation can combine with -modelling to help you build a powerful data analysis toolbox. For more -ideas, check out Wickham, Cook, and Hofmann (2015). - -This chapter only scratches the surface of the intersection between -visualisation and modelling. In my opinion, mastering the combination of -visualisations and models is key to being an effective data scientist. -Unfortunately most books (like this one!) only focus on either -visualisation or modelling, but not both. There's a lot of interesting -work to be done. - -\section{Removing trend}\label{sub:trend} - -So far our analysis of the diamonds data has been plagued by the -powerful relationship between size and price. It makes it very difficult -to see the impact of cut, colour and clarity because higher quality -diamonds tend to be smaller, and hence cheaper. This challenge is often -called confounding. We can use a linear model to remove the effect of -size on price. Instead of looking at the raw price, we can look at the -relative price: how valuable is this diamond relative to the average -diamond of the same size. \index{Removing trend} - -To get started, we'll focus on diamonds of size two carats or less (96\% -of the dataset). This avoids some incidental problems that you can -explore in the exercises if you're interested. We'll also create two new -variables: log price and log carat. These variables are useful because -they produce a plot with a strong linear trend. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{diamonds2 <-}\StringTok{ }\NormalTok{diamonds %>%}\StringTok{ } -\StringTok{ }\KeywordTok{filter}\NormalTok{(carat <=}\StringTok{ }\DecValTok{2}\NormalTok{) %>%} -\StringTok{ }\KeywordTok{mutate}\NormalTok{(} - \DataTypeTok{lcarat =} \KeywordTok{log2}\NormalTok{(carat),} - \DataTypeTok{lprice =} \KeywordTok{log2}\NormalTok{(price)} - \NormalTok{)} -\NormalTok{diamonds2} -\CommentTok{#> Source: local data frame [52,051 x 12]} -\CommentTok{#> } -\CommentTok{#> carat cut color clarity depth table price x y} -\CommentTok{#> (dbl) (fctr) (fctr) (fctr) (dbl) (dbl) (int) (dbl) (dbl)} -\CommentTok{#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98} -\CommentTok{#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84} -\CommentTok{#> 3 0.23 Good E VS1 56.9 65 327 4.05 4.07} -\CommentTok{#> 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23} -\CommentTok{#> 5 0.31 Good J SI2 63.3 58 335 4.34 4.35} -\CommentTok{#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96} -\CommentTok{#> .. ... ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: z (dbl), lcarat (dbl), lprice (dbl)} - -\KeywordTok{ggplot}\NormalTok{(diamonds2, }\KeywordTok{aes}\NormalTok{(lcarat, lprice)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"yellow"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/modelling/unnamed-chunk-1-1} -\end{figure} - -In the graphic we used \texttt{geom\_smooth()} to overlay the line of -best fit to the data. We can replicate this outside of ggplot2 by -fitting a linear model with \texttt{lm()}. This allows us to find out -the slope and intercept of the line: \indexf{lm} \indexf{coef} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mod <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(lprice ~}\StringTok{ }\NormalTok{lcarat, }\DataTypeTok{data =} \NormalTok{diamonds2)} -\KeywordTok{coef}\NormalTok{(}\KeywordTok{summary}\NormalTok{(mod))} -\CommentTok{#> Estimate Std. Error t value Pr(>|t|)} -\CommentTok{#> (Intercept) 12.2 0.00211 5789 0} -\CommentTok{#> lcarat 1.7 0.00208 816 0} -\end{Highlighting} -\end{Shaded} - -If you're familiar with linear models, you might want to interpret those -coefficients: \(\log_2(price) = 12.2 + 1.7 \cdot \log_2(carat)\), which -implies \(price = 4900 \cdot carat ^ {1.7}\). Interpreting those -coefficients certainly is useful, but even if you don't understand them, -the model can still be useful. We can use it to subtract the trend away -by looking at the residuals: the price of each diamond minus its -predicted price, based on weight alone. Geometrically, the residuals are -the vertical distance between each point and the line of best fit. They -tell us the price relative to the ``average'' diamond of that size. -\indexf{resid} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{diamonds2 <-}\StringTok{ }\NormalTok{diamonds2 %>%}\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{rel_price =} \KeywordTok{resid}\NormalTok{(mod))} -\KeywordTok{ggplot}\NormalTok{(diamonds2, }\KeywordTok{aes}\NormalTok{(carat, rel_price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/modelling/unnamed-chunk-3-1} -\end{figure} - -A relative price of zero means that the diamond was at the average -price; positive means that it's more expensive than expected (based on -its size), and negative means that it's cheaper than expected. - -Interpreting the values precisely is a little tricky here because we've -log-transformed price. The residuals give the absolute difference -(\(x - expected\)), but here we have -\(\log_2(price) - \log_2(expected price)\), or equivalently -\(\log_2(price / expected price)\). If we ``back-transform'' to the -original scale by applying the opposite transformation (\(2 ^ x\)) we -get \(price / expected price\). This makes the values more -interpretable, at the cost of the nice symmetry property of the logged -values (i.e.~both relatively cheaper and relatively more expensive -diamonds have the same range). We can make a little table to help -interpret the values: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{xgrid <-}\StringTok{ }\KeywordTok{seq}\NormalTok{(-}\DecValTok{2}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DataTypeTok{by =} \DecValTok{1}\NormalTok{/}\DecValTok{3}\NormalTok{)} -\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{logx =} \NormalTok{xgrid, }\DataTypeTok{x =} \KeywordTok{round}\NormalTok{(}\DecValTok{2} \NormalTok{^}\StringTok{ }\NormalTok{xgrid, }\DecValTok{2}\NormalTok{))} -\CommentTok{#> logx x} -\CommentTok{#> 1 -2.000 0.25} -\CommentTok{#> 2 -1.667 0.31} -\CommentTok{#> 3 -1.333 0.40} -\CommentTok{#> 4 -1.000 0.50} -\CommentTok{#> 5 -0.667 0.63} -\CommentTok{#> 6 -0.333 0.79} -\CommentTok{#> 7 0.000 1.00} -\CommentTok{#> 8 0.333 1.26} -\CommentTok{#> 9 0.667 1.59} -\CommentTok{#> 10 1.000 2.00} -\end{Highlighting} -\end{Shaded} - -This table illustrates why we used \texttt{log2()} rather than -\texttt{log()}: a change of 1 unit on the logged scale, corresponding to -a doubling on the original scale. For example, a \texttt{rel\_price} of --1 means that it's half of the expected price; a relative price of 1 -means that it's twice the expected price. \index{Log!transform} - -Let's use both price and relative price to see how colour and cut affect -the value of a diamond. We'll compute the average price and average -relative price for each combination of colour and cut: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{color_cut <-}\StringTok{ }\NormalTok{diamonds2 %>%}\StringTok{ } -\StringTok{ }\KeywordTok{group_by}\NormalTok{(color, cut) %>%} -\StringTok{ }\KeywordTok{summarise}\NormalTok{(} - \DataTypeTok{price =} \KeywordTok{mean}\NormalTok{(price), } - \DataTypeTok{rel_price =} \KeywordTok{mean}\NormalTok{(rel_price)} - \NormalTok{)} -\NormalTok{color_cut} -\CommentTok{#> Source: local data frame [35 x 4]} -\CommentTok{#> Groups: color [?]} -\CommentTok{#> } -\CommentTok{#> color cut price rel_price} -\CommentTok{#> (fctr) (fctr) (dbl) (dbl)} -\CommentTok{#> 1 D Fair 3939 -0.0755} -\CommentTok{#> 2 D Good 3309 -0.0472} -\CommentTok{#> 3 D Very Good 3368 0.1038} -\CommentTok{#> 4 D Premium 3513 0.1093} -\CommentTok{#> 5 D Ideal 2595 0.2173} -\CommentTok{#> 6 E Fair 3516 -0.1720} -\CommentTok{#> .. ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -If we look at price, it's hard to see how the quality of the diamond -affects the price. The lowest quality diamonds (fair cut with colour J) -have the highest average value! This is because those diamonds also tend -to be larger: size and quality are confounded. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(color_cut, }\KeywordTok{aes}\NormalTok{(color, price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{cut), }\DataTypeTok{colour =} \StringTok{"grey80"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{cut))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/modelling/unnamed-chunk-6-1} -\end{figure} - -If however, we plot the relative price, you see the pattern that you -expect: as the quality of the diamonds decreases, the relative price -decreases. The worst quality diamond is 0.61x (\(2 ^ {-0.7}\)) the price -of an ``average'' diamond. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(color_cut, }\KeywordTok{aes}\NormalTok{(color, rel_price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{cut), }\DataTypeTok{colour =} \StringTok{"grey80"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{cut))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/modelling/unnamed-chunk-7-1} -\end{figure} - -This technique can be employed in a wide range of situations. Wherever -you can explicitly model a strong pattern that you see in a plot, it's -worthwhile to use a model to remove that strong pattern so that you can -see what interesting trends remain. - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - What happens if you repeat the above analysis with all diamonds? (Not - just all diamonds with two or fewer carats). What does the strange - geometry of \texttt{log(carat)} vs relative price represent? What does - the diagonal line without any points represent? -\item - I made an unsupported assertion that lower-quality diamonds tend to be - larger. Support my claim with a plot. -\item - Can you create a plot that simultaneously shows the effect of colour, - cut, and clarity on relative price? If there's too much information to - show on one plot, think about how you might create a sequence of plots - to convey the same message. -\item - How do depth and table relate to the relative price? -\end{enumerate} - -\section{Texas housing data}\label{texas-housing-data} - -We'll continue to explore the connection between modelling and -visualisation with the \texttt{txhousing} dataset: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{txhousing} -\CommentTok{#> Source: local data frame [8,034 x 9]} -\CommentTok{#> } -\CommentTok{#> city year month sales volume median listings inventory} -\CommentTok{#> (chr) (int) (int) (dbl) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 Abilene 2000 1 72 5380000 71400 701 6.3} -\CommentTok{#> 2 Abilene 2000 2 98 6505000 58700 746 6.6} -\CommentTok{#> 3 Abilene 2000 3 130 9285000 58100 784 6.8} -\CommentTok{#> 4 Abilene 2000 4 98 9730000 68600 785 6.9} -\CommentTok{#> 5 Abilene 2000 5 141 10590000 67300 794 6.8} -\CommentTok{#> 6 Abilene 2000 6 156 13910000 66900 780 6.6} -\CommentTok{#> .. ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: date (dbl)} -\end{Highlighting} -\end{Shaded} - -This data was collected by the Real Estate Center at Texas A\&M -University, \url{http://recenter.tamu.edu/Data/hs/}. The data contains -information about 46 Texas cities, recording the number of house sales -(\texttt{sales}), the total volume of sales (\texttt{volume}), the -\texttt{average} and \texttt{median} sale prices, the number of houses -listed for sale (\texttt{listings}) and the number of months inventory -(\texttt{inventory}). Data is recorded monthly from Jan 2000 to Apr -2015, 187 entries for each city. -\index{Data!txhousing@\texttt{txhousing}} - -We're going to explore how sales have varied over time for each city as -it shows some interesting trends and poses some interesting challenges. -Let's start with an overview: a time series of sales for each city: -\index{Data!longitudinal} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(txhousing, }\KeywordTok{aes}\NormalTok{(date, sales)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{city), }\DataTypeTok{alpha =} \DecValTok{1}\NormalTok{/}\DecValTok{2}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/modelling/unnamed-chunk-9-1} -\end{figure} - -Two factors make it hard to see the long-term trend in this plot: - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - The range of sales varies over multiple orders of magnitude. The - biggest city, Houston, averages over \textasciitilde{}4000 sales per - month; the smallest city, San Marcos, only averages - \textasciitilde{}20 sales per month. -\item - There is a strong seasonal trend: sales are much higher in the summer - than in the winter. -\end{enumerate} - -We can fix the first problem by plotting log sales: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(txhousing, }\KeywordTok{aes}\NormalTok{(date, }\KeywordTok{log}\NormalTok{(sales))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{city), }\DataTypeTok{alpha =} \DecValTok{1}\NormalTok{/}\DecValTok{2}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/modelling/unnamed-chunk-10-1} -\end{figure} - -We can fix the second problem using the same technique we used for -removing the trend in the diamonds data: we'll fit a linear model and -look at the residuals. This time we'll use a categorical predictor to -remove the month effect. First we check that the technique works by -applying it to a single city. It's always a good idea to start simple so -that if something goes wrong you can more easily pinpoint the problem. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{abilene <-}\StringTok{ }\NormalTok{txhousing %>%}\StringTok{ }\KeywordTok{filter}\NormalTok{(city ==}\StringTok{ "Abilene"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(abilene, }\KeywordTok{aes}\NormalTok{(date, }\KeywordTok{log}\NormalTok{(sales))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} - -\NormalTok{mod <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(}\KeywordTok{log}\NormalTok{(sales) ~}\StringTok{ }\KeywordTok{factor}\NormalTok{(month), }\DataTypeTok{data =} \NormalTok{abilene)} -\NormalTok{abilene$rel_sales <-}\StringTok{ }\KeywordTok{resid}\NormalTok{(mod)} -\KeywordTok{ggplot}\NormalTok{(abilene, }\KeywordTok{aes}\NormalTok{(date, rel_sales)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/modelling/unnamed-chunk-11-1}% - \includegraphics[width=0.5\linewidth]{_figures/modelling/unnamed-chunk-11-2} -\end{figure} - -We can apply this transformation to every city with \texttt{group\_by()} -and \texttt{mutate()}. Note the use of \texttt{na.action\ =\ na.exclude} -argument to \texttt{lm()}. Counterintuitively this ensures that missing -values in the input are matched with missing values in the output -predictions and residuals. Without this argument, missing values are -just dropped, and the residuals don't line up with the inputs. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{deseas <-}\StringTok{ }\NormalTok{function(x, month) \{} - \KeywordTok{resid}\NormalTok{(}\KeywordTok{lm}\NormalTok{(x ~}\StringTok{ }\KeywordTok{factor}\NormalTok{(month), }\DataTypeTok{na.action =} \NormalTok{na.exclude))} -\NormalTok{\}} - -\NormalTok{txhousing <-}\StringTok{ }\NormalTok{txhousing %>%}\StringTok{ } -\StringTok{ }\KeywordTok{group_by}\NormalTok{(city) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{rel_sales =} \KeywordTok{deseas}\NormalTok{(}\KeywordTok{log}\NormalTok{(sales), month))} -\end{Highlighting} -\end{Shaded} - -With this data in hand, we can re-plot the data. Now that we have -log-transformed the data and removed the strong seasonal effects we can -see there is a strong common pattern: a consistent increase from -2000-2007, a drop until 2010 (with quite some noise), and then a gradual -rebound. To make that more clear, I included a summary line that shows -the mean relative sales across all cities. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(txhousing, }\KeywordTok{aes}\NormalTok{(date, rel_sales)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{city), }\DataTypeTok{alpha =} \DecValTok{1}\NormalTok{/}\DecValTok{5}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"summary"}\NormalTok{, }\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/modelling/unnamed-chunk-13-1} -\end{figure} - -(Note that removing the seasonal effect also removed the intercept - we -see the trend for each city relative to its average number of sales.) - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - The final plot shows a lot of short-term noise in the overall trend. - How could you smooth this further to focus on long-term changes? -\item - If you look closely (e.g. \texttt{+\ xlim(2008,\ 2012)}) at the - long-term trend you'll notice a weird pattern in 2009-2011. It looks - like there was a big dip in 2010. Is this dip ``real''? (i.e.~can you - spot it in the original data) -\item - What other variables in the TX housing data show strong seasonal - effects? Does this technique help to remove them? -\item - Not all the cities in this data set have complete time series. Use - your dplyr skills to figure out how much data each city is missing. - Display the results with a visualisation. -\item - Replicate the computation that \texttt{stat\_summary()} did with dplyr - so you can plot the data ``by hand''. -\end{enumerate} - -\section{Visualising models}\label{sub:modelvis} - -The previous examples used the linear model just as a tool for removing -trend: we fit the model and immediately threw it away. We didn't care -about the model itself, just what it could do for us. But the models -themselves contain useful information and if we keep them around, there -are many new problems that we can solve: - -\begin{itemize} -\item - We might be interested in cities where the model didn't fit well: a - poorly fitting model suggests that there isn't much of a seasonal - pattern, which contradicts our implicit hypothesis that all cities - share a similar pattern. -\item - The coefficients themselves might be interesting. In this case, - looking at the coefficients will show us how the seasonal pattern - varies between cities. -\item - We may want to dive into the details of the model itself, and see - exactly what it says about each observation. For this data, it might - help us find suspicious data points that might reflect data entry - errors. -\end{itemize} - -To take advantage of this data, we need to store the models. We can do -this using a new dplyr verb: \texttt{do()}. It allows us to store the -result of arbitrary computation in a column. Here we'll use it to store -that linear model: \indexf{do} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{models <-}\StringTok{ }\NormalTok{txhousing %>%}\StringTok{ } -\StringTok{ }\KeywordTok{group_by}\NormalTok{(city) %>%} -\StringTok{ }\KeywordTok{do}\NormalTok{(}\DataTypeTok{mod =} \KeywordTok{lm}\NormalTok{(} - \KeywordTok{log2}\NormalTok{(sales) ~}\StringTok{ }\KeywordTok{factor}\NormalTok{(month), } - \DataTypeTok{data =} \NormalTok{., } - \DataTypeTok{na.action =} \NormalTok{na.exclude} - \NormalTok{))} -\NormalTok{models} -\CommentTok{#> Source: local data frame [46 x 2]} -\CommentTok{#> Groups: } -\CommentTok{#> } -\CommentTok{#> city mod} -\CommentTok{#> (chr) (chr)} -\CommentTok{#> 1 Abilene } -\CommentTok{#> 2 Amarillo } -\CommentTok{#> 3 Arlington } -\CommentTok{#> 4 Austin } -\CommentTok{#> 5 Bay Area } -\CommentTok{#> 6 Beaumont } -\CommentTok{#> .. ... ...} -\end{Highlighting} -\end{Shaded} - -There are two important things to note in this code: - -\begin{itemize} -\item - \texttt{do()} creates a new column called \texttt{mod.} This is a - special type of column: instead of containing an atomic vector (a - logical, integer, numeric, or character) like usual, it's a list. - Lists are R's most flexible data structure and can hold anything, - including linear models. -\item - \texttt{.} is a special pronoun used by \texttt{do()}. It refers to - the ``current'' data frame. In this example, \texttt{do()} fits the - model 46 times (once for each city), each time replacing \texttt{.} - with the data for one city. \indexc{.} -\end{itemize} - -If you're an experienced modeller, you might wonder why I didn't fit one -model to all cities simultaneously. That's a great next step, but it's -often useful to start off simple. Once we have a model that works for -each city individually, you can figure out how to generalise it to fit -all cities simultaneously. - -To visualise these models, we'll turn them into tidy data frames. We'll -do that with the \textbf{broom} package by David Robinson. \index{broom} -\index{Tidy models} \index{Model data} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(broom)} -\end{Highlighting} -\end{Shaded} - -Broom provides three key verbs, each corresponding to one of the -challenges outlined above: - -\begin{itemize} -\item - \texttt{glance()} extracts \textbf{model}-level summaries with one row - of data for each model. It contains summary statistics like the - \(R^2\) and degrees of freedom. -\item - \texttt{tidy()} extracts \textbf{coefficient}-level summaries with one - row of data for each coefficient in each model. It contains - information about individual coefficients like their estimate and - standard error. -\item - \texttt{augment()} extracts \textbf{observation}-level summaries with - one row of data for each observation in each model. It includes - variables like the residual and influence metrics useful for - diagnosing outliers. -\end{itemize} - -We'll learn more about each of these functions in the following three -sections. - -\section{Model-level summaries}\label{model-level-summaries} - -We'll begin by looking at how well the model fit to each city with -\texttt{glance()}: \indexf{glance} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{model_sum <-}\StringTok{ }\NormalTok{models %>%}\StringTok{ }\KeywordTok{glance}\NormalTok{(mod)} -\NormalTok{model_sum} -\CommentTok{#> Source: local data frame [46 x 12]} -\CommentTok{#> Groups: city [46]} -\CommentTok{#> } -\CommentTok{#> city r.squared adj.r.squared sigma statistic p.value} -\CommentTok{#> (chr) (dbl) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 Abilene 0.530 0.500 0.282 17.9 1.50e-23} -\CommentTok{#> 2 Amarillo 0.449 0.415 0.302 13.0 7.41e-18} -\CommentTok{#> 3 Arlington 0.513 0.483 0.267 16.8 2.75e-22} -\CommentTok{#> 4 Austin 0.487 0.455 0.310 15.1 2.04e-20} -\CommentTok{#> 5 Bay Area 0.555 0.527 0.265 19.9 1.45e-25} -\CommentTok{#> 6 Beaumont 0.430 0.395 0.275 12.0 1.18e-16} -\CommentTok{#> .. ... ... ... ... ... ...} -\CommentTok{#> df} -\CommentTok{#> (int)} -\CommentTok{#> 1 12} -\CommentTok{#> 2 12} -\CommentTok{#> 3 12} -\CommentTok{#> 4 12} -\CommentTok{#> 5 12} -\CommentTok{#> 6 12} -\CommentTok{#> .. ...} -\CommentTok{#> Variables not shown: logLik (dbl), AIC (dbl), BIC (dbl), deviance} -\CommentTok{#> (dbl), df.residual (int)} -\end{Highlighting} -\end{Shaded} - -This creates a variable with one row for each city, and variables that -either summarise complexity (e.g. \texttt{df}) or fit (e.g. -\texttt{r.squared}, \texttt{p.value}, \texttt{AIC}). Since all the -models we fit have the same complexity (12 terms: one for each month), -we'll focus on the model fit summaries. \(R^2\) is a reasonable place to -start because it's well known. We can use a dot plot to see the -variation across cities: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(model_sum, }\KeywordTok{aes}\NormalTok{(r.squared, }\KeywordTok{reorder}\NormalTok{(city, r.squared))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/modelling/unnamed-chunk-17-1} -\end{figure} - -It's hard to picture exactly what those values of \(R^2\) mean, so it's -helpful to pick out a few exemplars. The following code extracts and -plots out the three cities with the highest and lowest \(R^2\): - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{top3 <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"Bryan-College Station"}\NormalTok{, }\StringTok{"Lubbock"}\NormalTok{, }\StringTok{"NE Tarrant County"}\NormalTok{)} -\NormalTok{bottom3 <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"McAllen"}\NormalTok{, }\StringTok{"Brownsville"}\NormalTok{, }\StringTok{"Harlingen"}\NormalTok{)} -\NormalTok{extreme <-}\StringTok{ }\NormalTok{txhousing %>%}\StringTok{ }\KeywordTok{ungroup}\NormalTok{() %>%} -\StringTok{ }\KeywordTok{filter}\NormalTok{(city %in%}\StringTok{ }\KeywordTok{c}\NormalTok{(top3, bottom3), !}\KeywordTok{is.na}\NormalTok{(sales)) %>%} -\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{city =} \KeywordTok{factor}\NormalTok{(city, }\KeywordTok{c}\NormalTok{(top3, bottom3)))} - -\KeywordTok{ggplot}\NormalTok{(extreme, }\KeywordTok{aes}\NormalTok{(month, }\KeywordTok{log}\NormalTok{(sales))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{year)) +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~city)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/modelling/unnamed-chunk-18-1} -\end{figure} - -The cities with low \(R^2\) have weaker seasonal patterns and more -variation between years. The data for Harlingen seems particularly -noisy. - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Do your conclusions change if you use a different measurement of model - fit like AIC or deviance? Why/why not? -\item - One possible hypothesis that explains why McAllen, Harlingen and - Brownsville have lower \(R^2\) is that they're smaller towns so there - are fewer sales and more noise. Confirm or refute this hypothesis. -\item - McAllen, Harlingen and Brownsville seem to have much more year-to-year - variation than Bryan-College Station, Lubbock, and NE Tarrant County. - How does the model change if you also include a linear trend for year? - (i.e. \texttt{log(sales)\ \textasciitilde{}\ factor(month)\ +\ year}). -\item - Create a faceted plot that shows the seasonal patterns for all - cities.\\ - Order the facets by the \(R^2\) for the city. -\end{enumerate} - -\section{Coefficient-level summaries}\label{coefficient-level-summaries} - -The model fit summaries suggest that there are some important -differences in seasonality between the different cities. Let's dive into -those differences by using \texttt{tidy()} to extract detail about each -individual coefficient: \indexf{tidy} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{coefs <-}\StringTok{ }\NormalTok{models %>%}\StringTok{ }\KeywordTok{tidy}\NormalTok{(mod)} -\NormalTok{coefs} -\CommentTok{#> Source: local data frame [552 x 6]} -\CommentTok{#> Groups: city [46]} -\CommentTok{#> } -\CommentTok{#> city term estimate std.error statistic p.value} -\CommentTok{#> (chr) (chr) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 Abilene (Intercept) 6.542 0.0704 92.88 7.90e-151} -\CommentTok{#> 2 Abilene factor(month)2 0.354 0.0996 3.55 4.91e-04} -\CommentTok{#> 3 Abilene factor(month)3 0.675 0.0996 6.77 1.83e-10} -\CommentTok{#> 4 Abilene factor(month)4 0.749 0.0996 7.52 2.76e-12} -\CommentTok{#> 5 Abilene factor(month)5 0.916 0.0996 9.20 1.06e-16} -\CommentTok{#> 6 Abilene factor(month)6 1.002 0.0996 10.06 4.37e-19} -\CommentTok{#> .. ... ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -We're more interested in the month effect, so we'll do a little extra -tidying to only look at the month coefficients, and then to extract the -month value into a numeric variable: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{months <-}\StringTok{ }\NormalTok{coefs %>%} -\StringTok{ }\KeywordTok{filter}\NormalTok{(}\KeywordTok{grepl}\NormalTok{(}\StringTok{"factor"}\NormalTok{, term)) %>%} -\StringTok{ }\NormalTok{tidyr::}\KeywordTok{extract}\NormalTok{(term, }\StringTok{"month"}\NormalTok{, }\StringTok{"(}\CharTok{\textbackslash{}\textbackslash{}}\StringTok{d+)"}\NormalTok{, }\DataTypeTok{convert =} \OtherTok{TRUE}\NormalTok{)} -\NormalTok{months} -\CommentTok{#> Source: local data frame [506 x 6]} -\CommentTok{#> Groups: city [46]} -\CommentTok{#> } -\CommentTok{#> city month estimate std.error statistic p.value} -\CommentTok{#> (chr) (int) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 Abilene 2 0.354 0.0996 3.55 4.91e-04} -\CommentTok{#> 2 Abilene 3 0.675 0.0996 6.77 1.83e-10} -\CommentTok{#> 3 Abilene 4 0.749 0.0996 7.52 2.76e-12} -\CommentTok{#> 4 Abilene 5 0.916 0.0996 9.20 1.06e-16} -\CommentTok{#> 5 Abilene 6 1.002 0.0996 10.06 4.37e-19} -\CommentTok{#> 6 Abilene 7 0.954 0.0996 9.58 9.81e-18} -\CommentTok{#> .. ... ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -This is a common pattern. You need to use your data tidying skills at -many points in an analysis. Once you have the correct tidy dataset, -creating the plot is usually easy. Here we'll put month on the x-axis, -estimate on the y-axis, and draw one line for each city. I've -back-transformed to make the coefficients more interpretable: these are -now ratios of sales compared to January. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(months, }\KeywordTok{aes}\NormalTok{(month, }\DecValTok{2} \NormalTok{^}\StringTok{ }\NormalTok{estimate)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{city))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/modelling/unnamed-chunk-21-1} -\end{figure} - -The pattern seems similar across the cities. The main difference is the -strength of the seasonal effect. Let's pull that out and plot it: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{coef_sum <-}\StringTok{ }\NormalTok{months %>%} -\StringTok{ }\KeywordTok{group_by}\NormalTok{(city) %>%} -\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{max =} \KeywordTok{max}\NormalTok{(estimate))} -\KeywordTok{ggplot}\NormalTok{(coef_sum, }\KeywordTok{aes}\NormalTok{(}\DecValTok{2} \NormalTok{^}\StringTok{ }\NormalTok{max, }\KeywordTok{reorder}\NormalTok{(city, max))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/modelling/unnamed-chunk-22-1} -\end{figure} - -The cities with the strongest seasonal effect are College Station and -San Marcos (both college towns) and Galveston and South Padre Island -(beach cities). It makes sense that these cities would have very strong -seasonal effects. - -\subsection{Exercises}\label{exercises-3} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Pull out the three cities with highest and lowest seasonal effect. - Plot their coefficients. -\item - How does strength of seasonal effect relate to the \(R^2\) for the - model? Answer with a plot. -\item - You should be extra cautious when your results agree with your prior - beliefs. How can you confirm or refute my hypothesis about the causes - of strong seasonal patterns? -\item - Group the diamonds data by cut, clarity and colour. Fit a linear model - \texttt{log(price)\ \textasciitilde{}\ log(carat)}. What does the - intercept tell you? What does the slope tell you? How do the slope and - intercept vary across the groups? Answer with a plot. -\end{enumerate} - -\section{Observation data}\label{observation-data} - -Observation-level data, which include residual diagnostics, is most -useful in the traditional model fitting scenario, because it can helps -you find ``high-leverage'' points, point that have a big influence on -the final model. It's also useful in conjunction with visualisation, -particularly because it provides an alternative way to access the -residuals. - -Extracting observation-level data is the job of the \texttt{augment()} -function. This adds one row for each observation. It includes the -variables used in the original model, the residuals, and a number of -common influence statistics (see \texttt{?augment.lm} for more details): -\indexf{augment} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{obs_sum <-}\StringTok{ }\NormalTok{models %>%}\StringTok{ }\KeywordTok{augment}\NormalTok{(mod)} -\NormalTok{obs_sum} -\CommentTok{#> Source: local data frame [8,034 x 10]} -\CommentTok{#> Groups: city [46]} -\CommentTok{#> } -\CommentTok{#> city log2.sales. factor.month. .fitted .se.fit .resid .hat} -\CommentTok{#> (chr) (dbl) (fctr) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 Abilene 6.17 1 6.54 0.0704 -0.372 0.0625} -\CommentTok{#> 2 Abilene 6.61 2 6.90 0.0704 -0.281 0.0625} -\CommentTok{#> 3 Abilene 7.02 3 7.22 0.0704 -0.194 0.0625} -\CommentTok{#> 4 Abilene 6.61 4 7.29 0.0704 -0.676 0.0625} -\CommentTok{#> 5 Abilene 7.14 5 7.46 0.0704 -0.319 0.0625} -\CommentTok{#> 6 Abilene 7.29 6 7.54 0.0704 -0.259 0.0625} -\CommentTok{#> .. ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: .sigma (dbl), .cooksd (dbl), .std.resid (dbl)} -\end{Highlighting} -\end{Shaded} - -For example, it might be interesting to look at the distribution of -standardised residuals. (These are residuals standardised to have a -variance of one in each model, making them more comparable). We're -looking for unusual values that might need deeper exploration: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(obs_sum, }\KeywordTok{aes}\NormalTok{(.std.resid)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.1}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(obs_sum, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{abs}\NormalTok{(.std.resid))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/modelling/unnamed-chunk-25-1}% - \includegraphics[width=0.5\linewidth]{_figures/modelling/unnamed-chunk-25-2} -\end{figure} - -A threshold of 2 seems like a reasonable threshold to explore -individually: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{obs_sum %>%}\StringTok{ } -\StringTok{ }\KeywordTok{filter}\NormalTok{(}\KeywordTok{abs}\NormalTok{(.std.resid) >}\StringTok{ }\DecValTok{2}\NormalTok{) %>%} -\StringTok{ }\KeywordTok{group_by}\NormalTok{(city) %>%} -\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{n =} \KeywordTok{n}\NormalTok{(), }\DataTypeTok{avg =} \KeywordTok{mean}\NormalTok{(}\KeywordTok{abs}\NormalTok{(.std.resid))) %>%} -\StringTok{ }\KeywordTok{arrange}\NormalTok{(}\KeywordTok{desc}\NormalTok{(n))} -\CommentTok{#> Source: local data frame [43 x 3]} -\CommentTok{#> } -\CommentTok{#> city n avg} -\CommentTok{#> (chr) (int) (dbl)} -\CommentTok{#> 1 Texarkana 12 2.43} -\CommentTok{#> 2 Harlingen 11 2.73} -\CommentTok{#> 3 Waco 11 2.96} -\CommentTok{#> 4 Victoria 10 2.49} -\CommentTok{#> 5 Brazoria County 9 2.31} -\CommentTok{#> 6 Brownsville 9 2.48} -\CommentTok{#> .. ... ... ...} -\end{Highlighting} -\end{Shaded} - -In a real analysis, you'd want to look into these cities in more detail. - -\subsection{Exercises}\label{exercises-4} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - A common diagnotic plot is fitted values (\texttt{.fitted}) - vs.~residuals (\texttt{.resid}). Do you see any patterns? What if you - include the city or month on the same plot? -\item - Create a time series of log(sales) for each city. Highlight points - that have a standardised residual of greater than 2. -\end{enumerate} - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-model-vis-paper}{} -Wickham, Hadley, Dianne Cook, and Heike Hofmann. 2015. ``Visualizing -Statistical Models: Removing the Blindfold.'' \emph{Statistical Analysis -and Data Mining: The ASA Data Science Journal} 8 (4): 203--25. diff --git a/book/tex/position.tex b/book/tex/position.tex deleted file mode 100644 index b85a3ac5..00000000 --- a/book/tex/position.tex +++ /dev/null @@ -1,958 +0,0 @@ -\chapter{Positioning}\label{cha:position} - -\section{Introduction}\label{introduction} - -This chapter discusses position, particularly how facets are laid out on -a page, and how coordinate systems within a panel work. There are four -components that control position. You have already learned about two of -them that work within a facet: \index{Positioning} - -\begin{itemize} -\item - \textbf{Position adjustments} adjust the position of overlapping - objects within a layer. These are most useful for bar and other - interval geoms, but can be useful in other situations - (\protect\hyperlink{sec:position}{link to section}). -\item - \textbf{Position scales} control how the values in the data are mapped - to positions on the plot (\protect\hyperlink{sub:scale-position}{link - to section}). -\end{itemize} - -This chapter will describe the other two components and show you how all -four pieces fit together: - -\begin{itemize} -\item - \textbf{Facetting} is a mechanism for automatically laying out - multiple plots on a page. It splits the data into subsets, and then - plots each subset in a different panel. Such plots are often called - small multiples or trellis graphics - (\protect\hyperlink{sec:facetting}{link to section}). -\item - \textbf{Coordinate systems} control how the two independent position - scales are combined to create a 2d coordinate system. The most common - coordinate system is Cartesian, but other coordinate systems can be - useful in special circumstances (\protect\hyperlink{sec:coord}{link to - section}). -\end{itemize} - -\hypertarget{sec:facetting}{\section{Facetting}\label{sec:facetting}} - -You first encountered facetting in -\protect\hyperlink{sec:qplot-facetting}{getting started}. Facetting -generates small multiples each showing a different subset of the data. -Small multiples are a powerful tool for exploratory data analysis: you -can rapidly compare patterns in different parts of the data and see -whether they are the same or different. This section will discuss how -you can fine-tune facets, particularly the way in which they interact -with position scales. \index{Facetting} \index{Positioning!facetting} - -There are three types of facetting: - -\begin{itemize} -\item - \texttt{facet\_null()}: a single plot, the default. - \indexf{facet\_null} -\item - \texttt{facet\_wrap()}: ``wraps'' a 1d ribbon of panels into 2d. -\item - \texttt{facet\_grid()}: produces a 2d grid of panels defined by - variables which form the rows and columns. -\end{itemize} - -The differences between \texttt{facet\_wrap()} and -\texttt{facet\_grid()} are illustrated in Figure \ref{fig:facet-sketch}. - -\begin{figure}[htbp] - \centering - \includegraphics[width=0.75\linewidth]{diagrams/position-facets} - \caption{A sketch illustrating the difference between the two facetting systems. \texttt{facet\_grid()} (left) is fundamentally 2d, being made up of two independent components. \texttt{facet\_wrap()} (right) is 1d, but wrapped into 2d to save space.} - \label{fig:facet-sketch} -\end{figure} - -Faceted plots have the capability to fill up a lot of space, so for this -chapter we will use a subset of the mpg dataset that has a manageable -number of levels: three cylinders (4, 6, 8), two types of drive train (4 -and f), and six classes. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mpg2 <-}\StringTok{ }\KeywordTok{subset}\NormalTok{(mpg, cyl !=}\StringTok{ }\DecValTok{5} \NormalTok{&}\StringTok{ }\NormalTok{drv %in%}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"4"}\NormalTok{, }\StringTok{"f"}\NormalTok{) &}\StringTok{ }\NormalTok{class !=}\StringTok{ "2seater"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\subsection{Facet wrap}\label{sub:facet-wrap} - -\texttt{facet\_wrap()} makes a long ribbon of panels (generated by any -number of variables) and wraps it into 2d. This is useful if you have a -single variable with many levels and want to arrange the plots in a more -space efficient manner. \index{Facetting!wrapped} \indexf{facet\_wrap} -\indexc{\textasciitilde} - -You can control how the ribbon is wrapped into a grid with -\texttt{ncol}, \texttt{nrow}, \texttt{as.table} and \texttt{dir}. -\texttt{ncol} and \texttt{nrow} control how many columns and rows (you -only need to set one). \texttt{as.table} controls whether the facets are -laid out like a table (\texttt{TRUE}), with highest values at the -bottom-right, or a plot (\texttt{FALSE}), with the highest values at the -top-right. \texttt{dir} controls the direction of wrap: -\textbf{h}orizontal or \textbf{v}ertical. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg2, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_blank}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} - -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~class, }\DataTypeTok{ncol =} \DecValTok{3}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~class, }\DataTypeTok{ncol =} \DecValTok{3}\NormalTok{, }\DataTypeTok{as.table =} \OtherTok{FALSE}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/position/unnamed-chunk-1-1}% - \includegraphics[width=0.5\linewidth]{_figures/position/unnamed-chunk-1-2} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~class, }\DataTypeTok{nrow =} \DecValTok{3}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~class, }\DataTypeTok{nrow =} \DecValTok{3}\NormalTok{, }\DataTypeTok{dir =} \StringTok{"v"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/position/unnamed-chunk-2-1}% - \includegraphics[width=0.5\linewidth]{_figures/position/unnamed-chunk-2-2} -\end{figure} - -\subsection{Facet grid}\label{facet-grid} - -\texttt{facet\_grid()} lays out plots in a 2d grid, as defined by a -formula: \index{Facetting!grid} \indexf{facet\_grid} - -\begin{itemize} -\item - \texttt{.\ \textasciitilde{}\ a} spreads the values of \texttt{a} - across the columns. This direction\\ - facilitates comparisons of y position, because the vertical scales - are aligned. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_grid}\NormalTok{(. ~}\StringTok{ }\NormalTok{cyl)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/grid-v-1} - \end{figure} -\item - \texttt{b\ \textasciitilde{}\ .} spreads the values of \texttt{b} down - the rows. This direction facilitates comparison of x position because - the horizontal scales are aligned. This makes it particularly useful - for comparing distributions. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_grid}\NormalTok{(drv ~}\StringTok{ }\NormalTok{.)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.3\linewidth]{_figures/position/mpg2-h-1} - \end{figure} -\item - \texttt{a\ \textasciitilde{}\ b} spreads \texttt{a} across columns and - \texttt{b} down rows. You'll usually want to put the variable with the - greatest number of levels in the columns, to take advantage of the - aspect ratio of your screen. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{facet_grid}\NormalTok{(drv ~}\StringTok{ }\NormalTok{cyl)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/grid-vh-1} - \end{figure} -\end{itemize} - -You can use multiple variables in the rows or columns, by ``adding'' -them together, e.g. \texttt{a\ +\ b\ \textasciitilde{}\ c\ +\ d}. -Variables appearing together on the rows or columns are nested in the -sense that only combinations that appear in the data will appear in the -plot. Variables that are specified on rows and columns will be crossed: -all combinations will be shown, including those that didn't appear in -the original dataset: this may result in empty panels. - -\subsection{Controlling scales}\label{sub:controlling-scales} - -For both \texttt{facet\_wrap()} and \texttt{facet\_grid()} you can -control whether the position scales are the same in all panels (fixed) -or allowed to vary between panels (free) with the \texttt{scales} -parameter: \index{Facetting!interaction with scales} -\index{Scales!interaction with facetting} -\index{Facetting!controlling scales} - -\begin{itemize} -\tightlist -\item - \texttt{scales\ =\ "fixed"}: x and y scales are fixed across all - panels. -\item - \texttt{scales\ =\ "free\_x"}: the x scale is free, and the y scale is - fixed. -\item - \texttt{scales\ =\ "free\_y"}: the y scale is free, and the x scale is - fixed. -\item - \texttt{scales\ =\ "free"}: x and y scales vary across panels. -\end{itemize} - -\texttt{facet\_grid()} imposes an additional constraint on the scales: -all panels in a column must have the same x scale, and all panels in a -row must have the same y scale. This is because each column shares an x -axis, and each row shares a y axis. - -Fixed scales make it easier to see patterns across panels; free scales -make it easier to see patterns within panels. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg2, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_abline}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.1}\NormalTok{, }\DataTypeTok{height =} \FloatTok{0.1}\NormalTok{) } -\NormalTok{p +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~cyl)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/fixed-vs-free-1}% -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~cyl, }\DataTypeTok{scales =} \StringTok{"free"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/fixed-vs-free-2} -\end{figure} - -Free scales are also useful when we want to display multiple time series -that were measured on different scales. To do this, we first need to -change from `wide' to `long' data, stacking the separate variables into -a single column. An example of this is shown below with the long form of -the \texttt{economics} data, and the topic is discussed in more detail -in \protect\hyperlink{sec:spread-gather}{converting data from wide to -long}. \index{Data!economics\_long@\texttt{economics\_long}} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{economics_long} -\CommentTok{#> Source: local data frame [2,870 x 4]} -\CommentTok{#> Groups: variable [5]} -\CommentTok{#> } -\CommentTok{#> date variable value value01} -\CommentTok{#> (date) (fctr) (dbl) (dbl)} -\CommentTok{#> 1 1967-07-01 pce 507 0.000000} -\CommentTok{#> 2 1967-08-01 pce 510 0.000266} -\CommentTok{#> 3 1967-09-01 pce 516 0.000764} -\CommentTok{#> 4 1967-10-01 pce 513 0.000472} -\CommentTok{#> 5 1967-11-01 pce 518 0.000918} -\CommentTok{#> 6 1967-12-01 pce 526 0.001579} -\CommentTok{#> .. ... ... ... ...} -\KeywordTok{ggplot}\NormalTok{(economics_long, }\KeywordTok{aes}\NormalTok{(date, value)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~variable, }\DataTypeTok{scales =} \StringTok{"free_y"}\NormalTok{, }\DataTypeTok{ncol =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/time-1} -\end{figure} - -\texttt{facet\_grid()} has an additional parameter called -\texttt{space}, which takes the same values as \texttt{scales}. When -space is ``free'', each column (or row) will have width (or height) -proportional to the range of the scale for that column (or row). This -makes the scaling equal across the whole plot: 1 cm on each panel maps -to the same range of data. (This is somewhat analogous to the `sliced' -axis limits of lattice.) For example, if panel a had range 2 and panel b -had range 4, one-third of the space would be given to a, and two-thirds -to b. This is most useful for categorical scales, where we can assign -space proportionally based on the number of levels in each facet, as -illustrated below. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mpg2$model <-}\StringTok{ }\KeywordTok{reorder}\NormalTok{(mpg2$model, mpg2$cty)} -\NormalTok{mpg2$manufacturer <-}\StringTok{ }\KeywordTok{reorder}\NormalTok{(mpg2$manufacturer, -mpg2$cty)} -\KeywordTok{ggplot}\NormalTok{(mpg2, }\KeywordTok{aes}\NormalTok{(cty, model)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_grid}\NormalTok{(manufacturer ~}\StringTok{ }\NormalTok{., }\DataTypeTok{scales =} \StringTok{"free"}\NormalTok{, }\DataTypeTok{space =} \StringTok{"free"}\NormalTok{) +} -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{strip.text.y =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle =} \DecValTok{0}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/discrete-free-1} -\end{figure} - -\subsection{Missing facetting -variables}\label{sub:missing-facetting-columns} - -If you are using facetting on a plot with multiple datasets, what -happens when one of those datasets is missing the facetting variables? -This situation commonly arises when you are adding contextual -information that should be the same in all panels. For example, imagine -you have a spatial display of disease faceted by gender. What happens -when you add a map layer that does not contain the gender variable? Here -ggplot will do what you expect: it will display the map in every facet: -missing facetting variables are treated like they have all values. -\index{Facetting!missing data} - -Here's a simple example. Note how the single red point from \texttt{df2} -appears in both panels. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df1 <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{gender =} \KeywordTok{c}\NormalTok{(}\StringTok{"f"}\NormalTok{, }\StringTok{"f"}\NormalTok{, }\StringTok{"m"}\NormalTok{))} -\NormalTok{df2 <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{2}\NormalTok{, }\DataTypeTok{y =} \DecValTok{2}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(df1, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{data =} \NormalTok{df2, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~gender)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/unnamed-chunk-3-1} -\end{figure} - -This technique is particularly useful when you add annotations to make -it easier to compare between facets, as shown in the next section. - -\subsection{Grouping vs.~facetting}\label{sub:group-vs-facet} - -Facetting is an alternative to using aesthetics (like colour, shape or -size) to differentiate groups. Both techniques have strengths and -weaknesses, based around the relative positions of the subsets. -\index{Facetting!vs. grouping} \index{Grouping!vs. facetting} With -facetting, each group is quite far apart in its own panel, and there is -no overlap between the groups. This is good if the groups overlap a lot, -but it does make small differences harder to see. When using aesthetics -to differentiate groups, the groups are close together and may overlap, -but small differences are easier to see. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{x =} \KeywordTok{rnorm}\NormalTok{(}\DecValTok{120}\NormalTok{, }\KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{4}\NormalTok{)),} - \DataTypeTok{y =} \KeywordTok{rnorm}\NormalTok{(}\DecValTok{120}\NormalTok{, }\KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{1}\NormalTok{)),} - \DataTypeTok{z =} \NormalTok{letters[}\DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{]} -\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/position/unnamed-chunk-4-1} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~z)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/position/unnamed-chunk-5-1} -\end{figure} - -Comparisons between facets often benefit from some thoughtful -annotation. For example, in this case we could show the mean of each -group in every panel. You'll learn how to write summary code like this -in \protect\hyperlink{cha:dplyr}{dplyr}. Note that we need two ``z'' -variables: one for the facets and one for the colours. -\index{Facetting!adding annotations} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df_sum <-}\StringTok{ }\NormalTok{df %>%}\StringTok{ } -\StringTok{ }\KeywordTok{group_by}\NormalTok{(z) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{mean}\NormalTok{(x), }\DataTypeTok{y =} \KeywordTok{mean}\NormalTok{(y)) %>%} -\StringTok{ }\KeywordTok{rename}\NormalTok{(}\DataTypeTok{z2 =} \NormalTok{z)} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{data =} \NormalTok{df_sum, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z2), }\DataTypeTok{size =} \DecValTok{4}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~z)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/position/unnamed-chunk-6-1} -\end{figure} - -Another useful technique is to put all the data in the background of -each panel: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df2 <-}\StringTok{ }\NormalTok{dplyr::}\KeywordTok{select}\NormalTok{(df, -z)} - -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{data =} \NormalTok{df2, }\DataTypeTok{colour =} \StringTok{"grey70"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z)) +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~z)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/position/unnamed-chunk-7-1} -\end{figure} - -\subsection{Continuous variables}\label{sub:continuous-variables} - -To facet continuous variables, you must first discretise them. ggplot2 -provides three helper functions to do so: -\index{Facetting!by continuous variables} - -\begin{itemize} -\item - Divide the data into \texttt{n} bins each of the same length: - \texttt{cut\_interval(x,\ n)} \indexf{cut\_interval} -\item - Divide the data into bins of width \texttt{width}: - \texttt{cut\_width(x,\ width)}. \indexf{cut\_width} -\item - Divide the data into n bins each containing (approximately) the same - number of points: \texttt{cut\_number(x,\ n\ =\ 10)}. - \indexf{cut\_number} -\end{itemize} - -They are illustrated below: - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Bins of width 1} -\NormalTok{mpg2$disp_w <-}\StringTok{ }\KeywordTok{cut_width}\NormalTok{(mpg2$displ, }\DecValTok{1}\NormalTok{)} -\CommentTok{# Six bins of equal length} -\NormalTok{mpg2$disp_i <-}\StringTok{ }\KeywordTok{cut_interval}\NormalTok{(mpg2$displ, }\DecValTok{6}\NormalTok{)} -\CommentTok{# Six bins containing equal numbers of points} -\NormalTok{mpg2$disp_n <-}\StringTok{ }\KeywordTok{cut_number}\NormalTok{(mpg2$displ, }\DecValTok{6}\NormalTok{)} - -\NormalTok{plot <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg2, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +} -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{plot +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~disp_w, }\DataTypeTok{nrow =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/position/discretising-1}% -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{plot +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~disp_i, }\DataTypeTok{nrow =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/position/discretising-2}% -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{plot +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~disp_n, }\DataTypeTok{nrow =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/position/discretising-3} -\end{figure} - -Note that the facetting formula does not evaluate functions, so you must -first create a new variable containing the discretised data. - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Diamonds: display the distribution of price conditional on cut and - carat. Try facetting by cut and grouping by carat. Try facetting by - carat and grouping by cut. Which do you prefer? -\item - Diamonds: compare the relationship between price and carat for each - colour. What makes it hard to compare the groups? Is grouping better - or facetting? If you use facetting, what annotation might you add to - make it easier to see the differences between panels? -\item - Why is \texttt{facet\_wrap()} generally more useful than - \texttt{facet\_grid()}? -\item - Recreate the following plot. It facets \texttt{mpg2} by class, - overlaying a smooth curve fit to the full dataset. - - \begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/position/unnamed-chunk-8-1} - \end{figure} -\end{enumerate} - -\hypertarget{sec:coord}{\section{Coordinate systems}\label{sec:coord}} - -Coordinate systems have two main jobs: \index{Coordinate systems} - -\begin{itemize} -\item - Combine the two position aesthetics to produce a 2d position on the - plot. The position aesthetics are called \texttt{x} and \texttt{y}, - but they might be better called position 1 and 2 because their meaning - depends on the coordinate system used. For example, with the polar - coordinate system they become angle and radius (or radius and angle), - and with maps they become latitude and longitude. -\item - In coordination with the faceter, coordinate systems draw axes and - panel backgrounds. While the scales control the values that appear on - the axes, and how they map from data to position, it is the coordinate - system which actually draws them. This is because their appearance - depends on the coordinate system: an angle axis looks quite different - than an x axis. -\end{itemize} - -There are two types of coordinate system. Linear coordinate systems -preserve the shape of geoms: - -\begin{itemize} -\item - \texttt{coord\_cartesian()}: the default Cartesian coordinate system, - where the 2d position of an element is given by the combination of the - x and y positions. -\item - \texttt{coord\_flip()}: Cartesian coordinate system with x and y axes - flipped. -\item - \texttt{coord\_fixed()}: Cartesian coordinate system with a fixed - aspect ratio. -\end{itemize} - -On the other hand, non-linear coordinate systems can change the shapes: -a straight line may no longer be straight. The closest distance between -two points may no longer be a straight line. - -\begin{itemize} -\item - \texttt{coord\_map()}/\texttt{coord\_quickmap()}: Map projections. -\item - \texttt{coord\_polar()}: Polar coordinates. -\item - \texttt{coord\_trans()}: Apply arbitrary transformations to x and y - positions, after the data has been processed by the stat. -\end{itemize} - -Each coordinate system is described in more detail below. - -\section{Linear coordinate systems}\label{sub:cartesian} - -There are three linear coordinate systems: \texttt{coord\_cartesian()}, -\texttt{coord\_flip()}, \texttt{coord\_fixed()}. -\index{Coordinate systems!Cartesian} \indexf{coord\_cartesian} - -\subsection{\texorpdfstring{Zooming into a plot with -\texttt{coord\_cartesian()}}{Zooming into a plot with coord\_cartesian()}}\label{zooming-into-a-plot-with-coordux5fcartesian} - -\texttt{coord\_cartesian()} has arguments \texttt{xlim} and -\texttt{ylim}. If you think back to the scales chapter, you might wonder -why we need these. Doesn't the limits argument of the scales already -allow us to control what appears on the plot? The key difference is how -the limits work: when setting scale limits, any data outside the limits -is thrown away; but when setting coordinate system limits we still use -all the data, but we only display a small region of the plot. Setting -coordinate system limits is like looking at the plot under a magnifying -glass. \index{Zooming} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{()} - -\CommentTok{# Full dataset} -\NormalTok{base} -\CommentTok{# Scaling to 5--7 throws away data outside that range} -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{limits =} \KeywordTok{c}\NormalTok{(}\DecValTok{5}\NormalTok{, }\DecValTok{7}\NormalTok{))} -\CommentTok{#> Warning: Removed 196 rows containing non-finite values} -\CommentTok{#> (stat_smooth).} -\CommentTok{#> Warning: Removed 196 rows containing missing values (geom_point).} -\CommentTok{# Zooming to 5--7 keeps all the data but only shows some of it} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_cartesian}\NormalTok{(}\DataTypeTok{xlim =} \KeywordTok{c}\NormalTok{(}\DecValTok{5}\NormalTok{, }\DecValTok{7}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/limits-smooth-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/limits-smooth-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/limits-smooth-3} -\end{figure} - -\subsection{\texorpdfstring{Flipping the axes with -\texttt{coord\_flip()}}{Flipping the axes with coord\_flip()}}\label{flipping-the-axes-with-coordux5fflip} - -\label{sub:coord-flip} - -Most statistics and geoms assume you are interested in y values -conditional on x values (e.g., smooth, summary, boxplot, line): in most -statistical models, the x values are assumed to be measured without -error. If you are interested in x conditional on y (or you just want to -rotate the plot 90 degrees), you can use \texttt{coord\_flip()} to -exchange the x and y axes. Compare this with just exchanging the -variables mapped to x and y: \index{Rotating} -\index{Coordinate systems!flipped} \indexf{coord\_flip} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{()} -\CommentTok{# Exchanging cty and displ rotates the plot 90 degrees, but the smooth } -\CommentTok{# is fit to the rotated data.} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, displ)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{()} -\CommentTok{# coord_flip() fits the smooth to the original data, and then rotates } -\CommentTok{# the output} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_flip}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/coord-flip-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/coord-flip-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/coord-flip-3} -\end{figure} - -\subsection{\texorpdfstring{Equal scales with -\texttt{coord\_fixed()}}{Equal scales with coord\_fixed()}}\label{equal-scales-with-coordux5ffixed} - -\texttt{coord\_fixed()} fixes the ratio of length on the x and y axes. -The default \texttt{ratio} ensures that the x and y axes have equal -scales: i.e., 1 cm along the x axis represents the same range of data as -1 cm along the y axis. The aspect ratio will also be set to ensure that -the mapping is maintained regardless of the shape of the output device. -See the documentation of \texttt{coord\_fixed()} for more details. -\index{Aspect ratio} \index{Coordinate systems!equal} -\indexf{coord\_equal} - -\section{Non-linear coordinate systems}\label{sub:coord-non-linear} - -Unlike linear coordinates, non-linear coordinates can change the shape -of geoms. For example, in polar coordinates a rectangle becomes an arc; -in a map projection, the shortest path between two points is not -necessarily a straight line. The code below shows how a line and a -rectangle are rendered in a few different coordinate systems. -\index{Transformation!coordinate system} -\index{Coordinate systems!non-linear} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{rect <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{50}\NormalTok{, }\DataTypeTok{y =} \DecValTok{50}\NormalTok{)} -\NormalTok{line <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{200}\NormalTok{), }\DataTypeTok{y =} \KeywordTok{c}\NormalTok{(}\DecValTok{100}\NormalTok{, }\DecValTok{1}\NormalTok{))} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{mapping =} \KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_tile}\NormalTok{(}\DataTypeTok{data =} \NormalTok{rect, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{width =} \DecValTok{50}\NormalTok{, }\DataTypeTok{height =} \DecValTok{50}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{data =} \NormalTok{line) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\NormalTok{base} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_polar}\NormalTok{(}\StringTok{"x"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_polar}\NormalTok{(}\StringTok{"y"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/coord-trans-ex-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/coord-trans-ex-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/coord-trans-ex-3} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_flip}\NormalTok{()} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_trans}\NormalTok{(}\DataTypeTok{y =} \StringTok{"log10"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_fixed}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/coord-trans-ex-2-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/coord-trans-ex-2-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/coord-trans-ex-2-3} -\end{figure} - -The transformation takes part in two steps. Firstly, the -parameterisation of each geom is changed to be purely location-based, -rather than location- and dimension-based. For example, a bar can be -represented as an x position (a location), a height and a width (two -dimensions). Interpreting height and width in a non-Cartesian coordinate -system is hard because a rectangle may no longer have constant height -and width, so we convert to a purely location-based representation, a -polygon defined by the four corners. This effectively converts all geoms -to a combination of points, lines and polygons. -\index{Geoms!parameterisation} \index{Coordinate systems!transformation} - -Once all geoms have a location-based representation, the next step is to -transform each location into the new coordinate system. It is easy to -transform points, because a point is still a point no matter what -coordinate system you are in. Lines and polygons are harder, because a -straight line may no longer be straight in the new coordinate system. To -make the problem tractable we assume that all coordinate transformations -are smooth, in the sense that all very short lines will still be very -short straight lines in the new coordinate system. With this assumption -in hand, we can transform lines and polygons by breaking them up into -many small line segments and transforming each segment. This process is -called munching and is illustrated below: \index{Munching} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - We start with a line parameterised by its two endpoints: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{r =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{1}\NormalTok{), }\DataTypeTok{theta =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{3} \NormalTok{/}\StringTok{ }\DecValTok{2} \NormalTok{*}\StringTok{ }\NormalTok{pi))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(r, theta)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.4\linewidth]{_figures/position/unnamed-chunk-9-1} - \end{figure} -\item - We break it into multiple line segments, each with two endpoints. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{interp <-}\StringTok{ }\NormalTok{function(rng, n) \{} - \KeywordTok{seq}\NormalTok{(rng[}\DecValTok{1}\NormalTok{], rng[}\DecValTok{2}\NormalTok{], }\DataTypeTok{length =} \NormalTok{n)} -\NormalTok{\}} -\NormalTok{munched <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{r =} \KeywordTok{interp}\NormalTok{(df$r, }\DecValTok{15}\NormalTok{),} - \DataTypeTok{theta =} \KeywordTok{interp}\NormalTok{(df$theta, }\DecValTok{15}\NormalTok{)} -\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(munched, }\KeywordTok{aes}\NormalTok{(r, theta)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.4\linewidth]{_figures/position/unnamed-chunk-10-1} - \end{figure} -\item - We transform the locations of each piece: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{transformed <-}\StringTok{ }\KeywordTok{transform}\NormalTok{(munched,} - \DataTypeTok{x =} \NormalTok{r *}\StringTok{ }\KeywordTok{sin}\NormalTok{(theta),} - \DataTypeTok{y =} \NormalTok{r *}\StringTok{ }\KeywordTok{cos}\NormalTok{(theta)} -\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(transformed, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_path}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_fixed}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.4\linewidth]{_figures/position/unnamed-chunk-11-1} - \end{figure} -\end{enumerate} - -Internally ggplot2 uses many more segments so that the result looks -smooth. - -\subsection{\texorpdfstring{Transformations with -\texttt{coord\_trans()}}{Transformations with coord\_trans()}}\label{transformations-with-coordux5ftrans} - -Like limits, we can also transform the data in two places: at the scale -level or at the coordinate system level. \texttt{coord\_trans()} has -arguments \texttt{x} and \texttt{y} which should be strings naming the -transformer or transformer objects (see -\protect\hyperlink{sub:scale-position}{continous position scales}). -Transforming at the scale level occurs before statistics are computed -and does not change the shape of the geom. Transforming at the -coordinate system level occurs after the statistics have been computed, -and does affect the shape of the geom. Using both together allows us to -model the data on a transformed scale and then backtransform it for -interpretation: a common pattern in analysis. -\index{Transformation!coordinate system} -\index{Coordinate systems!transformed} \indexf{coord\_trans} - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Linear model on original scale is poor fit} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(carat, price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{stat_bin2d}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\NormalTok{base} - -\CommentTok{# Better fit on log scale, but harder to interpret} -\NormalTok{base +} -\StringTok{ }\KeywordTok{scale_x_log10}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_log10}\NormalTok{()} - -\CommentTok{# Fit on log scale, then backtransform to original.} -\CommentTok{# Highlights lack of expensive diamonds with large carats} -\NormalTok{pow10 <-}\StringTok{ }\NormalTok{scales::}\KeywordTok{exp_trans}\NormalTok{(}\DecValTok{10}\NormalTok{)} -\NormalTok{base +} -\StringTok{ }\KeywordTok{scale_x_log10}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_log10}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_trans}\NormalTok{(}\DataTypeTok{x =} \NormalTok{pow10, }\DataTypeTok{y =} \NormalTok{pow10)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/backtrans-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/backtrans-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/backtrans-3} -\end{figure} - -\subsection{\texorpdfstring{Polar coordinates with -\texttt{coord\_polar()}}{Polar coordinates with coord\_polar()}}\label{polar-coordinates-with-coordux5fpolar} - -Using polar coordinates gives rise to pie charts and wind roses (from -bar geoms), and radar charts (from line geoms). Polar coordinates are -often used for circular data, particularly time or direction, but the -perceptual properties are not good because the angle is harder to -perceive for small radii than it is for large radii. The \texttt{theta} -argument determines which position variable is mapped to angle (by -default, x) and which to radius. - -The code below shows how we can turn a bar into a pie chart or a -bullseye chart by changing the coordinate system. The documentation -includes other examples. \index{Polar coordinates} -\index{Coordinate systems!polar} \indexf{coord\_polar} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mtcars, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{factor}\NormalTok{(}\DecValTok{1}\NormalTok{), }\DataTypeTok{fill =} \KeywordTok{factor}\NormalTok{(cyl))) +} -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{width =} \DecValTok{1}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_discrete}\NormalTok{(}\OtherTok{NULL}\NormalTok{, }\DataTypeTok{expand =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) +} -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\OtherTok{NULL}\NormalTok{, }\DataTypeTok{expand =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{))} - -\CommentTok{# Stacked barchart} -\NormalTok{base} - -\CommentTok{# Pie chart} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_polar}\NormalTok{(}\DataTypeTok{theta =} \StringTok{"y"}\NormalTok{)} - -\CommentTok{# The bullseye chart} -\NormalTok{base +}\StringTok{ }\KeywordTok{coord_polar}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/polar-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/polar-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/polar-3} -\end{figure} - -\subsection{\texorpdfstring{Map projections with -\texttt{coord\_map()}}{Map projections with coord\_map()}}\label{map-projections-with-coordux5fmap} - -Maps are intrinsically displays of spherical data. Simply plotting raw -longitudes and latitudes is misleading, so we must \emph{project} the -data. There are two ways to do this with ggplot2: -\index{Maps!projections} \index{Coordinate systems!map projections} -\indexf{coord\_map} \indexf{coord\_quickmap} \index{mapproj} - -\begin{itemize} -\item - \texttt{coord\_quickmap()} is a quick and dirty approximation that - sets the aspect ratio to ensure than 1m of latitude and 1m of - longitude are the same distance in the middle of the plot. These is a - reasonable place to start for smaller regions, and is very faster. - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Prepare a map of NZ} -\NormalTok{nzmap <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(}\KeywordTok{map_data}\NormalTok{(}\StringTok{"nz"}\NormalTok{), }\KeywordTok{aes}\NormalTok{(long, lat, }\DataTypeTok{group =} \NormalTok{group)) +} -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"black"}\NormalTok{) +} -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} - -\CommentTok{# Plot it in cartesian coordinates} -\NormalTok{nzmap} -\CommentTok{# With the aspect ratio approximation} -\NormalTok{nzmap +}\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.375\linewidth]{_figures/position/map-nz-1}% - \includegraphics[width=0.375\linewidth]{_figures/position/map-nz-2} - \end{figure} -\item - \texttt{coord\_map()} uses the \textbf{mapproj} package, - \url{https://cran.r-project.org/package=mapproj} to do a formal map - projection. It takes the same arguments as - \texttt{mapproj::mapproject()} for controlling the projection. It is - much slower than \texttt{coord\_quickmap()} because it must munch the - data and transform each piece. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{world <-}\StringTok{ }\KeywordTok{map_data}\NormalTok{(}\StringTok{"world"}\NormalTok{)} -\NormalTok{worldmap <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(world, }\KeywordTok{aes}\NormalTok{(long, lat, }\DataTypeTok{group =} \NormalTok{group)) +} -\StringTok{ }\KeywordTok{geom_path}\NormalTok{() +} -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\OtherTok{NULL}\NormalTok{, }\DataTypeTok{breaks =} \NormalTok{(-}\DecValTok{2}\NormalTok{:}\DecValTok{3}\NormalTok{) *}\StringTok{ }\DecValTok{30}\NormalTok{, }\DataTypeTok{labels =} \OtherTok{NULL}\NormalTok{) +} -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\OtherTok{NULL}\NormalTok{, }\DataTypeTok{breaks =} \NormalTok{(-}\DecValTok{4}\NormalTok{:}\DecValTok{4}\NormalTok{) *}\StringTok{ }\DecValTok{45}\NormalTok{, }\DataTypeTok{labels =} \OtherTok{NULL}\NormalTok{)} - -\NormalTok{worldmap +}\StringTok{ }\KeywordTok{coord_map}\NormalTok{()} -\CommentTok{# Some crazier projections} -\NormalTok{worldmap +}\StringTok{ }\KeywordTok{coord_map}\NormalTok{(}\StringTok{"ortho"}\NormalTok{)} -\NormalTok{worldmap +}\StringTok{ }\KeywordTok{coord_map}\NormalTok{(}\StringTok{"stereographic"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/position/map-world-1}% - \includegraphics[width=0.333\linewidth]{_figures/position/map-world-2}% - \includegraphics[width=0.333\linewidth]{_figures/position/map-world-3} - \end{figure} -\end{itemize} diff --git a/book/tex/preface.tex b/book/tex/preface.tex deleted file mode 100644 index 69923201..00000000 --- a/book/tex/preface.tex +++ /dev/null @@ -1,134 +0,0 @@ -\preface - -Welcome to the second edition of ``ggplot2: elegant graphics for data -analysis''. I'm so excited to have an updated book that shows off all -the latest and greatest ggplot2 features, as well as the great things -that have been happening in R and in the ggplot2 community the last five -years. The ggplot2 community is vibrant: the ggplot2 mailing list has -over 7,000 members and there is a very active Stack Overflow community, -with nearly 10,000 questions tagged with ggplot2. While most of my -development effort is no longer going into ggplot2 (more on that below), -there's never been a better time to learn it and use it. - -I am tremendously grateful for the success of ggplot2. It's one of the -most commonly downloaded R packages (over a million downloads in the -last year!) and has influenced the design of graphics packages for other -languages. Personally, ggplot2 has bought me many exciting opportunities -to travel the world and meet interesting people. I love hearing how -people are using R and ggplot2 to understand the data that they care -about. - -A big thanks for this edition goes to Carson Sievert, who helped me -modernise the code, including converting the sources to R Markdown. He -also updated many of the examples and helped me proofread the book. - -\section*{Major changes} - -I've spent a lot of effort ensuring that this edition is a true upgrade -over the first. As well as updating the code everywhere to make sure -it's fully compatible with the latest version of ggplot2, I have: - -\begin{itemize} -\item - Shown much more code in the book, so it's easier to use as a - reference. Overall the book has a more ``knitr''-ish sensibility: - there are fewer floating figures and tables, and more inline code. - This makes the layout a little less pretty but keeps related items - closer together. -\item - Published the complete source online at - \url{https://github.com/hadley/ggplot2-book}. -\item - Switched from \texttt{qplot()} to \texttt{ggplot()} in the - introduction, \protect\hyperlink{cha:getting-started}{intro}. Feedback - indicated that \texttt{qplot()} was a crutch: it makes simple plots a - little easier, but it doesn't help with mastering the grammar. -\item - Added practice exercises throughout the book so you can practice new - techniques immediately after learning about them. -\item - Added pointers to the rich ecosystem of packages that have built up - around ggplot2. You'll now see a number of other packages highlighted - in the book, and get pointers to other packages I think are - particularly useful. -\item - Overhauled the toolbox chapter, - \protect\hyperlink{cha:toolbox}{toolbox}, to cover all the new geoms. - I've added a completely new section on text labels, - \protect\hyperlink{sec:labelling}{labels}, since it's important and - not covered in detail elsewhere. The mapping section, - \protect\hyperlink{sec:maps}{maps}, has been considerably expanded to - talk more about the different types of map data, and where you might - find them. -\item - Completely rewritten the scales chapter, - \protect\hyperlink{cha:scales}{scales}, to focus on the most important - tasks. It also discusses the new features that give finer control over - legend appearance, \protect\hyperlink{sec:legends}{legends}, and shows - off some of the new scales added to ggplot2, - \protect\hyperlink{sec:scale-details}{scales}. -\item - Split the data analysis chapter into three pieces: data tidying (with - tidyr), \protect\hyperlink{cha:data}{tidyr}; data manipulation (with - dplyr), \protect\hyperlink{cha:dplyr}{dplyr}; and model visualisation - (with broom), \protect\hyperlink{cha:modelling}{models}. I discuss the - latest iteration of my data manipulation tools, and introduce the - fantastic broom package by David Robinson. -\end{itemize} - -The book is accompanied by a new version of ggplot2: version 2.0.0. This -includes a number of minor tweaks and improvements, and considerable -improvements to the documentation. Coming back to ggplot2 development -after a considerable pause has helped me to see many problems that -previously escaped notice. ggplot2 2.0.0 (finally!) contains an official -extension mechanism so that others can contribute new ggplot2 components -in their own packages. This is documented in a new vignette, -\texttt{vignette("extending-ggplot2")}. - -\section*{The future} - -ggplot2 is now stable, and is unlikely to change much in the future. -There will be bug fixes and there may be new geoms, but there will be no -large changes to how ggplot2 works. The next iteration of ggplot2 is -ggvis. ggvis is significantly more ambitious because it aims to provide -a grammar of \emph{interactive} graphics. ggvis is still young, and -lacks many of the features of ggplot2 (most notably it currently lacks -facetting and has no way to make static graphics), but over the coming -years the goal is to make ggvis better than ggplot2. - -The syntax of ggvis is a little different to ggplot2. You won't be able -to trivially convert your ggplot2 plots to ggvis, but we think the cost -is worth it: the new syntax is considerably more consistent, and will be -easier for newcomers to learn. If you've mastered ggplot2, you'll find -your skills transfer very well to ggvis and after struggling with the -syntax for a while, it will start to feel quite natural. The important -skills you learn when mastering ggplot2 are not the programmatic details -of describing a plot in code, but the much harder challenge of thinking -about how to turn data into effective visualisations. - -\section*{Acknowledgements} - -Many people have contributed to this book with high-level structural -insights, spelling and grammar corrections and bug reports. I'd -particularly like to thank William E. J. Doane, Alexander Forrence, -Devin Pastoor, David Robinson, and Guangchuang Yu, for their detailed -technical reviews of the book. - -Many others have contributed over the (now quite long!) lifetime of -ggplot2. I would like to thank: Leland Wilkinson, for discussions and -comments that cemented my understanding of the grammar; Gabor -Grothendieck, for early helpful comments; Heike Hofmann and Di Cook, for -being great advisors and supporting the development of ggplot2 during my -PhD; Charlotte Wickham; the students of stat480 and stat503 at ISU, for -trying it out when it was very young; Debby Swayne, for masses of -helpful feedback and advice; Bob Muenchen, Reinhold Kliegl, Philipp -Pagel, Richard Stahlhut, Baptiste Auguie, Jean-Olivier Irisson, Thierry -Onkelinx and the many others who have read draft versions of the book -and given me feedback; and last, but not least, the members of R-help -and the ggplot2 mailing list, for providing the many interesting and -challenging graphics problems that have helped motivate this book. - -\vspace{\baselineskip}\begin{flushright}\noindent -{\it Hadley Wickham}\\ -September 2015\\ -\end{flushright} diff --git a/book/tex/programming.tex b/book/tex/programming.tex deleted file mode 100644 index 09123a32..00000000 --- a/book/tex/programming.tex +++ /dev/null @@ -1,647 +0,0 @@ -\chapter{Programming with ggplot2}\label{cha:programming} - -\section{Introduction}\label{introduction} - -A major requirement of a good data analysis is flexibility. If your data -changes, or you discover something that makes you rethink your basic -assumptions, you need to be able to easily change many plots at once. -The main inhibitor of flexibility is code duplication. If you have the -same plotting statement repeated over and over again, you'll have to -make the same change in many different places. Often just the thought of -making all those changes is exhausting! This chapter will help you -overcome that problem by showing you how to program with ggplot2. -\index{Programming} - -To make your code more flexible, you need to reduce duplicated code by -writing functions. When you notice you're doing the same thing over and -over again, think about how you might generalise it and turn it into a -function. If you're not that familiar with how functions work in R, you -might want to brush up your knowledge at -\url{http://adv-r.had.co.nz/Functions.html}. - -In this chapter I'll show how to write functions that create: - -\begin{itemize} -\tightlist -\item - A single ggplot2 component. -\item - Multiple ggplot2 components. -\item - A complete plot. -\end{itemize} - -And then I'll finish off with a brief illustration of how you can apply -functional programming techniques to ggplot2 objects. - -You might also find the -\href{https://github.com/wilkelab/cowplot}{cowplot} and -\href{https://github.com/jrnold/ggthemes}{ggthemes} packages helpful. As -well as providing reusable components that help you directly, you can -also read the source code of the packages to figure out how they work. - -\section{Single components}\label{single-components} - -Each component of a ggplot plot is an object. Most of the time you -create the component and immediately add it to a plot, but you don't -have to. Instead, you can save any component to a variable (giving it a -name), and then add it to multiple plots: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bestfit <-}\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(} - \DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, } - \DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{, } - \DataTypeTok{colour =} \KeywordTok{alpha}\NormalTok{(}\StringTok{"steelblue"}\NormalTok{, }\FloatTok{0.5}\NormalTok{), } - \DataTypeTok{size =} \DecValTok{2} -\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\NormalTok{bestfit} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\NormalTok{bestfit} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.375\linewidth]{_figures/programming/layer9-1}% - \includegraphics[width=0.375\linewidth]{_figures/programming/layer9-2} -\end{figure} - -That's a great way to reduce simple types of duplication (it's much -better than copying-and-pasting!), but requires that the component be -exactly the same each time. If you need more flexibility, you can wrap -these reusable snippets in a function. For example, we could extend our -\texttt{bestfit} object to a more general function for adding lines of -best fit to a plot. The following code creates a \texttt{geom\_lm()} -with three parameters: the model \texttt{formula}, the line -\texttt{colour} and the line \texttt{size}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{geom_lm <-}\StringTok{ }\NormalTok{function(}\DataTypeTok{formula =} \NormalTok{y ~}\StringTok{ }\NormalTok{x, }\DataTypeTok{colour =} \KeywordTok{alpha}\NormalTok{(}\StringTok{"steelblue"}\NormalTok{, }\FloatTok{0.5}\NormalTok{), } - \DataTypeTok{size =} \DecValTok{2}\NormalTok{, ...) \{} - \KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{formula =} \NormalTok{formula, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{, }\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{colour =} \NormalTok{colour,} - \DataTypeTok{size =} \NormalTok{size, ...)} -\NormalTok{\}} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, }\DecValTok{1} \NormalTok{/}\StringTok{ }\NormalTok{hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_lm}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, }\DecValTok{1} \NormalTok{/}\StringTok{ }\NormalTok{hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_lm}\NormalTok{(y ~}\StringTok{ }\KeywordTok{poly}\NormalTok{(x, }\DecValTok{2}\NormalTok{), }\DataTypeTok{size =} \DecValTok{1}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.375\linewidth]{_figures/programming/geom-lm-1}% - \includegraphics[width=0.375\linewidth]{_figures/programming/geom-lm-2} -\end{figure} - -Pay close attention to the use of ``\texttt{...}''. When included in the -function definition ``\texttt{...}'' allows a function to accept -arbitrary additional arguments. Inside the function, you can then use -``\texttt{...}'' to pass those arguments on to another function. Here we -pass ``\texttt{...}'' onto \texttt{geom\_smooth()} so the user can still -modify all the other arguments we haven't explicitly overridden. When -you write your own component functions, it's a good idea to always use -``\texttt{...}'' in this way. \indexc{...} - -Finally, note that you can only \emph{add} components to a plot; you -can't modify or remove existing objects. - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Create an object that represents a pink histogram with 100 bins. -\item - Create an object that represents a fill scale with the Blues - ColorBrewer palette. -\item - Read the source code for \texttt{theme\_grey()}. What are its - arguments? How does it work? -\item - Create \texttt{scale\_colour\_wesanderson()}. It should have a - parameter to pick the palette from the wesanderson package, and create - either a continuous or discrete scale. -\end{enumerate} - -\section{Multiple components}\label{multiple-components} - -It's not always possible to achieve your goals with a single component. -Fortunately, ggplot2 has a convenient way of adding multiple components -to a plot in one step with a list. The following function adds two -layers: one to show the mean, and one to show its standard error: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{geom_mean <-}\StringTok{ }\NormalTok{function() \{} - \KeywordTok{list}\NormalTok{(} - \KeywordTok{stat_summary}\NormalTok{(}\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"bar"}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"grey70"}\NormalTok{),} - \KeywordTok{stat_summary}\NormalTok{(}\DataTypeTok{fun.data =} \StringTok{"mean_cl_normal"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"errorbar"}\NormalTok{, }\DataTypeTok{width =} \FloatTok{0.4}\NormalTok{)} - \NormalTok{)} -\NormalTok{\}} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class, cty)) +}\StringTok{ }\KeywordTok{geom_mean}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, cty)) +}\StringTok{ }\KeywordTok{geom_mean}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.375\linewidth]{_figures/programming/geom-mean-1-1}% - \includegraphics[width=0.375\linewidth]{_figures/programming/geom-mean-1-2} -\end{figure} - -If the list contains any \texttt{NULL} elements, they're ignored. This -makes it easy to conditionally add components: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{geom_mean <-}\StringTok{ }\NormalTok{function(}\DataTypeTok{se =} \OtherTok{TRUE}\NormalTok{) \{} - \KeywordTok{list}\NormalTok{(} - \KeywordTok{stat_summary}\NormalTok{(}\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"bar"}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"grey70"}\NormalTok{),} - \NormalTok{if (se) } - \KeywordTok{stat_summary}\NormalTok{(}\DataTypeTok{fun.data =} \StringTok{"mean_cl_normal"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"errorbar"}\NormalTok{, }\DataTypeTok{width =} \FloatTok{0.4}\NormalTok{)} - \NormalTok{)} -\NormalTok{\}} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, cty)) +}\StringTok{ }\KeywordTok{geom_mean}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, cty)) +}\StringTok{ }\KeywordTok{geom_mean}\NormalTok{(}\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.375\linewidth]{_figures/programming/geom-mean-2-1}% - \includegraphics[width=0.375\linewidth]{_figures/programming/geom-mean-2-2} -\end{figure} - -\subsection{Plot components}\label{plot-components} - -You're not just limited to adding layers in this way. You can also -include any of the following object types in the list: - -\begin{itemize} -\item - A data.frame, which will override the default dataset associated with - the plot. (If you add a data frame by itself, you'll need to use - \texttt{\%+\%}, but this is not necessary if the data frame is in a - list.) -\item - An \texttt{aes()} object, which will be combined with the existing - default aesthetic mapping. -\item - Scales, which override existing scales, with a warning if they've - already been set by the user. -\item - Coordinate systems and facetting specification, which override the - existing settings. -\item - Theme components, which override the specified components. -\end{itemize} - -\subsection{Annotation}\label{annotation} - -It's often useful to add standard annotations to a plot. In this case, -your function will also set the data in the layer function, rather than -inheriting it from the plot. There are two other options that you should -set when you do this. These ensure that the layer is self-contained: -\index{Annotation!functions} - -\begin{itemize} -\item - \texttt{inherit.aes\ =\ FALSE} prevents the layer from inheriting - aesthetics from the parent plot. This ensures your annotation works - regardless of what else is on the plot. \indexc{inherit.aes} -\item - \texttt{show.legend\ =\ FALSE} ensures that your annotation won't - appear in the legend. \indexc{show.legend} -\end{itemize} - -One example of this technique is the \texttt{borders()} function built -into ggplot2. It's designed to add map borders from one of the datasets -in the maps package: \indexf{borders} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{borders <-}\StringTok{ }\NormalTok{function(}\DataTypeTok{database =} \StringTok{"world"}\NormalTok{, }\DataTypeTok{regions =} \StringTok{"."}\NormalTok{, }\DataTypeTok{fill =} \OtherTok{NA}\NormalTok{, } - \DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, ...) \{} - \NormalTok{df <-}\StringTok{ }\KeywordTok{map_data}\NormalTok{(database, regions)} - \KeywordTok{geom_polygon}\NormalTok{(} - \KeywordTok{aes_}\NormalTok{(~lat, ~long, }\DataTypeTok{group =} \NormalTok{~group), } - \DataTypeTok{data =} \NormalTok{df, }\DataTypeTok{fill =} \NormalTok{fill, }\DataTypeTok{colour =} \NormalTok{colour, ..., } - \DataTypeTok{inherit.aes =} \OtherTok{FALSE}\NormalTok{, }\DataTypeTok{show.legend =} \OtherTok{FALSE} - \NormalTok{)} -\NormalTok{\}} -\end{Highlighting} -\end{Shaded} - -\subsection{Additional arguments}\label{additional-arguments} - -If you want to pass additional arguments to the components in your -function, \texttt{...} is no good: there's no way to direct different -arguments to different components. Instead, you'll need to think about -how you want your function to work, balancing the benefits of having one -function that does it all vs.~the cost of having a complex function -that's harder to understand. \indexc{...} - -To get you started, here's one approach using \texttt{modifyList()} and -\texttt{do.call()}: \indexf{modifyList} \indexf{do.call} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{geom_mean <-}\StringTok{ }\NormalTok{function(..., }\DataTypeTok{bar.params =} \KeywordTok{list}\NormalTok{(), }\DataTypeTok{errorbar.params =} \KeywordTok{list}\NormalTok{()) \{} - \NormalTok{params <-}\StringTok{ }\KeywordTok{list}\NormalTok{(...)} - \NormalTok{bar.params <-}\StringTok{ }\KeywordTok{modifyList}\NormalTok{(params, bar.params)} - \NormalTok{errorbar.params <-}\StringTok{ }\KeywordTok{modifyList}\NormalTok{(params, errorbar.params)} - - \NormalTok{bar <-}\StringTok{ }\KeywordTok{do.call}\NormalTok{(}\StringTok{"stat_summary"}\NormalTok{, }\KeywordTok{modifyList}\NormalTok{(} - \KeywordTok{list}\NormalTok{(}\DataTypeTok{fun.y =} \StringTok{"mean"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"bar"}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"grey70"}\NormalTok{),} - \NormalTok{bar.params)} - \NormalTok{)} - \NormalTok{errorbar <-}\StringTok{ }\KeywordTok{do.call}\NormalTok{(}\StringTok{"stat_summary"}\NormalTok{, }\KeywordTok{modifyList}\NormalTok{(} - \KeywordTok{list}\NormalTok{(}\DataTypeTok{fun.data =} \StringTok{"mean_cl_normal"}\NormalTok{, }\DataTypeTok{geom =} \StringTok{"errorbar"}\NormalTok{, }\DataTypeTok{width =} \FloatTok{0.4}\NormalTok{),} - \NormalTok{errorbar.params)} - \NormalTok{)} - - \KeywordTok{list}\NormalTok{(bar, errorbar)} -\NormalTok{\}} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_mean}\NormalTok{(} - \DataTypeTok{colour =} \StringTok{"steelblue"}\NormalTok{,} - \DataTypeTok{errorbar.params =} \KeywordTok{list}\NormalTok{(}\DataTypeTok{width =} \FloatTok{0.5}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{)} - \NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_mean}\NormalTok{(} - \DataTypeTok{bar.params =} \KeywordTok{list}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"steelblue"}\NormalTok{),} - \DataTypeTok{errorbar.params =} \KeywordTok{list}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"blue"}\NormalTok{)} - \NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.375\linewidth]{_figures/programming/unnamed-chunk-3-1}% - \includegraphics[width=0.375\linewidth]{_figures/programming/unnamed-chunk-3-2} -\end{figure} - -If you need more complex behaviour, it might be easier to create a -custom geom or stat. You can learn about that in the extending ggplot2 -vignette included with the package. Read it by running -\texttt{vignette("extending-ggplot2")}. - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - To make the best use of space, many examples in this book hide the - axes labels and legend. I've just copied-and-pasted the same code into - multiple places, but it would make more sense to create a reusable - function. What would that function look like? -\item - Extend the \texttt{borders()} function to also add - \texttt{coord\_quickmap()} to the plot. -\item - Look through your own code. What combinations of geoms or scales do - you use all the time? How could you extract the pattern into a - reusable function? -\end{enumerate} - -\section{Plot functions}\label{sec:functions} - -Creating small reusable components is most in line with the ggplot2 -spirit: you can recombine them flexibly to create whatever plot you -want. But sometimes you're creating the same plot over and over again, -and you don't need that flexibility. Instead of creating components, you -might want to write a function that takes data and parameters and -returns a complete plot. \index{Plot functions} - -For example, you could wrap up the complete code needed to make a -piechart: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{piechart <-}\StringTok{ }\NormalTok{function(data, mapping) \{} - \KeywordTok{ggplot}\NormalTok{(data, mapping) +} -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{width =} \DecValTok{1}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_polar}\NormalTok{(}\DataTypeTok{theta =} \StringTok{"y"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\NormalTok{\}} -\KeywordTok{piechart}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{factor}\NormalTok{(}\DecValTok{1}\NormalTok{), }\DataTypeTok{fill =} \NormalTok{class))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.5\linewidth]{_figures/programming/unnamed-chunk-4-1} -\end{figure} - -This is much less flexible than the component based approach, but -equally, it's much more concise. Note that I was careful to return the -plot object, rather than printing it. That makes it possible add on -other ggplot2 components. - -You can take a similar approach to drawing parallel coordinates plots -(PCPs). PCPs require a transformation of the data, so I recommend -writing two functions: one that does the transformation and one that -generates the plot. Keeping these two pieces separate makes life much -easier if you later want to reuse the same transformation for a -different visualisation. \index{Parallel coordinate plots} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{pcp_data <-}\StringTok{ }\NormalTok{function(df) \{} - \NormalTok{is_numeric <-}\StringTok{ }\KeywordTok{vapply}\NormalTok{(df, is.numeric, }\KeywordTok{logical}\NormalTok{(}\DecValTok{1}\NormalTok{))} - - \CommentTok{# Rescale numeric columns} - \NormalTok{rescale01 <-}\StringTok{ }\NormalTok{function(x) \{} - \NormalTok{rng <-}\StringTok{ }\KeywordTok{range}\NormalTok{(x, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)} - \NormalTok{(x -}\StringTok{ }\NormalTok{rng[}\DecValTok{1}\NormalTok{]) /}\StringTok{ }\NormalTok{(rng[}\DecValTok{2}\NormalTok{] -}\StringTok{ }\NormalTok{rng[}\DecValTok{1}\NormalTok{])} - \NormalTok{\}} - \NormalTok{df[is_numeric] <-}\StringTok{ }\KeywordTok{lapply}\NormalTok{(df[is_numeric], rescale01)} - - \CommentTok{# Add row identifier} - \NormalTok{df$.row <-}\StringTok{ }\KeywordTok{rownames}\NormalTok{(df)} - - \CommentTok{# Treat numerics as value (aka measure) variables} - \CommentTok{# gather_ is the standard-evaluation version of gather, and} - \CommentTok{# is usually easier to program with.} - \NormalTok{tidyr::}\KeywordTok{gather_}\NormalTok{(df, }\StringTok{"variable"}\NormalTok{, }\StringTok{"value"}\NormalTok{, }\KeywordTok{names}\NormalTok{(df)[is_numeric])} -\NormalTok{\}} -\NormalTok{pcp <-}\StringTok{ }\NormalTok{function(df, ...) \{} - \NormalTok{df <-}\StringTok{ }\KeywordTok{pcp_data}\NormalTok{(df)} - \KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(variable, value, }\DataTypeTok{group =} \NormalTok{.row)) +}\StringTok{ }\KeywordTok{geom_line}\NormalTok{(...)} -\NormalTok{\}} -\KeywordTok{pcp}\NormalTok{(mpg)} -\KeywordTok{pcp}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{drv))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/programming/pcp_data-1}% - \includegraphics[width=0.5\linewidth]{_figures/programming/pcp_data-2} -\end{figure} - -A complete exploration of this idea is \texttt{qplot()}, which provides -a fairly deep wrapper around the most common \texttt{ggplot()} options. -I recommend studying the source code if you want to see how far these -basic techniques can take you. \indexf{qplot} - -\subsection{Indirectly referring to -variables}\label{indirectly-referring-to-variables} - -The \texttt{piechart()} function above is a little unappealing because -it requires the user to know the exact \texttt{aes()} specification that -generates a pie chart. It would be more convenient if the user could -simply specify the name of the variable to plot. To do that you'll need -to learn a bit more about how \texttt{aes()} works. - -\texttt{aes()} uses non-standard evaluation: rather than looking at the -values of its arguments, it looks at their expressions. This makes it -difficult to work with programmatically as there's no way to store the -name of a variable in an object and then refer to it later: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{x_var <-}\StringTok{ "displ"} -\KeywordTok{aes}\NormalTok{(x_var)} -\CommentTok{#> * x -> x_var} -\end{Highlighting} -\end{Shaded} - -Instead we need to use \texttt{aes\_()}, which uses regular evaluation. -There are two basic ways to create a mapping with \texttt{aes\_()}: -\indexf{aes\_} - -\begin{itemize} -\item - Using a \emph{quoted call}, created by \texttt{quote()}, - \texttt{substitute()}, \texttt{as.name()}, or \texttt{parse()}. - \indexf{quote} \indexf{substitute} \indexf{parse} \indexf{as.name} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{aes_}\NormalTok{(}\KeywordTok{quote}\NormalTok{(displ))} -\CommentTok{#> * x -> displ} -\KeywordTok{aes_}\NormalTok{(}\KeywordTok{as.name}\NormalTok{(x_var))} -\CommentTok{#> * x -> displ} -\KeywordTok{aes_}\NormalTok{(}\KeywordTok{parse}\NormalTok{(}\DataTypeTok{text =} \NormalTok{x_var)[[}\DecValTok{1}\NormalTok{]])} -\CommentTok{#> * x -> displ} - -\NormalTok{f <-}\StringTok{ }\NormalTok{function(x_var) \{} - \KeywordTok{aes_}\NormalTok{(}\KeywordTok{substitute}\NormalTok{(x_var))} -\NormalTok{\}} -\KeywordTok{f}\NormalTok{(displ)} -\CommentTok{#> * x -> displ} -\end{Highlighting} -\end{Shaded} - - The difference between \texttt{as.name()} and \texttt{parse()} is - subtle. If \texttt{x\_var} is ``a + b'', \texttt{as.name()} will turn - it into a variable called \texttt{`a\ +\ b`}, \texttt{parse()} will - turn it into the function call \texttt{a\ +\ b}. (If this is - confusing, \url{http://adv-r.had.co.nz/Expressions.html} might help). -\item - Using a formula, created with \texttt{\textasciitilde{}}. - \indexc{\textasciitilde} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{aes_}\NormalTok{(~displ)} -\CommentTok{#> * x -> displ} -\end{Highlighting} -\end{Shaded} -\end{itemize} - -\texttt{aes\_()} gives us three options for how a user can supply -variables: as a string, as a formula, or as a bare expression. These -three options are illustrated below - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{piechart1 <-}\StringTok{ }\NormalTok{function(data, var, ...) \{} - \KeywordTok{piechart}\NormalTok{(data, }\KeywordTok{aes_}\NormalTok{(~}\KeywordTok{factor}\NormalTok{(}\DecValTok{1}\NormalTok{), }\DataTypeTok{fill =} \KeywordTok{as.name}\NormalTok{(var)))} -\NormalTok{\}} -\KeywordTok{piechart1}\NormalTok{(mpg, }\StringTok{"class"}\NormalTok{) +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} - -\NormalTok{piechart2 <-}\StringTok{ }\NormalTok{function(data, var, ...) \{} - \KeywordTok{piechart}\NormalTok{(data, }\KeywordTok{aes_}\NormalTok{(~}\KeywordTok{factor}\NormalTok{(}\DecValTok{1}\NormalTok{), }\DataTypeTok{fill =} \NormalTok{var))} -\NormalTok{\}} -\KeywordTok{piechart2}\NormalTok{(mpg, ~class) +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} - -\NormalTok{piechart3 <-}\StringTok{ }\NormalTok{function(data, var, ...) \{} - \KeywordTok{piechart}\NormalTok{(data, }\KeywordTok{aes_}\NormalTok{(~}\KeywordTok{factor}\NormalTok{(}\DecValTok{1}\NormalTok{), }\DataTypeTok{fill =} \KeywordTok{substitute}\NormalTok{(var)))} -\NormalTok{\}} -\KeywordTok{piechart3}\NormalTok{(mpg, class) +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/programming/unnamed-chunk-8-1}% - \includegraphics[width=0.333\linewidth]{_figures/programming/unnamed-chunk-8-2}% - \includegraphics[width=0.333\linewidth]{_figures/programming/unnamed-chunk-8-3} -\end{figure} - -There's another advantage to \texttt{aes\_()} over \texttt{aes()} if -you're writing ggplot2 plots inside a package: using -\texttt{aes\_(\textasciitilde{}x,\ \textasciitilde{}y)} instead of -\texttt{aes(x,\ y)} avoids the global variables NOTE in -\texttt{R\ CMD\ check}. \index{Global variables} - -\subsection{The plot environment}\label{the-plot-environment} - -As you create more sophisticated plotting functions, you'll need to -understand a bit more about ggplot2's scoping rules. ggplot2 was written -well before I understood the full intricacies of non-standard -evaluation, so it has a rather simple scoping system. If a variable is -not found in the \texttt{data}, it is looked for in \emph{the} plot -environment. There is only one environment for a plot (not one for each -layer), and it is the environment in which \texttt{ggplot()} is called -from (i.e.~the \texttt{parent.frame()}). \index{Environments} -\indexf{parent.frame} - -This means that the following function won't work because \texttt{n} is -not stored in an environment accessible when the expressions in -\texttt{aes()} are evaluated. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{f <-}\StringTok{ }\NormalTok{function() \{} - \NormalTok{n <-}\StringTok{ }\DecValTok{10} - \KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(x /}\StringTok{ }\NormalTok{n)) } -\NormalTok{\}} -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{f}\NormalTok{()} -\CommentTok{#> Error in x/n: non-numeric argument to binary operator} -\end{Highlighting} -\end{Shaded} - -Note that this is only a problem with the \texttt{mapping} argument. All -other arguments are evaluated immediately so their values (not a -reference to a name) are stored in the plot object. This means the -following function will work: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{f <-}\StringTok{ }\NormalTok{function() \{} - \NormalTok{colour <-}\StringTok{ "blue"} - \KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{colour) } -\NormalTok{\}} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{f}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -If you need to use a different environment for the plot, you can specify -it with the \texttt{environment} argument to \texttt{ggplot()}. You'll -need to do this if you're creating a plot function that takes user -provided data. See \texttt{qplot()} for an example. - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Create a \texttt{distribution()} function specially designed for - visualising continuous distributions. Allow the user to supply a - dataset and the name of a variable to visualise. Let them choose - between histograms, frequency polygons, and density plots. What other - arguments might you want to include? -\item - What additional arguments should \texttt{pcp()} take? What are the - downsides of how \texttt{...} is used in the current code? -\item - Advanced: why doesn't this code work? How can you fix it? - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{f <-}\StringTok{ }\NormalTok{function() \{} - \NormalTok{levs <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"2seater"}\NormalTok{, }\StringTok{"compact"}\NormalTok{, }\StringTok{"midsize"}\NormalTok{, }\StringTok{"minivan"}\NormalTok{, }\StringTok{"pickup"}\NormalTok{, } - \StringTok{"subcompact"}\NormalTok{, }\StringTok{"suv"}\NormalTok{)} - \KeywordTok{piechart3}\NormalTok{(mpg, }\KeywordTok{factor}\NormalTok{(class, }\DataTypeTok{levels =} \NormalTok{levs))} -\NormalTok{\}} -\KeywordTok{f}\NormalTok{()} -\CommentTok{#> Error in factor(class, levels = levs): object 'levs' not found} -\end{Highlighting} -\end{Shaded} -\end{enumerate} - -\section{Functional programming}\label{functional-programming} - -Since ggplot2 objects are just regular R objects, you can put them in a -list. This means you can apply all of R's great functional programming -tools. For example, if you wanted to add different geoms to the same -base plot, you could put them in a list and use \texttt{lapply()}. -\index{Functional programming} \indexf{lapply} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{geoms <-}\StringTok{ }\KeywordTok{list}\NormalTok{(} - \KeywordTok{geom_point}\NormalTok{(),} - \KeywordTok{geom_boxplot}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \KeywordTok{cut_width}\NormalTok{(displ, }\DecValTok{1}\NormalTok{))),} - \KeywordTok{list}\NormalTok{(}\KeywordTok{geom_point}\NormalTok{(), }\KeywordTok{geom_smooth}\NormalTok{())} -\NormalTok{)} - -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy))} -\KeywordTok{lapply}\NormalTok{(geoms, function(g) p +}\StringTok{ }\NormalTok{g)} -\CommentTok{#> [[1]]} -\CommentTok{#> } -\CommentTok{#> [[2]]} -\CommentTok{#> } -\CommentTok{#> [[3]]} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/programming/unnamed-chunk-12-1}% - \includegraphics[width=0.333\linewidth]{_figures/programming/unnamed-chunk-12-2}% - \includegraphics[width=0.333\linewidth]{_figures/programming/unnamed-chunk-12-3} -\end{figure} - -If you're not familiar with functional programming, read through -\url{http://adv-r.had.co.nz/Functional-programming.html} and think about -how you might apply the techniques to your duplicated plotting code. - -\subsection{Exercises}\label{exercises-3} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - How could you add a \texttt{geom\_point()} layer to each element of - the following list? - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{plots <-}\StringTok{ }\KeywordTok{list}\NormalTok{(} - \KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)),} - \KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(carat, price)),} - \KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(waiting, eruptions, }\DataTypeTok{size =} \NormalTok{density))} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} -\item - What does the following function do? What's a better name for it? - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mystery <-}\StringTok{ }\NormalTok{function(...) \{} - \KeywordTok{Reduce}\NormalTok{(}\StringTok{`}\DataTypeTok{+}\StringTok{`}\NormalTok{, }\KeywordTok{list}\NormalTok{(...), }\DataTypeTok{accumulate =} \OtherTok{TRUE}\NormalTok{)} -\NormalTok{\}} - -\KeywordTok{mystery}\NormalTok{(} - \KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(), } - \KeywordTok{geom_smooth}\NormalTok{(), } - \KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{), } - \KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} -\end{enumerate} - -\hypertarget{refs}{} diff --git a/book/tex/scales.tex b/book/tex/scales.tex deleted file mode 100644 index 8a0903ef..00000000 --- a/book/tex/scales.tex +++ /dev/null @@ -1,1704 +0,0 @@ -\chapter{Scales, axes and legends}\label{cha:scales} - -\section{Introduction}\label{introduction} - -Scales control the mapping from data to aesthetics. They take your data -and turn it into something that you can see, like size, colour, position -or shape. Scales also provide the tools that let you read the plot: the -axes and legends. Formally, each scale is a function from a region in -data space (the domain of the scale) to a region in aesthetic space (the -range of the scale). The axis or legend is the inverse function: it -allows you to convert visual properties back to data. \index{Scales} - -You can generate many plots without knowing how scales work, but -understanding scales and learning how to manipulate them will give you -much more control. The basics of working with scales is described in -\protect\hyperlink{sec:scale-usage}{scale usage}. -\protect\hyperlink{sec:guides}{Guides} discusses the common parameters -that control the axes and legends. Legends are particularly complicated -so have an additional set of options as described in -\protect\hyperlink{sec:legends}{legends}. -\protect\hyperlink{sec:limits}{Limits} shows how to use limits to both -zoom into interesting parts of a plot, and to ensure that multiple plots -have matching legends and axes. -\protect\hyperlink{sec:scale-details}{Scale details} gives an overview -of the different types of scales available in ggplot2, which can be -roughly divided into four categories: continuous position scales, colour -scales, manual scales and identity scales. - -\hypertarget{sec:scale-usage}{\section{Modifying -scales}\label{sec:scale-usage}} - -A scale is required for every aesthetic used on the plot. When you -write: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{class))} -\end{Highlighting} -\end{Shaded} - -What actually happens is this: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{class)) +} -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_discrete}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -Default scales are named according to the aesthetic and the variable -type: \texttt{scale\_y\_continuous()}, -\texttt{scale\_colour\_discrete()}, etc. - -It would be tedious to manually add a scale every time you used a new -aesthetic, so ggplot2 does it for you. But if you want to override the -defaults, you'll need to add the scale yourself, like this: -\index{Scales!defaults} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\StringTok{"A really awesome x axis label"}\NormalTok{) +} -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\StringTok{"An amazingly great y axis label"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -The use of \texttt{+} to ``add'' scales to a plot is a little -misleading. When you \texttt{+} a scale, you're not actually adding it -to the plot, but overriding the existing scale. This means that the -following two specifications are equivalent: \indexc{+} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\StringTok{"Label 1"}\NormalTok{) +} -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\StringTok{"Label 2"}\NormalTok{)} -\CommentTok{#> Scale for 'x' is already present. Adding another scale for 'x',} -\CommentTok{#> which will replace the existing scale.} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\StringTok{"Label 2"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -Note the message: if you see this in your own code, you need to -reorganise your code specification to only add a single scale. - -You can also use a different scale altogether: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{class)) +} -\StringTok{ }\KeywordTok{scale_x_sqrt}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_brewer}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -You've probably already figured out the naming scheme for scales, but to -be concrete, it's made up of three pieces separated by ``\_``: -\index{Scales!naming scheme} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\tightlist -\item - \texttt{scale} -\item - The name of the aesthetic (e.g., \texttt{colour}, \texttt{shape} or - \texttt{x}) -\item - The name of the scale (e.g., \texttt{continuous}, \texttt{discrete}, - \texttt{brewer}). -\end{enumerate} - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - What happens if you pair a discrete variable to a continuous scale? - What happens if you pair a continuous variable to a discrete scale? -\item - Simplify the following plot specifications to make them easier to - understand. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\StringTok{"Highway mpg"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{hwy))} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{displ, }\DataTypeTok{x =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\StringTok{"Displacement (l)"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_discrete}\NormalTok{(}\StringTok{"Car type"}\NormalTok{) +} -\StringTok{ }\KeywordTok{scale_x_discrete}\NormalTok{(}\StringTok{"Type of car"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_discrete}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{drv)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_discrete}\NormalTok{(}\StringTok{"Drive}\CharTok{\textbackslash{}n}\StringTok{train"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} -\end{enumerate} - -\hypertarget{sec:guides}{\section{Guides: legends and -axes}\label{sec:guides}} - -The component of a scale that you're most likely to want to modify is -the \textbf{guide}, the axis or legend associated with the scale. Guides -allow you to read observations from the plot and map them back to their -original values. In ggplot2, guides are produced automatically based on -the layers in your plot. This is very different to base R graphics, -where you are responsible for drawing the legends by hand. In ggplot2, -you don't directly control the legend; instead you set up the data so -that there's a clear mapping between data and aesthetics, and a legend -is generated for you automatically. This can be frustrating when you -first start using ggplot2, but once you get the hang of it, you'll find -that it saves you time, and there is little you cannot do. If you're -struggling to get the legend you want, it's likely that your data is in -the wrong form. Read \protect\hyperlink{cha:data}{tidying} to find out -the right form. - -You might find it surprising that axes and legends are the same type of -thing, but while they look very different there are many natural -correspondences between the two, as shown in table below and in Figure -\ref{fig:guides}. \index{Guides} \index{Legend} \index{Axis} - -\begin{figure}[htbp] - \centering - \includegraphics[width=\linewidth]{diagrams/scale-guides.pdf} - \caption{Axis and legend components} - \label{fig:guides} -\end{figure} - -\begin{longtable}[c]{@{}lll@{}} -\toprule -Axis & Legend & Argument name\tabularnewline -\midrule -\endhead -Label & Title & \texttt{name}\tabularnewline -Ticks \& grid line & Key & \texttt{breaks}\tabularnewline -Tick label & Key label & \texttt{labels}\tabularnewline -\bottomrule -\end{longtable} - -The following sections covers each of the \texttt{name}, \texttt{breaks} -and \texttt{labels} arguments in more detail. - -\subsection{Scale title}\label{scale-title} - -The first argument to the scale function, \texttt{name}, is the -axes/legend title. You can supply text strings (using -\texttt{\textbackslash{}n} for line breaks) or mathematical expressions -in \texttt{quote()} (as described in \texttt{?plotmath}): -\index{Axis!title} \index{Legend!title} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{2}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{, }\DataTypeTok{z =} \StringTok{"a"}\NormalTok{)} -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\NormalTok{p +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\StringTok{"X axis"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\KeywordTok{quote}\NormalTok{(a +}\StringTok{ }\NormalTok{mathematical ^}\StringTok{ }\NormalTok{expression))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/guide-names-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/guide-names-2} -\end{figure} - -Because tweaking these labels is such a common task, there are three -helpers that save you some typing: \texttt{xlab()}, \texttt{ylab()} and -\texttt{labs()}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z))} -\NormalTok{p +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\StringTok{"X axis"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\StringTok{"Y axis"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"X axis"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Y axis"}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"Colour}\CharTok{\textbackslash{}n}\StringTok{legend"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/guide-names-helper-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/guide-names-helper-2} -\end{figure} - -There are two ways to remove the axis label. Setting it to \texttt{""} -omits the label, but still allocates space; \texttt{NULL} removes the -label and its space. Look closely at the left and bottom borders of the -following two plots. I've drawn a grey rectangle around the plot to make -it easier to see the difference. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{))} -\NormalTok{p +}\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{""}\NormalTok{, }\DataTypeTok{y =} \StringTok{""}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/guide-names-remove-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/guide-names-remove-2} -\end{figure} - -\subsection{Breaks and labels}\label{breaks-and-labels} - -The \texttt{breaks} argument controls which values appear as tick marks -on axes and keys on legends. Each break has an associated label, -controlled by the \texttt{labels} argument. If you set \texttt{labels}, -you must also set \texttt{breaks}; otherwise, if data changes, the -breaks will no longer align with the labels. \index{Axis!ticks} -\index{Axis!breaks} \index{Axis!labels} \index{Legend!keys} - -The following code shows some basic examples for both axes and legends. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{3}\NormalTok{, }\DecValTok{5}\NormalTok{) *}\StringTok{ }\DecValTok{1000}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{)} -\NormalTok{axs <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{axs} -\NormalTok{axs +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{breaks =} \KeywordTok{c}\NormalTok{(}\DecValTok{2000}\NormalTok{, }\DecValTok{4000}\NormalTok{))} -\NormalTok{axs +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{breaks =} \KeywordTok{c}\NormalTok{(}\DecValTok{2000}\NormalTok{, }\DecValTok{4000}\NormalTok{), }\DataTypeTok{labels =} \KeywordTok{c}\NormalTok{(}\StringTok{"2k"}\NormalTok{, }\StringTok{"4k"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/breaks-labels-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/breaks-labels-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/breaks-labels-3} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{leg <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(y, x, }\DataTypeTok{fill =} \NormalTok{x)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{leg} -\NormalTok{leg +}\StringTok{ }\KeywordTok{scale_fill_continuous}\NormalTok{(}\DataTypeTok{breaks =} \KeywordTok{c}\NormalTok{(}\DecValTok{2000}\NormalTok{, }\DecValTok{4000}\NormalTok{))} -\NormalTok{leg +}\StringTok{ }\KeywordTok{scale_fill_continuous}\NormalTok{(}\DataTypeTok{breaks =} \KeywordTok{c}\NormalTok{(}\DecValTok{2000}\NormalTok{, }\DecValTok{4000}\NormalTok{), }\DataTypeTok{labels =} \KeywordTok{c}\NormalTok{(}\StringTok{"2k"}\NormalTok{, }\StringTok{"4k"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-5-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-5-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-5-3} -\end{figure} - -If you want to relabel the breaks in a categorical scale, you can use a -named labels vector: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df2 <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(df2, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(df2, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_discrete}\NormalTok{(}\DataTypeTok{labels =} \KeywordTok{c}\NormalTok{(}\DataTypeTok{a =} \StringTok{"apple"}\NormalTok{, }\DataTypeTok{b =} \StringTok{"banana"}\NormalTok{, }\DataTypeTok{c =} \StringTok{"carrot"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-6-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-6-2} -\end{figure} - -To suppress breaks (and for axes, grid lines) or labels, set them to -\texttt{NULL}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{axs +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{breaks =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{axs +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{labels =} \OtherTok{NULL}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/axs-breaks-hide-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/axs-breaks-hide-2} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{leg +}\StringTok{ }\KeywordTok{scale_fill_continuous}\NormalTok{(}\DataTypeTok{breaks =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{leg +}\StringTok{ }\KeywordTok{scale_fill_continuous}\NormalTok{(}\DataTypeTok{labels =} \OtherTok{NULL}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/leg-breaks-hide-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/leg-breaks-hide-2} -\end{figure} - -Additionally, you can supply a function to \texttt{breaks} or -\texttt{labels}. The \texttt{breaks} function should have one argument, -the limits (a numeric vector of length two), and should return a numeric -vector of breaks. The \texttt{labels} function should accept a numeric -vector of breaks and return a character vector of labels (the same -length as the input). The scales package provides a number of useful -labelling functions: - -\begin{itemize} -\item - \texttt{scales::comma\_format()} adds commas to make it easier to read - large numbers. -\item - \texttt{scales::unit\_format(unit,\ scale)} adds a unit suffix, - optionally scaling. -\item - \texttt{scales::dollar\_format(prefix,\ suffix)} displays currency - values, rounding to two decimal places and adding a prefix or suffix. -\item - \texttt{scales::wrap\_format()} wraps long labels into multiple lines. -\end{itemize} - -See the documentation of the scales package for more details. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{axs +}\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\DataTypeTok{labels =} \NormalTok{scales::}\KeywordTok{percent_format}\NormalTok{())} -\NormalTok{axs +}\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\DataTypeTok{labels =} \NormalTok{scales::}\KeywordTok{dollar_format}\NormalTok{(}\StringTok{"$"}\NormalTok{))} -\NormalTok{leg +}\StringTok{ }\KeywordTok{scale_fill_continuous}\NormalTok{(}\DataTypeTok{labels =} \NormalTok{scales::}\KeywordTok{unit_format}\NormalTok{(}\StringTok{"k"}\NormalTok{, }\FloatTok{1e-3}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/breaks-functions-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/breaks-functions-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/breaks-functions-3} -\end{figure} - -You can adjust the minor breaks (the faint grid lines that appear -between the major grid lines) by supplying a numeric vector of positions -to the \texttt{minor\_breaks} argument. This is particularly useful for -log scales: \index{Minor breaks} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\DecValTok{2}\NormalTok{, }\DecValTok{3}\NormalTok{, }\DecValTok{5}\NormalTok{, }\DecValTok{10}\NormalTok{, }\DecValTok{200}\NormalTok{, }\DecValTok{3000}\NormalTok{), }\DataTypeTok{y =} \DecValTok{1}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_log10}\NormalTok{()} - -\NormalTok{mb <-}\StringTok{ }\KeywordTok{as.numeric}\NormalTok{(}\DecValTok{1}\NormalTok{:}\DecValTok{10} \NormalTok{%o%}\StringTok{ }\DecValTok{10} \NormalTok{^}\StringTok{ }\NormalTok{(}\DecValTok{0}\NormalTok{:}\DecValTok{4}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_log10}\NormalTok{(}\DataTypeTok{minor_breaks =} \KeywordTok{log10}\NormalTok{(mb))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-7-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-7-2} -\end{figure} - -Note the use of \texttt{\%o\%} to quickly generate the multiplication -table, and that the minor breaks must be supplied on the transformed -scale. \index{Log!ticks} - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Recreate the following graphic: - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-8-1} - \end{figure} - - Adjust the y axis label so that the parentheses are the right size. -\item - List the three different types of object you can supply to the - \texttt{breaks} argument. How do \texttt{breaks} and \texttt{labels} - differ? -\item - Recreate the following plot: - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-9-1} - \end{figure} -\item - What label function allows you to create mathematical expressions? - What label function converts 1 to 1st, 2 to 2nd, and so on? -\item - What are the three most important arguments that apply to both axes - and legends? What do they do? Compare and contrast their operation for - axes vs.~legends. -\end{enumerate} - -\hypertarget{sec:legends}{\section{Legends}\label{sec:legends}} - -While the most important parameters are shared between axes and legends, -there are some extra options that only apply to legends. Legends are -more complicated than axes because: \index{Legend} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - A legend can display multiple aesthetics (e.g.~colour and shape), from - multiple layers, and the symbol displayed in a legend varies based on - the geom used in the layer. -\item - Axes always appear in the same place. Legends can appear in different - places, so you need some global way of controlling them. -\item - Legends have considerably more details that can be tweaked: should - they be displayed vertically or horizontally? How many columns? How - big should the keys be? -\end{enumerate} - -The following sections describe the options that control these -interactions. - -\hypertarget{sub-layers-legends}{\subsection{Layers and -legends}\label{sub-layers-legends}} - -A legend may need to draw symbols from multiple layers. For example, if -you've mapped colour to both points and lines, the keys will show both -points and lines. If you've mapped fill colour, you get a rectangle. -Note the way the legend varies in the plots below: - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-geom-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-geom-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-geom-3} -\end{figure} - -By default, a layer will only appear if the corresponding aesthetic is -mapped to a variable with \texttt{aes()}. You can override whether or -not a layer appears in the legend with \texttt{show.legend}: -\texttt{FALSE} to prevent a layer from ever appearing in the legend; -\texttt{TRUE} forces it to appear when it otherwise wouldn't. Using -\texttt{TRUE} can be useful in conjunction with the following trick to -make points stand out: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(y, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{4}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"grey20"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z), }\DataTypeTok{size =} \DecValTok{2}\NormalTok{) } -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(y, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{4}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"grey20"}\NormalTok{, }\DataTypeTok{show.legend =} \OtherTok{TRUE}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z), }\DataTypeTok{size =} \DecValTok{2}\NormalTok{) } -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-10-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-10-2} -\end{figure} - -Sometimes you want the geoms in the legend to display differently to the -geoms in the plot. This is particularly useful when you've used -transparency or size to deal with moderate overplotting and also used -colour in the plot. You can do this using the \texttt{override.aes} -parameter of \texttt{guide\_legend()}, which you'll learn more about -shortly. \indexf{override.aes} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{norm <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{rnorm}\NormalTok{(}\DecValTok{1000}\NormalTok{), }\DataTypeTok{y =} \KeywordTok{rnorm}\NormalTok{(}\DecValTok{1000}\NormalTok{))} -\NormalTok{norm$z <-}\StringTok{ }\KeywordTok{cut}\NormalTok{(norm$x, }\DecValTok{3}\NormalTok{, }\DataTypeTok{labels =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(norm, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z), }\DataTypeTok{alpha =} \FloatTok{0.1}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(norm, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z), }\DataTypeTok{alpha =} \FloatTok{0.1}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{colour =} \KeywordTok{guide_legend}\NormalTok{(}\DataTypeTok{override.aes =} \KeywordTok{list}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1}\NormalTok{)))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-11-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-11-2} -\end{figure} - -ggplot2 tries to use the fewest number of legends to accurately convey -the aesthetics used in the plot. It does this by combining legends where -the same variable is mapped to different aesthetics. The figure below -shows how this works for points: if both colour and shape are mapped to -the same variable, then only a single legend is necessary. -\index{Legend!merging} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{shape =} \NormalTok{z))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{shape =} \NormalTok{z, }\DataTypeTok{colour =} \NormalTok{z))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-merge-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-merge-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-merge-3} -\end{figure} - -In order for legends to be merged, they must have the same -\texttt{name}. So if you change the name of one of the scales, you'll -need to change it for all of them. - -\subsection{Legend layout}\label{sub:legend-layout} - -A number of settings that affect the overall display of the legends are -controlled through the theme system. You'll learn more about that in -\protect\hyperlink{sec:themes}{themes}, but for now, all you need to -know is that you modify theme settings with the \texttt{theme()} -function. \index{Themes!legend} - -The position and justification of legends are controlled by the theme -setting \texttt{legend.position}, which takes values ``right'', -``left'', ``top'', ``bottom'', or ``none'' (no legend). -\index{Legend!layout} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{z =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{))} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z), }\DataTypeTok{size =} \DecValTok{3}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} - -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"right"}\NormalTok{) }\CommentTok{# the default } -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"bottom"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-position-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-position-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-position-3} -\end{figure} - -Switching between left/right and top/bottom modifies how the keys in -each legend are laid out (horizontal or vertically), and how multiple -legends are stacked (horizontal or vertically). If needed, you can -adjust those options independently: - -\begin{itemize} -\item - \texttt{legend.direction}: layout of items in legends (``horizontal'' - or ``vertical''). -\item - \texttt{legend.box}: arrangement of multiple legends (``horizontal'' - or ``vertical''). -\item - \texttt{legend.box.just}: justification of each legend within the - overall bounding box, when there are multiple legends (``top'', - ``bottom'', ``left'', or ``right''). -\end{itemize} - -Alternatively, if there's a lot of blank space in your plot you might -want to place the legend inside the plot. You can do this by setting -\texttt{legend.position} to a numeric vector of length two. The numbers -represent a relative location in the panel area: \texttt{c(0,\ 1)} is -the top-left corner and \texttt{c(1,\ 0)} is the bottom-right corner. -You control which corner of the legend the \texttt{legend.position} -refers to with \texttt{legend.justification}, which is specified in a -similar way. Unfortunately positioning the legend exactly where you want -it requires a lot of trial and error. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z), }\DataTypeTok{size =} \DecValTok{3}\NormalTok{)} - -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{1}\NormalTok{), }\DataTypeTok{legend.justification =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{1}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \KeywordTok{c}\NormalTok{(}\FloatTok{0.5}\NormalTok{, }\FloatTok{0.5}\NormalTok{), }\DataTypeTok{legend.justification =} \KeywordTok{c}\NormalTok{(}\FloatTok{0.5}\NormalTok{, }\FloatTok{0.5}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{0}\NormalTok{), }\DataTypeTok{legend.justification =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{0}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-position-man-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-position-man-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-position-man-3} -\end{figure} - -There's also a margin around the legends, which you can suppress with -\texttt{legend.margin\ =\ unit(0,\ "mm")}. - -\subsection{Guide functions}\label{guide-functions} - -The guide functions, \texttt{guide\_colourbar()} and -\texttt{guide\_legend()}, offer additional control over the fine details -of the legend. Legend guides can be used for any aesthetic (discrete or -continuous) while the colour bar guide can only be used with continuous -colour scales. - -You can override the default guide using the \texttt{guide} argument of -the corresponding scale function, or more conveniently, the -\texttt{guides()} helper function. \texttt{guides()} works like -\texttt{labs()}: you can override the default guide associated with each -aesthetic. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{z =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{)} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{z))} -\NormalTok{base } -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_fill_continuous}\NormalTok{(}\DataTypeTok{guide =} \KeywordTok{guide_legend}\NormalTok{())} -\NormalTok{base +}\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{fill =} \KeywordTok{guide_legend}\NormalTok{())} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-12-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-12-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-12-3} -\end{figure} - -Both functions have numerous examples in their documentation help pages -that illustrate all of their arguments. Most of the arguments to the -guide function control the fine level details of the text colour, size, -font etc. You'll learn about those in the themes chapter. Here I'll -focus on the most important arguments. - -\subsubsection{\texorpdfstring{\texttt{guide\_legend()}}{guide\_legend()}}\label{guideux5flegend} - -The legend guide displays individual keys in a table. The most useful -options are: \index{Legend!guide} - -\begin{itemize} -\item - \texttt{nrow} or \texttt{ncol} which specify the dimensions of the - table. \texttt{byrow} controls how the table is filled: \texttt{FALSE} - fills it by column (the default), \texttt{TRUE} fills it by row. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{, }\DataTypeTok{z =} \NormalTok{letters[}\DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{])} -\CommentTok{# Base plot} -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{z))} -\NormalTok{p} -\NormalTok{p +}\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{fill =} \KeywordTok{guide_legend}\NormalTok{(}\DataTypeTok{ncol =} \DecValTok{2}\NormalTok{))} -\NormalTok{p +}\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{fill =} \KeywordTok{guide_legend}\NormalTok{(}\DataTypeTok{ncol =} \DecValTok{2}\NormalTok{, }\DataTypeTok{byrow =} \OtherTok{TRUE}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-rows-cols-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-rows-cols-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/legend-rows-cols-3} - \end{figure} -\item - \texttt{reverse} reverses the order of the keys. This is particularly - useful when you have stacked bars because the default stacking and - legend orders are different: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(}\DecValTok{1}\NormalTok{, y)) +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{z))} -\NormalTok{p} -\NormalTok{p +}\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{fill =} \KeywordTok{guide_legend}\NormalTok{(}\DataTypeTok{reverse =} \OtherTok{TRUE}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-13-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-13-2} - \end{figure} -\item - \texttt{override.aes}: override some of the aesthetic settings derived - from each layer. This is useful if you want to make the elements in - the legend more visually prominent. See discussion in - \protect\hyperlink{sub-layers-legends}{layers and legends}. -\item - \texttt{keywidth} and \texttt{keyheight} (along with - \texttt{default.unit}) allow you to specify the size of the keys. - These are grid units, e.g. \texttt{unit(1,\ "cm")}. -\end{itemize} - -\subsubsection{\texorpdfstring{\texttt{guide\_colourbar}}{guide\_colourbar}}\label{guideux5fcolourbar} - -The colour bar guide is designed for continuous ranges of colors---as -its name implies, it outputs a rectangle over which the color gradient -varies. The most important arguments are: \index{Legend!colour bar} -\index{Colour bar} - -\begin{itemize} -\item - \texttt{barwidth} and \texttt{barheight} (along with - \texttt{default.unit}) allow you to specify the size of the bar. These - are grid units, e.g. \texttt{unit(1,\ "cm")}. -\item - \texttt{nbin} controls the number of slices. You may want to increase - this from the default value of 20 if you draw a very long bar. -\item - \texttt{reverse} flips the colour bar to put the lowest values at the - top. -\end{itemize} - -These options are illustrated below: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{, }\DataTypeTok{z =} \DecValTok{4}\NormalTok{:}\DecValTok{1}\NormalTok{)} -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_tile}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{z))} - -\NormalTok{p} -\NormalTok{p +}\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{fill =} \KeywordTok{guide_colorbar}\NormalTok{(}\DataTypeTok{reverse =} \OtherTok{TRUE}\NormalTok{))} -\NormalTok{p +}\StringTok{ }\KeywordTok{guides}\NormalTok{(}\DataTypeTok{fill =} \KeywordTok{guide_colorbar}\NormalTok{(}\DataTypeTok{barheight =} \KeywordTok{unit}\NormalTok{(}\DecValTok{4}\NormalTok{, }\StringTok{"cm"}\NormalTok{)))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-14-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-14-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-14-3} -\end{figure} - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - How do you make legends appear to the left of the plot? -\item - What's gone wrong with this plot? How could you fix it? - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{drv, }\DataTypeTok{shape =} \NormalTok{drv)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_discrete}\NormalTok{(}\StringTok{"Drive train"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/scales/unnamed-chunk-15-1} - \end{figure} -\item - Can you recreate the code for this plot? - - \begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/scales/unnamed-chunk-16-1} - \end{figure} -\end{enumerate} - -\hypertarget{sec:limits}{\section{Limits}\label{sec:limits}} - -The limits, or domain, of a scale are usually derived from the range of -the data. \index{Axis!limits} \index{Scales!limits} There are two -reasons you might want to specify limits rather than relying on the -data: - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - You want to make limits smaller than the range of the data to focus on - an interesting area of the plot. -\item - You want to make the limits larger than the range of the data because - you want multiple plots to match up. -\end{enumerate} - -It's most natural to think about the limits of position scales: they map -directly to the ranges of the axes. But limits also apply to scales that -have legends, like colour, size, and shape. This is particularly -important to realise if you want your colours to match up across -multiple plots in your paper. - -You can modify the limits using the \texttt{limits} parameter of the -scale: - -\begin{itemize} -\item - For continuous scales, this should be a numeric vector of length two. - If you only want to set the upper or lower limit, you can set the - other value to \texttt{NA}. -\item - For discrete scales, this is a character vector which enumerates all - possible values. -\end{itemize} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{)} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{() } - -\NormalTok{base} -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{limits =} \KeywordTok{c}\NormalTok{(}\FloatTok{1.5}\NormalTok{, }\FloatTok{2.5}\NormalTok{))} -\CommentTok{#> Warning: Removed 2 rows containing missing values (geom_point).} -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{limits =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{4}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-17-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-17-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-17-3} -\end{figure} - -Because modifying the limits is such a common task, ggplot2 provides -some helper to make this even easier: \texttt{xlim()}, \texttt{ylim()} -and \texttt{lims()} These functions inspect their input and then create -the appropriate scale, as follows: \indexf{xlim} \indexf{ylim} - -\begin{itemize} -\tightlist -\item - \texttt{xlim(10,\ 20)}: a continuous scale from 10 to 20 -\item - \texttt{ylim(20,\ 10)}: a reversed continuous scale from 20 to 10 -\item - \texttt{xlim("a",\ "b",\ "c")}: a discrete scale -\item - \texttt{xlim(as.Date(c("2008-05-01",\ "2008-08-01")))}: a date scale - from May 1 to August 1 2008. -\end{itemize} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{4}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{4}\NormalTok{, }\DecValTok{0}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{lims}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{4}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-18-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-18-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-18-3} -\end{figure} - -If you have eagle eyes, you'll have noticed that the range of the axes -actually extends a little bit past the limits that you've specified. -This ensures that the data does not overlap the axes. To eliminate this -space, set \texttt{expand\ =\ c(0,\ 0)}. This is useful in conjunction -with \texttt{geom\_raster()}: \index{Axis!expansion} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(waiting, eruptions)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{density)) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(waiting, eruptions)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{density)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{expand =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{,}\DecValTok{0}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\DataTypeTok{expand =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{,}\DecValTok{0}\NormalTok{)) +} -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.33\linewidth]{_figures/scales/unnamed-chunk-19-1}% - \includegraphics[width=0.33\linewidth]{_figures/scales/unnamed-chunk-19-2} -\end{figure} - -By default, any data outside the limits is converted to \texttt{NA}. -This means that setting the limits is not the same as visually zooming -in to a region of the plot. To do that, you need to use the -\texttt{xlim} and \texttt{ylim} arguments to -\texttt{coord\_cartesian()}, described in -\protect\hyperlink{sub:cartesian}{cartesian coordinate systems}. This -performs purely visual zooming and does not affect the underlying data. -\index{Zooming} You can override this with the \texttt{oob} (out of -bounds) argument to the scale. The default is \texttt{scales::censor()} -which replaces any value outside the limits with \texttt{NA}. Another -option is \texttt{scales::squish()} which squishes all values into the -range: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{5}\NormalTok{)} -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, }\DecValTok{1}\NormalTok{)) +}\StringTok{ }\KeywordTok{geom_tile}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{x), }\DataTypeTok{colour =} \StringTok{"white"}\NormalTok{)} -\NormalTok{p} -\NormalTok{p +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(}\DataTypeTok{limits =} \KeywordTok{c}\NormalTok{(}\DecValTok{2}\NormalTok{, }\DecValTok{4}\NormalTok{))} -\NormalTok{p +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(}\DataTypeTok{limits =} \KeywordTok{c}\NormalTok{(}\DecValTok{2}\NormalTok{, }\DecValTok{4}\NormalTok{), }\DataTypeTok{oob =} \NormalTok{scales::squish)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.33\linewidth]{_figures/scales/unnamed-chunk-20-1}% - \includegraphics[width=0.33\linewidth]{_figures/scales/unnamed-chunk-20-2}% - \includegraphics[width=0.33\linewidth]{_figures/scales/unnamed-chunk-20-3} -\end{figure} - -\subsection{Exercises}\label{exercises-3} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - The following code creates two plots of the mpg dataset. Modify the - code so that the legend and axes match, without using facetting! - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{fwd <-}\StringTok{ }\KeywordTok{subset}\NormalTok{(mpg, drv ==}\StringTok{ "f"}\NormalTok{)} -\NormalTok{rwd <-}\StringTok{ }\KeywordTok{subset}\NormalTok{(mpg, drv ==}\StringTok{ "r"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(fwd, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(rwd, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-21-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-21-2} - \end{figure} -\item - What does \texttt{expand\_limits()} do and how does it work? Read the - source code. -\item - What happens if you add two \texttt{xlim()} calls to the same plot? - Why? -\item - What does \texttt{scale\_x\_continuous(limits\ =\ c(NA,\ NA))} do? -\end{enumerate} - -\hypertarget{sec:scale-details}{\section{Scales -toolbox}\label{sec:scale-details}} - -As well as tweaking the options of the default scales, you can also -override them completely with new scales. Scales can be divided roughly -into four families: - -\begin{itemize} -\item - Continuous position scales used to map integer, numeric, and date/time - data to x and y position. -\item - Colour scales, used to map continuous and discrete data to colours. -\item - Manual scales, used to map discrete variables to your choice of size, - line type, shape or colour. -\item - The identity scale, paradoxically used to plot variables - \emph{without} scaling them. This is useful if your data is already a - vector of colour names. -\end{itemize} - -The follow sections describe each family in more detail. - -\subsection{Continuous position scales}\label{sub:scale-position} - -Every plot has two position scales, x and y. \index{Scales!position} -\index{Positioning!scales} The most common continuous position scales -are \texttt{scale\_x\_continuous()} and \texttt{scale\_y\_continuous()}, -which linearly map data to the x and y axis. \index{Scales!position} -\index{Transformation!scales} \indexf{scale\_x\_continuous} The most -interesting variations are produced using transformations. Every -continuous scale takes a \texttt{trans} argument, allowing the use of a -variety of transformations: - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Convert from fuel economy to fuel consumption} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\DataTypeTok{trans =} \StringTok{"reciprocal"}\NormalTok{)} - -\CommentTok{# Log transform x and y axes} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(price, carat)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\DataTypeTok{trans =} \StringTok{"log10"}\NormalTok{) +} -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\DataTypeTok{trans =} \StringTok{"log10"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-22-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/unnamed-chunk-22-2} -\end{figure} - -The transformation is carried out by a ``transformer'', which describes -the transformation, its inverse, and how to draw the labels. The -following table lists the most common variants: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -Name & Function \(f(x)\) & Inverse \(f^{-1}(y)\)\tabularnewline -\midrule -\endhead -asn & \(\tanh^{-1}(x)\) & \(\tanh(y)\)\tabularnewline -exp & \(e ^ x\) & \(\log(y)\)\tabularnewline -identity & \(x\) & \(y\)\tabularnewline -log & \(\log(x)\) & \(e ^ y\)\tabularnewline -log10 & \(\log_{10}(x)\) & \(10 ^ y\)\tabularnewline -log2 & \(\log_2(x)\) & \(2 ^ y\)\tabularnewline -logit & \(\log(\frac{x}{1 - x})\) & -\(\frac{1}{1 + e(y)}\)\tabularnewline -pow10 & \(10^x\) & \(\log_{10}(y)\)\tabularnewline -probit & \(\Phi(x)\) & \(\Phi^{-1}(y)\)\tabularnewline -reciprocal & \(x^{-1}\) & \(y^{-1}\)\tabularnewline -reverse & \(-x\) & \(-y\)\tabularnewline -sqrt & \(x^{1/2}\) & \(y ^ 2\)\tabularnewline -\bottomrule -\end{longtable} - -There are shortcuts for the most common: \texttt{scale\_x\_log10()}, -\texttt{scale\_x\_sqrt()} and \texttt{scale\_x\_reverse()} (and -similarly for \texttt{y}.) \index{Log!scale} \indexf{scale\_x\_log10} - -Of course, you can also perform the transformation yourself. For -example, instead of using \texttt{scale\_x\_log10()}, you could plot -\texttt{log10(x)}. The appearance of the geom will be the same, but the -tick labels will be different. If you use a transformed scale, the axes -will be labelled in the original data space; if you transform the data, -the axes will be labelled in the transformed space. - -In either case, the transformation occurs before any statistical -summaries. To transform, \emph{after} statistical computation, use -\texttt{coord\_trans()}. See \protect\hyperlink{sub:cartesian}{cartesian -coordinate systems} for more details. - -Date and date/time data are continuous variables with special labels. -ggplot2 works with \texttt{Date} (for dates) and \texttt{POSIXct} (for -date/times) classes: if your dates are in a different format you will -need to convert them with \texttt{as.Date()} or \texttt{as.POSIXct()}. -\index{Date/times} \index{Data!date/time} \index{Time} -\index{Scales!date/time} \indexf{scale\_x\_datetime} -\texttt{scale\_x\_date()} and \texttt{scale\_x\_datetime()} work -similarly to \texttt{scale\_x\_continuous()} but have special -\texttt{date\_breaks} and \texttt{date\_labels} arguments that work in -date-friendly units: - -\begin{itemize} -\item - \texttt{date\_breaks} and \texttt{date\_minor\_breaks()} allows you to - position breaks by date units (years, months, weeks, days, hours, - minutes, and seconds). For example, - \texttt{date\_breaks\ =\ "2\ weeks"} will place a major tick mark - every two weeks. -\item - \texttt{date\_labels} controls the display of the labels using the - same formatting strings as in \texttt{strptime()} and - \texttt{format()}: - - \begin{longtable}[c]{@{}ll@{}} - \toprule - String & Meaning\tabularnewline - \midrule - \endhead - \texttt{\%S} & second (00-59)\tabularnewline - \texttt{\%M} & minute (00-59)\tabularnewline - \texttt{\%l} & hour, in 12-hour clock (1-12)\tabularnewline - \texttt{\%I} & hour, in 12-hour clock (01-12)\tabularnewline - \texttt{\%p} & am/pm\tabularnewline - \texttt{\%H} & hour, in 24-hour clock (00-23)\tabularnewline - \texttt{\%a} & day of week, abbreviated (Mon-Sun)\tabularnewline - \texttt{\%A} & day of week, full (Monday-Sunday)\tabularnewline - \texttt{\%e} & day of month (1-31)\tabularnewline - \texttt{\%d} & day of month (01-31)\tabularnewline - \texttt{\%m} & month, numeric (01-12)\tabularnewline - \texttt{\%b} & month, abbreviated (Jan-Dec)\tabularnewline - \texttt{\%B} & month, full (January-December)\tabularnewline - \texttt{\%y} & year, without century (00-99)\tabularnewline - \texttt{\%Y} & year, with century (0000-9999)\tabularnewline - \bottomrule - \end{longtable} - - For example, if you wanted to display dates like 14/10/1979, you would - use the string \texttt{"\%d/\%m/\%Y"}. -\end{itemize} - -The code below illustrates some of these parameters. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(date, psavert)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +} -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} - -\NormalTok{base }\CommentTok{# Default breaks and labels} -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(}\DataTypeTok{date_labels =} \StringTok{"%y"}\NormalTok{, }\DataTypeTok{date_breaks =} \StringTok{"5 years"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/date-scale-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/date-scale-2} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(} - \DataTypeTok{limits =} \KeywordTok{as.Date}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"2004-01-01"}\NormalTok{, }\StringTok{"2005-01-01"}\NormalTok{)),} - \DataTypeTok{date_labels =} \StringTok{"%b %y"}\NormalTok{,} - \DataTypeTok{date_minor_breaks =} \StringTok{"1 month"} -\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(} - \DataTypeTok{limits =} \KeywordTok{as.Date}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"2004-01-01"}\NormalTok{, }\StringTok{"2004-06-01"}\NormalTok{)),} - \DataTypeTok{date_labels =} \StringTok{"%m/%d"}\NormalTok{,} - \DataTypeTok{date_minor_breaks =} \StringTok{"2 weeks"} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/date-scale-2-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/date-scale-2-2} -\end{figure} - -\subsection{Colour}\label{sub:scale-colour} - -After position, the most commonly used aesthetic is colour. There are -quite a few different ways of mapping values to colours in ggplot2: four -different gradient-based methods for continuous values, and two methods -for mapping discrete values. But before we look at the details of the -different methods, it's useful to learn a little bit of colour theory. -Colour theory is complex because the underlying biology of the eye and -brain is complex, and this introduction will only touch on some of the -more important issues. An excellent and more detailed exposition is -available online at \url{http://tinyurl.com/clrdtls}. \index{Colour} -\index{Scales!colour} - -At the physical level, colour is produced by a mixture of wavelengths of -light. To characterise a colour completely, we need to know the complete -mixture of wavelengths. Fortunately for us the human eye only has three -different colour receptors, and so we can summarise the perception of -any colour with just three numbers. You may be familiar with the RGB -encoding of colour space, which defines a colour by the intensities of -red, green and blue light needed to produce it. One problem with this -space is that it is not perceptually uniform: the two colours that are -one unit apart may look similar or very different depending on where -they are in the colour space. This makes it difficult to create a -mapping from a continuous variable to a set of colours. There have been -many attempts to come up with colours spaces that are more perceptually -uniform. We'll use a modern attempt called the HCL colour space, which -has three components of \textbf{h}ue, \textbf{c}hroma and -\textbf{l}uminance: \index{Colour!spaces} - -\begin{itemize} -\item - Hue is a number between 0 and 360 (an angle) which gives the - ``colour'' of the colour: like blue, red, orange, etc. -\item - Chroma is the purity of a colour. A chroma of 0 is grey, and the - maximum value of chroma varies with luminance. -\item - Luminance is the lightness of the colour. A luminance of 0 produces - black, and a luminance of 1 produces white. -\end{itemize} - -Hues are not perceived as being ordered: e.g.~green does not seem -``larger'' than red. The perception of chroma and luminance are ordered. - -The combination of these three components does not produce a simple -geometric shape. Figure \ref{fig:hcl} attempts to show the 3d shape of -the space. Each slice is a constant luminance (brightness) with hue -mapped to angle and chroma to radius. You can see the centre of each -slice is grey and the colours get more intense as they get closer to the -edge. - -\begin{figure}[htbp] - \centering - \includegraphics[width=\linewidth]{diagrams/hcl-space} - \caption{The shape of the HCL colour space. Hue is mapped to angle, chroma to radius and each slice shows a different luminance. The HCL space is a pretty odd shape, but you can see that colours near the centre of each slice are grey, and as you move towards the edges they become more intense. Slices for luminance 0 and 100 are omitted because they would, respectively, be a single black point and a single white point.} - \label{fig:hcl} -\end{figure} - -An additional complication is that many people (\textasciitilde{}10\% of -men) do not possess the normal complement of colour receptors and so can -distinguish fewer colours than usual. \index{Colour!blindness} In brief, -it's best to avoid red-green contrasts, and to check your plots with -systems that simulate colour blindness. Visicheck is one online -solution. Another alternative is the \textbf{dichromat} package (Lumley -2007) which provides tools for simulating colour blindness, and a set of -colour schemes known to work well for colour-blind people. You can also -help people with colour blindness in the same way that you can help -people with black-and-white printers: by providing redundant mappings to -other aesthetics like size, line type or shape. - -\subsubsection{Continuous}\label{ssub:colour-continuous} - -Colour gradients are often used to show the height of a 2d surface. In -the following example we'll use the surface of a 2d density estimate of -the \texttt{faithful} dataset (Azzalini and Bowman 1990), which records -the waiting time between eruptions and during each eruption for the Old -Faithful geyser in Yellowstone Park. I hide the legends and set -\texttt{expand} to 0, to focus on the appearance of the data. -\index{Colour!gradients} \index{Scales!colour}. Remember: I'm -illustrating these scales with filled tiles, but you can also use them -with coloured lines and points. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{erupt <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(waiting, eruptions, }\DataTypeTok{fill =} \NormalTok{density)) +} -\StringTok{ }\KeywordTok{geom_raster}\NormalTok{() +} -\StringTok{ }\KeywordTok{scale_x_continuous}\NormalTok{(}\OtherTok{NULL}\NormalTok{, }\DataTypeTok{expand =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_continuous}\NormalTok{(}\OtherTok{NULL}\NormalTok{, }\DataTypeTok{expand =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -There are four continuous colour scales: - -\begin{itemize} -\item - \texttt{scale\_colour\_gradient()} and - \texttt{scale\_fill\_gradient()}: a two-colour gradient, low-high - (light blue-dark blue). This is the default scale for continuous - colour, and is the same as \texttt{scale\_colour\_continuous()}. - Arguments \texttt{low} and \texttt{high} control the colours at either - end of the gradient. \indexf{scale\_colour\_gradient} - \indexf{scale\_fill\_gradient} - - Generally, for continuous colour scales you want to keep hue constant, - and vary chroma and luminance. The munsell colour system is useful for - this as it provides an easy way of specifying colours based on their - hue, chroma and luminance. Use \texttt{munsell::hue\_slice("5Y")} to - see the valid chroma and luminance values for a given hue. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{erupt} - -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(}\DataTypeTok{low =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{high =} \StringTok{"black"}\NormalTok{)} - -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(} - \DataTypeTok{low =} \NormalTok{munsell::}\KeywordTok{mnsl}\NormalTok{(}\StringTok{"5G 9/2"}\NormalTok{), } - \DataTypeTok{high =} \NormalTok{munsell::}\KeywordTok{mnsl}\NormalTok{(}\StringTok{"5G 6/8"}\NormalTok{)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-24-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-24-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-24-3} - \end{figure} -\item - \texttt{scale\_colour\_gradient2()} and - \texttt{scale\_fill\_gradient2()}: a three-colour gradient, - low-med-high (red-white-blue). As well as \texttt{low} and - \texttt{high} colours, these scales also have a \texttt{mid} colour - for the colour of the midpoint. The midpoint defaults to 0, but can be - set to any value with the \texttt{midpoint} argument. - \indexf{scale\_colour\_gradient2} \indexf{scale\_fill\_gradient2} - - It's artificial to use this colour scale with this dataset, but we can - force it by using the median of the density as the midpoint. Note that - the blues are much more intense than the reds (which you only see as a - very pale pink) - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mid <-}\StringTok{ }\KeywordTok{median}\NormalTok{(faithfuld$density)} -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_gradient2}\NormalTok{(}\DataTypeTok{midpoint =} \NormalTok{mid) } -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-25-1} - \end{figure} -\item - \texttt{scale\_colour\_gradientn()} and - \texttt{scale\_fill\_gradientn()}: a custom n-colour gradient. This is - useful if you have colours that are meaningful for your data (e.g., - black body colours or standard terrain colours), or you'd like to use - a palette produced by another package. The following code includes - palettes generated from routines in the \textbf{colorspace} package. - (Zeileis, Hornik, and Murrell 2008) describes the philosophy behind - these palettes and provides a good introduction to some of the - complexities of creating good colour scales. \index{Colour!palettes} - \indexf{scale\_colour\_gradientn} \indexf{scale\_fill\_gradientn} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_gradientn}\NormalTok{(}\DataTypeTok{colours =} \KeywordTok{terrain.colors}\NormalTok{(}\DecValTok{7}\NormalTok{))} -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_gradientn}\NormalTok{(}\DataTypeTok{colours =} \NormalTok{colorspace::}\KeywordTok{heat_hcl}\NormalTok{(}\DecValTok{7}\NormalTok{))} -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_gradientn}\NormalTok{(}\DataTypeTok{colours =} \NormalTok{colorspace::}\KeywordTok{diverge_hcl}\NormalTok{(}\DecValTok{7}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/colorspace-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/colorspace-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/colorspace-3} - \end{figure} - - By default, \texttt{colours} will be evenly spaced along the range of - the data. To make them unevenly spaced, use the \texttt{values} - argument, which should be a vector of values between 0 and 1. -\item - \texttt{scale\_color\_distiller()} and - \texttt{scale\_fill\_gradient()} apply the ColorBrewer colour scales - to continuous data. You use it the same way as - \texttt{scale\_fill\_brewer()}, described below: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_distiller}\NormalTok{()} -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_distiller}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"RdPu"}\NormalTok{)} -\NormalTok{erupt +}\StringTok{ }\KeywordTok{scale_fill_distiller}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"YlOrBr"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-26-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-26-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-26-3} - \end{figure} -\end{itemize} - -All continuous colour scales have an \texttt{na.value} parameter that -controls what colour is used for missing values (including values -outside the range of the scale limits). By default it is set to grey, -which will stand out when you use a colourful scale. If you use a black -and white scale, you might want to set it to something else to make it -more obvious. \indexc{na.value} \index{Missing values!changing colour} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{5}\NormalTok{, }\DataTypeTok{z =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{3}\NormalTok{, }\DecValTok{2}\NormalTok{, }\OtherTok{NA}\NormalTok{, }\DecValTok{5}\NormalTok{))} -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_tile}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{z), }\DataTypeTok{size =} \DecValTok{5}\NormalTok{)} -\NormalTok{p} -\CommentTok{# Make missing colours invisible} -\NormalTok{p +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(}\DataTypeTok{na.value =} \OtherTok{NA}\NormalTok{)} -\CommentTok{# Customise on a black and white scale} -\NormalTok{p +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(}\DataTypeTok{low =} \StringTok{"black"}\NormalTok{, }\DataTypeTok{high =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{na.value =} \StringTok{"red"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-27-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-27-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-27-3} -\end{figure} - -\subsubsection{Discrete}\label{ssub:colour-discrete} - -There are four colour scales for discrete data. We illustrate them with -a barchart that encodes both position and fill to the same variable: -\index{Colour!discrete scales} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{, }\StringTok{"d"}\NormalTok{), }\DataTypeTok{y =} \KeywordTok{c}\NormalTok{(}\DecValTok{3}\NormalTok{, }\DecValTok{4}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{))} -\NormalTok{bars <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{fill =} \NormalTok{x)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{) +} -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{itemize} -\item - The default colour scheme, \texttt{scale\_colour\_hue()}, picks evenly - spaced hues around the HCL colour wheel. This works well for up to - about eight colours, but after that it becomes hard to tell the - different colours apart. You can control the default chroma and - luminance, and the range of hues, with the \texttt{h}, \texttt{c} and - \texttt{l} arguments: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bars} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_hue}\NormalTok{(}\DataTypeTok{c =} \DecValTok{40}\NormalTok{)} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_hue}\NormalTok{(}\DataTypeTok{h =} \KeywordTok{c}\NormalTok{(}\DecValTok{180}\NormalTok{, }\DecValTok{300}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-29-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-29-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-29-3} - \end{figure} - - One disadvantage of the default colour scheme is that because the - colours all have the same luminance and chroma, when you print them in - black and white, they all appear as an identical shade of grey. - \indexf{scale\_colour\_hue} -\item - \texttt{scale\_colour\_brewer()} uses handpicked ``ColorBrewer'' - colours, \url{http://colorbrewer2.org/}. These colours have been - designed to work well in a wide variety of situations, although the - focus is on maps and so the colours tend to work better when displayed - in large areas. For categorical data, the palettes most of interest - are `Set1' and `Dark2' for points and `Set2', `Pastel1', `Pastel2' and - `Accent' for areas. Use \texttt{RColorBrewer::display.brewer.all()} to - list all palettes. \index{Colour!Brewer} - \indexf{scale\_colour\_brewer} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Set1"}\NormalTok{)} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Set2"}\NormalTok{)} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Accent"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-30-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-30-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-30-3} - \end{figure} -\item - \texttt{scale\_colour\_grey()} maps discrete data to grays, from light - to dark. \indexf{scale\_colour\_grey} \index{Colour!greys} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_grey}\NormalTok{()} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_grey}\NormalTok{(}\DataTypeTok{start =} \FloatTok{0.5}\NormalTok{, }\DataTypeTok{end =} \DecValTok{1}\NormalTok{)} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_grey}\NormalTok{(}\DataTypeTok{start =} \DecValTok{0}\NormalTok{, }\DataTypeTok{end =} \FloatTok{0.5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-31-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-31-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-31-3} - \end{figure} -\item - \texttt{scale\_colour\_manual()} is useful if you have your own - discrete colour palette. The following examples show colour palettes - inspired by Wes Anderson movies, as provided by the wesanderson - package, \url{https://github.com/karthik/wesanderson}. These are not - designed for perceptual uniformity, but are fun! - \indexf{scale\_colour\_manual} \index{wesanderson} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(wesanderson)} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_manual}\NormalTok{(}\DataTypeTok{values =} \KeywordTok{wes_palette}\NormalTok{(}\StringTok{"GrandBudapest"}\NormalTok{))} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_manual}\NormalTok{(}\DataTypeTok{values =} \KeywordTok{wes_palette}\NormalTok{(}\StringTok{"Zissou"}\NormalTok{))} -\NormalTok{bars +}\StringTok{ }\KeywordTok{scale_fill_manual}\NormalTok{(}\DataTypeTok{values =} \KeywordTok{wes_palette}\NormalTok{(}\StringTok{"Rushmore"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-32-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-32-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-32-3} - \end{figure} -\end{itemize} - -Note that one set of colours is not uniformly good for all purposes: -bright colours work well for points, but are overwhelming on bars. -Subtle colours work well for bars, but are hard to see on points: - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Bright colours work best with points} -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3} \NormalTok{+}\StringTok{ }\KeywordTok{runif}\NormalTok{(}\DecValTok{30}\NormalTok{), }\DataTypeTok{y =} \KeywordTok{runif}\NormalTok{(}\DecValTok{30}\NormalTok{), }\DataTypeTok{z =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{))} -\NormalTok{point <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{z)) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{) +} -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{point +}\StringTok{ }\KeywordTok{scale_colour_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Set1"}\NormalTok{)} -\NormalTok{point +}\StringTok{ }\KeywordTok{scale_colour_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Set2"}\NormalTok{) } -\NormalTok{point +}\StringTok{ }\KeywordTok{scale_colour_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Pastel1"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/brewer-pal-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/brewer-pal-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/brewer-pal-3} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Subtler colours work better with areas} -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{3}\NormalTok{:}\DecValTok{1}\NormalTok{, }\DataTypeTok{z =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{))} -\NormalTok{area <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{z), }\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{) +} -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{)} -\NormalTok{area +}\StringTok{ }\KeywordTok{scale_fill_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Set1"}\NormalTok{)} -\NormalTok{area +}\StringTok{ }\KeywordTok{scale_fill_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Set2"}\NormalTok{)} -\NormalTok{area +}\StringTok{ }\KeywordTok{scale_fill_brewer}\NormalTok{(}\DataTypeTok{palette =} \StringTok{"Pastel1"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-33-1}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-33-2}% - \includegraphics[width=0.333\linewidth]{_figures/scales/unnamed-chunk-33-3} -\end{figure} - -\subsection{The manual discrete scale}\label{sub:scale-manual} - -The discrete scales, \texttt{scale\_linetype()}, -\texttt{scale\_shape()}, and \texttt{scale\_size\_discrete()} basically -have no options. These scales are just a list of valid values that are -mapped to the unique discrete values. \index{Shape} \index{Line type} -\index{Size} \indexf{scale\_shape\_manual} -\indexf{scale\_colour\_manual} \indexf{scale\_linetype\_manual} - -If you want to customise these scales, you need to create your own new -scale with the manual scale: \texttt{scale\_shape\_manual()}, -\texttt{scale\_linetype\_manual()}, \texttt{scale\_colour\_manual()}. -The manual scale has one important argument, \texttt{values}, where you -specify the values that the scale should produce. If this vector is -named, it will match the values of the output to the values of the -input; otherwise it will match in order of the levels of the discrete -variable. You will need some knowledge of the valid aesthetic values, -which are described in \texttt{vignette("ggplot2-specs")}. - -The following code demonstrates the use of -\texttt{scale\_colour\_manual()}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{plot <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(msleep, }\KeywordTok{aes}\NormalTok{(brainwt, bodywt)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_x_log10}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_y_log10}\NormalTok{()} -\NormalTok{plot +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{vore)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_manual}\NormalTok{(} - \DataTypeTok{values =} \KeywordTok{c}\NormalTok{(}\StringTok{"red"}\NormalTok{, }\StringTok{"orange"}\NormalTok{, }\StringTok{"green"}\NormalTok{, }\StringTok{"blue"}\NormalTok{), } - \DataTypeTok{na.value =} \StringTok{"grey50"} - \NormalTok{)} -\CommentTok{#> Warning: Removed 27 rows containing missing values (geom_point).} - -\NormalTok{colours <-}\StringTok{ }\KeywordTok{c}\NormalTok{(} - \DataTypeTok{carni =} \StringTok{"red"}\NormalTok{, } - \DataTypeTok{insecti =} \StringTok{"orange"}\NormalTok{, } - \DataTypeTok{herbi =} \StringTok{"green"}\NormalTok{, } - \DataTypeTok{omni =} \StringTok{"blue"} -\NormalTok{)} -\NormalTok{plot +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{vore)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_manual}\NormalTok{(}\DataTypeTok{values =} \NormalTok{colours)} -\CommentTok{#> Warning: Removed 27 rows containing missing values (geom_point).} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/scales/scale-manual-1}% - \includegraphics[width=0.5\linewidth]{_figures/scales/scale-manual-2} -\end{figure} - -The following example shows a creative use of -\texttt{scale\_colour\_manual()} to display multiple variables on the -same plot and show a useful legend. In most other plotting systems, -you'd colour the lines and then add a legend: \index{Data!longitudinal} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{huron <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{year =} \DecValTok{1875}\NormalTok{:}\DecValTok{1972}\NormalTok{, }\DataTypeTok{level =} \KeywordTok{as.numeric}\NormalTok{(LakeHuron))} -\KeywordTok{ggplot}\NormalTok{(huron, }\KeywordTok{aes}\NormalTok{(year)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{level +}\StringTok{ }\DecValTok{5}\NormalTok{), }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{level -}\StringTok{ }\DecValTok{5}\NormalTok{), }\DataTypeTok{colour =} \StringTok{"blue"}\NormalTok{) } -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.8\linewidth]{_figures/scales/huron-1} -\end{figure} - -That doesn't work in ggplot because there's no way to add a legend -manually. Instead, give the lines informative labels: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(huron, }\KeywordTok{aes}\NormalTok{(year)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{level +}\StringTok{ }\DecValTok{5}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"above"}\NormalTok{)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{level -}\StringTok{ }\DecValTok{5}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"below"}\NormalTok{)) } -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.8\linewidth]{_figures/scales/huron2-1} -\end{figure} - -And then tell the scale how to map labels to colours: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(huron, }\KeywordTok{aes}\NormalTok{(year)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{level +}\StringTok{ }\DecValTok{5}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"above"}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =} \NormalTok{level -}\StringTok{ }\DecValTok{5}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"below"}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_colour_manual}\NormalTok{(}\StringTok{"Direction"}\NormalTok{, } - \DataTypeTok{values =} \KeywordTok{c}\NormalTok{(}\StringTok{"above"} \NormalTok{=}\StringTok{ "red"}\NormalTok{, }\StringTok{"below"} \NormalTok{=}\StringTok{ "blue"}\NormalTok{)} - \NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.8\linewidth]{_figures/scales/huron3-1} -\end{figure} - -See \protect\hyperlink{sec:spread-gather}{multiple time series} for -another approach. - -\subsection{The identity scale}\label{sub:scale-identity} - -The identity scale is used when your data is already scaled, when the -data and aesthetic spaces are the same. The code below shows an example -where the identity scale is useful. \texttt{luv\_colours} contains the -locations of all R's built-in colours in the LUV colour space (the space -that HCL is based on). A legend is unnecessary, because the point colour -represents itself: the data and aesthetic spaces are the same. -\index{Scales!identity} \indexf{scale\_identity} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{head}\NormalTok{(luv_colours)} -\CommentTok{#> L u v col} -\CommentTok{#> 1 9342 -3.37e-12 0 white} -\CommentTok{#> 2 9101 -4.75e+02 -635 aliceblue} -\CommentTok{#> 3 8810 1.01e+03 1668 antiquewhite} -\CommentTok{#> 4 8935 1.07e+03 1675 antiquewhite1} -\CommentTok{#> 5 8452 1.01e+03 1610 antiquewhite2} -\CommentTok{#> 6 7498 9.03e+02 1402 antiquewhite3} - -\KeywordTok{ggplot}\NormalTok{(luv_colours, }\KeywordTok{aes}\NormalTok{(u, v)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{col), }\DataTypeTok{size =} \DecValTok{3}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_color_identity}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_equal}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/scales/scale-identity-1} -\end{figure} - -\subsection{Exercises}\label{exercises-4} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Compare and contrast the four continuous colour scales with the four - discrete scales. -\item - Explore the distribution of the built-in \texttt{colors()} using the - \texttt{luv\_colours} dataset. -\end{enumerate} - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-azzalini:1990}{} -Azzalini, A., and A. W. Bowman. 1990. ``A Look at Some Data on the Old -Faithful Geyser.'' \emph{Applied Statistics} 39: 357--65. - -\hypertarget{ref-dichromat}{} -Lumley, Thomas. 2007. \emph{Dichromat: Color Schemes for Dichromats}. - -\hypertarget{ref-zeileis:2008}{} -Zeileis, Achim, Kurt Hornik, and Paul Murrell. 2008. ``Escaping RGBland: -Selecting Colors for Statistical Graphics.'' \emph{Computational -Statistics \& Data Analysis}. -\url{http://statmath.wu-wien.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2008.pdf}. diff --git a/book/tex/spbasic.bst b/book/tex/spbasic.bst deleted file mode 100755 index 4b879b2a..00000000 --- a/book/tex/spbasic.bst +++ /dev/null @@ -1,1659 +0,0 @@ -%% -%% This is file `spbasic.bst', -%% generated with the docstrip utility. -%% -%% The original source files were: -%% -%% merlin.mbs (with options: `ay,nat,seq-lab,vonx,nm-rvx,ed-rev,jnrlst,dt-beg,yr-par,yrp-x,yrpp-xsp,note-yr,jxper,jttl-rm,thtit-a,pgsep-c,num-xser,ser-vol,jnm-x,btit-rm,bt-rm,pre-pub,doi,edparxc,blk-tit,in-col,fin-bare,pp,ed,abr,mth-bare,ord,jabr,xand,eprint,url,url-blk,em-x,nfss,') -%% ---------------------------------------- -%% -%%********************************************************************************%% -%% %% -%% For Springer medical, life sciences, chemistry, geology, engineering and %% -%% computer science publications. %% -%% For use with the natbib package (see below). Default is author-year citations. %% -%% When citations are numbered, please use \usepackage[numbers]{natbib}. %% -%% A lack of punctuation is the key feature. Springer-Verlag 2004/10/15 %% -%% Report bugs and improvements to: Joylene Vette-Guillaume or Frank Holzwarth %% -%% %% -%%********************************************************************************%% -%% -%% Copyright 1994-2004 Patrick W Daly - % =============================================================== - % IMPORTANT NOTICE: - % This bibliographic style (bst) file has been generated from one or - % more master bibliographic style (mbs) files, listed above. - % - % This generated file can be redistributed and/or modified under the terms - % of the LaTeX Project Public License Distributed from CTAN - % archives in directory macros/latex/base/lppl.txt; either - % version 1 of the License, or any later version. - % =============================================================== - % Name and version information of the main mbs file: - % \ProvidesFile{merlin.mbs}[2004/02/09 4.13 (PWD, AO, DPC)] - % For use with BibTeX version 0.99a or later - %------------------------------------------------------------------- - % This bibliography style file is intended for texts in ENGLISH - % This is an author-year citation style bibliography. As such, it is - % non-standard LaTeX, and requires a special package file to function properly. - % Such a package is natbib.sty by Patrick W. Daly - % The form of the \bibitem entries is - % \bibitem[Jones et al.(1990)]{key}... - % \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}... - % The essential feature is that the label (the part in brackets) consists - % of the author names, as they should appear in the citation, with the year - % in parentheses following. There must be no space before the opening - % parenthesis! - % With natbib v5.3, a full list of authors may also follow the year. - % In natbib.sty, it is possible to define the type of enclosures that is - % really wanted (brackets or parentheses), but in either case, there must - % be parentheses in the label. - % The \cite command functions as follows: - % \citet{key} ==>> Jones et al. (1990) - % \citet*{key} ==>> Jones, Baker, and Smith (1990) - % \citep{key} ==>> (Jones et al., 1990) - % \citep*{key} ==>> (Jones, Baker, and Smith, 1990) - % \citep[chap. 2]{key} ==>> (Jones et al., 1990, chap. 2) - % \citep[e.g.][]{key} ==>> (e.g. Jones et al., 1990) - % \citep[e.g.][p. 32]{key} ==>> (e.g. Jones et al., p. 32) - % \citeauthor{key} ==>> Jones et al. - % \citeauthor*{key} ==>> Jones, Baker, and Smith - % \citeyear{key} ==>> 1990 - %--------------------------------------------------------------------- - -ENTRY - { address - archive - author - booktitle - chapter - doi - edition - editor - eid - eprint - howpublished - institution - journal - key - month - note - number - organization - pages - publisher - school - series - title - type - url - volume - year - } - {} - { label extra.label sort.label short.list } -INTEGERS { output.state before.all mid.sentence after.sentence after.block } -FUNCTION {init.state.consts} -{ #0 'before.all := - #1 'mid.sentence := - #2 'after.sentence := - #3 'after.block := -} -STRINGS { s t} -FUNCTION {output.nonnull} -{ 's := - output.state mid.sentence = - { ", " * write$ } - { output.state after.block = - { add.period$ write$ - newline$ - "\newblock " write$ - } - { output.state before.all = - 'write$ - { add.period$ " " * write$ } - if$ - } - if$ - mid.sentence 'output.state := - } - if$ - s -} -FUNCTION {output} -{ duplicate$ empty$ - 'pop$ - 'output.nonnull - if$ -} -FUNCTION {output.check} -{ 't := - duplicate$ empty$ - { pop$ "empty " t * " in " * cite$ * warning$ } - 'output.nonnull - if$ -} -FUNCTION {fin.entry} -{ duplicate$ empty$ - 'pop$ - 'write$ - if$ - newline$ -} - -FUNCTION {new.block} -{ output.state before.all = - 'skip$ - { after.block 'output.state := } - if$ -} -FUNCTION {new.sentence} -{ output.state after.block = - 'skip$ - { output.state before.all = - 'skip$ - { after.sentence 'output.state := } - if$ - } - if$ -} -FUNCTION {add.blank} -{ " " * before.all 'output.state := -} - -FUNCTION {no.blank.or.punct} -{ "\hspace{0pt}" * before.all 'output.state := -} - -FUNCTION {date.block} -{ - add.blank -} - -FUNCTION {not} -{ { #0 } - { #1 } - if$ -} -FUNCTION {and} -{ 'skip$ - { pop$ #0 } - if$ -} -FUNCTION {or} -{ { pop$ #1 } - 'skip$ - if$ -} -STRINGS {z} -FUNCTION {remove.dots} -{ 'z := - "" - { z empty$ not } - { z #1 #1 substring$ - z #2 global.max$ substring$ 'z := - duplicate$ "." = 'pop$ - { * } - if$ - } - while$ -} -FUNCTION {new.block.checkb} -{ empty$ - swap$ empty$ - and - 'skip$ - 'new.block - if$ -} -FUNCTION {field.or.null} -{ duplicate$ empty$ - { pop$ "" } - 'skip$ - if$ -} -FUNCTION {emphasize} -{ skip$ } -FUNCTION {tie.or.space.prefix} -{ duplicate$ text.length$ #3 < - { "~" } - { " " } - if$ - swap$ -} - -FUNCTION {capitalize} -{ "u" change.case$ "t" change.case$ } - -FUNCTION {space.word} -{ " " swap$ * " " * } - % Here are the language-specific definitions for explicit words. - % Each function has a name bbl.xxx where xxx is the English word. - % The language selected here is ENGLISH -FUNCTION {bbl.and} -{ "and"} - -FUNCTION {bbl.etal} -{ "et~al" } - -FUNCTION {bbl.editors} -{ "eds" } - -FUNCTION {bbl.editor} -{ "ed" } - -FUNCTION {bbl.edby} -{ "edited by" } - -FUNCTION {bbl.edition} -{ "edn" } - -FUNCTION {bbl.volume} -{ "vol" } - -FUNCTION {bbl.of} -{ "of" } - -FUNCTION {bbl.number} -{ "no." } - -FUNCTION {bbl.nr} -{ "no." } - -FUNCTION {bbl.in} -{ "in" } - -FUNCTION {bbl.pages} -{ "pp" } - -FUNCTION {bbl.page} -{ "p" } - -FUNCTION {bbl.chapter} -{ "chap" } - -FUNCTION {bbl.techrep} -{ "Tech. Rep." } - -FUNCTION {bbl.mthesis} -{ "Master's thesis" } - -FUNCTION {bbl.phdthesis} -{ "PhD thesis" } - -FUNCTION {bbl.first} -{ "1st" } - -FUNCTION {bbl.second} -{ "2nd" } - -FUNCTION {bbl.third} -{ "3rd" } - -FUNCTION {bbl.fourth} -{ "4th" } - -FUNCTION {bbl.fifth} -{ "5th" } - -FUNCTION {bbl.st} -{ "st" } - -FUNCTION {bbl.nd} -{ "nd" } - -FUNCTION {bbl.rd} -{ "rd" } - -FUNCTION {bbl.th} -{ "th" } - -MACRO {jan} {"Jan."} - -MACRO {feb} {"Feb."} - -MACRO {mar} {"Mar."} - -MACRO {apr} {"Apr."} - -MACRO {may} {"May"} - -MACRO {jun} {"Jun."} - -MACRO {jul} {"Jul."} - -MACRO {aug} {"Aug."} - -MACRO {sep} {"Sep."} - -MACRO {oct} {"Oct."} - -MACRO {nov} {"Nov."} - -MACRO {dec} {"Dec."} - -FUNCTION {eng.ord} -{ duplicate$ "1" swap$ * - #-2 #1 substring$ "1" = - { bbl.th * } - { duplicate$ #-1 #1 substring$ - duplicate$ "1" = - { pop$ bbl.st * } - { duplicate$ "2" = - { pop$ bbl.nd * } - { "3" = - { bbl.rd * } - { bbl.th * } - if$ - } - if$ - } - if$ - } - if$ -} - -MACRO {acmcs} {"ACM Comput Surv"} - -MACRO {acta} {"Acta Inf"} - -MACRO {cacm} {"Commun ACM"} - -MACRO {ibmjrd} {"IBM~J~Res Dev"} - -MACRO {ibmsj} {"IBM Syst~J"} - -MACRO {ieeese} {"IEEE Trans Softw Eng"} - -MACRO {ieeetc} {"IEEE Trans Comput"} - -MACRO {ieeetcad} - {"IEEE Trans Comput Aid Des"} - -MACRO {ipl} {"Inf Process Lett"} - -MACRO {jacm} {"J~ACM"} - -MACRO {jcss} {"J~Comput Syst Sci"} - -MACRO {scp} {"Sci Comput Program"} - -MACRO {sicomp} {"SIAM J~Comput"} - -MACRO {tocs} {"ACM Trans Comput Syst"} - -MACRO {tods} {"ACM Trans Database Syst"} - -MACRO {tog} {"ACM Trans Graphic"} - -MACRO {toms} {"ACM Trans Math Softw"} - -MACRO {toois} {"ACM Trans Office Inf Syst"} - -MACRO {toplas} {"ACM Trans Program Lang Syst"} - -MACRO {tcs} {"Theor Comput Sci"} - -FUNCTION {bibinfo.check} -{ swap$ - duplicate$ missing$ - { - pop$ pop$ - "" - } - { duplicate$ empty$ - { - swap$ pop$ - } - { swap$ - pop$ - } - if$ - } - if$ -} -FUNCTION {bibinfo.warn} -{ swap$ - duplicate$ missing$ - { - swap$ "missing " swap$ * " in " * cite$ * warning$ pop$ - "" - } - { duplicate$ empty$ - { - swap$ "empty " swap$ * " in " * cite$ * warning$ - } - { swap$ - pop$ - } - if$ - } - if$ -} -FUNCTION {format.eprint} -{ eprint duplicate$ empty$ - 'skip$ - { "\eprint" - archive empty$ - 'skip$ - { "[" * archive * "]" * } - if$ - "{" * swap$ * "}" * - } - if$ -} -FUNCTION {format.url} -{ url empty$ - { "" } - { "\urlprefix\url{" url * "}" * } - if$ -} - -STRINGS { bibinfo} -INTEGERS { nameptr namesleft numnames } - -FUNCTION {format.names} -{ 'bibinfo := - duplicate$ empty$ 'skip$ { - 's := - "" 't := - #1 'nameptr := - s num.names$ 'numnames := - numnames 'namesleft := - { namesleft #0 > } - { s nameptr - "{vv~}{ll}{ f{}}{ jj}" - format.name$ - remove.dots - bibinfo bibinfo.check - 't := - nameptr #1 > - { - namesleft #1 > - { ", " * t * } - { - "," * - s nameptr "{ll}" format.name$ duplicate$ "others" = - { 't := } - { pop$ } - if$ - t "others" = - { - " " * bbl.etal * - } - { " " * t * } - if$ - } - if$ - } - 't - if$ - nameptr #1 + 'nameptr := - namesleft #1 - 'namesleft := - } - while$ - } if$ -} -FUNCTION {format.names.ed} -{ - format.names -} -FUNCTION {format.key} -{ empty$ - { key field.or.null } - { "" } - if$ -} - -FUNCTION {format.authors} -{ author "author" format.names -} -FUNCTION {get.bbl.editor} -{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ } - -FUNCTION {format.editors} -{ editor "editor" format.names duplicate$ empty$ 'skip$ - { - " " * - get.bbl.editor - "(" swap$ * ")" * - * - } - if$ -} -FUNCTION {format.doi} -{ doi "doi" bibinfo.check - duplicate$ empty$ 'skip$ - { - "\doi{" swap$ * "}" * - } - if$ -} -FUNCTION {format.note} -{ - note empty$ - { "" } - { note #1 #1 substring$ - duplicate$ "{" = - 'skip$ - { output.state mid.sentence = - { "l" } - { "u" } - if$ - change.case$ - } - if$ - note #2 global.max$ substring$ * "note" bibinfo.check - } - if$ -} - -FUNCTION {format.title} -{ title - duplicate$ empty$ 'skip$ - { "t" change.case$ } - if$ - "title" bibinfo.check -} -FUNCTION {format.full.names} -{'s := - "" 't := - #1 'nameptr := - s num.names$ 'numnames := - numnames 'namesleft := - { namesleft #0 > } - { s nameptr - "{vv~}{ll}" format.name$ - 't := - nameptr #1 > - { - namesleft #1 > - { ", " * t * } - { - s nameptr "{ll}" format.name$ duplicate$ "others" = - { 't := } - { pop$ } - if$ - t "others" = - { - " " * bbl.etal * - } - { - numnames #2 > - { "," * } - 'skip$ - if$ - bbl.and - space.word * t * - } - if$ - } - if$ - } - 't - if$ - nameptr #1 + 'nameptr := - namesleft #1 - 'namesleft := - } - while$ -} - -FUNCTION {author.editor.key.full} -{ author empty$ - { editor empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { editor format.full.names } - if$ - } - { author format.full.names } - if$ -} - -FUNCTION {author.key.full} -{ author empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { author format.full.names } - if$ -} - -FUNCTION {editor.key.full} -{ editor empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { editor format.full.names } - if$ -} - -FUNCTION {make.full.names} -{ type$ "book" = - type$ "inbook" = - or - 'author.editor.key.full - { type$ "proceedings" = - 'editor.key.full - 'author.key.full - if$ - } - if$ -} - -FUNCTION {output.bibitem} -{ newline$ - "\bibitem[{" write$ - label write$ - ")" make.full.names duplicate$ short.list = - { pop$ } - { * } - if$ - "}]{" * write$ - cite$ write$ - "}" write$ - newline$ - "" - before.all 'output.state := -} - -FUNCTION {add.period} -{ duplicate$ empty$ - 'skip$ - { "." * add.blank } - if$ -} - -FUNCTION {if.digit} -{ duplicate$ "0" = - swap$ duplicate$ "1" = - swap$ duplicate$ "2" = - swap$ duplicate$ "3" = - swap$ duplicate$ "4" = - swap$ duplicate$ "5" = - swap$ duplicate$ "6" = - swap$ duplicate$ "7" = - swap$ duplicate$ "8" = - swap$ "9" = or or or or or or or or or -} -FUNCTION {n.separate} -{ 't := - "" - #0 'numnames := - { t empty$ not } - { t #-1 #1 substring$ if.digit - { numnames #1 + 'numnames := } - { #0 'numnames := } - if$ - t #-1 #1 substring$ swap$ * - t #-2 global.max$ substring$ 't := - numnames #5 = - { duplicate$ #1 #2 substring$ swap$ - #3 global.max$ substring$ - "," swap$ * * - } - 'skip$ - if$ - } - while$ -} -FUNCTION {n.dashify} -{ - n.separate - 't := - "" - { t empty$ not } - { t #1 #1 substring$ "-" = - { t #1 #2 substring$ "--" = not - { "--" * - t #2 global.max$ substring$ 't := - } - { { t #1 #1 substring$ "-" = } - { "-" * - t #2 global.max$ substring$ 't := - } - while$ - } - if$ - } - { t #1 #1 substring$ * - t #2 global.max$ substring$ 't := - } - if$ - } - while$ -} - -FUNCTION {word.in} -{ bbl.in capitalize - ":" * - " " * } - -FUNCTION {format.date} -{ year "year" bibinfo.check duplicate$ empty$ - { - "empty year in " cite$ * "; set to ????" * warning$ - pop$ "????" - } - 'skip$ - if$ - extra.label * - before.all 'output.state := - " (" swap$ * ")" * -} -FUNCTION {format.btitle} -{ title "title" bibinfo.check - duplicate$ empty$ 'skip$ - { - } - if$ -} -FUNCTION {either.or.check} -{ empty$ - 'pop$ - { "can't use both " swap$ * " fields in " * cite$ * warning$ } - if$ -} -FUNCTION {format.bvolume} -{ volume empty$ - { "" } - { bbl.volume volume tie.or.space.prefix - "volume" bibinfo.check * * - series "series" bibinfo.check - duplicate$ empty$ 'pop$ - { emphasize ", " * swap$ * } - if$ - "volume and number" number either.or.check - } - if$ -} -FUNCTION {format.number.series} -{ volume empty$ - { number empty$ - { series field.or.null } - { series empty$ - { number "number" bibinfo.check } - { output.state mid.sentence = - { bbl.number } - { bbl.number capitalize } - if$ - number tie.or.space.prefix "number" bibinfo.check * * - bbl.in space.word * - series "series" bibinfo.check * - } - if$ - } - if$ - } - { "" } - if$ -} -FUNCTION {is.num} -{ chr.to.int$ - duplicate$ "0" chr.to.int$ < not - swap$ "9" chr.to.int$ > not and -} - -FUNCTION {extract.num} -{ duplicate$ 't := - "" 's := - { t empty$ not } - { t #1 #1 substring$ - t #2 global.max$ substring$ 't := - duplicate$ is.num - { s swap$ * 's := } - { pop$ "" 't := } - if$ - } - while$ - s empty$ - 'skip$ - { pop$ s } - if$ -} - -FUNCTION {convert.edition} -{ extract.num "l" change.case$ 's := - s "first" = s "1" = or - { bbl.first 't := } - { s "second" = s "2" = or - { bbl.second 't := } - { s "third" = s "3" = or - { bbl.third 't := } - { s "fourth" = s "4" = or - { bbl.fourth 't := } - { s "fifth" = s "5" = or - { bbl.fifth 't := } - { s #1 #1 substring$ is.num - { s eng.ord 't := } - { edition 't := } - if$ - } - if$ - } - if$ - } - if$ - } - if$ - } - if$ - t -} - -FUNCTION {format.edition} -{ edition duplicate$ empty$ 'skip$ - { - convert.edition - output.state mid.sentence = - { "l" } - { "t" } - if$ change.case$ - "edition" bibinfo.check - " " * bbl.edition * - } - if$ -} -INTEGERS { multiresult } -FUNCTION {multi.page.check} -{ 't := - #0 'multiresult := - { multiresult not - t empty$ not - and - } - { t #1 #1 substring$ - duplicate$ "-" = - swap$ duplicate$ "," = - swap$ "+" = - or or - { #1 'multiresult := } - { t #2 global.max$ substring$ 't := } - if$ - } - while$ - multiresult -} -FUNCTION {format.pages} -{ pages duplicate$ empty$ 'skip$ - { duplicate$ multi.page.check - { - bbl.pages swap$ - n.dashify - } - { - bbl.page swap$ - } - if$ - tie.or.space.prefix - "pages" bibinfo.check - * * - } - if$ -} -FUNCTION {format.journal.pages} -{ pages duplicate$ empty$ 'pop$ - { swap$ duplicate$ empty$ - { pop$ pop$ format.pages } - { - ":" * - swap$ - n.dashify - "pages" bibinfo.check - * - } - if$ - } - if$ -} -FUNCTION {format.journal.eid} -{ eid "eid" bibinfo.check - duplicate$ empty$ 'pop$ - { swap$ duplicate$ empty$ 'skip$ - { - ":" * - } - if$ - swap$ * - } - if$ -} -FUNCTION {format.vol.num.pages} -{ volume field.or.null - duplicate$ empty$ 'skip$ - { - "volume" bibinfo.check - } - if$ - number "number" bibinfo.check duplicate$ empty$ 'skip$ - { - swap$ duplicate$ empty$ - { "there's a number but no volume in " cite$ * warning$ } - 'skip$ - if$ - swap$ - "(" swap$ * ")" * - } - if$ * - eid empty$ - { format.journal.pages } - { format.journal.eid } - if$ -} - -FUNCTION {format.chapter.pages} -{ chapter empty$ - 'format.pages - { type empty$ - { bbl.chapter } - { type "l" change.case$ - "type" bibinfo.check - } - if$ - chapter tie.or.space.prefix - "chapter" bibinfo.check - * * - pages empty$ - 'skip$ - { ", " * format.pages * } - if$ - } - if$ -} - -FUNCTION {format.booktitle} -{ - booktitle "booktitle" bibinfo.check -} -FUNCTION {format.in.ed.booktitle} -{ format.booktitle duplicate$ empty$ 'skip$ - { - editor "editor" format.names.ed duplicate$ empty$ 'pop$ - { - " " * - get.bbl.editor - "(" swap$ * ") " * - * swap$ - * } - if$ - word.in swap$ * - } - if$ -} -FUNCTION {format.thesis.type} -{ type duplicate$ empty$ - 'pop$ - { swap$ pop$ - "t" change.case$ "type" bibinfo.check - } - if$ -} -FUNCTION {format.tr.number} -{ number "number" bibinfo.check - type duplicate$ empty$ - { pop$ bbl.techrep } - 'skip$ - if$ - "type" bibinfo.check - swap$ duplicate$ empty$ - { pop$ "t" change.case$ } - { tie.or.space.prefix * * } - if$ -} -FUNCTION {format.article.crossref} -{ - word.in - " \cite{" * crossref * "}" * -} -FUNCTION {format.book.crossref} -{ volume duplicate$ empty$ - { "empty volume in " cite$ * "'s crossref of " * crossref * warning$ - pop$ word.in - } - { bbl.volume - capitalize - swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word * - } - if$ - " \cite{" * crossref * "}" * -} -FUNCTION {format.incoll.inproc.crossref} -{ - word.in - " \cite{" * crossref * "}" * -} -FUNCTION {format.org.or.pub} -{ 't := - "" - address empty$ t empty$ and - 'skip$ - { - t empty$ - { address "address" bibinfo.check * - } - { t * - address empty$ - 'skip$ - { ", " * address "address" bibinfo.check * } - if$ - } - if$ - } - if$ -} -FUNCTION {format.publisher.address} -{ publisher "publisher" bibinfo.warn format.org.or.pub -} - -FUNCTION {format.organization.address} -{ organization "organization" bibinfo.check format.org.or.pub -} - -FUNCTION {article} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title "title" output.check - new.sentence - crossref missing$ - { - journal - remove.dots - "journal" bibinfo.check - "journal" output.check - add.blank - format.vol.num.pages output - } - { format.article.crossref output.nonnull - format.pages output - } - if$ - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} -FUNCTION {book} -{ output.bibitem - author empty$ - { format.editors "author and editor" output.check - editor format.key output - add.blank - } - { format.authors output.nonnull - crossref missing$ - { "author and editor" editor either.or.check } - 'skip$ - if$ - } - if$ - format.date "year" output.check - date.block - format.btitle "title" output.check - crossref missing$ - { format.bvolume output - format.edition output - new.sentence - format.number.series output - format.publisher.address output - } - { - new.sentence - format.book.crossref output.nonnull - } - if$ - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} -FUNCTION {booklet} -{ output.bibitem - format.authors output - author format.key output - format.date "year" output.check - date.block - format.title "title" output.check - new.sentence - howpublished "howpublished" bibinfo.check output - address "address" bibinfo.check output - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {inbook} -{ output.bibitem - author empty$ - { format.editors "author and editor" output.check - editor format.key output - } - { format.authors output.nonnull - crossref missing$ - { "author and editor" editor either.or.check } - 'skip$ - if$ - } - if$ - format.date "year" output.check - date.block - format.btitle "title" output.check - crossref missing$ - { - format.bvolume output - format.edition output - format.publisher.address output - format.chapter.pages "chapter and pages" output.check - new.sentence - format.number.series output - } - { - format.chapter.pages "chapter and pages" output.check - new.sentence - format.book.crossref output.nonnull - } - if$ - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {incollection} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title "title" output.check - new.sentence - crossref missing$ - { format.in.ed.booktitle "booktitle" output.check - format.bvolume output - format.edition output - format.number.series output - format.publisher.address output - format.chapter.pages output - } - { format.incoll.inproc.crossref output.nonnull - format.chapter.pages output - } - if$ - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} -FUNCTION {inproceedings} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title "title" output.check - new.sentence - crossref missing$ - { format.in.ed.booktitle "booktitle" output.check - publisher empty$ - { format.organization.address output } - { organization "organization" bibinfo.check output - format.publisher.address output - } - if$ - format.bvolume output - format.number.series output - format.pages output - } - { format.incoll.inproc.crossref output.nonnull - format.pages output - } - if$ - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} -FUNCTION {conference} { inproceedings } -FUNCTION {manual} -{ output.bibitem - format.authors output - author format.key output - format.date "year" output.check - date.block - format.btitle "title" output.check - new.sentence - organization "organization" bibinfo.check output - address "address" bibinfo.check output - format.edition output - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {mastersthesis} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title - "title" output.check - new.sentence - bbl.mthesis format.thesis.type output.nonnull - school "school" bibinfo.warn output - address "address" bibinfo.check output - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {misc} -{ output.bibitem - format.authors output - author format.key output - format.date "year" output.check - date.block - format.title output - new.sentence - howpublished "howpublished" bibinfo.check output - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} -FUNCTION {phdthesis} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title - "title" output.check - new.sentence - bbl.phdthesis format.thesis.type output.nonnull - school "school" bibinfo.warn output - address "address" bibinfo.check output - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {proceedings} -{ output.bibitem - format.editors output - editor format.key output - format.date "year" output.check - date.block - format.btitle "title" output.check - format.bvolume output - format.number.series output - publisher empty$ - { format.organization.address output } - { organization "organization" bibinfo.check output - format.publisher.address output - } - if$ - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {techreport} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title - "title" output.check - new.sentence - format.tr.number output.nonnull - institution "institution" bibinfo.warn output - address "address" bibinfo.check output - format.doi output - format.url output - format.note output - format.eprint output - fin.entry -} - -FUNCTION {unpublished} -{ output.bibitem - format.authors "author" output.check - author format.key output - format.date "year" output.check - date.block - format.title "title" output.check - format.doi output - format.url output - format.note "note" output.check - format.eprint output - fin.entry -} - -FUNCTION {default.type} { misc } -READ -FUNCTION {sortify} -{ purify$ - "l" change.case$ -} -INTEGERS { len } -FUNCTION {chop.word} -{ 's := - 'len := - s #1 len substring$ = - { s len #1 + global.max$ substring$ } - 's - if$ -} -FUNCTION {format.lab.names} -{ 's := - "" 't := - s #1 "{vv~}{ll}" format.name$ - s num.names$ duplicate$ - #2 > - { pop$ - " " * bbl.etal * - } - { #2 < - 'skip$ - { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" = - { - " " * bbl.etal * - } - { bbl.and space.word * s #2 "{vv~}{ll}" format.name$ - * } - if$ - } - if$ - } - if$ -} - -FUNCTION {author.key.label} -{ author empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { author format.lab.names } - if$ -} - -FUNCTION {author.editor.key.label} -{ author empty$ - { editor empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { editor format.lab.names } - if$ - } - { author format.lab.names } - if$ -} - -FUNCTION {editor.key.label} -{ editor empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { editor format.lab.names } - if$ -} - -FUNCTION {calc.short.authors} -{ type$ "book" = - type$ "inbook" = - or - 'author.editor.key.label - { type$ "proceedings" = - 'editor.key.label - 'author.key.label - if$ - } - if$ - 'short.list := -} - -FUNCTION {calc.label} -{ calc.short.authors - short.list - "(" - * - year duplicate$ empty$ - { pop$ "????" } - 'skip$ - if$ - * - 'label := -} - -FUNCTION {sort.format.names} -{ 's := - #1 'nameptr := - "" - s num.names$ 'numnames := - numnames 'namesleft := - { namesleft #0 > } - { s nameptr - "{ll{ }}{ f{ }}{ jj{ }}" - format.name$ 't := - nameptr #1 > - { - " " * - namesleft #1 = t "others" = and - { "zzzzz" * } - { numnames #2 > nameptr #2 = and - { "zz" * year field.or.null * " " * } - 'skip$ - if$ - t sortify * - } - if$ - } - { t sortify * } - if$ - nameptr #1 + 'nameptr := - namesleft #1 - 'namesleft := - } - while$ -} - -FUNCTION {sort.format.title} -{ 't := - "A " #2 - "An " #3 - "The " #4 t chop.word - chop.word - chop.word - sortify - #1 global.max$ substring$ -} -FUNCTION {author.sort} -{ author empty$ - { key empty$ - { "to sort, need author or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { author sort.format.names } - if$ -} -FUNCTION {author.editor.sort} -{ author empty$ - { editor empty$ - { key empty$ - { "to sort, need author, editor, or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { editor sort.format.names } - if$ - } - { author sort.format.names } - if$ -} -FUNCTION {editor.sort} -{ editor empty$ - { key empty$ - { "to sort, need editor or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { editor sort.format.names } - if$ -} -FUNCTION {presort} -{ calc.label - label sortify - " " - * - type$ "book" = - type$ "inbook" = - or - 'author.editor.sort - { type$ "proceedings" = - 'editor.sort - 'author.sort - if$ - } - if$ - #1 entry.max$ substring$ - 'sort.label := - sort.label - * - " " - * - title field.or.null - sort.format.title - * - #1 entry.max$ substring$ - 'sort.key$ := -} - -ITERATE {presort} -SORT -STRINGS { last.label next.extra } -INTEGERS { last.extra.num number.label } -FUNCTION {initialize.extra.label.stuff} -{ #0 int.to.chr$ 'last.label := - "" 'next.extra := - #0 'last.extra.num := - #0 'number.label := -} -FUNCTION {forward.pass} -{ last.label label = - { last.extra.num #1 + 'last.extra.num := - last.extra.num int.to.chr$ 'extra.label := - } - { "a" chr.to.int$ 'last.extra.num := - "" 'extra.label := - label 'last.label := - } - if$ - number.label #1 + 'number.label := -} -FUNCTION {reverse.pass} -{ next.extra "b" = - { "a" 'extra.label := } - 'skip$ - if$ - extra.label 'next.extra := - extra.label - duplicate$ empty$ - 'skip$ - { "{\natexlab{" swap$ * "}}" * } - if$ - 'extra.label := - label extra.label * 'label := -} -EXECUTE {initialize.extra.label.stuff} -ITERATE {forward.pass} -REVERSE {reverse.pass} -FUNCTION {bib.sort.order} -{ sort.label - " " - * - year field.or.null sortify - * - " " - * - title field.or.null - sort.format.title - * - #1 entry.max$ substring$ - 'sort.key$ := -} -ITERATE {bib.sort.order} -SORT -FUNCTION {begin.bib} -{ preamble$ empty$ - 'skip$ - { preamble$ write$ newline$ } - if$ - "\begin{thebibliography}{" number.label int.to.str$ * "}" * - write$ newline$ - "\providecommand{\natexlab}[1]{#1}" - write$ newline$ - "\providecommand{\url}[1]{{#1}}" - write$ newline$ - "\providecommand{\urlprefix}{URL }" - write$ newline$ - "\expandafter\ifx\csname urlstyle\endcsname\relax" - write$ newline$ - " \providecommand{\doi}[1]{DOI~\discretionary{}{}{}#1}\else" - write$ newline$ - " \providecommand{\doi}{DOI~\discretionary{}{}{}\begingroup \urlstyle{rm}\Url}\fi" - write$ newline$ - "\providecommand{\eprint}[2][]{\url{#2}}" - write$ newline$ -} -EXECUTE {begin.bib} -EXECUTE {init.state.consts} -ITERATE {call.type$} -FUNCTION {end.bib} -{ newline$ - "\end{thebibliography}" write$ newline$ -} -EXECUTE {end.bib} -%% End of customized bst file -%% -%% End of file `spbasic.bst'. - diff --git a/book/tex/svind.ist b/book/tex/svind.ist deleted file mode 100755 index 11bf3666..00000000 --- a/book/tex/svind.ist +++ /dev/null @@ -1,7 +0,0 @@ -headings_flag 1 -heading_prefix "{\\bf " -heading_suffix "}\\nopagebreak%\n \\indexspace\\nopagebreak%" -delim_0 "\\idxquad " -delim_1 "\\idxquad " -delim_2 "\\idxquad " -delim_n ",\\," diff --git a/book/tex/svmono.cls b/book/tex/svmono.cls deleted file mode 100755 index fed1d9cb..00000000 --- a/book/tex/svmono.cls +++ /dev/null @@ -1,1943 +0,0 @@ -% SVMONO DOCUMENT CLASS -- version 5.5 (17-Dec-09) -% Springer Verlag global LaTeX2e support for monographs -%% -%% -%% \CharacterTable -%% {Upper-case \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z -%% Lower-case \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z -%% Digits \0\1\2\3\4\5\6\7\8\9 -%% Exclamation \! Double quote \" Hash (number) \# -%% Dollar \$ Percent \% Ampersand \& -%% Acute accent \' Left paren \( Right paren \) -%% Asterisk \* Plus \+ Comma \, -%% Minus \- Point \. Solidus \/ -%% Colon \: Semicolon \; Less than \< -%% Equals \= Greater than \> Question mark \? -%% Commercial at \@ Left bracket \[ Backslash \\ -%% Right bracket \] Circumflex \^ Underscore \_ -%% Grave accent \` Left brace \{ Vertical bar \| -%% Right brace \} Tilde \~} -%% -\NeedsTeXFormat{LaTeX2e}[1995/12/01] -\ProvidesClass{svmono}[2009/12/17 v5.5 -^^JSpringer Verlag global LaTeX document class for monographs] -% -% Options -% citations -\DeclareOption{natbib}{\ExecuteOptions{oribibl}% -\AtEndOfClass{% Loading package 'NATBIB' -\RequirePackage{natbib} -% Changing some parameters of NATBIB -\setlength{\bibhang}{\parindent} -%\setlength{\bibsep}{0mm} -\let\bibfont=\small -\def\@biblabel#1{#1.} -\newcommand{\etal}{\textit{et al}.} -%\bibpunct[,]{(}{)}{;}{a}{}{,}}} -}} -% Springer environment -\let\if@spthms\iftrue -\DeclareOption{nospthms}{\let\if@spthms\iffalse} -% -\let\envankh\@empty % no anchor for "theorems" -% -\let\if@envcntreset\iffalse % environment counter is not reset -\let\if@envcntresetsect=\iffalse % reset each section -\DeclareOption{envcountresetchap}{\let\if@envcntreset\iftrue} -\DeclareOption{envcountresetsect}{\let\if@envcntreset\iftrue -\let\if@envcntresetsect=\iftrue} -% -\let\if@envcntsame\iffalse % NOT all environments work like "Theorem", - % each using its own counter -\DeclareOption{envcountsame}{\let\if@envcntsame\iftrue} -% -\let\if@envcntshowhiercnt=\iffalse % do not show hierarchy counter at all -% -% enhance theorem counter -\DeclareOption{envcountchap}{\def\envankh{chapter}% show \thechapter along with theorem number -\let\if@envcntshowhiercnt=\iftrue -\ExecuteOptions{envcountreset}} -% -\DeclareOption{envcountsect}{\def\envankh{section}% show \thesection along with theorem number -\let\if@envcntshowhiercnt=\iftrue -\ExecuteOptions{envcountreset}} -% -% languages -\let\switcht@@therlang\relax -\let\svlanginfo\relax -\def\ds@deutsch{\def\switcht@@therlang{\switcht@deutsch}% -\gdef\svlanginfo{\typeout{Man spricht deutsch.}\global\let\svlanginfo\relax}} -\def\ds@francais{\def\switcht@@therlang{\switcht@francais}% -\gdef\svlanginfo{\typeout{On parle francais.}\global\let\svlanginfo\relax}} -% -\AtBeginDocument{\@ifundefined{url}{\def\url#1{#1}}{}% -\@ifpackageloaded{babel}{% -\@ifundefined{extrasamerican}{}{\addto\extrasamerican{\switcht@albion}}% -\@ifundefined{extrasaustralian}{}{\addto\extrasaustralian{\switcht@albion}}% -\@ifundefined{extrasbritish}{}{\addto\extrasbritish{\switcht@albion}}% -\@ifundefined{extrascanadian}{}{\addto\extrascanadian{\switcht@albion}}% -\@ifundefined{extrasenglish}{}{\addto\extrasenglish{\switcht@albion}}% -\@ifundefined{extrasnewzealand}{}{\addto\extrasnewzealand{\switcht@albion}}% -\@ifundefined{extrasUKenglish}{}{\addto\extrasUKenglish{\switcht@albion}}% -\@ifundefined{extrasUSenglish}{}{\addto\extrasUSenglish{\switcht@albion}}% -\@ifundefined{captionsfrench}{}{\addto\captionsfrench{\switcht@francais}}% -\@ifundefined{extrasgerman}{}{\addto\extrasgerman{\switcht@deutsch}}% -\@ifundefined{extrasngerman}{}{\addto\extrasngerman{\switcht@deutsch}}% -}{\switcht@@therlang}% -} -% numbering style of floats, equations -\newif\if@numart \@numartfalse -\DeclareOption{numart}{\@numarttrue} -% numbering of headings -\let\if@chapnum=\iftrue -\def\nixchapnum{\let\if@chapnum\iffalse} -\def\numstyle{0} -\DeclareOption{nosecnum}{\def\numstyle{1}}% -\DeclareOption{nochapnum}{\def\numstyle{2}}% -\DeclareOption{nonum}{\def\numstyle{3}}% -\def\set@numbering{\ifcase\numstyle \if@numart\else\num@book\fi %default -\or % 1-case - no \section-numbers -\setcounter{secnumdepth}{0}\if@numart\else\num@book\fi -\or % 2-case -\if@numart\else\num@spezart\fi -% chapter not numbered, but \sections are -\def\thesection{\@arabic\c@section}% -\nixchapnum -\or % 3-case -% neither chapter nor sections numbered + "numart" -\nixchapnum -\setcounter{secnumdepth}{0}% -\else\fi} -\AtEndOfClass{\set@numbering} -% style for vectors -\DeclareOption{vecphys}{\def\vec@style{phys}} -\DeclareOption{vecarrow}{\def\vec@style{arrow}} -% running heads -\let\if@runhead\iftrue -\DeclareOption{norunningheads}{\let\if@runhead\iffalse} -% referee option -\let\if@referee\iffalse -\def\makereferee{\def\baselinestretch{2}\selectfont -\newbox\refereebox -\setbox\refereebox=\vbox to\z@{\vskip0.5cm% - \hbox to\textwidth{\normalsize\tt\hrulefill\lower0.5ex - \hbox{\kern5\p@ referee's copy\kern5\p@}\hrulefill}\vss}% -\def\@oddfoot{\copy\refereebox}\let\@evenfoot=\@oddfoot} -\DeclareOption{referee}{\let\if@referee\iftrue -\AtBeginDocument{\makereferee\small\normalsize}} -% modification of thebibliography -\let\if@openbib\iffalse -\DeclareOption{openbib}{\let\if@openbib\iftrue} -% LaTeX standard, sectionwise references -\DeclareOption{oribibl}{\let\oribibl=Y} -\DeclareOption{sectrefs}{\let\secbibl=Y} -% -% footinfo option (provides an informatory line on every page) -\def\SpringerMacroPackageNameA{svmono.cls} -% \thetime, \thedate and \timstamp are macros to include -% time, date (or both) of the TeX run in the document -\def\maketimestamp{\count255=\time -\divide\count255 by 60\relax -\edef\thetime{\the\count255:}% -\multiply\count255 by-60\relax -\advance\count255 by\time -\edef\thetime{\thetime\ifnum\count255<10 0\fi\the\count255} -\edef\thedate{\number\day-\ifcase\month\or Jan\or Feb\or Mar\or - Apr\or May\or Jun\or Jul\or Aug\or Sep\or Oct\or - Nov\or Dec\fi-\number\year} -\def\timstamp{\hbox to\hsize{\tt\hfil\thedate\hfil\thetime\hfil}}} -\maketimestamp -% -% \footinfo generates a info footline on every page containing -% pagenumber, jobname, macroname, and timestamp -\DeclareOption{footinfo}{\AtBeginDocument{\maketimestamp - \def\ps@empty{\let\@mkboth\@gobbletwo - \let\@oddhead\@empty\let\@evenhead\@empty}% - \def\@oddfoot{\scriptsize\tt Page:\,\thepage\space\hfil - job:\,\jobname\space\hfil - macro:\,\SpringerMacroPackageNameA\space\hfil - date/time:\,\thedate/\thetime}% - \let\@evenfoot=\@oddfoot}} -% -% start new chapter on any page -\newif\if@openright \@openrighttrue -\DeclareOption{openany}{\@openrightfalse} -% -% no size changing allowed -\DeclareOption{11pt}{\OptionNotUsed} -\DeclareOption{12pt}{\OptionNotUsed} -% options for the article class -\def\@rticle@options{10pt,twoside} -% fleqn -\DeclareOption{fleqn}{\def\@rticle@options{10pt,twoside,fleqn}% -\AtEndOfClass{\let\leftlegendglue\relax}% -\AtBeginDocument{\mathindent\parindent}} -% hanging sectioning titles -\let\if@sechang\iftrue -\DeclareOption{nosechang}{\let\if@sechang\iffalse} -% hanging sectioning titles -\def\ClassInfoNoLine#1#2{% - \ClassInfo{#1}{#2\@gobble}% -} -% -\DeclareOption{graybox}{% -\AtEndOfClass{% Loading color package -\RequirePackage{color}% -% defining values of gray -\definecolor{shadecolor}{gray}{.85}% -\definecolor{tintedcolor}{gray}{.80}% -\RequirePackage{framed}% -% -\newenvironment{tinted}{% - \def\FrameCommand{\colorbox{tintedcolor}}% - \MakeFramed {\FrameRestore}}% - {\endMakeFramed}% -% -\renewenvironment{svgraybox}% - {\fboxsep=12pt\relax - \begin{shaded}% - \list{}{\leftmargin=12pt\rightmargin=2\leftmargin\leftmargin=\z@\topsep=\z@\relax}% - \expandafter\item\parindent=\svparindent - \hskip-\listparindent}% - {\endlist\end{shaded}}% -% -\renewenvironment{svtintedbox}% - {\fboxsep=12pt\relax - \begin{tinted}% - \list{}{\leftmargin=12pt\rightmargin=2\leftmargin\leftmargin=\z@\topsep=\z@\relax}% - \expandafter\item\parindent=\svparindent - \relax}% - {\endlist\end{tinted}}% -% -}} -% -\let\SVMonoOpt\@empty -\DeclareOption*{\InputIfFileExists{sv\CurrentOption.clo}{% -\global\let\SVMonoOpt\CurrentOption}{% -\ClassWarning{Springer-SVMono}{Specified option or subpackage -"\CurrentOption" \MessageBreak not found -passing it to article class \MessageBreak --}\PassOptionsToClass{\CurrentOption}{article}% -}} -\ProcessOptions\relax -\ifx\SVMonoOpt\@empty\relax -\ClassInfoNoLine{Springer-SVMono}{extra/valid Springer sub-package -\MessageBreak not found in option list - using "global" style}{} -\fi -\LoadClass[\@rticle@options]{article} -\raggedbottom - -% various sizes and settings for monographs - -\setlength{\textwidth}{117mm} -%\setlength{\textheight}{12pt}\multiply\textheight by 45\relax -\setlength{\textheight}{191mm} -\setlength{\topmargin}{0cm} -\setlength\oddsidemargin {63\p@} -\setlength\evensidemargin {63\p@} -\setlength\marginparwidth{90\p@} -\setlength\headsep {12\p@} - -\newdimen\svparindent -\setlength{\svparindent}{12\p@} -\parindent\svparindent -\setlength{\parskip}{\z@ \@plus \p@} -\setlength{\hfuzz}{2\p@} -\setlength{\arraycolsep}{1.5\p@} - -\frenchspacing - -\tolerance=500 - -\predisplaypenalty=0 -\clubpenalty=10000 -\widowpenalty=10000 - -\setlength\footnotesep{7.7\p@} - -\newdimen\betweenumberspace % dimension for space between -\betweenumberspace=5\p@ % number and text of titles -\newdimen\headlineindent % dimension for space of -\headlineindent=2.5cc % number and gap of running heads - -% fonts, sizes, and the like -\renewcommand\normalsize{% - \@setfontsize\normalsize\@xpt\@xiipt - \abovedisplayskip 10\p@ % \@plus2\p@ \@minus5\p@ - \abovedisplayshortskip \z@ % \@plus3\p@ - \belowdisplayshortskip 6\p@ %\@plus3\p@ \@minus3\p@ - \belowdisplayskip \abovedisplayskip - \let\@listi\@listI} -\normalsize -\renewcommand\small{% - \@setfontsize\small{8.5}{10}% - \abovedisplayskip 8.5\p@ % \@plus3\p@ \@minus4\p@ - \abovedisplayshortskip \z@ %\@plus2\p@ - \belowdisplayshortskip 4\p@ %\@plus2\p@ \@minus2\p@ - \def\@listi{\leftmargin\leftmargini - \parsep \z@ \@plus\p@ \@minus\p@ - \topsep 6\p@ \@plus2\p@ \@minus4\p@ - \itemsep\z@}% - \belowdisplayskip \abovedisplayskip -} -% -\let\footnotesize=\small -% -\renewcommand\Large{\@setfontsize\large{14}{16}} -\newcommand\LArge{\@setfontsize\Large{16}{18}} -\renewcommand\LARGE{\@setfontsize\LARGE{18}{20}} -% -\newenvironment{petit}{\par\addvspace{6\p@}\small}{\par\addvspace{6\p@}} -% - -% modification of automatic positioning of floating objects -\setlength\@fptop{\z@ } -\setlength\@fpsep{12\p@ } -\setlength\@fpbot{\z@ \@plus 1fil } -\def\textfraction{.01} -\def\floatpagefraction{.8} -\setlength{\intextsep}{20\p@ \@plus 2\p@ \@minus 2\p@} -\setlength\textfloatsep{24\p@ \@plus 2\p@ \@minus 4\p@} -\setcounter{topnumber}{4} -\def\topfraction{.9} -\setcounter{bottomnumber}{2} -\def\bottomfraction{.7} -\setcounter{totalnumber}{6} -% -% size and style of headings -\newcommand{\partnumsize}{\LArge} -\newcommand{\partnumstyle}{\bfseries\boldmath} -\newcommand{\partsize}{\LARGE} -\newcommand{\partstyle}{\bfseries\boldmath} -\newcommand{\chapnumsize}{\Large} -\newcommand{\chapnumstyle}{\bfseries\boldmath} -\newcommand{\chapsize}{\LArge} -\newcommand{\chapstyle}{\bfseries\boldmath} -\newcommand{\chapauthsize}{\normalsize} -\newcommand{\chapauthstyle}{\bfseries\boldmath} -\newcommand{\mottosize}{\small} -\newcommand{\mottostyle}{\itshape\unboldmath\raggedright} -\newcommand{\secsize}{\large} -\newcommand{\secstyle}{\bfseries\boldmath} -\newcommand{\subsecsize}{\large} -\newcommand{\subsecstyle}{\bfseries\itshape\boldmath} -\newcommand{\subsubsecstyle}{\bfseries\boldmath} -% -\def\cleardoublepage{\clearpage\if@twoside \ifodd\c@page\else - \hbox{}\newpage\if@twocolumn\hbox{}\newpage\fi\fi\fi} - -\newcommand{\clearemptydoublepage}{% - \clearpage{\pagestyle{empty}\cleardoublepage}} -\newcommand{\startnewpage}{\if@openright\clearemptydoublepage\else\clearpage\fi} - -% redefinition of \part -\renewcommand\part{\clearemptydoublepage - \thispagestyle{empty} - \if@twocolumn - \onecolumn - \@tempswatrue - \else - \@tempswafalse - \fi - \@ifundefined{thispagecropped}{}{\thispagecropped} - \secdef\@part\@spart} - -\def\@part[#1]#2{\ifnum \c@secnumdepth >-2\relax - \refstepcounter{part} - \addcontentsline{toc}{part}{\partname\ - \thepart\thechapterend\hspace{\betweenumberspace}% - #1}\else - \addcontentsline{toc}{part}{#1}\fi - \markboth{}{} - {\raggedleft - \hyphenpenalty \@M - \interlinepenalty\@M - \ifnum \c@secnumdepth >-2\relax - \normalfont\partnumsize\partnumstyle %\vrule height 34pt width 0pt depth 0pt% - \partname\ \thepart %\llap{\smash{\lower 5pt\hbox to\textwidth{\hrulefill}}} - \par - \vskip 2\p@ \fi - \partsize\partstyle #2\par}\@endpart} -% -% \@endpart finishes the part page -% -\def\@endpart{\vfil\newpage - \if@twoside - \hbox{} - \thispagestyle{empty} - \newpage - \fi - \if@tempswa - \twocolumn - \fi} -% -\def\@spart#1{{\raggedleft - \normalfont\partsize\partstyle - #1\par}\@endpart} -% -\newenvironment{partbacktext}{\def\@endpart{\vfil\newpage}} -{\thispagestyle{empty} \newpage} -% -% (re)define sectioning -\setcounter{secnumdepth}{3} - -\def\seccounterend{} -\def\seccountergap{\hskip\betweenumberspace} -\def\@seccntformat#1{\csname the#1\endcsname\seccounterend\seccountergap\ignorespaces} -% -\let\firstmark=\botmark -% -\@ifundefined{thechapterend}{\def\thechapterend{}}{} -% -\if@sechang - \def\sec@hangfrom#1{\setbox\@tempboxa\hbox{#1}% - \hangindent\wd\@tempboxa\noindent\box\@tempboxa} -\else - \def\sec@hangfrom#1{\setbox\@tempboxa\hbox{#1}% - \hangindent\z@\noindent\box\@tempboxa} -\fi - -%\def\chap@hangfrom#1{\noindent\vrule height 34pt width 0pt depth 0pt -%\rlap{\smash{\lower 5pt\hbox to\textwidth{\hrulefill}}}\hbox{#1} -%\vskip10pt} -%\def\schap@hangfrom{\chap@hangfrom{}} - -\newcounter{chapter} -% -\@addtoreset{section}{chapter} -\@addtoreset{footnote}{chapter} - -\newif\if@mainmatter \@mainmattertrue -\newcommand\frontmatter{\startnewpage - \@mainmatterfalse\pagenumbering{roman} - \setcounter{page}{5}} -% -\newcommand\mainmatter{\clearemptydoublepage - \@mainmattertrue\pagenumbering{arabic}} -% -\newcommand\backmatter{\clearemptydoublepage\@mainmatterfalse} - -\def\@chapapp{\chaptername} - -\newdimen\mottowidth -\newcommand\motto[2][77mm]{% -\setlength{\mottowidth}{#1}% -\gdef\m@ttotext{#2}} -% -\newcommand{\processmotto}{\@ifundefined{m@ttotext}{}{% - \setbox0=\hbox{\vbox{\hyphenpenalty=50 - \begin{flushright} - \begin{minipage}{\mottowidth} - \vrule\@width\z@\@height21\p@\@depth\z@ - \normalfont\mottosize\mottostyle\m@ttotext - \end{minipage} - \end{flushright}}}% - \@tempdima=\pagetotal - \advance\@tempdima by\ht0 - \ifdim\@tempdima<157\p@ - \multiply\@tempdima by-1 - \advance\@tempdima by157\p@ - \vskip\@tempdima - \fi - \box0\par - \global\let\m@ttotext=\undefined}} - -\newcommand{\chapsubtitle}[1]{% -\gdef\ch@psubtitle{#1}} -% -\newcommand{\processchapsubtit}{\@ifundefined{ch@psubtitle}{}{% - {\normalfont\chapnumsize\chapnumstyle - \vskip 14\p@ - \ch@psubtitle - \par} - \global\let\ch@psubtitle=\undefined}} - -\newcommand{\chapauthor}[1]{% -\gdef\ch@pauthor{#1}} -% -\newcommand{\processchapauthor}{\@ifundefined{ch@pauthor}{}{% - {\normalfont\chapauthsize\chapauthstyle - \vskip 20\p@ - \ch@pauthor - \par} - \global\let\ch@pauthor=\undefined}} - -\newcommand\chapter{\startnewpage - \@ifundefined{thispagecropped}{}{\thispagecropped} - \thispagestyle{bchap}% - \if@chapnum\else - \begingroup - \let\@elt\@stpelt - \csname cl@chapter\endcsname - \endgroup - \fi - \global\@topnum\z@ - \@afterindentfalse - \secdef\@chapter\@schapter} - -\def\@chapter[#1]#2{\if@chapnum % war mal \ifnum \c@secnumdepth >\m@ne - \refstepcounter{chapter}% - \if@mainmatter - \typeout{\@chapapp\space\thechapter.}% - \addcontentsline{toc}{chapter}{\protect - \numberline{\thechapter\thechapterend}#1}% - \else - \addcontentsline{toc}{chapter}{#1}% - \fi - \else - \addcontentsline{toc}{chapter}{#1}% - \fi - \chaptermark{#1}% - \addtocontents{lof}{\protect\addvspace{10\p@}}% - \addtocontents{lot}{\protect\addvspace{10\p@}}% - \if@twocolumn - \@topnewpage[\@makechapterhead{#2}]% - \else - \@makechapterhead{#2}% - \@afterheading - \fi} - -\def\@schapter#1{\if@twocolumn - \@topnewpage[\@makeschapterhead{#1}]% - \else - \@makeschapterhead{#1}% - \@afterheading - \fi} - -%%changes position and layout of numbered chapter headings -\def\@makechapterhead#1{{\parindent\z@\raggedright\normalfont - \hyphenpenalty \@M - \interlinepenalty\@M - \if@chapnum - \chapnumsize\chapnumstyle - \@chapapp\ \thechapter\thechapterend\par - \vskip 2\p@ - \fi - \chapsize\chapstyle - \ignorespaces#1\par\nobreak - \processchapsubtit - \processchapauthor - \processmotto - \ifdim\pagetotal>167\p@ - \vskip 11\p@ - \else - \@tempdima=167\p@\advance\@tempdima by-\pagetotal - \vskip\@tempdima - \fi}} - -%%changes position and layout of unnumbered chapter headings -\def\@makeschapterhead#1{{\parindent \z@ \raggedright\normalfont - \hyphenpenalty \@M - \interlinepenalty\@M - \chapsize\chapstyle - \ignorespaces#1\par\nobreak - \processmotto - \ifdim\pagetotal>167\p@ - \vskip 11\p@ - \else - \@tempdima=168\p@\advance\@tempdima by-\pagetotal - \vskip\@tempdima - \fi}} -% -% dedication environment -\newenvironment{dedication} -{\clearemptydoublepage -\thispagestyle{empty} -\vspace*{13\baselineskip} -\large\itshape -\let\\\@centercr\@rightskip\@flushglue \rightskip\@rightskip -\leftskip4cm\parindent\z@\relax -\everypar{\parindent=\svparindent\let\everypar\empty}}{\clearpage} -% -% predefined unnumbered headings -\newcommand{\preface}[1][\prefacename]{\chapter*{#1}\markboth{#1}{#1}} -\newcommand{\foreword}[1][\forewordname]{\chapter*{#1}\markboth{#1}{#1}} -\newcommand{\extrachap}[1]{\chapter*{#1}\markboth{#1}{#1}} -% same with TOC entry -\newcommand{\Extrachap}[1]{\chapter*{#1}\markboth{#1}{#1}% -\addcontentsline{toc}{chapter}{#1}} - -% measures and setting of sections -\renewcommand\section{\@startsection{section}{1}{\z@}% - {-30\p@}% \p@lus -4\p@ \@minus -4\p@}% - {16\p@}% \p@lus 4\p@ \@minus 4\p@}% - {\normalfont\secsize\secstyle - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} -\renewcommand\subsection{\@startsection{subsection}{2}{\z@}% - {-30\p@}% \p@lus -4\p@ \@minus -4\p@}% - {16\p@}% \p@lus 4\p@ \@minus 4\p@}% - {\normalfont\subsecsize\subsecstyle - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} -\renewcommand\subsubsection{\@startsection{subsubsection}{3}{\z@}% - {-24\p@}% \p@lus -4\p@ \@minus -4\p@}% - {12\p@}% \p@lus 4\p@ \@minus 4\p@}% - {\normalfont\normalsize\subsubsecstyle - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} -\renewcommand\paragraph{\@startsection{paragraph}{4}{\z@}% - {-24\p@}% \p@lus -4\p@ \@minus -4\p@}% - {12\p@}% \p@lus 4\p@ \@minus 4\p@}% - {\normalfont\normalsize\upshape - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} -\renewcommand\subparagraph{\@startsection{paragraph}{4}{\z@}% - {-18\p@}% \p@lus -4\p@ \@minus -4\p@}% - {6\p@}% \p@lus 4\p@ \@minus 4\p@}% - {\normalfont\normalsize\itshape - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} -\newcommand\runinhead{\@startsection{paragraph}{4}{\z@}% - {-6\p@}% \p@lus -4\p@ \@minus -4\p@}% - {-6\p@}% - {\normalfont\normalsize\bfseries\boldmath - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} -\newcommand\subruninhead{\@startsection{paragraph}{4}{\z@}% - {-6\p@}% \p@lus -4\p@ \@minus -4\p@}% - {-6\p@}% - {\normalfont\normalsize\itshape - \rightskip=\z@ \@plus 8em\pretolerance=10000 }} - -% Appendix -\renewcommand\appendix{\par - \stepcounter{chapter} - \setcounter{chapter}{0} - \stepcounter{section} - \setcounter{section}{0} - \setcounter{equation}{0} - \setcounter{figure}{0} - \setcounter{table}{0} - \setcounter{footnote}{0} - \def\@chapapp{\appendixname}% - \renewcommand\thechapter{\@Alph\c@chapter}} - -\def\runinsep{} -\def\aftertext{\unskip\runinsep} -% -\def\thesection{\thechapter.\arabic{section}} -\def\thesubsection{\thesection.\arabic{subsection}} -\def\thesubsubsection{\thesubsection.\arabic{subsubsection}} -\def\theparagraph{\thesubsubsection.\arabic{paragraph}} -\def\thesubparagraph{\theparagraph.\arabic{subparagraph}} -\def\chaptermark#1{} -% -\def\@ssect#1#2#3#4#5{% - \@tempskipa #3\relax - \ifdim \@tempskipa>\z@ - \begingroup - #4{% - \@hangfrom{\hskip #1}% - \raggedright - \hyphenpenalty \@M - \interlinepenalty \@M #5\@@par}% - \endgroup - \else - \def\@svsechd{#4{\hskip #1\relax #5}}% - \fi - \@xsect{#3}} -% -\def\@sect#1#2#3#4#5#6[#7]#8{% - \ifnum #2>\c@secnumdepth - \let\@svsec\@empty - \else - \refstepcounter{#1}% - \protected@edef\@svsec{\@seccntformat{#1}\relax}% - \fi - \@tempskipa #5\relax - \ifdim \@tempskipa>\z@ - \begingroup #6\relax - \sec@hangfrom{\hskip #3\relax\@svsec}% - {\raggedright - \hyphenpenalty \@M - \interlinepenalty \@M #8\@@par}% - \endgroup - \csname #1mark\endcsname{#7\seccounterend}% - \addcontentsline{toc}{#1}{\ifnum #2>\c@secnumdepth - \else - \protect\numberline{\csname the#1\endcsname\seccounterend}% - \fi - #7}% - \else - \def\@svsechd{% - #6\hskip #3\relax - \@svsec #8\aftertext\ignorespaces - \csname #1mark\endcsname{#7}% - \addcontentsline{toc}{#1}{% - \ifnum #2>\c@secnumdepth \else - \protect\numberline{\csname the#1\endcsname\seccounterend}% - \fi - #7}}% - \fi - \@xsect{#5}} - -% figures and tables are processed in small print -\def \@floatboxreset {% - \reset@font - \small - \@setnobreak - \@setminipage -} -\def\fps@figure{htbp} -\def\fps@table{htbp} -% -% Frame for paste-in figures or tables -\def\mpicplace#1#2{% #1 =width #2 =height -\vbox{\hbox to #1{\vrule\@width \fboxrule \@height #2\hfill}}} -% -\newenvironment{svgraybox}% - {\ClassWarning{Springer-SVMono}{Environment "svgraybox" not available,\MessageBreak - switching over to "quotation" environment;\MessageBreak - specify documentclass option "graybox",\MessageBreak - see SVMono documentation -}% - \par\addvspace{6pt} - \list{}{\listparindent12\p@% - \leftmargin=12\p@% - \itemindent \listparindent - \rightmargin \leftmargin - \parsep \z@ \@plus\p@}% - \expandafter\item\parindent=\svparindent - \relax\hskip-\listparindent}% - {\endlist}% -% -\newenvironment{svtintedbox}% - {\ClassWarning{Springer-SVMono}{Environment "svtintedbox" not available,\MessageBreak - switching over to "quotation" environment;\MessageBreak - specify documentclass option "graybox",\MessageBreak - see SVMono documentation -}% - \par\addvspace{6pt} - \list{}{\listparindent12\p@% - \leftmargin=12\p@% - \itemindent \listparindent - \rightmargin \leftmargin - \parsep \z@ \@plus\p@}% - \expandafter\item\parindent=\svparindent - \relax\hskip-\listparindent}% - {\endlist}% -% -\renewenvironment{quotation} - {\par\addvspace{6pt} - \list{}{\listparindent12\p@% - \leftmargin=12\p@% - \itemindent \listparindent - \rightmargin \leftmargin - \parsep \z@ \@plus\p@% - \small}% - \item\relax\hskip-\listparindent} - {\endlist} -% -\renewenvironment{quote} - {\par\addvspace{6pt} - \list{}{\leftmargin=12\p@% - \rightmargin\leftmargin - \parsep=3\p@ - \small}% - \item\relax} - {\endlist} - -% labels of enumerate -\renewcommand\labelenumii{\theenumii.} -\renewcommand\theenumii{\@alph\c@enumii} - -% labels of itemize -\renewcommand\labelitemi{\textbullet} -\renewcommand\labelitemii{\textendash} -\let\labelitemiii=\labelitemiv - -% labels of description -\renewcommand*\descriptionlabel[1]{\hspace\labelsep #1\hfil} - -% fixed indentation for standard itemize-environment -\newdimen\svitemindent \setlength{\svitemindent}{\parindent} - - -% make indentations changeable - -\def\setitemindent#1{\settowidth{\labelwidth}{#1}% - \let\setit@m=Y% - \leftmargini\labelwidth - \advance\leftmargini\labelsep - \def\@listi{\leftmargin\leftmargini - \labelwidth\leftmargini\advance\labelwidth by -\labelsep - \parsep=\parskip - \topsep=\medskipamount - \itemsep=\parskip \advance\itemsep by -\parsep}} -\def\setitemitemindent#1{\settowidth{\labelwidth}{#1}% - \let\setit@m=Y% - \leftmarginii\labelwidth - \advance\leftmarginii\labelsep -\def\@listii{\leftmargin\leftmarginii - \labelwidth\leftmarginii\advance\labelwidth by -\labelsep - \parsep=\parskip - \topsep=6\p@ - \itemsep=\parskip \advance\itemsep by -\parsep}} -% -% adjusted environment "description" -% if an optional parameter (at the first two levels of lists) -% is present, its width is considered to be the widest mark -% throughout the current list. -\def\description{\@ifnextchar[{\@describe}{\list{}{\labelwidth\z@ -\labelsep=12pt\relax %!!!!!!!!! -\leftmargini=12pt\relax %!!!!!!!!! -\leftmargin=12pt\relax %!!!!!!!!! - \itemindent-\leftmargin \let\makelabel\descriptionlabel}}} -% -\def\describelabel#1{#1\hfil} -\def\@describe[#1]{\labelsep=12pt\relax -\relax\ifnum\@listdepth=0 -\setitemindent{#1}\else\ifnum\@listdepth=1 -\setitemitemindent{#1}\fi\fi -\list{--}{\let\makelabel\describelabel}} -% -\def\itemize{% - \ifnum \@itemdepth >\thr@@\@toodeep\else - \advance\@itemdepth\@ne - \ifx\setit@m\undefined - \ifnum \@itemdepth=1 \leftmargini=\svitemindent - \labelwidth\leftmargini\advance\labelwidth-\labelsep - \leftmarginii=\leftmargini \leftmarginiii=\leftmargini - \fi - \fi - \edef\@itemitem{labelitem\romannumeral\the\@itemdepth}% - \expandafter\list - \csname\@itemitem\endcsname - {\def\makelabel##1{\rlap{##1}\hss}}% - \fi} -% -\def\enumerate{% - \ifnum \@enumdepth >\thr@@\@toodeep\else - \advance\@enumdepth\@ne - \ifx\setit@m\undefined - \ifnum \@enumdepth=1 \leftmargini=\svitemindent - \labelwidth\leftmargini\advance\labelwidth-\labelsep - \leftmarginii=\leftmargini \leftmarginiii=\leftmargini - \fi - \fi - \edef\@enumctr{enum\romannumeral\the\@enumdepth}% - \expandafter - \list - \csname label\@enumctr\endcsname - {\usecounter\@enumctr\def\makelabel##1{\hss\llap{##1}}}% - \fi} -% -\newdimen\verbatimindent \verbatimindent\parindent -\def\verbatim{\advance\@totalleftmargin by\verbatimindent -\@verbatim \frenchspacing\@vobeyspaces \@xverbatim} - -% -% special signs and characters -\newcommand{\D}{\mathrm{d}} -\newcommand{\E}{\mathrm{e}} -\let\eul=\E -\newcommand{\I}{{\rm i}} -\let\imag=\I -% -% the definition of uppercase Greek characters -% Springer likes them as italics to depict variables -\DeclareMathSymbol{\Gamma}{\mathalpha}{letters}{"00} -\DeclareMathSymbol{\Delta}{\mathalpha}{letters}{"01} -\DeclareMathSymbol{\Theta}{\mathalpha}{letters}{"02} -\DeclareMathSymbol{\Lambda}{\mathalpha}{letters}{"03} -\DeclareMathSymbol{\Xi}{\mathalpha}{letters}{"04} -\DeclareMathSymbol{\Pi}{\mathalpha}{letters}{"05} -\DeclareMathSymbol{\Sigma}{\mathalpha}{letters}{"06} -\DeclareMathSymbol{\Upsilon}{\mathalpha}{letters}{"07} -\DeclareMathSymbol{\Phi}{\mathalpha}{letters}{"08} -\DeclareMathSymbol{\Psi}{\mathalpha}{letters}{"09} -\DeclareMathSymbol{\Omega}{\mathalpha}{letters}{"0A} -% the upright forms are defined here as \var -\DeclareMathSymbol{\varGamma}{\mathalpha}{operators}{"00} -\DeclareMathSymbol{\varDelta}{\mathalpha}{operators}{"01} -\DeclareMathSymbol{\varTheta}{\mathalpha}{operators}{"02} -\DeclareMathSymbol{\varLambda}{\mathalpha}{operators}{"03} -\DeclareMathSymbol{\varXi}{\mathalpha}{operators}{"04} -\DeclareMathSymbol{\varPi}{\mathalpha}{operators}{"05} -\DeclareMathSymbol{\varSigma}{\mathalpha}{operators}{"06} -\DeclareMathSymbol{\varUpsilon}{\mathalpha}{operators}{"07} -\DeclareMathSymbol{\varPhi}{\mathalpha}{operators}{"08} -\DeclareMathSymbol{\varPsi}{\mathalpha}{operators}{"09} -\DeclareMathSymbol{\varOmega}{\mathalpha}{operators}{"0A} -% Upright Lower Case Greek letters without using a new MathAlphabet -\newcommand{\greeksym}[1]{\usefont{U}{psy}{m}{n}#1} -\newcommand{\greeksymbold}[1]{{\usefont{U}{psy}{b}{n}#1}} -\newcommand{\allmodesymb}[2]{\relax\ifmmode{\mathchoice -{\mbox{\fontsize{\tf@size}{\tf@size}#1{#2}}} -{\mbox{\fontsize{\tf@size}{\tf@size}#1{#2}}} -{\mbox{\fontsize{\sf@size}{\sf@size}#1{#2}}} -{\mbox{\fontsize{\ssf@size}{\ssf@size}#1{#2}}}} -\else -\mbox{#1{#2}}\fi} -% Definition of lower case Greek letters -\newcommand{\ualpha}{\allmodesymb{\greeksym}{a}} -\newcommand{\ubeta}{\allmodesymb{\greeksym}{b}} -\newcommand{\uchi}{\allmodesymb{\greeksym}{c}} -\newcommand{\udelta}{\allmodesymb{\greeksym}{d}} -\newcommand{\ugamma}{\allmodesymb{\greeksym}{g}} -\newcommand{\umu}{\allmodesymb{\greeksym}{m}} -\newcommand{\unu}{\allmodesymb{\greeksym}{n}} -\newcommand{\upi}{\allmodesymb{\greeksym}{p}} -\newcommand{\utau}{\allmodesymb{\greeksym}{t}} -% redefines the \vec accent to a bold character - if desired -\def\fig@type{arrow}% temporarily abused -\ifx\vec@style\fig@type\else -\@ifundefined{vec@style}{% - \def\vec#1{\ensuremath{\mathchoice - {\mbox{\boldmath$\displaystyle\mathbf{#1}$}} - {\mbox{\boldmath$\textstyle\mathbf{#1}$}} - {\mbox{\boldmath$\scriptstyle\mathbf{#1}$}} - {\mbox{\boldmath$\scriptscriptstyle\mathbf{#1}$}}}}% -} -{\def\vec#1{\ensuremath{\mathchoice - {\mbox{\boldmath$\displaystyle#1$}} - {\mbox{\boldmath$\textstyle#1$}} - {\mbox{\boldmath$\scriptstyle#1$}} - {\mbox{\boldmath$\scriptscriptstyle#1$}}}}% -} -\fi -% tensor -\def\tens#1{\relax\ifmmode\mathsf{#1}\else\textsf{#1}\fi} - -% end of proof symbol -\newcommand\qedsymbol{\hbox{\rlap{$\sqcap$}$\sqcup$}} -\newcommand\qed{\relax\ifmmode\else\unskip\quad\fi\qedsymbol} -\newcommand\smartqed{\renewcommand\qed{\relax\ifmmode\qedsymbol\else - {\unskip\nobreak\hfil\penalty50\hskip1em\null\nobreak\hfil\qedsymbol - \parfillskip=\z@\finalhyphendemerits=0\endgraf}\fi}} -% -\def\num@book{% -\renewcommand\thesection{\thechapter.\@arabic\c@section}% -\renewcommand\thesubsection{\thesection.\@arabic\c@subsection}% -\renewcommand\theequation{\thechapter.\@arabic\c@equation}% -\renewcommand\thefigure{\thechapter.\@arabic\c@figure}% -\renewcommand\thetable{\thechapter.\@arabic\c@table}% -\@addtoreset{section}{chapter}% -\@addtoreset{figure}{chapter}% -\@addtoreset{table}{chapter}% -\@addtoreset{equation}{chapter}} -% -\def\num@spezart{% -\renewcommand\thesection{\@arabic\c@section}% -\renewcommand\thesubsection{\thesection.\@arabic\c@subsection}% -\renewcommand\theequation{\@arabic\c@equation}% -\def\thesubequation{\@arabic\c@equation\alph{subequation}}% -\renewcommand\thefigure{\@arabic\c@figure}% -\renewcommand\thetable{\@arabic\c@table}% -\@addtoreset{section}{chapter}% -\@addtoreset{figure}{chapter}% -\@addtoreset{table}{chapter}% -\@addtoreset{equation}{chapter}} -% -% Ragged bottom for the actual page -\def\thisbottomragged{\def\@textbottom{\vskip\z@ \@plus.0001fil -\global\let\@textbottom\relax}} - -% This is texte.tex -% it defines various texts and their translations -% called up with documentstyle options -\def\switcht@albion{% -\def\abstractname{Abstract}% -\def\ackname{Acknowledgements}% -\def\andname{and}% -\def\bibname{References}% -\def\lastandname{, and}% -\def\appendixname{Appendix}% -\def\chaptername{Chapter}% -\def\claimname{Claim}% -\def\conjecturename{Conjecture}% -\def\contentsname{Contents}% -\def\corollaryname{Corollary}% -\def\definitionname{Definition}% -\def\emailname{e-mail}% -\def\examplename{Example}% -\def\exercisename{Exercise}% -\def\figurename{Fig.}% -\def\forewordname{Foreword}% -\def\keywordname{{\bf Key words:}}% -\def\indexname{Index}% -\def\lemmaname{Lemma}% -\def\contriblistname{List of Contributors}% -\def\listfigurename{List of Figures}% -\def\listtablename{List of Tables}% -\def\mailname{{\it Correspondence to\/}:}% -\def\noteaddname{Note added in proof}% -\def\notename{Note}% -\def\partname{Part}% -\def\prefacename{Preface}% -\def\problemname{Problem}% -\def\proofname{Proof}% -\def\propertyname{Property}% -\def\propositionname{Proposition}% -\def\questionname{Question}% -\def\refname{References}% -\def\remarkname{Remark}% -\def\seename{see}% -\def\solutionname{Solution}% -\def\subclassname{{\it Subject Classifications\/}:}% -\def\tablename{Table}% -\def\theoremname{Theorem}} -\switcht@albion -% Names of theorem like environments are already defined -% but must be translated if another language is chosen -% -% French section -\def\switcht@francais{\svlanginfo - \def\abstractname{R\'esum\'e}% - \def\ackname{Remerciements}% - \def\andname{et}% - \def\lastandname{ et}% - \def\appendixname{Appendice}% - \def\bibname{Bibliographie}% - \def\chaptername{Chapitre}% - \def\claimname{Pr\'etention}% - \def\conjecturename{Hypoth\`ese}% - \def\contentsname{Table des mati\`eres}% - \def\corollaryname{Corollaire}% - \def\definitionname{D\'efinition}% - \def\emailname{e-mail}% - \def\examplename{Exemple}% - \def\exercisename{Exercice}% - \def\figurename{Fig.}% - \def\forewordname{Avant-propos}% - \def\keywordname{{\bf Mots-cl\'e:}}% - \def\indexname{Index}% - \def\lemmaname{Lemme}% - \def\contriblistname{Liste des contributeurs}% - \def\listfigurename{Liste des figures}% - \def\listtablename{Liste des tables}% - \def\mailname{{\it Correspondence to\/}:}% - \def\noteaddname{Note ajout\'ee \`a l'\'epreuve}% - \def\notename{Remarque}% - \def\partname{Partie}% - \def\prefacename{Pr\'eface}% - \def\problemname{Probl\`eme}% - \def\proofname{Preuve}% - \def\propertyname{Caract\'eristique}% -%\def\propositionname{Proposition}% - \def\questionname{Question}% - \def\refname{Litt\'erature}% - \def\remarkname{Remarque}% - \def\seename{voir}% - \def\solutionname{Solution}% - \def\subclassname{{\it Subject Classifications\/}:}% - \def\tablename{Tableau}% - \def\theoremname{Th\'eor\`eme}% -} -% -% German section -\def\switcht@deutsch{\svlanginfo - \def\abstractname{Zusammenfassung}% - \def\ackname{Danksagung}% - \def\andname{und}% - \def\lastandname{ und}% - \def\appendixname{Anhang}% - \def\bibname{Literaturverzeichnis}% - \def\chaptername{Kapitel}% - \def\claimname{Behauptung}% - \def\conjecturename{Hypothese}% - \def\contentsname{Inhaltsverzeichnis}% - \def\corollaryname{Korollar}% -%\def\definitionname{Definition}% - \def\emailname{E-mail}% - \def\examplename{Beispiel}% - \def\exercisename{\"Ubung}% - \def\figurename{Abb.}% - \def\forewordname{Geleitwort}% - \def\keywordname{{\bf Schl\"usselw\"orter:}}% - \def\indexname{Sachverzeichnis}% -%\def\lemmaname{Lemma}% - \def\contriblistname{Mitarbeiter}% - \def\listfigurename{Abbildungsverzeichnis}% - \def\listtablename{Tabellenverzeichnis}% - \def\mailname{{\it Correspondence to\/}:}% - \def\noteaddname{Nachtrag}% - \def\notename{Anmerkung}% - \def\partname{Teil}% - \def\prefacename{Vorwort}% -%\def\problemname{Problem}% - \def\proofname{Beweis}% - \def\propertyname{Eigenschaft}% -%\def\propositionname{Proposition}% - \def\questionname{Frage}% - \def\refname{Literaturverzeichnis}% - \def\remarkname{Anmerkung}% - \def\seename{siehe}% - \def\solutionname{L\"osung}% - \def\subclassname{{\it Subject Classifications\/}:}% - \def\tablename{Tabelle}% -%\def\theoremname{Theorem}% -} - -\def\getsto{\mathrel{\mathchoice {\vcenter{\offinterlineskip -\halign{\hfil -$\displaystyle##$\hfil\cr\gets\cr\to\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\textstyle##$\hfil\cr\gets -\cr\to\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptstyle##$\hfil\cr\gets -\cr\to\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptscriptstyle##$\hfil\cr -\gets\cr\to\cr}}}}} -\def\lid{\mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil -$\displaystyle##$\hfil\cr<\cr\noalign{\vskip1.2\p@}=\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\textstyle##$\hfil\cr<\cr -\noalign{\vskip1.2\p@}=\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptstyle##$\hfil\cr<\cr -\noalign{\vskip\p@}=\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptscriptstyle##$\hfil\cr -<\cr -\noalign{\vskip0.9\p@}=\cr}}}}} -\def\gid{\mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil -$\displaystyle##$\hfil\cr>\cr\noalign{\vskip1.2\p@}=\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\textstyle##$\hfil\cr>\cr -\noalign{\vskip1.2\p@}=\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptstyle##$\hfil\cr>\cr -\noalign{\vskip\p@}=\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptscriptstyle##$\hfil\cr ->\cr -\noalign{\vskip0.9\p@}=\cr}}}}} -\def\grole{\mathrel{\mathchoice {\vcenter{\offinterlineskip -\halign{\hfil -$\displaystyle##$\hfil\cr>\cr\noalign{\vskip-\p@}<\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\textstyle##$\hfil\cr ->\cr\noalign{\vskip-\p@}<\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptstyle##$\hfil\cr ->\cr\noalign{\vskip-0.8\p@}<\cr}}} -{\vcenter{\offinterlineskip\halign{\hfil$\scriptscriptstyle##$\hfil\cr ->\cr\noalign{\vskip-0.3\p@}<\cr}}}}} -\def\bbbr{{\rm I\!R}} %reelle Zahlen -\def\bbbm{{\rm I\!M}} -\def\bbbn{{\rm I\!N}} %natuerliche Zahlen -\def\bbbf{{\rm I\!F}} -\def\bbbh{{\rm I\!H}} -\def\bbbk{{\rm I\!K}} -\def\bbbp{{\rm I\!P}} -\def\bbbone{{\mathchoice {\rm 1\mskip-4mu l} {\rm 1\mskip-4mu l} -{\rm 1\mskip-4.5mu l} {\rm 1\mskip-5mu l}}} -\def\bbbc{{\mathchoice {\setbox0=\hbox{$\displaystyle\rm C$}\hbox{\hbox -to\z@{\kern0.4\wd0\vrule\@height0.9\ht0\hss}\box0}} -{\setbox0=\hbox{$\textstyle\rm C$}\hbox{\hbox -to\z@{\kern0.4\wd0\vrule\@height0.9\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptstyle\rm C$}\hbox{\hbox -to\z@{\kern0.4\wd0\vrule\@height0.9\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptscriptstyle\rm C$}\hbox{\hbox -to\z@{\kern0.4\wd0\vrule\@height0.9\ht0\hss}\box0}}}} -\def\bbbq{{\mathchoice {\setbox0=\hbox{$\displaystyle\rm -Q$}\hbox{\raise -0.15\ht0\hbox to\z@{\kern0.4\wd0\vrule\@height0.8\ht0\hss}\box0}} -{\setbox0=\hbox{$\textstyle\rm Q$}\hbox{\raise -0.15\ht0\hbox to\z@{\kern0.4\wd0\vrule\@height0.8\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptstyle\rm Q$}\hbox{\raise -0.15\ht0\hbox to\z@{\kern0.4\wd0\vrule\@height0.7\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptscriptstyle\rm Q$}\hbox{\raise -0.15\ht0\hbox to\z@{\kern0.4\wd0\vrule\@height0.7\ht0\hss}\box0}}}} -\def\bbbt{{\mathchoice {\setbox0=\hbox{$\displaystyle\rm -T$}\hbox{\hbox to\z@{\kern0.3\wd0\vrule\@height0.9\ht0\hss}\box0}} -{\setbox0=\hbox{$\textstyle\rm T$}\hbox{\hbox -to\z@{\kern0.3\wd0\vrule\@height0.9\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptstyle\rm T$}\hbox{\hbox -to\z@{\kern0.3\wd0\vrule\@height0.9\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptscriptstyle\rm T$}\hbox{\hbox -to\z@{\kern0.3\wd0\vrule\@height0.9\ht0\hss}\box0}}}} -\def\bbbs{{\mathchoice -{\setbox0=\hbox{$\displaystyle \rm S$}\hbox{\raise0.5\ht0\hbox -to\z@{\kern0.35\wd0\vrule\@height0.45\ht0\hss}\hbox -to\z@{\kern0.55\wd0\vrule\@height0.5\ht0\hss}\box0}} -{\setbox0=\hbox{$\textstyle \rm S$}\hbox{\raise0.5\ht0\hbox -to\z@{\kern0.35\wd0\vrule\@height0.45\ht0\hss}\hbox -to\z@{\kern0.55\wd0\vrule\@height0.5\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptstyle \rm S$}\hbox{\raise0.5\ht0\hbox -to\z@{\kern0.35\wd0\vrule\@height0.45\ht0\hss}\raise0.05\ht0\hbox -to\z@{\kern0.5\wd0\vrule\@height0.45\ht0\hss}\box0}} -{\setbox0=\hbox{$\scriptscriptstyle\rm S$}\hbox{\raise0.5\ht0\hbox -to\z@{\kern0.4\wd0\vrule\@height0.45\ht0\hss}\raise0.05\ht0\hbox -to\z@{\kern0.55\wd0\vrule\@height0.45\ht0\hss}\box0}}}} -\def\bbbz{{\mathchoice {\hbox{$\textstyle\sf Z\kern-0.4em Z$}} -{\hbox{$\textstyle\sf Z\kern-0.4em Z$}} -{\hbox{$\scriptstyle\sf Z\kern-0.3em Z$}} -{\hbox{$\scriptscriptstyle\sf Z\kern-0.2em Z$}}}} - -\let\ts\, - -\setlength\arrayrulewidth{.5\p@} -\def\svhline{% - \noalign{\ifnum0=`}\fi\hrule \@height2\arrayrulewidth \futurelet - \reserved@a\@xhline} - -\setlength \labelsep {5\p@} -\setlength\leftmargini {17\p@} -\setlength\leftmargin {\leftmargini} -\setlength\leftmarginii {\leftmargini} -\setlength\leftmarginiii {\leftmargini} -\setlength\leftmarginiv {\leftmargini} -\setlength\labelwidth {\leftmargini} -\addtolength\labelwidth{-\labelsep} - -\def\@listI{\leftmargin\leftmargini - \parsep=\parskip - \topsep=\medskipamount - \itemsep=\parskip \advance\itemsep by -\parsep} -\let\@listi\@listI -\@listi - -\def\@listii{\leftmargin\leftmarginii - \labelwidth\leftmarginii - \advance\labelwidth by -\labelsep - \parsep=\parskip - \topsep=6\p@ - \itemsep=\parskip - \advance\itemsep by -\parsep} - -\def\@listiii{\leftmargin\leftmarginiii - \labelwidth\leftmarginiii\advance\labelwidth by -\labelsep - \parsep=\parskip - \topsep=\z@ - \itemsep=\parskip - \advance\itemsep by -\parsep - \partopsep=\topsep} - -\setlength\arraycolsep{1.5\p@} -\setlength\tabcolsep{1.5\p@} - -\def\tableofcontents{\@restonecolfalse\if@twocolumn\@restonecoltrue\onecolumn - \fi\chapter*{\contentsname \@mkboth{{\contentsname}}{{\contentsname}}} - \@starttoc{toc}\if@restonecol\twocolumn\fi} - -\setcounter{tocdepth}{2} - -\def\l@part#1#2{\addpenalty{\@secpenalty}% - \addvspace{1em \@plus\p@}% - \begingroup - \parindent \z@ - \rightskip \z@ \@plus 5em -% \hrule\vskip5\p@ - \bfseries\boldmath - \leavevmode - #1\par -% \vskip5\p@ -% \hrule - \vskip\p@ - \nobreak - \addvspace{1em \@plus\p@}% - \endgroup} - -\def\@dotsep{2} - -\def\addnumcontentsmark#1#2#3{% -\addtocontents{#1}{\protect\contentsline{#2}{\protect\numberline - {\thechapter}#3}{\thepage}}} -\def\addcontentsmark#1#2#3{% -\addtocontents{#1}{\protect\contentsline{#2}{#3}{\thepage}}} -\def\addcontentsmarkwop#1#2#3{% -\addtocontents{#1}{\protect\contentsline{#2}{#3}{0}}} - -\def\@adcmk[#1]{\ifcase #1 \or -\def\@gtempa{\addnumcontentsmark}% - \or \def\@gtempa{\addcontentsmark}% - \or \def\@gtempa{\addcontentsmarkwop}% - \fi\@gtempa{toc}{chapter}} -\def\addtocmark{\@ifnextchar[{\@adcmk}{\@adcmk[3]}} - -\def\l@chapter#1#2{\par\addpenalty{-\@highpenalty} - \addvspace{1.0em \@plus \p@} - \@tempdima=\if@chapnum\tocchpnum\else\z@\fi - \begingroup - \parindent \z@ \rightskip \@tocrmarg - \advance\rightskip by \z@ \@plus 2cm - \parfillskip -\rightskip \pretolerance=10000 - \leavevmode \advance\leftskip\@tempdima \hskip -\leftskip - {\bfseries\boldmath#1}\ifx0#2\hfil\null - \else - \nobreak - \leaders\hbox{$\m@th \mkern \@dotsep mu\hbox{.}\mkern - \@dotsep mu$}\hfill - \nobreak\hbox to\@pnumwidth{\hfil #2}% - \fi\par - \penalty\@highpenalty \endgroup} - -\newdimen\tocchpnum -\newdimen\tocsecnum -\newdimen\tocsectotal -\newdimen\tocsubsecnum -\newdimen\tocsubsectotal -\newdimen\tocsubsubsecnum -\newdimen\tocsubsubsectotal -\newdimen\tocparanum -\newdimen\tocparatotal -\newdimen\tocsubparanum -\tocchpnum=20\p@ % chapter {\bf 88.} \@plus 5.3\p@ -\tocsecnum=22.5\p@ % section 88.8. plus 4.722\p@ -\tocsubsecnum=30.5\p@ % subsection 88.8.8 plus 4.944\p@ -\tocsubsubsecnum=38\p@ % subsubsection 88.8.8.8 plus 4.666\p@ -\tocparanum=45\p@ % paragraph 88.8.8.8.8 plus 3.888\p@ -\tocsubparanum=53\p@ % subparagraph 88.8.8.8.8.8 plus 4.11\p@ -\def\calctocindent{% -\tocsectotal=\tocchpnum -\advance\tocsectotal by\tocsecnum -\tocsubsectotal=\tocsectotal -\advance\tocsubsectotal by\tocsubsecnum -\tocsubsubsectotal=\tocsubsectotal -\advance\tocsubsubsectotal by\tocsubsubsecnum -\tocparatotal=\tocsubsubsectotal -\advance\tocparatotal by\tocparanum} -\calctocindent - -\def\@dottedtocline#1#2#3#4#5{% - \ifnum #1>\c@tocdepth \else - \vskip \z@ \@plus.2\p@ - {\leftskip #2\rightskip \@tocrmarg \advance\rightskip by \z@ \@plus 2cm - \parfillskip -\rightskip \pretolerance=10000 - \parindent #2\relax\@afterindenttrue - \interlinepenalty\@M - \leavevmode - \ifnum #1>\c@secnumdepth \@tempdima\z@ \else \@tempdima #3\fi -% \@tempdima #3\relax - \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip - {#4}\nobreak - \leaders\hbox{$\m@th - \mkern \@dotsep mu\hbox{.}\mkern \@dotsep - mu$}\hfill - \nobreak - \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}% - \par}% - \fi} -% -\def\l@section{\@dottedtocline{1}{\tocchpnum}{\tocsecnum}} -\def\l@subsection{\@dottedtocline{2}{\tocsectotal}{\tocsubsecnum}} -\def\l@subsubsection{\@dottedtocline{3}{\tocsubsectotal}{\tocsubsubsecnum}} -\def\l@paragraph{\@dottedtocline{4}{\tocsubsubsectotal}{\tocparanum}} -\def\l@subparagraph{\@dottedtocline{5}{\tocparatotal}{\tocsubparanum}} - -\renewcommand\listoffigures{% - \chapter*{\listfigurename - \@mkboth{\listfigurename}{\listfigurename}}% - \@starttoc{lof}% - } - -\renewcommand\listoftables{% - \chapter*{\listtablename - \@mkboth{\listtablename}{\listtablename}}% - \@starttoc{lot}% - } - -\newenvironment{thecontriblist} - {\par - \addvspace{\bigskipamount} - \parindent\z@ - \rightskip\z@ \@plus 40\p@ - \def\iand{\\[\medskipamount]\let\and=\nand}% - \def\nand{\ifhmode\unskip\nobreak\fi\ $\cdot$ }% - \let\and=\nand - \def\at{\\\let\and=\iand}% - } - {\par - \addvspace{\bigskipamount}} - -\renewcommand\footnoterule{% - \kern-3\p@ - \hrule\@width 36mm - \kern2.6\p@} - -\newdimen\foot@parindent -\foot@parindent 10.83\p@ - -\footnotesep 9\p@ - -\AtBeginDocument{% -\renewcommand\@makefntext[1]{% - \parindent 12\p@ - \noindent - \mbox{\@makefnmark} #1}} -\if@spthms -% -% Definition of the "\spnewtheorem" command. -% -% Usage: -% -% \spnewtheorem{env_nam}{caption}[within]{cap_font}{body_font} -% or \spnewtheorem{env_nam}[numbered_like]{caption}{cap_font}{body_font} -% or \spnewtheorem*{env_nam}{caption}{cap_font}{body_font} -% -% New is "cap_font" and "body_font". It stands for -% fontdefinition of the caption and the text itself. -% -% "\spnewtheorem*" gives a theorem without number. -% -% A defined spnewthoerem environment is used as described -% by Lamport. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\def\@thmcountersep{.} -\def\@thmcounterend{.} -\newcommand\nocaption{\noexpand\@gobble} -\newdimen\spthmsep \spthmsep=3pt - -\def\spnewtheorem{\@ifstar{\@sthm}{\@Sthm}} - -% definition of \spnewtheorem with number - -\def\@spnthm#1#2{% - \@ifnextchar[{\@spxnthm{#1}{#2}}{\@spynthm{#1}{#2}}} -\def\@Sthm#1{\@ifnextchar[{\@spothm{#1}}{\@spnthm{#1}}} - -\def\@spxnthm#1#2[#3]#4#5{\expandafter\@ifdefinable\csname #1\endcsname - {\@definecounter{#1}\@addtoreset{#1}{#3}% - \expandafter\xdef\csname the#1\endcsname{\expandafter\noexpand - \csname the#3\endcsname \noexpand\@thmcountersep \@thmcounter{#1}}% - \expandafter\xdef\csname #1name\endcsname{#2}% - \global\@namedef{#1}{\@spthm{#1}{\csname #1name\endcsname}{#4}{#5}}% - \global\@namedef{end#1}{\@endtheorem}}} - -\def\@spynthm#1#2#3#4{\expandafter\@ifdefinable\csname #1\endcsname - {\@definecounter{#1}% - \expandafter\xdef\csname the#1\endcsname{\@thmcounter{#1}}% - \expandafter\xdef\csname #1name\endcsname{#2}% - \global\@namedef{#1}{\@spthm{#1}{\csname #1name\endcsname}{#3}{#4}}% - \global\@namedef{end#1}{\@endtheorem}}} - -\def\@spothm#1[#2]#3#4#5{% - \@ifundefined{c@#2}{\@latexerr{No theorem environment `#2' defined}\@eha}% - {\expandafter\@ifdefinable\csname #1\endcsname - {\global\@namedef{the#1}{\@nameuse{the#2}}% - \expandafter\xdef\csname #1name\endcsname{#3}% - \global\@namedef{#1}{\@spthm{#2}{\csname #1name\endcsname}{#4}{#5}}% - \global\@namedef{end#1}{\@endtheorem}}}} - -\def\@spthm#1#2#3#4{\topsep 7\p@ \@plus2\p@ \@minus4\p@ -\labelsep=\spthmsep\refstepcounter{#1}% -\@ifnextchar[{\@spythm{#1}{#2}{#3}{#4}}{\@spxthm{#1}{#2}{#3}{#4}}} - -\def\@spxthm#1#2#3#4{\@spbegintheorem{#2}{\csname the#1\endcsname}{#3}{#4}% - \ignorespaces} - -\def\@spythm#1#2#3#4[#5]{\@spopargbegintheorem{#2}{\csname - the#1\endcsname}{#5}{#3}{#4}\ignorespaces} - -\def\normalthmheadings{\def\@spbegintheorem##1##2##3##4{\trivlist - \item[\hskip\labelsep{##3##1\ ##2\@thmcounterend}]##4} -\def\@spopargbegintheorem##1##2##3##4##5{\trivlist - \item[\hskip\labelsep{##4##1\ ##2}]{##4(##3)\@thmcounterend\ }##5}} -\normalthmheadings - -\def\reversethmheadings{\def\@spbegintheorem##1##2##3##4{\trivlist - \item[\hskip\labelsep{##3##2\ ##1\@thmcounterend}]##4} -\def\@spopargbegintheorem##1##2##3##4##5{\trivlist - \item[\hskip\labelsep{##4##2\ ##1}]{##4(##3)\@thmcounterend\ }##5}} - -% definition of \spnewtheorem* without number - -\def\@sthm#1#2{\@Ynthm{#1}{#2}} - -\def\@Ynthm#1#2#3#4{\expandafter\@ifdefinable\csname #1\endcsname - {\global\@namedef{#1}{\@Thm{\csname #1name\endcsname}{#3}{#4}}% - \expandafter\xdef\csname #1name\endcsname{#2}% - \global\@namedef{end#1}{\@endtheorem}}} - -\def\@Thm#1#2#3{\topsep 7\p@ \@plus2\p@ \@minus4\p@ -\@ifnextchar[{\@Ythm{#1}{#2}{#3}}{\@Xthm{#1}{#2}{#3}}} - -\def\@Xthm#1#2#3{\@Begintheorem{#1}{#2}{#3}\ignorespaces} - -\def\@Ythm#1#2#3[#4]{\@Opargbegintheorem{#1} - {#4}{#2}{#3}\ignorespaces} - -\def\@Begintheorem#1#2#3{#3\trivlist - \item[\hskip\labelsep{#2#1\@thmcounterend}]} - -\def\@Opargbegintheorem#1#2#3#4{#4\trivlist - \item[\hskip\labelsep{#3#1}]{#3(#2)\@thmcounterend\ }} - -% initialize theorem environment - -\if@envcntshowhiercnt % show hierarchy counter - \def\@thmcountersep{.} - \spnewtheorem{theorem}{Theorem}[\envankh]{\bfseries}{\itshape} - \@addtoreset{theorem}{chapter} -\else % theorem counter only - \spnewtheorem{theorem}{Theorem}{\bfseries}{\itshape} - \if@envcntreset - \@addtoreset{theorem}{chapter} - \if@envcntresetsect - \@addtoreset{theorem}{section} - \fi - \fi -\fi - -%definition of divers theorem environments -\spnewtheorem*{claim}{Claim}{\itshape}{\rmfamily} -\spnewtheorem*{proof}{Proof}{\itshape}{\rmfamily} -% -\if@envcntsame % all environments like "Theorem" - using its counter - \def\spn@wtheorem#1#2#3#4{\@spothm{#1}[theorem]{#2}{#3}{#4}} -\else % all environments with their own counter - \if@envcntshowhiercnt % show hierarchy counter - \def\spn@wtheorem#1#2#3#4{\@spxnthm{#1}{#2}[\envankh]{#3}{#4}} - \else % environment counter only - \if@envcntreset % environment counter is reset each section - \if@envcntresetsect - \def\spn@wtheorem#1#2#3#4{\@spynthm{#1}{#2}{#3}{#4} - \@addtoreset{#1}{chapter}\@addtoreset{#1}{section}} - \else - \def\spn@wtheorem#1#2#3#4{\@spynthm{#1}{#2}{#3}{#4} - \@addtoreset{#1}{chapter}} - \fi - \else - \let\spn@wtheorem=\@spynthm - \fi - \fi -\fi -% -\let\spdefaulttheorem=\spn@wtheorem -% -\spn@wtheorem{case}{Case}{\itshape}{\rmfamily} -\spn@wtheorem{conjecture}{Conjecture}{\itshape}{\rmfamily} -\spn@wtheorem{corollary}{Corollary}{\bfseries}{\itshape} -\spn@wtheorem{definition}{Definition}{\bfseries}{\rmfamily} -\spn@wtheorem{example}{Example}{\itshape}{\rmfamily} -\spn@wtheorem{exercise}{Exercise}{\bfseries}{\rmfamily} -\spn@wtheorem{lemma}{Lemma}{\bfseries}{\itshape} -\spn@wtheorem{note}{Note}{\itshape}{\rmfamily} -\spn@wtheorem{problem}{Problem}{\bfseries}{\rmfamily} -\spn@wtheorem{property}{Property}{\itshape}{\rmfamily} -\spn@wtheorem{proposition}{Proposition}{\bfseries}{\itshape} -\spn@wtheorem{question}{Question}{\itshape}{\rmfamily} -\spn@wtheorem{solution}{Solution}{\bfseries}{\rmfamily} -\spn@wtheorem{remark}{Remark}{\itshape}{\rmfamily} -% -\newenvironment{theopargself} - {\def\@spopargbegintheorem##1##2##3##4##5{\trivlist - \item[\hskip\labelsep{##4##1\ ##2}]{##4##3\@thmcounterend\ }##5} - \def\@Opargbegintheorem##1##2##3##4{##4\trivlist - \item[\hskip\labelsep{##3##1}]{##3##2\@thmcounterend\ }}}{} -\newenvironment{theopargself*} - {\def\@spopargbegintheorem##1##2##3##4##5{\trivlist - \item[\hskip\labelsep{##4##1\ ##2}]{\hspace*{-\labelsep}##4##3\@thmcounterend}##5} - \def\@Opargbegintheorem##1##2##3##4{##4\trivlist - \item[\hskip\labelsep{##3##1}]{\hspace*{-\labelsep}##3##2\@thmcounterend}}}{} -% -\spn@wtheorem{prob}{\nocaption}{\bfseries}{\rmfamily} -\newcommand{\probref}[1]{\textbf{\ref{#1}} } -\newenvironment{sol}{\par\addvspace{6pt}\noindent\probref}{\par\addvspace{6pt}} -% -\fi - -\def\@takefromreset#1#2{% - \def\@tempa{#1}% - \let\@tempd\@elt - \def\@elt##1{% - \def\@tempb{##1}% - \ifx\@tempa\@tempb\else - \@addtoreset{##1}{#2}% - \fi}% - \expandafter\expandafter\let\expandafter\@tempc\csname cl@#2\endcsname - \expandafter\def\csname cl@#2\endcsname{}% - \@tempc - \let\@elt\@tempd} - -% redefininition of the captions for "figure" and "table" environments -% -\@ifundefined{floatlegendstyle}{\def\floatlegendstyle{\bfseries}}{} -\def\floatcounterend{\enspace} -\def\capstrut{\vrule\@width\z@\@height\topskip} -\@ifundefined{captionstyle}{\def\captionstyle{\normalfont\small}}{} -\@ifundefined{instindent}{\newdimen\instindent}{} - -\long\def\@caption#1[#2]#3{\par\addcontentsline{\csname - ext@#1\endcsname}{#1}{\protect\numberline{\csname - the#1\endcsname}{\ignorespaces #2}}\begingroup - \@parboxrestore\if@minipage\@setminipage\fi - \@makecaption{\csname fnum@#1\endcsname}{\ignorespaces #3}\par - \endgroup} - -\def\twocaptionwidth#1#2{\def\first@capwidth{#1}\def\second@capwidth{#2}} -% Default: .46\textwidth -\twocaptionwidth{.46\textwidth}{.46\textwidth} - -\def\leftcaption{\refstepcounter\@captype\@dblarg% - {\@leftcaption\@captype}} - -\def\rightcaption{\refstepcounter\@captype\@dblarg% - {\@rightcaption\@captype}} - -\long\def\@leftcaption#1[#2]#3{\addcontentsline{\csname - ext@#1\endcsname}{#1}{\protect\numberline{\csname - the#1\endcsname}{\ignorespaces #2}}\begingroup - \@parboxrestore - \vskip\figcapgap - \@maketwocaptions{\csname fnum@#1\endcsname}{\ignorespaces #3}% - {\first@capwidth}\ignorespaces\hspace{.073\textwidth}\hfill% - \endgroup} - -\long\def\@rightcaption#1[#2]#3{\addcontentsline{\csname - ext@#1\endcsname}{#1}{\protect\numberline{\csname - the#1\endcsname}{\ignorespaces #2}}\begingroup - \@parboxrestore - \@maketwocaptions{\csname fnum@#1\endcsname}{\ignorespaces #3}% - {\second@capwidth}\par - \endgroup} - -\long\def\@maketwocaptions#1#2#3{% - \parbox[t]{#3}{{\floatlegendstyle #1\floatcounterend}#2}} - -\def\fig@pos{l} -\newcommand{\leftfigure}[2][\fig@pos]{\makebox[.4635\textwidth][#1]{#2}} -\let\rightfigure\leftfigure - -\newdimen\figgap\figgap=0.5cm % hgap between figure and sidecaption -% -\long\def\@makesidecaption#1#2{\@tempdimb=3.6cm - \setbox0=\vbox{\hsize=\@tempdimb - \captionstyle{\floatlegendstyle - #1\floatcounterend}#2}% - \ifdim\instindent<\z@ - \ifdim\ht0>-\instindent - \advance\instindent by\ht0 - \typeout{^^JClass-Warning: Legend of \string\sidecaption\space for - \@captype\space\csname the\@captype\endcsname - ^^Jis \the\instindent\space taller than the corresponding float - - ^^Jyou'd better switch the environment. }% - \instindent\z@ - \fi - \else - \ifdim\ht0<\instindent - \advance\instindent by-\ht0 - \advance\instindent by-\dp0\relax - \advance\instindent by\topskip - \advance\instindent by-11\p@ - \else - \advance\instindent by-\ht0 - \instindent=-\instindent - \typeout{^^JClass-Warning: Legend of \string\sidecaption\space for - \@captype\space\csname the\@captype\endcsname - ^^Jis \the\instindent\space taller than the corresponding float - - ^^Jyou'd better switch the environment. }% - \instindent\z@ - \fi - \fi - \parbox[b]{\@tempdimb}{\captionstyle{\floatlegendstyle - #1\floatcounterend}#2% - \ifdim\instindent>\z@ \\ - \vrule\@width\z@\@height\instindent - \@depth\z@ - \fi}} -\def\sidecaption{\@ifnextchar[\sidec@ption{\sidec@ption[b]}} -% -\newbox\bildb@x -% -\def\sidec@ption[#1]#2\caption{% -\setbox\bildb@x=\hbox{\ignorespaces#2\unskip}% -\if@twocolumn - \ifdim\hsize<\textwidth\else - \ifdim\wd\bildb@x<\columnwidth - \typeout{Double column float fits into single column - - ^^Jyou'd better switch the environment. }% - \fi - \fi -\fi - \instindent=\ht\bildb@x - \advance\instindent by\dp\bildb@x -\if t#1 -\else - \instindent=-\instindent -\fi -\@tempdimb=\hsize -\advance\@tempdimb by-\figgap -\advance\@tempdimb by-\wd\bildb@x -\ifdim\@tempdimb<3.6cm - \ClassWarning{SVMono}{\string\sidecaption: No sufficient room for the legend; - ^^Jusing normal \string\caption}% - \unhbox\bildb@x - \let\@capcommand=\@caption -\else -% \ifdim\@tempdimb<4.5cm -% \ClassWarning{SVMono}{\string\sidecaption: Room for the legend very narrow; -% ^^Jusing \string\raggedright}% - \toks@\expandafter{\captionstyle\sloppy - \rightskip=\z@\@plus6mm\relax}% - \def\captionstyle{\the\toks@}% -% \fi - \let\@capcommand=\@sidecaption -% \leavevmode -% \unhbox\bildb@x -% \hfill -\fi -\refstepcounter\@captype -\@dblarg{\@capcommand\@captype}} -\long\def\@sidecaption#1[#2]#3{\addcontentsline{\csname - ext@#1\endcsname}{#1}{\protect\numberline{\csname - the#1\endcsname}{\ignorespaces #2}}\begingroup - \@parboxrestore - \@makesidecaption{\csname fnum@#1\endcsname}{\ignorespaces #3}% - \hfill - \unhbox\bildb@x - \par - \endgroup} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\def\fig@type{figure} - -\def\leftlegendglue{\relax} -\newdimen\figcapgap\figcapgap=5\p@ % vgap between figure and caption -\newdimen\tabcapgap\tabcapgap=3\p@ % vgap between caption and table - -\long\def\@makecaption#1#2{% - \captionstyle - \ifx\@captype\fig@type - \vskip\figcapgap - \fi - \setbox\@tempboxa\hbox{{\floatlegendstyle #1\floatcounterend}% - \capstrut #2}% - \ifdim \wd\@tempboxa >\hsize - {\floatlegendstyle #1\floatcounterend}\capstrut #2\par - \else - \hbox to\hsize{\leftlegendglue\unhbox\@tempboxa\hfil}% - \fi - \ifx\@captype\fig@type\else - \vskip\tabcapgap - \fi} - -\newcounter{merk} - -\def\endfigure{\resetsubfig\end@float} - -\@namedef{endfigure*}{\resetsubfig\end@dblfloat} - -\def\resetsubfig{\global\let\last@subfig=\undefined} - -\def\r@setsubfig{\xdef\last@subfig{\number\value{figure}}% -\setcounter{figure}{\value{merk}}% -\setcounter{merk}{0}} - -\def\subfigures{\refstepcounter{figure}% - \@tempcnta=\value{merk}% - \setcounter{merk}{\value{figure}}% - \setcounter{figure}{\the\@tempcnta}% - \def\thefigure{\if@numart\else\thechapter.\fi - \@arabic\c@merk\alph{figure}}% - \let\resetsubfig=\r@setsubfig} - -\def\samenumber{\addtocounter{\@captype}{-1}% -\@ifundefined{last@subfig}{}{\setcounter{merk}{\last@subfig}}} - -% redefinition of the "bibliography" environment -% -\def\biblstarthook#1{\gdef\biblst@rthook{#1}} -% -\AtBeginDocument{% -\ifx\secbibl\undefined - \def\bibsection{\chapter*{\refname}\markboth{\refname}{\refname}% - \addcontentsline{toc}{chapter}{\refname}% - \csname biblst@rthook\endcsname\par} -\else - \def\bibsection{\section*{\refname}\markright{\refname}% - \addcontentsline{toc}{section}{\refname}% - \csname biblst@rthook\endcsname\par} -\fi} -\ifx\oribibl\undefined % Springer way of life - \renewenvironment{thebibliography}[1]{\bibsection - \global\let\biblst@rthook=\undefined - \def\@biblabel##1{##1.} - \small - \list{\@biblabel{\@arabic\c@enumiv}}% - {\settowidth\labelwidth{\@biblabel{#1}}% - \leftmargin\labelwidth - \advance\leftmargin\labelsep - \if@openbib - \advance\leftmargin\bibindent - \itemindent -\bibindent - \listparindent \itemindent - \parsep \z@ - \fi - \usecounter{enumiv}% - \let\p@enumiv\@empty - \renewcommand\theenumiv{\@arabic\c@enumiv}}% - \if@openbib - \renewcommand\newblock{\par}% - \else - \renewcommand\newblock{\hskip .11em \@plus.33em \@minus.07em}% - \fi - \sloppy\clubpenalty4000\widowpenalty4000% - \sfcode`\.=\@m} - {\def\@noitemerr - {\@latex@warning{Empty `thebibliography' environment}}% - \endlist} - \def\@lbibitem[#1]#2{\item[{[#1]}\hfill]\if@filesw - {\let\protect\noexpand\immediate - \write\@auxout{\string\bibcite{#2}{#1}}}\fi\ignorespaces} -\else % original bibliography is required - \let\bibname=\refname - \renewenvironment{thebibliography}[1] - {\chapter*{\bibname - \@mkboth{\bibname}{\bibname}}% - \list{\@biblabel{\@arabic\c@enumiv}}% - {\settowidth\labelwidth{\@biblabel{#1}}% - \leftmargin\labelwidth - \advance\leftmargin\labelsep - \@openbib@code - \usecounter{enumiv}% - \let\p@enumiv\@empty - \renewcommand\theenumiv{\@arabic\c@enumiv}}% - \sloppy - \clubpenalty4000 - \@clubpenalty \clubpenalty - \widowpenalty4000% - \sfcode`\.\@m} - {\def\@noitemerr - {\@latex@warning{Empty `thebibliography' environment}}% - \endlist} -\fi - -\let\if@threecolind\iffalse -\def\threecolindex{\let\if@threecolind\iftrue} -\def\indexstarthook#1{\gdef\indexst@rthook{#1}} -\renewenvironment{theindex} - {\if@twocolumn - \@restonecolfalse - \else - \@restonecoltrue - \fi - \columnseprule \z@ - \columnsep 1cc - \@nobreaktrue - \if@threecolind - \begin{multicols}{3}[\chapter*{\indexname}% - \else - \begin{multicols}{2}[\chapter*{\indexname}% - \fi - {\csname indexst@rthook\endcsname}]% - \global\let\indexst@rthook=\undefined - \markboth{\indexname}{\indexname}% - \addcontentsline{toc}{chapter}{\indexname}% - \parindent\z@ - \rightskip\z@ \@plus 40\p@ - \parskip\z@ \@plus .3\p@\relax - \flushbottom - \let\item\@idxitem - \def\,{\relax\ifmmode\mskip\thinmuskip - \else\hskip0.2em\ignorespaces\fi}% - \normalfont\small} - {\end{multicols} - \global\let\if@threecolind\iffalse - \if@restonecol\onecolumn\else\clearpage\fi} - -\def\idxquad{\hskip 10\p@}% space that divides entry from number - -\def\@idxitem{\par\setbox0=\hbox{--\,--\,--\enspace}% - \hangindent\wd0\relax} - -\def\subitem{\par\noindent\setbox0=\hbox{--\enspace}% second order - \kern\wd0\setbox0=\hbox{--\,--\,--\enspace}% - \hangindent\wd0\relax}% indexentry - -\def\subsubitem{\par\noindent\setbox0=\hbox{--\,--\enspace}% third order - \kern\wd0\setbox0=\hbox{--\,--\,--\enspace}% - \hangindent\wd0\relax}% indexentry - -\def\indexspace{\par \vskip 10\p@ \@plus5\p@ \@minus3\p@\relax} - -\def\subtitle#1{\gdef\@subtitle{#1}} -\def\@subtitle{} - -\def\maketitle{\par - \begingroup - \def\thefootnote{\fnsymbol{footnote}}% - \def\@makefnmark{\hbox - to\z@{$\m@th^{\@thefnmark}$\hss}}% - \if@twocolumn - \twocolumn[\@maketitle]% - \else \newpage - \global\@topnum\z@ % Prevents figures from going at top of page. - \@maketitle \fi\thispagestyle{empty}\@thanks - \par\penalty -\@M - \endgroup - \setcounter{footnote}{0}% - \let\maketitle\relax - \let\@maketitle\relax - \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax} - -\def\@maketitle{\newpage - \null - \vskip 2em % Vertical space above title. -\begingroup - \def\and{\unskip, } - \parindent=\z@ - \pretolerance=10000 - \rightskip=\z@ \@plus 3cm - {\LARGE % each author set in \LARGE - \lineskip .5em - \@author - \par}% - \vskip 2cm % Vertical space after author. - {\Huge \@title \par}% % Title set in \Huge size. - \vskip 1cm % Vertical space after title. - \if!\@subtitle!\else - {\LARGE\ignorespaces\@subtitle \par} - \vskip 1cm % Vertical space after subtitle. - \fi - \if!\@date!\else - {\large \@date}% % Date set in \large size. - \par - \vskip 1.5em % Vertical space after date. - \fi - \vfill - {\Large Springer\par} -%\vskip 5\p@ -%\large -% Berlin\enspace Heidelberg\enspace New\kern0.1em York\\ -% Hong\thinspace Kong\enspace London\\ -% Milan\enspace Paris\enspace Tokyo\par -\endgroup} - -% Useful environments -\newenvironment{acknowledgement}{\par\addvspace{17\p@}\small\rm -\trivlist\item[\hskip\labelsep{\bfseries\ackname}]} -{\endtrivlist\addvspace{6\p@}} -% -\newenvironment{noteadd}{\par\addvspace{17\p@}\small\rm -\trivlist\item[\hskip\labelsep{\it\noteaddname}]} -{\endtrivlist\addvspace{6\p@}} -% -\DeclareRobustCommand\abstract{\@ifstar\@abstgobl\@abstract} -\def\@abstract#1{\noindent\textbf{\abstractname} #1\par -%\@afterindentfalse -%\@afterheading -} -\def\@abstgobl#1{\par -%\@afterindentfalse -%\@afterheading -} -% -\newcommand{\keywords}[1]{\par\addvspace\baselineskip -\noindent\keywordname\enspace\ignorespaces#1} -% -% define the running headings of a twoside text -\def\runheadsize{\small} -\def\runheadstyle{\rmfamily\upshape} -\def\customizhead{\hspace{\headlineindent}} - -\def\ps@bchap{%\let\@mkboth\@gobbletwo - \let\@oddhead\@empty\let\@evenhead\@empty - \def\@oddfoot{\reset@font\small\hfil\thepage}% - \let\@evenfoot\@oddfoot} - -\def\ps@headings{\let\@mkboth\markboth - \let\@oddfoot\@empty\let\@evenfoot\@empty - \def\@evenhead{\runheadsize\runheadstyle\rlap{\thepage}\hfil - \leftmark} - \def\@oddhead{\runheadsize\runheadstyle\rightmark\hfil - \llap{\thepage}} - \def\chaptermark##1{\markboth{{\if@chapnum %\ifnum\c@secnumdepth>\m@ne - \thechapter\thechapterend\hskip\betweenumberspace\fi ##1}}{{\if@chapnum %\ifnum\c@secnumdepth>\m@ne - \thechapter\thechapterend\hskip\betweenumberspace\fi ##1}}}%!!! - \def\sectionmark##1{\markright{{\ifnum\c@secnumdepth>\z@ - \thesection\seccounterend\hskip\betweenumberspace\fi ##1}}}} - -\def\ps@myheadings{\let\@mkboth\@gobbletwo - \let\@oddfoot\@empty\let\@evenfoot\@empty - \def\@evenhead{\runheadsize\runheadstyle\rlap{\thepage}\hfil - \leftmark} - \def\@oddhead{\runheadsize\runheadstyle\rightmark\hfil - \llap{\thepage}} - \let\chaptermark\@gobble - \let\sectionmark\@gobble - \let\subsectionmark\@gobble} - - -\ps@headings - -\endinput -%end of file svmono.cls diff --git a/book/tex/themes.tex b/book/tex/themes.tex deleted file mode 100644 index 26a89b17..00000000 --- a/book/tex/themes.tex +++ /dev/null @@ -1,1136 +0,0 @@ -\chapter{Themes}\label{cha:polishing} - -\section{Introduction}\label{introduction} - -In this chapter you will learn how to use the ggplot2 theme system, -which allows you to exercise fine control over the non-data elements of -your plot. The theme system does not affect how the data is rendered by -geoms, or how it is transformed by scales. Themes don't change the -perceptual properties of the plot, but they do help you make the plot -aesthetically pleasing or match an existing style guide. Themes give you -control over things like fonts, ticks, panel strips, and backgrounds. -\index{Themes} - -This separation of control into data and non-data parts is quite -different from base and lattice graphics. In base and lattice graphics, -most functions take a large number of arguments that specify both data -and non-data appearance, which makes the functions complicated and -harder to learn. ggplot2 takes a different approach: when creating the -plot you determine how the data is displayed, then \emph{after} it has -been created you can edit every detail of the rendering, using the -theming system. - -The theming system is composed of four main components: - -\begin{itemize} -\item - Theme \textbf{elements} specify the non-data elements that you can - control. For example, the \texttt{plot.title} element controls the - appearance of the plot title; \texttt{axis.ticks.x}, the ticks on the - x axis; \texttt{legend.key.height}, the height of the keys in the - legend. -\item - Each element is associated with an \textbf{element function}, which - describes the visual properties of the element. For example, - \texttt{element\_text()} sets the font size, colour and face of text - elements like \texttt{plot.title}. -\item - The \texttt{theme()} function which allows you to override the default - theme elements by calling element functions, like - \texttt{theme(plot.title\ =\ element\_text(colour\ =\ "red"))}. -\item - Complete \textbf{themes}, like \texttt{theme\_grey()} set all of the - theme elements to values designed to work together harmoniously. -\end{itemize} - -For example, imagine you've made the following plot of your data. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(cty, hwy, }\DataTypeTok{color =} \KeywordTok{factor}\NormalTok{(cyl))) +} -\StringTok{ }\KeywordTok{geom_jitter}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_abline}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{)} -\NormalTok{base} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/themes/motivation-1-1} -\end{figure} - -It's served its purpose for you: you've learned that \texttt{cty} and -\texttt{hwy} are highly correlated, both are tightly coupled with -\texttt{cyl}, and that \texttt{hwy} is always greater than \texttt{cty} -(and the difference increases as \texttt{cty} increases). Now you want -to share the plot with others, perhaps by publishing it in a paper. That -requires some changes. First, you need to make sure the plot can stand -alone by: - -\begin{itemize} -\tightlist -\item - Improving the axes and legend labels. -\item - Adding a title for the plot. -\item - Tweaking the colour scale. -\end{itemize} - -Fortunately you know how to do that already because you've read -\protect\hyperlink{cha:scales}{the scales chapter}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{labelled <-}\StringTok{ }\NormalTok{base +} -\StringTok{ }\KeywordTok{labs}\NormalTok{(} - \DataTypeTok{x =} \StringTok{"City mileage/gallon"}\NormalTok{,} - \DataTypeTok{y =} \StringTok{"Highway mileage/gallon"}\NormalTok{,} - \DataTypeTok{colour =} \StringTok{"Cylinders"}\NormalTok{,} - \DataTypeTok{title =} \StringTok{"Highway and city mileage are highly correlated"} - \NormalTok{) +} -\StringTok{ }\KeywordTok{scale_colour_brewer}\NormalTok{(}\DataTypeTok{type =} \StringTok{"seq"}\NormalTok{, }\DataTypeTok{palette =} \StringTok{"Spectral"}\NormalTok{)} -\NormalTok{labelled} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/themes/motivation-2-1} -\end{figure} - -Next, you need to make sure the plot matches the style guidelines of -your journal: - -\begin{itemize} -\tightlist -\item - The background should be white, not pale grey. -\item - The legend should be placed inside the plot if there's room. -\item - Major gridlines should be a pale grey and minor gridlines should be - removed. -\item - The plot title should be 12pt bold text. -\end{itemize} - -In this chapter, you'll learn how to use the theming system to make -those changes, as shown below: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{styled <-}\StringTok{ }\NormalTok{labelled +} -\StringTok{ }\KeywordTok{theme_bw}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{face =} \StringTok{"bold"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{12}\NormalTok{),} - \DataTypeTok{legend.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{4}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"white"}\NormalTok{),} - \DataTypeTok{legend.justification =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{1}\NormalTok{),} - \DataTypeTok{legend.position =} \KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{1}\NormalTok{),} - \DataTypeTok{axis.ticks =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey70"}\NormalTok{, }\DataTypeTok{size =} \FloatTok{0.2}\NormalTok{),} - \DataTypeTok{panel.grid.major =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey70"}\NormalTok{, }\DataTypeTok{size =} \FloatTok{0.2}\NormalTok{),} - \DataTypeTok{panel.grid.minor =} \KeywordTok{element_blank}\NormalTok{()} - \NormalTok{)} -\NormalTok{styled} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.75\linewidth]{_figures/themes/motivation-3-1} -\end{figure} - -Finally, the journal wants the figure as a 600 dpi TIFF file. You'll -learn the fine details of \texttt{ggsave()} in -\protect\hyperlink{sec:saving}{saving your output}. - -\section{Complete themes}\label{sec:themes} - -ggplot2 comes with a number of built in themes. The most important is -\texttt{theme\_grey()}, the signature ggplot2 theme with a light grey -background and white gridlines. The theme is designed to put the data -forward while supporting comparisons, following the advice of (Tufte -2006; Brewer 1994; Carr 2002; Carr 1994; Carr and Sun 1999). We can -still see the gridlines to aid in the judgement of position (Cleveland -1993), but they have little visual impact and we can easily `tune' them -out. The grey background gives the plot a similar typographic colour to -the text, ensuring that the graphics fit in with the flow of a document -without jumping out with a bright white background. Finally, the grey -background creates a continuous field of colour which ensures that the -plot is perceived as a single visual entity. \index{Themes!built-in} -\indexf{theme\_grey} - -There are seven other themes built in to ggplot2 1.1.0: - -\begin{itemize} -\item - \texttt{theme\_bw()}: a variation on \texttt{theme\_grey()} that uses - a white background and thin grey grid lines. \indexf{theme\_bw} -\item - \texttt{theme\_linedraw()}: A theme with only black lines of various - widths on white backgrounds, reminiscent of a line drawing. - \indexf{theme\_linedraw} -\item - \texttt{theme\_light()}: similar to \texttt{theme\_linedraw()} but - with light grey lines and axes, to direct more attention towards the - data. \indexf{theme\_light} -\item - \texttt{theme\_dark()}: the dark cousin of \texttt{theme\_light()}, - with similar line sizes but a dark background. Useful to make thin - coloured lines pop out. \indexf{theme\_dark} -\item - \texttt{theme\_minimal()}: A minimalistic theme with no background - annotations. \indexf{theme\_minimal} -\item - \texttt{theme\_classic()}: A classic-looking theme, with x and y axis - lines and no gridlines. \indexf{theme\_classic} -\item - \texttt{theme\_void()}: A completely empty theme. \indexf{theme\_void} -\end{itemize} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{)} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_grey}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_grey()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_bw}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_bw()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_linedraw}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_linedraw()"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/built-in-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/built-in-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/built-in-3} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_light}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_light()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_dark}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_dark()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_minimal}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_minimal()"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/unnamed-chunk-1-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/unnamed-chunk-1-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/unnamed-chunk-1-3} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_classic}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_classic()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_void}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_void()"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/unnamed-chunk-2-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/unnamed-chunk-2-2} -\end{figure} - -All themes have a \texttt{base\_size} parameter which controls the base -font size. The base font size is the size that the axis titles use: the -plot title is usually bigger (1.2x), and the tick and strip labels are -smaller (0.8x). If you want to control these sizes separately, you'll -need to modify the individual elements as described below. - -As well as applying themes a plot at a time, you can change the default -theme with \texttt{theme\_set()}. For example, if you really hate the -default grey background, run \texttt{theme\_set(theme\_bw())} to use a -white background for all plots. \indexf{theme\_set} - -You're not limited to the themes built-in to ggplot2. Other packages, -like ggthemes by Jeffrey Arnold, add even more. Here's a few of my -favourites from ggthemes: \index{ggtheme} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(ggthemes)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_tufte}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_tufte()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_solarized}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_solarized()"}\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme_excel}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"theme_excel()"}\NormalTok{) }\CommentTok{# ;)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/ggtheme-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/ggtheme-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/ggtheme-3} -\end{figure} - -The complete themes are a great place to start but don't give you a lot -of control. To modify individual elements, you need to use -\texttt{theme()} to override the default setting for an element with an -element function. - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Try out all the themes in ggthemes. Which do you like the best? -\item - What aspects of the default theme do you like? What don't you like?\\ - What would you change? -\item - Look at the plots in your favourite scientific journal. What theme do - they most resemble? What are the main differences? -\end{enumerate} - -\section{Modifying theme components}\label{modifying-theme-components} - -To modify an individual theme component you use code like -\texttt{plot\ +\ theme(element.name\ =\ element\_function())}. In this -section you'll learn about the basic element functions, and then in the -next section, you'll see all the elements that you can modify. -\indexf{theme} - -There are four basic types of built-in element functions: text, lines, -rectangles, and blank. Each element function has a set of parameters -that control the appearance: - -\begin{itemize} -\item - \texttt{element\_text()} draws labels and headings. You can control - the font \texttt{family}, \texttt{face}, \texttt{colour}, - \texttt{size} (in points), \texttt{hjust}, \texttt{vjust}, - \texttt{angle} (in degrees) and \texttt{lineheight} (as ratio of - \texttt{fontcase}). More details on the parameters can be found in - \texttt{vignette("ggplot2-specs")}. Setting the font face is - particularly challenging. \index{Themes!labels} \indexf{element\_text} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base_t <-}\StringTok{ }\NormalTok{base +}\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{title =} \StringTok{"This is a ggplot"}\NormalTok{) +}\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\NormalTok{base_t +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{size =} \DecValTok{16}\NormalTok{))} -\NormalTok{base_t +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{face =} \StringTok{"bold"}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{))} -\NormalTok{base_t +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{hjust =} \DecValTok{1}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/element_text-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_text-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_text-3} - \end{figure} - - You can control the margins around the text with the \texttt{margin} - argument and \texttt{margin()} function. \texttt{margin()} has four - arguments: the amount of space (in points) to add to the top, right, - bottom and left sides of the text. Any elements not specified default - to 0. - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# The margins here look asymmetric because there are also plot margins} -\NormalTok{base_t +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{margin =} \KeywordTok{margin}\NormalTok{()))} -\NormalTok{base_t +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{margin =} \KeywordTok{margin}\NormalTok{(}\DataTypeTok{t =} \DecValTok{10}\NormalTok{, }\DataTypeTok{b =} \DecValTok{10}\NormalTok{)))} -\NormalTok{base_t +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.title.y =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{margin =} \KeywordTok{margin}\NormalTok{(}\DataTypeTok{r =} \DecValTok{10}\NormalTok{)))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/element_text-margin-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_text-margin-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_text-margin-3} - \end{figure} -\item - \texttt{element\_line()} draws lines parameterised by \texttt{colour}, - \texttt{size} and \texttt{linetype}: \indexf{element\_line} - \index{Themes!lines} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.grid.major =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"black"}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.grid.major =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.grid.major =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{linetype =} \StringTok{"dotted"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/element_line-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_line-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_line-3} - \end{figure} -\item - \texttt{element\_rect()} draws rectangles, mostly used for - backgrounds, parameterised by \texttt{fill} colour and border - \texttt{colour}, \texttt{size} and \texttt{linetype}.\\ - \index{Background} \index{Themes!background} \indexf{theme\_rect} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"grey80"}\NormalTok{, }\DataTypeTok{colour =} \OtherTok{NA}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"linen"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/element_rect-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_rect-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_rect-3} - \end{figure} -\item - \texttt{element\_blank()} draws nothing. Use this if you don't want - anything drawn, and no space allocated for that element. The following - example uses \texttt{element\_blank()} to progressively suppress the - appearance of elements we're not interested in. Notice how the plot - automatically reclaims the space previously used by these elements: if - you don't want this to happen (perhaps because they need to line up - with other plots on the page), use - \texttt{colour\ =\ NA,\ fill\ =\ NA} to create invisible elements that - still take up space. \indexf{element\_blank} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base} -\KeywordTok{last_plot}\NormalTok{() +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.grid.minor =} \KeywordTok{element_blank}\NormalTok{())} -\KeywordTok{last_plot}\NormalTok{() +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.grid.major =} \KeywordTok{element_blank}\NormalTok{())} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/element_blank-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_blank-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_blank-3} - \end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{last_plot}\NormalTok{() +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.background =} \KeywordTok{element_blank}\NormalTok{())} -\KeywordTok{last_plot}\NormalTok{() +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{axis.title.x =} \KeywordTok{element_blank}\NormalTok{(), } - \DataTypeTok{axis.title.y =} \KeywordTok{element_blank}\NormalTok{()} -\NormalTok{)} -\KeywordTok{last_plot}\NormalTok{() +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.line =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/element_blank-2-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_blank-2-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/element_blank-2-3} - \end{figure} -\item - A few other settings take grid units. Create them with - \texttt{unit(1,\ "cm")} or \texttt{unit(0.25,\ "in")}. -\end{itemize} - -To modify theme elements for all future plots, use -\texttt{theme\_update()}. It returns the previous theme settings, so you -can easily restore the original parameters once you're done. -\index{Themes!updating} \indexf{theme\_set} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{old_theme <-}\StringTok{ }\KeywordTok{theme_update}\NormalTok{(} - \DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"lightblue3"}\NormalTok{, }\DataTypeTok{colour =} \OtherTok{NA}\NormalTok{),} - \DataTypeTok{panel.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"lightblue"}\NormalTok{, }\DataTypeTok{colour =} \OtherTok{NA}\NormalTok{),} - \DataTypeTok{axis.text =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"linen"}\NormalTok{),} - \DataTypeTok{axis.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"linen"}\NormalTok{)} -\NormalTok{)} -\NormalTok{base} -\KeywordTok{theme_set}\NormalTok{(old_theme)} -\NormalTok{base} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.333\linewidth]{_figures/themes/theme-update-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/theme-update-2} -\end{figure} - -\section{Theme elements}\label{sec:theme-elements} - -There are around 40 unique elements that control the appearance of the -plot. They can be roughly grouped into five categories: plot, axis, -legend, panel and facet. The following sections describe each in turn. -\index{Themes!elements} - -\subsection{Plot elements}\label{plot-elements} - -\index{Themes!plot} - -Some elements affect the plot as a whole: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -Element & Setter & Description\tabularnewline -\midrule -\endhead -plot.background & \texttt{element\_rect()} & plot -background\tabularnewline -plot.title & \texttt{element\_text()} & plot title\tabularnewline -plot.margin & \texttt{margin()} & margins around plot\tabularnewline -\bottomrule -\end{longtable} - -\texttt{plot.background} draws a rectangle that underlies everything -else on the plot. By default, ggplot2 uses a white background which -ensures that the plot is usable wherever it might end up (e.g.~even if -you save as a png and put on a slide with a black background). When -exporting plots to use in other systems, you might want to make the -background transparent with \texttt{fill\ =\ NA}. Similarly, if you're -embedding a plot in a system that already has margins you might want to -eliminate the built-in margins. Note that a small margin is still -necessary if you want to draw a border around the plot. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{))} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{),} - \DataTypeTok{plot.margin =} \KeywordTok{margin}\NormalTok{(}\DecValTok{2}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{2}\NormalTok{)} -\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"lightblue"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/plot-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/plot-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/plot-3} -\end{figure} - -\subsection{Axis elements}\label{sub:theme-axis} - -\index{Themes!axis} \index{Axis!styling} - -The axis elements control the apperance of the axes: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -\begin{minipage}[b]{0.27\columnwidth}\raggedright\strut -Element -\strut\end{minipage} & -\begin{minipage}[b]{0.25\columnwidth}\raggedright\strut -Setter -\strut\end{minipage} & -\begin{minipage}[b]{0.35\columnwidth}\raggedright\strut -Description -\strut\end{minipage}\tabularnewline -\midrule -\endhead -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.line -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_line()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -line parallel to axis (hidden in default themes) -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.text -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -tick labels -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.text.x -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -x-axis tick labels -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.text.y -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -y-axis tick labels -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.title -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -axis titles -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.title.x -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -x-axis title -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.title.y -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -y-axis title -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.ticks -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{element\_line()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -axis tick marks -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -axis.ticks.length -\strut\end{minipage} & -\begin{minipage}[t]{0.25\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -length of tick marks -\strut\end{minipage}\tabularnewline -\bottomrule -\end{longtable} - -Note that \texttt{axis.text} (and \texttt{axis.title}) comes in three -forms: \texttt{axis.text}, \texttt{axis.text.x}, and -\texttt{axis.text.y}. Use the first form if you want to modify the -properties of both axes at once: any properties that you don't -explicitly set in \texttt{axis.text.x} and \texttt{axis.text.y} will be -inherited from \texttt{axis.text}. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{)} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} - -\CommentTok{# Accentuate the axes} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.line =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{))} -\CommentTok{# Style both x and y axis labels} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{color =} \StringTok{"blue"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{12}\NormalTok{))} -\CommentTok{# Useful for long labels} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text.x =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle =} \NormalTok{-}\DecValTok{90}\NormalTok{, }\DataTypeTok{vjust =} \FloatTok{0.5}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/axis-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/axis-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/axis-3} -\end{figure} - -The most common adjustment is to rotate the x-axis labels to avoid long -overlapping labels. If you do this, note negative angles tend to look -best and you should set \texttt{hjust\ =\ 0} and \texttt{vjust\ =\ 1}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\StringTok{"label"}\NormalTok{, }\StringTok{"a long label"}\NormalTok{, }\StringTok{"an even longer label"}\NormalTok{), } - \DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3} -\NormalTok{)} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\NormalTok{base} -\NormalTok{base +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text.x =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle =} \NormalTok{-}\DecValTok{30}\NormalTok{, }\DataTypeTok{vjust =} \DecValTok{1}\NormalTok{, }\DataTypeTok{hjust =} \DecValTok{0}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/themes/axis-labels-1}% - \includegraphics[width=0.5\linewidth]{_figures/themes/axis-labels-2} -\end{figure} - -\subsection{Legend elements}\label{legend-elements} - -\index{Themes!legend} \index{Legend!styling} - -The legend elements control the apperance of all legends. You can also -modify the appearance of individual legends by modifying the same -elements in \texttt{guide\_legend()} or \texttt{guide\_colourbar()}. - -\begin{longtable}[c]{@{}lll@{}} -\toprule -\begin{minipage}[b]{0.27\columnwidth}\raggedright\strut -Element -\strut\end{minipage} & -\begin{minipage}[b]{0.35\columnwidth}\raggedright\strut -Setter -\strut\end{minipage} & -\begin{minipage}[b]{0.58\columnwidth}\raggedright\strut -Description -\strut\end{minipage}\tabularnewline -\midrule -\endhead -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.background -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{element\_rect()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend background -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.key -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{element\_rect()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -background of legend keys -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.key.size -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend key size -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.key.height -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend key height -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.key.width -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend key width -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.margin -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend margin -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.text -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend labels -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.text.align -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -0--1 -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend label alignment (0 = right, 1 = left) -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.title -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend name -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -legend.title.align -\strut\end{minipage} & -\begin{minipage}[t]{0.35\columnwidth}\raggedright\strut -0--1 -\strut\end{minipage} & -\begin{minipage}[t]{0.58\columnwidth}\raggedright\strut -legend name alignment (0 = right, 1 = left) -\strut\end{minipage}\tabularnewline -\bottomrule -\end{longtable} - -These options are illustrated below: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{, }\DataTypeTok{z =} \KeywordTok{rep}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{), }\DataTypeTok{each =} \DecValTok{2}\NormalTok{))} -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{colour =} \NormalTok{z)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} - -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{legend.background =} \KeywordTok{element_rect}\NormalTok{(} - \DataTypeTok{fill =} \StringTok{"lemonchiffon"}\NormalTok{, } - \DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, } - \DataTypeTok{size =} \DecValTok{1} - \NormalTok{)} -\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{legend.key =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{color =} \StringTok{"grey50"}\NormalTok{),} - \DataTypeTok{legend.key.width =} \KeywordTok{unit}\NormalTok{(}\FloatTok{0.9}\NormalTok{, }\StringTok{"cm"}\NormalTok{),} - \DataTypeTok{legend.key.height =} \KeywordTok{unit}\NormalTok{(}\FloatTok{0.75}\NormalTok{, }\StringTok{"cm"}\NormalTok{)} -\NormalTok{)} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{legend.text =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{size =} \DecValTok{15}\NormalTok{),} - \DataTypeTok{legend.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{size =} \DecValTok{15}\NormalTok{, }\DataTypeTok{face =} \StringTok{"bold"}\NormalTok{)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/legend-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/legend-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/legend-3} -\end{figure} - -There are four other properties that control how legends are laid out in -the context of the plot (\texttt{legend.position}, -\texttt{legend.direction}, \texttt{legend.justification}, -\texttt{legend.box}). They are described in -\protect\hyperlink{sub:legend-layout}{legend layout}. - -\subsection{Panel elements}\label{panel-elements} - -\index{Themes!panel} \index{Aspect ratio} - -Panel elements control the appearance of the plotting panels: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -Element & Setter & Description\tabularnewline -\midrule -\endhead -panel.background & \texttt{element\_rect()} & panel background (under -data)\tabularnewline -panel.border & \texttt{element\_rect()} & panel border (over -data)\tabularnewline -panel.grid.major & \texttt{element\_line()} & major grid -lines\tabularnewline -panel.grid.major.x & \texttt{element\_line()} & vertical major grid -lines\tabularnewline -panel.grid.major.y & \texttt{element\_line()} & horizontal major grid -lines\tabularnewline -panel.grid.minor & \texttt{element\_line()} & minor grid -lines\tabularnewline -panel.grid.minor.x & \texttt{element\_line()} & vertical minor grid -lines\tabularnewline -panel.grid.minor.y & \texttt{element\_line()} & horizontal minor grid -lines\tabularnewline -aspect.ratio & numeric & plot aspect ratio\tabularnewline -\bottomrule -\end{longtable} - -The main difference between \texttt{panel.background} and -\texttt{panel.border} is that the background is drawn underneath the -data, and the border is drawn on top of it. For that reason, you'll -always need to assign \texttt{fill\ =\ NA} when overriding -\texttt{panel.border}. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\CommentTok{# Modify background} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"lightblue"}\NormalTok{))} - -\CommentTok{# Tweak major grid lines} -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{panel.grid.major =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{color =} \StringTok{"gray60"}\NormalTok{, }\DataTypeTok{size =} \FloatTok{0.8}\NormalTok{)} -\NormalTok{)} -\CommentTok{# Just in one direction } -\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{panel.grid.major.x =} \KeywordTok{element_line}\NormalTok{(}\DataTypeTok{color =} \StringTok{"gray60"}\NormalTok{, }\DataTypeTok{size =} \FloatTok{0.8}\NormalTok{)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/panel-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/panel-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/panel-3} -\end{figure} - -Note that aspect ratio controls the aspect ratio of the \emph{panel}, -not the overall plot: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base2 <-}\StringTok{ }\NormalTok{base +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{))} -\CommentTok{# Wide screen} -\NormalTok{base2 +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{aspect.ratio =} \DecValTok{9} \NormalTok{/}\StringTok{ }\DecValTok{16}\NormalTok{)} -\CommentTok{# Long and skiny} -\NormalTok{base2 +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{aspect.ratio =} \DecValTok{2} \NormalTok{/}\StringTok{ }\DecValTok{1}\NormalTok{)} -\CommentTok{# Square} -\NormalTok{base2 +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{aspect.ratio =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/aspect-ratio-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/aspect-ratio-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/aspect-ratio-3} -\end{figure} - -\subsection{Facetting elements}\label{facetting-elements} - -\index{Themes!facets} \index{Facetting!styling} - -The following theme elements are associated with faceted ggplots: - -\begin{longtable}[c]{@{}lll@{}} -\toprule -\begin{minipage}[b]{0.27\columnwidth}\raggedright\strut -Element -\strut\end{minipage} & -\begin{minipage}[b]{0.24\columnwidth}\raggedright\strut -Setter -\strut\end{minipage} & -\begin{minipage}[b]{0.47\columnwidth}\raggedright\strut -Description -\strut\end{minipage}\tabularnewline -\midrule -\endhead -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -strip.background -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{element\_rect()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -background of panel strips -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -strip.text -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -strip text -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -strip.text.x -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -horizontal strip text -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -strip.text.y -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{element\_text()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -vertical strip text -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -panel.margin -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -margin between facets -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -panel.margin.x -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -margin between facets (vertical) -\strut\end{minipage}\tabularnewline -\begin{minipage}[t]{0.27\columnwidth}\raggedright\strut -panel.margin.y -\strut\end{minipage} & -\begin{minipage}[t]{0.24\columnwidth}\raggedright\strut -\texttt{unit()} -\strut\end{minipage} & -\begin{minipage}[t]{0.47\columnwidth}\raggedright\strut -margin between facets (horizontal) -\strut\end{minipage}\tabularnewline -\bottomrule -\end{longtable} - -Element \texttt{strip.text.x} affects both \texttt{facet\_wrap()} or -\texttt{facet\_grid()}; \texttt{strip.text.y} only affects -\texttt{facet\_grid()}. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{, }\DataTypeTok{z =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"b"}\NormalTok{))} -\NormalTok{base_f <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~z)} - -\NormalTok{base_f} -\NormalTok{base_f +}\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{panel.margin =} \KeywordTok{unit}\NormalTok{(}\FloatTok{0.5}\NormalTok{, }\StringTok{"in"}\NormalTok{))} -\NormalTok{base_f +}\StringTok{ }\KeywordTok{theme}\NormalTok{(} - \DataTypeTok{strip.background =} \KeywordTok{element_rect}\NormalTok{(}\DataTypeTok{fill =} \StringTok{"grey20"}\NormalTok{, }\DataTypeTok{color =} \StringTok{"grey80"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{),} - \DataTypeTok{strip.text =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"white"}\NormalTok{)} -\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/themes/facetting-1}% - \includegraphics[width=0.333\linewidth]{_figures/themes/facetting-2}% - \includegraphics[width=0.333\linewidth]{_figures/themes/facetting-3} -\end{figure} - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Create the ugliest plot possible! (Contributed by Andrew D. Steen, - University of Tennessee - Knoxville) -\item - \texttt{theme\_dark()} makes the inside of the plot dark, but not the - outside. Change the plot background to black, and then update the text - settings so you can still read the labels. -\item - Make an elegant theme that uses ``linen'' as the background colour and - a serif font for the text. -\item - Systematically explore the effects of \texttt{hjust} when you have a - multiline title. Why doesn't \texttt{vjust} do anything? -\end{enumerate} - -\hypertarget{sec:saving}{\section{Saving your output}\label{sec:saving}} - -When saving a plot to use in another program, you have two basic choices -of output: raster or vector: \index{Exporting} \index{Saving output} - -\begin{itemize} -\item - Vector graphics describe a plot as sequence of operations: draw a line - from \((x_1, y_1)\) to \((x_2, y_2)\), draw a circle at \((x_3, x_4)\) - with radius \(r\). This means that they are effectively `infinitely' - zoomable; there is no loss of detail. The most useful vector graphic - formats are pdf and svg. -\item - Raster graphics are stored as an array of pixel colours and have a - fixed optimal viewing size. The most useful raster graphic format is - png. -\end{itemize} - -Figure \ref{fig:vector-raster} illustrates the basic differences in -these formats for a circle. A good description is available at -\url{http://tinyurl.com/rstrvctr}. - -\begin{figure}[htbp] - \centering - \includegraphics[width= 0.5\linewidth]{diagrams/vector-raster} - \caption{The schematic difference between raster (left) and vector (right) graphics. } - \label{fig:vector-raster} -\end{figure} - -Unless there is a compelling reason not to, use vector graphics: they -look better in more places. There are two main reasons to use raster -graphics: - -\begin{itemize} -\item - You have a plot (e.g.~a scatterplot) with thousands of graphical - objects (i.e.~points). A vector version will be large and slow to - render. -\item - You want to embed the graphic in MS Office. MS has poor support for - vector graphics (except for their own DrawingXML format which is not - currently easy to make from R), so raster graphics are easier. -\end{itemize} - -There are two ways to save output from ggplot2. You can use the standard -R approach where you open a graphics device, generate the plot, then -close the device: \indexf{pdf} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{pdf}\NormalTok{(}\StringTok{"output.pdf"}\NormalTok{, }\DataTypeTok{width =} \DecValTok{6}\NormalTok{, }\DataTypeTok{height =} \DecValTok{6}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, cty)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\KeywordTok{dev.off}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -This works for all packages, but is verbose. ggplot2 provides a -convenient shorthand with \texttt{ggsave()}: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, cty)) +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\KeywordTok{ggsave}\NormalTok{(}\StringTok{"output.pdf"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\texttt{ggsave()} is optimised for interactive use: you can use it after -you've drawn a plot. It has the following important arguments: -\indexf{ggsave} - -\begin{itemize} -\item - The first argument, \texttt{path}, specifies the path where the image - should be saved. The file extension will be used to automatically - select the correct graphics device. \texttt{ggsave()} can produce - \texttt{.eps}, \texttt{.pdf}, \texttt{.svg}, \texttt{.wmf}, - \texttt{.png}, \texttt{.jpg}, \texttt{.bmp}, and \texttt{.tiff}. -\item - \texttt{width} and \texttt{height} control the output size, specified - in inches. If left blank, they'll use the size of the on-screen - graphics device. -\item - For raster graphics (i.e. \texttt{.png}, \texttt{.jpg}), the - \texttt{dpi} argument controls the resolution of the plot. It defaults - to 300, which is appropriate for most printers, but you may want to - use 600 for particularly high-resolution output, or 96 for on-screen - (e.g., web) display. -\end{itemize} - -See \texttt{?ggsave} for more details. - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-brewer:1994}{} -Brewer, Cynthia A. 1994. ``Color Use Guidelines for Mapping and -Visualization.'' In \emph{Visualization in Modern Cartography}, edited -by A.M. MacEachren and D.R.F. Taylor, 123--47. Elsevier Science. - -\hypertarget{ref-carr:1994}{} -Carr, Dan. 1994. ``Using Gray in Plots.'' \emph{ASA Statistical -Computing and Graphics Newsletter} 2 (5): 11--14. -\url{http://www.galaxy.gmu.edu/~dcarr/lib/v5n2.pdf}. - -\hypertarget{ref-carr:2002}{} ----------. 2002. ``Graphical Displays.'' In \emph{Encyclopedia of -Environmetrics}, edited by Abdel H. El-Shaarawi and Walter W. Piegorsch, -2:933--60. John Wiley \& Sons. -\url{http://www.galaxy.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf}. - -\hypertarget{ref-carr:1999}{} -Carr, Dan, and Ru Sun. 1999. ``Using Layering and Perceptual Grouping in -Statistical Graphics.'' \emph{ASA Statistical Computing and Graphics -Newsletter} 10 (1): 25--31. - -\hypertarget{ref-cleveland:1993a}{} -Cleveland, William. 1993. ``A Model for Studying Display Methods of -Statistical Graphics.'' \emph{Journal of Computational and Graphical -Statistics} 2: 323--64. \url{http://stat.bell-labs.com/doc/93.4.ps}. - -\hypertarget{ref-tufte:2006}{} -Tufte, Edward R. 2006. \emph{Beautiful Evidence}. Graphics Press. diff --git a/book/tex/tidy-data.tex b/book/tex/tidy-data.tex deleted file mode 100644 index b991ceee..00000000 --- a/book/tex/tidy-data.tex +++ /dev/null @@ -1,698 +0,0 @@ -\chapter{Data analysis}\label{cha:data} - -\section{Introduction}\label{introduction} - -So far, every example in this book has started with a nice dataset -that's easy to plot. That's great for learning (because you don't want -to struggle with data handling while you're learning visualisation), but -in real life, datasets hardly ever come in exactly the right structure. -To use ggplot2 in practice, you'll need to learn some data wrangling -skills. Indeed, in my experience, visualisation is often the easiest -part of the data analysis process: once you have the right data, in the -right format, aggregated in the right way, the right visualisation is -often obvious. - -The goal of this part of the book is to show you how to integrate -ggplot2 with other tools needed for a complete data analysis: - -\begin{itemize} -\item - In this chapter, you'll learn the principles of tidy data (Wickham - 2014), which help you organise your data in a way that makes it easy - to visualise with ggplot2, manipulate with dplyr and model with the - many modelling packages. The principles of tidy data are supported by - the \textbf{tidyr} package, which helps you tidy messy datasets. -\item - Most visualisations require some data transformation whether it's - creating a new variable from existing variables, or performing simple - aggregations so you can see the forest for the trees. - \protect\hyperlink{cha:dplyr}{dplyr} will show you how to do this with - the \textbf{dplyr} package. -\item - If you're using R, you're almost certainly using it for its fantastic - modelling capabilities. While there's an R package for almost every - type of model that you can think of, the results of these models can - be hard to visualise. In \protect\hyperlink{cha:modelling}{modelling}, - you'll learn about the \textbf{broom} package, by David Robinson, to - convert models into tidy datasets so you can easily visualise them - with ggplot2. -\end{itemize} - -Tidy data is the foundation for data manipulation and visualising -models. In the following sections, you'll learn the definition of tidy -data, and the tools you need to make messy data tidy. The chapter -concludes with two case studies that show how to apply the tools in -sequence to work with real(istic) data. - -\section{Tidy data}\label{sec:tidy-data} - -The principle behind tidy data is simple: storing your data in a -consistent way makes it easier to work with it. Tidy data is a mapping -between the statistical structure of a data frame (variables and -observations) and the physical structure (columns and rows). Tidy data -follows two main principles: \index{Tidy data} -\index{Data!best form for ggplot2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\tightlist -\item - Variables go in columns. -\item - Observations go in rows. -\end{enumerate} - -Tidy data is particularly important for ggplot2 because the job of -ggplot2 is to map variables to visual properties: if your data isn't -tidy, you'll have a hard time visualising it. - -Sometimes you'll find a dataset that you have no idea how to plot. -That's normally because it's not tidy. For example, take this data frame -that contains monthly employment data for the United States: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{ec2} -\CommentTok{#> Source: local data frame [12 x 11]} -\CommentTok{#> } -\CommentTok{#> month 2006 2007 2008 2009 2010 2011 2012 2013 2014} -\CommentTok{#> (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 1 8.6 8.3 9.0 10.7 20.0 21.6 21.0 16.2 15.9} -\CommentTok{#> 2 2 9.1 8.5 8.7 11.7 19.9 21.1 19.8 17.5 16.2} -\CommentTok{#> 3 3 8.7 9.1 8.7 12.3 20.4 21.5 19.2 17.7 15.9} -\CommentTok{#> 4 4 8.4 8.6 9.4 13.1 22.1 20.9 19.1 17.1 15.6} -\CommentTok{#> 5 5 8.5 8.2 7.9 14.2 22.3 21.6 19.9 17.0 14.5} -\CommentTok{#> 6 6 7.3 7.7 9.0 17.2 25.2 22.3 20.1 16.6 13.2} -\CommentTok{#> .. ... ... ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: 2015 (dbl)} -\end{Highlighting} -\end{Shaded} - -(If it looks familiar, it's because it's derived from the -\texttt{economics} dataset that we used earlier in the book.) - -Imagine you want to plot a time series showing how unemployment has -changed over the last 10 years. Can you picture the ggplot2 command -you'd need to do it? What if you wanted to focus on the seasonal -component of unemployment by putting months on the x-axis and drawing -one line for each year? It's difficult to see how to create those plots -because the data is not tidy. There are three variables, month, year and -unemployment rate, but each variable is stored in a different way: - -\begin{itemize} -\tightlist -\item - \texttt{month} is stored in a column. -\item - \texttt{year} is spread across the column names. -\item - \texttt{rate} is the value of each cell. -\end{itemize} - -To make it possible to plot this data we first need to tidy it. There -are two important pairs of tools: - -\begin{itemize} -\tightlist -\item - Spread \& gather. -\item - Separate \& unite. -\end{itemize} - -\section{Spread and gather}\label{sec:spread-gather} - -Take a look at the two tables below: - -\begin{longtable}[c]{@{}llr@{}} -\toprule -x & y & z\tabularnewline -\midrule -\endhead -a & A & 1\tabularnewline -b & D & 5\tabularnewline -c & A & 4\tabularnewline -c & B & 10\tabularnewline -d & C & 9\tabularnewline -\bottomrule -\end{longtable} - -\begin{longtable}[c]{@{}lrrrr@{}} -\toprule -x & A & B & C & D\tabularnewline -\midrule -\endhead -a & 1 & NA & NA & NA\tabularnewline -b & NA & NA & NA & 5\tabularnewline -c & 4 & 10 & NA & NA\tabularnewline -d & NA & NA & 9 & NA\tabularnewline -\bottomrule -\end{longtable} - -If you study them for a little while, you'll notice that they contain -the same data in different forms. I call the first form \textbf{indexed} -data, because you look up a value using an index (the values of the -\texttt{x} and \texttt{y} variables). I call the second form -\textbf{Cartesian} data, because you find a value by looking at -intersection of a row and a column. We can't tell if these datasets are -tidy or not. Either form could be tidy depending on what the values -``A'', ``B'', ``C'', ``D'' mean. - -(Also note the missing values: missing values that are explicit in one -form may be implicit in the other. An \texttt{NA} is the presence of an -absense; but sometimes a missing value is the absense of a presence.) - -Tidying your data will often require translating Cartesian → indexed -forms, called \textbf{gathering}, and less commonly, indexed → -Cartesian, called \textbf{spreading}. The tidyr package provides the -\texttt{spread()} and \texttt{gather()} functions to perform these -operations, as described below. - -(You can imagine generalising these ideas to higher dimensions. However, -data is almost always stored in 2d (rows \& columns), so these -generalisations are fun to think about, but not that practical. I -explore the idea more in Wickham (2007). - -\subsection{Gather}\label{gather} - -\texttt{gather()} has four main arguments: \indexf{gather} - -\begin{itemize} -\item - \texttt{data}: the dataset to translate. -\item - \texttt{key} \& \texttt{value}: the key is the name of the variable - that will be created from the column names, and the value is the name - of the variable that will be created from the cell values. -\item - \texttt{...}: which variables to gather. You can specify individually, - \texttt{A,\ B,\ C,\ D}, or as a range \texttt{A:D}. Alternatively, you - can specify which columns are \emph{not} to be gathered with - \texttt{-}: \texttt{-E,\ -F}. -\end{itemize} - -To tidy the economics dataset shown above, you first need to identify -the variables: \texttt{year}, \texttt{month} and \texttt{rate}. -\texttt{month} is already in a column, but \texttt{year} and -\texttt{rate} are in Cartesian form, and we want them in indexed form, -so we need to use \texttt{gather()}. In this example, the key is -\texttt{year}, the value is \texttt{unemp} and we want to select columns -from \texttt{2006} to \texttt{2015}: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{gather}\NormalTok{(ec2, }\DataTypeTok{key =} \NormalTok{year, }\DataTypeTok{value =} \NormalTok{unemp, }\StringTok{`}\DataTypeTok{2006}\StringTok{`}\NormalTok{:}\StringTok{`}\DataTypeTok{2015}\StringTok{`}\NormalTok{)} -\CommentTok{#> Source: local data frame [120 x 3]} -\CommentTok{#> } -\CommentTok{#> month year unemp} -\CommentTok{#> (dbl) (chr) (dbl)} -\CommentTok{#> 1 1 2006 8.6} -\CommentTok{#> 2 2 2006 9.1} -\CommentTok{#> 3 3 2006 8.7} -\CommentTok{#> 4 4 2006 8.4} -\CommentTok{#> 5 5 2006 8.5} -\CommentTok{#> 6 6 2006 7.3} -\CommentTok{#> .. ... ... ...} -\end{Highlighting} -\end{Shaded} - -Note that the columns have names that are not standard variable names in -R (they don't start with a letter). This means that we need to surround -them in backticks, i.e. \texttt{`2006`} to refer to them. - -Alternatively, we could gather all columns except \texttt{month}: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{gather}\NormalTok{(ec2, }\DataTypeTok{key =} \NormalTok{year, }\DataTypeTok{value =} \NormalTok{unemp, -month)} -\CommentTok{#> Source: local data frame [120 x 3]} -\CommentTok{#> } -\CommentTok{#> month year unemp} -\CommentTok{#> (dbl) (chr) (dbl)} -\CommentTok{#> 1 1 2006 8.6} -\CommentTok{#> 2 2 2006 9.1} -\CommentTok{#> 3 3 2006 8.7} -\CommentTok{#> 4 4 2006 8.4} -\CommentTok{#> 5 5 2006 8.5} -\CommentTok{#> 6 6 2006 7.3} -\CommentTok{#> .. ... ... ...} -\end{Highlighting} -\end{Shaded} - -To be most useful, we can provide two extra arguments: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{economics_2 <-}\StringTok{ }\KeywordTok{gather}\NormalTok{(ec2, year, rate, }\StringTok{`}\DataTypeTok{2006}\StringTok{`}\NormalTok{:}\StringTok{`}\DataTypeTok{2015}\StringTok{`}\NormalTok{, } - \DataTypeTok{convert =} \OtherTok{TRUE}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)} -\NormalTok{economics_2} -\CommentTok{#> Source: local data frame [112 x 3]} -\CommentTok{#> } -\CommentTok{#> month year rate} -\CommentTok{#> (dbl) (int) (dbl)} -\CommentTok{#> 1 1 2006 8.6} -\CommentTok{#> 2 2 2006 9.1} -\CommentTok{#> 3 3 2006 8.7} -\CommentTok{#> 4 4 2006 8.4} -\CommentTok{#> 5 5 2006 8.5} -\CommentTok{#> 6 6 2006 7.3} -\CommentTok{#> .. ... ... ...} -\end{Highlighting} -\end{Shaded} - -We use \texttt{convert\ =\ TRUE} to automatically convert the years from -character strings to numbers, and \texttt{na.rm\ =\ TRUE} to remove the -months with no data. (In some sense the data isn't actually missing -because it represents dates that haven't occurred yet.) - -When the data is in this form, it's easy to visualise in many different -ways. For example, we can choose to emphasise either long term trend or -seasonal variations: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(economics_2, }\KeywordTok{aes}\NormalTok{(year +}\StringTok{ }\NormalTok{(month -}\StringTok{ }\DecValTok{1}\NormalTok{) /}\StringTok{ }\DecValTok{12}\NormalTok{, rate)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(economics_2, }\KeywordTok{aes}\NormalTok{(month, rate, }\DataTypeTok{group =} \NormalTok{year)) +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{year), }\DataTypeTok{size =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/tidy_data/ec2-plots-1}% - \includegraphics[width=0.5\linewidth]{_figures/tidy_data/ec2-plots-2} -\end{figure} - -\subsection{Spread}\label{spread} - -\texttt{spread()} is the opposite of \texttt{gather()}. You use it when -you have a pair of columns that are in indexed form, instead of -Cartesian form. For example, the following example dataset contains -three variables (\texttt{day}, \texttt{rain} and \texttt{temp}), but -\texttt{rain} and \texttt{temp} are stored in indexed form. -\indexf{spread} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{weather <-}\StringTok{ }\NormalTok{dplyr::}\KeywordTok{data_frame}\NormalTok{(} - \DataTypeTok{day =} \KeywordTok{rep}\NormalTok{(}\DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DecValTok{2}\NormalTok{),} - \DataTypeTok{obs =} \KeywordTok{rep}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"temp"}\NormalTok{, }\StringTok{"rain"}\NormalTok{), }\DataTypeTok{each =} \DecValTok{3}\NormalTok{),} - \DataTypeTok{val =} \KeywordTok{c}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\DecValTok{23}\NormalTok{, }\DecValTok{22}\NormalTok{, }\DecValTok{20}\NormalTok{), }\KeywordTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{, }\DecValTok{5}\NormalTok{))} -\NormalTok{)} -\NormalTok{weather} -\CommentTok{#> Source: local data frame [6 x 3]} -\CommentTok{#> } -\CommentTok{#> day obs val} -\CommentTok{#> (int) (chr) (dbl)} -\CommentTok{#> 1 1 temp 23} -\CommentTok{#> 2 2 temp 22} -\CommentTok{#> 3 3 temp 20} -\CommentTok{#> 4 1 rain 0} -\CommentTok{#> 5 2 rain 0} -\CommentTok{#> 6 3 rain 5} -\end{Highlighting} -\end{Shaded} - -Spread allows us to turn this messy indexed form into a tidy Cartesian -form. It shares many of the arguments with \texttt{gather()}. You'll -need to supply the \texttt{data} to translate, as well as the name of -the \texttt{key} column which gives the variable names, and the -\texttt{value} column which contains the cell values. Here the key is -\texttt{obs} and the value is \texttt{val}: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{spread}\NormalTok{(weather, }\DataTypeTok{key =} \NormalTok{obs, }\DataTypeTok{value =} \NormalTok{val)} -\CommentTok{#> Source: local data frame [3 x 3]} -\CommentTok{#> } -\CommentTok{#> day rain temp} -\CommentTok{#> (int) (dbl) (dbl)} -\CommentTok{#> 1 1 0 23} -\CommentTok{#> 2 2 0 22} -\CommentTok{#> 3 3 5 20} -\end{Highlighting} -\end{Shaded} - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - How can you translate each of the initial example datasets into the - other form? -\item - How can you convert back and forth between the \texttt{economics} and - \texttt{economics\_long} datasets built into ggplot2? -\item - Install the EDAWR package from \url{https://github.com/rstudio/EDAWR}. - Tidy the \texttt{storms}, \texttt{population} and \texttt{tb} - datasets. -\end{enumerate} - -\section{Separate and unite}\label{sec:separate-unite} - -Spread and gather help when the variables are in the wrong place in the -dataset. Separate and unite help when multiple variables are crammed -into one column, or spread across multiple columns. \indexf{separate} -\indexf{unite} - -For example, the following dataset stores some information about the -response to a medical treatment. There are three variables (time, -treatment and value), but time and treatment are jammed in one variable -together: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{trt <-}\StringTok{ }\NormalTok{dplyr::}\KeywordTok{data_frame}\NormalTok{(} - \DataTypeTok{var =} \KeywordTok{paste0}\NormalTok{(}\KeywordTok{rep}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"beg"}\NormalTok{, }\StringTok{"end"}\NormalTok{), }\DataTypeTok{each =} \DecValTok{3}\NormalTok{), }\StringTok{"_"}\NormalTok{, }\KeywordTok{rep}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{))),} - \DataTypeTok{val =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{4}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{10}\NormalTok{, }\DecValTok{5}\NormalTok{, }\DecValTok{11}\NormalTok{)} -\NormalTok{)} -\NormalTok{trt} -\CommentTok{#> Source: local data frame [6 x 2]} -\CommentTok{#> } -\CommentTok{#> var val} -\CommentTok{#> (chr) (dbl)} -\CommentTok{#> 1 beg_a 1} -\CommentTok{#> 2 beg_b 4} -\CommentTok{#> 3 beg_c 2} -\CommentTok{#> 4 end_a 10} -\CommentTok{#> 5 end_b 5} -\CommentTok{#> 6 end_c 11} -\end{Highlighting} -\end{Shaded} - -The \texttt{separate()} function makes it easy to tease apart multiple -variables stored in one column. It takes four arguments: - -\begin{itemize} -\item - \texttt{data}: the data frame to modify. -\item - \texttt{col}: the name of the variable to split into pieces. -\item - \texttt{into}: a character vector giving the names of the new - variables. -\item - \texttt{sep}: a description of how to split the variable apart. This - can either be a regular expression, e.g. \texttt{\_} to split by - underscores, or \texttt{{[}\^{}a-z{]}} to split by any non-letter, or - an integer giving a position. -\end{itemize} - -In this case, we want to split by the \texttt{\_} character: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{separate}\NormalTok{(trt, var, }\KeywordTok{c}\NormalTok{(}\StringTok{"time"}\NormalTok{, }\StringTok{"treatment"}\NormalTok{), }\StringTok{"_"}\NormalTok{)} -\CommentTok{#> Source: local data frame [6 x 3]} -\CommentTok{#> } -\CommentTok{#> time treatment val} -\CommentTok{#> (chr) (chr) (dbl)} -\CommentTok{#> 1 beg a 1} -\CommentTok{#> 2 beg b 4} -\CommentTok{#> 3 beg c 2} -\CommentTok{#> 4 end a 10} -\CommentTok{#> 5 end b 5} -\CommentTok{#> 6 end c 11} -\end{Highlighting} -\end{Shaded} - -(If the variables are combined in a more complex form, have a look at -\texttt{extract()}. Alternatively, you might need to create columns -individually yourself using other calculations. A useful tool for this -is \texttt{mutate()} which you'll learn about in the next chapter.) - -\texttt{unite()} is the inverse of \texttt{separate()} - it joins -together multiple columns into one column. This is less common, but it's -useful to know about as the inverse of \texttt{separate()}. - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Install the EDAWR package from \url{https://github.com/rstudio/EDAWR}. - Tidy the \texttt{who} dataset. -\item - Work through the demos included in the tidyr package - (\texttt{demo(package\ =\ "tidyr")}) -\end{enumerate} - -\section{Case studies}\label{sec:tidy-case-study} - -For most real datasets, you'll need to use more than one tidying verb. -There many be multiple ways to get there, but as long as each step makes -the data tidier, you'll eventually get to the tidy dataset. That said, -you typically apply the functions in the same order: \texttt{gather()}, -\texttt{separate()} and \texttt{spread()} (although you might not use -all three). - -\subsection{Blood pressure}\label{blood-pressure} - -The first step when tidying a new dataset is always to identify the -variables. Take the following simulated medical data. There are seven -variables in this dataset: name, age, start date, week, systolic \& -diastolic blood pressure. Can you see how they're stored? - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Adapted from example by Barry Rowlingson, } -\CommentTok{# http://barryrowlingson.github.io/hadleyverse/} -\NormalTok{bpd <-}\StringTok{ }\NormalTok{readr::}\KeywordTok{read_table}\NormalTok{(} -\StringTok{"name age start week1 week2 week3} -\StringTok{Anne 35 2014-03-27 100/80 100/75 120/90} -\StringTok{ Ben 41 2014-03-09 110/65 100/65 135/70} -\StringTok{Carl 33 2014-04-02 125/80 } -\StringTok{"}\NormalTok{, }\DataTypeTok{na =} \StringTok{""}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -The first step is to convert from Cartesian to indexed form: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bpd_1 <-}\StringTok{ }\KeywordTok{gather}\NormalTok{(bpd, week, bp, week1:week3)} -\NormalTok{bpd_1} -\CommentTok{#> Source: local data frame [9 x 5]} -\CommentTok{#> } -\CommentTok{#> name age start week bp} -\CommentTok{#> (chr) (int) (date) (chr) (chr)} -\CommentTok{#> 1 Anne 35 2014-03-27 week1 100/80} -\CommentTok{#> 2 Ben 41 2014-03-09 week1 110/65} -\CommentTok{#> 3 Carl 33 2014-04-02 week1 125/80} -\CommentTok{#> 4 Anne 35 2014-03-27 week2 100/75} -\CommentTok{#> 5 Ben 41 2014-03-09 week2 100/65} -\CommentTok{#> 6 Carl 33 2014-04-02 week2 NA} -\CommentTok{#> .. ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -This is tidier, but we have two variables combined together in the -\texttt{bp} variable. This is a common way of writing down the blood -pressure, but analysis is easier if we break it into two variables. -That's the job of separate: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bpd_2 <-}\StringTok{ }\KeywordTok{separate}\NormalTok{(bpd_1, bp, }\KeywordTok{c}\NormalTok{(}\StringTok{"sys"}\NormalTok{, }\StringTok{"dia"}\NormalTok{), }\StringTok{"/"}\NormalTok{)} -\NormalTok{bpd_2} -\CommentTok{#> Source: local data frame [9 x 6]} -\CommentTok{#> } -\CommentTok{#> name age start week sys dia} -\CommentTok{#> (chr) (int) (date) (chr) (chr) (chr)} -\CommentTok{#> 1 Anne 35 2014-03-27 week1 100 80} -\CommentTok{#> 2 Ben 41 2014-03-09 week1 110 65} -\CommentTok{#> 3 Carl 33 2014-04-02 week1 125 80} -\CommentTok{#> 4 Anne 35 2014-03-27 week2 100 75} -\CommentTok{#> 5 Ben 41 2014-03-09 week2 100 65} -\CommentTok{#> 6 Carl 33 2014-04-02 week2 NA NA} -\CommentTok{#> .. ... ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -This dataset is now tidy, but we could do a little more to make it -easier to use. The following code uses \texttt{extract()} to pull the -week number out into its own variable (using regular expressions is -beyond the scope of the book, but -\texttt{\textbackslash{}\textbackslash{}d} stands for any digit). I also -use \texttt{arrange()} (which you'll learn about in the next chapter) to -order the rows to keep the records for each person together. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{bpd_3 <-}\StringTok{ }\KeywordTok{extract}\NormalTok{(bpd_2, week, }\StringTok{"week"}\NormalTok{, }\StringTok{"(}\CharTok{\textbackslash{}\textbackslash{}}\StringTok{d)"}\NormalTok{, }\DataTypeTok{convert =} \OtherTok{TRUE}\NormalTok{)} -\NormalTok{bpd_4 <-}\StringTok{ }\NormalTok{dplyr::}\KeywordTok{arrange}\NormalTok{(bpd_3, name, week)} -\NormalTok{bpd_4} -\CommentTok{#> Source: local data frame [9 x 6]} -\CommentTok{#> } -\CommentTok{#> name age start week sys dia} -\CommentTok{#> (chr) (int) (date) (int) (chr) (chr)} -\CommentTok{#> 1 Anne 35 2014-03-27 1 100 80} -\CommentTok{#> 2 Anne 35 2014-03-27 2 100 75} -\CommentTok{#> 3 Anne 35 2014-03-27 3 120 90} -\CommentTok{#> 4 Ben 41 2014-03-09 1 110 65} -\CommentTok{#> 5 Ben 41 2014-03-09 2 100 65} -\CommentTok{#> 6 Ben 41 2014-03-09 3 135 70} -\CommentTok{#> .. ... ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -You might notice that there's some repetition in this dataset: if you -know the name, then you also know the age and start date. This reflects -a third condition of tidyness that I don't discuss here: each data frame -should contain one and only one data set. Here there are really two -datasets: information about each person that doesn't change over time, -and their weekly blood pressure measurements. You can learn more about -this sort of messiness in the resources mentioned at the end of the -chapter. - -\subsection{Test scores}\label{test-scores} - -Imagine you're interested in the effect of an intervention on test -scores. You've collected the following data. What are the variables? - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Adapted from http://stackoverflow.com/questions/29775461} -\NormalTok{scores <-}\StringTok{ }\NormalTok{dplyr::}\KeywordTok{data_frame}\NormalTok{(} - \DataTypeTok{person =} \KeywordTok{rep}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"Greg"}\NormalTok{, }\StringTok{"Sally"}\NormalTok{, }\StringTok{"Sue"}\NormalTok{), }\DataTypeTok{each =} \DecValTok{2}\NormalTok{),} - \DataTypeTok{time =} \KeywordTok{rep}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"pre"}\NormalTok{, }\StringTok{"post"}\NormalTok{), }\DecValTok{3}\NormalTok{),} - \DataTypeTok{test1 =} \KeywordTok{round}\NormalTok{(}\KeywordTok{rnorm}\NormalTok{(}\DecValTok{6}\NormalTok{, }\DataTypeTok{mean =} \DecValTok{80}\NormalTok{, }\DataTypeTok{sd =} \DecValTok{4}\NormalTok{), }\DecValTok{0}\NormalTok{),} - \DataTypeTok{test2 =} \KeywordTok{round}\NormalTok{(}\KeywordTok{jitter}\NormalTok{(test1, }\DecValTok{15}\NormalTok{), }\DecValTok{0}\NormalTok{)} -\NormalTok{)} -\NormalTok{scores} -\CommentTok{#> Source: local data frame [6 x 4]} -\CommentTok{#> } -\CommentTok{#> person time test1 test2} -\CommentTok{#> (chr) (chr) (dbl) (dbl)} -\CommentTok{#> 1 Greg pre 84 83} -\CommentTok{#> 2 Greg post 76 75} -\CommentTok{#> 3 Sally pre 80 78} -\CommentTok{#> 4 Sally post 78 77} -\CommentTok{#> 5 Sue pre 83 80} -\CommentTok{#> 6 Sue post 76 75} -\end{Highlighting} -\end{Shaded} - -I think the variables are person, test, pre-test score and post-test -score. As usual, we start by converting columns in Cartesian form -(\texttt{test1} and \texttt{test2}) to indexed form (\texttt{test} and -\texttt{score}): - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{scores_1 <-}\StringTok{ }\KeywordTok{gather}\NormalTok{(scores, test, score, test1:test2)} -\NormalTok{scores_1} -\CommentTok{#> Source: local data frame [12 x 4]} -\CommentTok{#> } -\CommentTok{#> person time test score} -\CommentTok{#> (chr) (chr) (chr) (dbl)} -\CommentTok{#> 1 Greg pre test1 84} -\CommentTok{#> 2 Greg post test1 76} -\CommentTok{#> 3 Sally pre test1 80} -\CommentTok{#> 4 Sally post test1 78} -\CommentTok{#> 5 Sue pre test1 83} -\CommentTok{#> 6 Sue post test1 76} -\CommentTok{#> .. ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -Now we need to do the opposite: \texttt{pre} and \texttt{post} should be -variables, not values, so we need to spread \texttt{time} and -\texttt{score}: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{scores_2 <-}\StringTok{ }\KeywordTok{spread}\NormalTok{(scores_1, time, score)} -\NormalTok{scores_2} -\CommentTok{#> Source: local data frame [6 x 4]} -\CommentTok{#> } -\CommentTok{#> person test post pre} -\CommentTok{#> (chr) (chr) (dbl) (dbl)} -\CommentTok{#> 1 Greg test1 76 84} -\CommentTok{#> 2 Greg test2 75 83} -\CommentTok{#> 3 Sally test1 78 80} -\CommentTok{#> 4 Sally test2 77 78} -\CommentTok{#> 5 Sue test1 76 83} -\CommentTok{#> 6 Sue test2 75 80} -\end{Highlighting} -\end{Shaded} - -A good indication that we have made a tidy dataset is that it's now easy -to calculate the statistic of interest: the difference between pre- and -post-intervention scores: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{scores_3 <-}\StringTok{ }\KeywordTok{mutate}\NormalTok{(scores_2, }\DataTypeTok{diff =} \NormalTok{post -}\StringTok{ }\NormalTok{pre)} -\NormalTok{scores_3} -\CommentTok{#> Source: local data frame [6 x 5]} -\CommentTok{#> } -\CommentTok{#> person test post pre diff} -\CommentTok{#> (chr) (chr) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 Greg test1 76 84 -8} -\CommentTok{#> 2 Greg test2 75 83 -8} -\CommentTok{#> 3 Sally test1 78 80 -2} -\CommentTok{#> 4 Sally test2 77 78 -1} -\CommentTok{#> 5 Sue test1 76 83 -7} -\CommentTok{#> 6 Sue test2 75 80 -5} -\end{Highlighting} -\end{Shaded} - -And it's similarly easy to plot: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(scores_3, }\KeywordTok{aes}\NormalTok{(person, diff, }\DataTypeTok{color =} \NormalTok{test)) +} -\StringTok{ }\KeywordTok{geom_hline}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{yintercept =} \DecValTok{0}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_path}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{person), }\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, } - \DataTypeTok{arrow =} \KeywordTok{arrow}\NormalTok{(}\DataTypeTok{length =} \KeywordTok{unit}\NormalTok{(}\FloatTok{0.25}\NormalTok{, }\StringTok{"cm"}\NormalTok{)))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/tidy_data/scores4-1} -\end{figure} - -(Again, you'll learn about \texttt{mutate()} in the next chapter.) - -\section{Learning more}\label{learning-more} - -Data tidying is a big topic and this chapter only scratches the surface. -I recommend the following references which go into considerably more -depth on this topic: - -\begin{itemize} -\item - The tidyr documentation. I've described the most important arguments, - but most functions have other arguments that help deal with less - common situations. If you're struggling, make sure to read the - documentation to see if there's an argument that might help you. -\item - ``\href{http://www.jstatsoft.org/v59/i10/}{Tidy data}'', an article in - the \emph{Journal of Statistical Software}. It describes the ideas of - tidy data in more depth and shows other types of messy data. - Unfortunately the paper was written before tidyr existed, so to see - how to use tidyr instead of reshape2, consult the - \href{http://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html}{tidyr - vignette}. -\item - The \href{http://rstudio.com/cheatsheets}{data wrangling cheatsheet} - by RStudio, includes the most common tidyr verbs in a form designed to - jog your memory when you're stuck. -\end{itemize} - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-wickham:2007b}{} -Wickham, Hadley. 2007. ``Reshaping Data with the Reshape Package.'' -\emph{Journal of Statistical Software} 21 (12). -\url{http://www.jstatsoft.org/v21/i12/paper}. - -\hypertarget{ref-tidy-data}{} ----------. 2014. ``Tidy Data.'' \emph{The Journal of Statistical -Software} 59. \url{http://www.jstatsoft.org/v59/i10/}. diff --git a/book/tex/toolbox.tex b/book/tex/toolbox.tex deleted file mode 100644 index 2ec3896d..00000000 --- a/book/tex/toolbox.tex +++ /dev/null @@ -1,2008 +0,0 @@ -\chapter{Toolbox}\label{cha:toolbox} - -\section{Introduction}\label{introduction} - -The layered structure of ggplot2 encourages you to design and construct -graphics in a structured manner. You've learned the basics in the -previous chapter, and in this chapter you'll get a more comprehensive -task-based introduction. The goal here is not to exhaustively explore -every option of every geom, but instead to show the most important tools -for a given task. For more information about individual geoms, along -with many more examples illustrating their use, see the documentation. - -It is useful to think about the purpose of each layer before it is -added. In general, there are three purposes for a layer: -\index{Layers!strategy} - -\begin{itemize} -\item - To display the \textbf{data}. We plot the raw data for many reasons, - relying on our skills at pattern detection to spot gross structure, - local structure, and outliers. This layer appears on virtually every - graphic. In the earliest stages of data exploration, it is often the - only layer. -\item - To display a statistical \textbf{summary} of the data. As we develop - and explore models of the data, it is useful to display model - predictions in the context of the data. Showing the data helps us - improve the model, and showing the model helps reveal subtleties of - the data that we might otherwise miss. Summaries are usually drawn on - top of the data. -\item - To add additional \textbf{metadata}: context, annotations, and - references. A metadata layer displays background context, annotations - that help to give meaning to the raw data, or fixed references that - aid comparisons across panels. Metadata can be useful in the - background and foreground. - - A map is often used as a background layer with spatial data. - Background metadata should be rendered so that it doesn't interfere - with your perception of the data, so is usually displayed underneath - the data and formatted so that it is minimally perceptible. That is, - if you concentrate on it, you can see it with ease, but it doesn't - jump out at you when you are casually browsing the plot. - - Other metadata is used to highlight important features of the data. If - you have added explanatory labels to a couple of inflection points or - outliers, then you want to render them so that they pop out at the - viewer. In that case, you want this to be the very last layer drawn. -\end{itemize} - -This chapter is broken up into the following sections, each of which -deals with a particular graphical challenge. This is not an exhaustive -or exclusive categorisation, and there are many other possible ways to -break up graphics into different categories. Each geom can be used for -many different purposes, especially if you are creative. However, this -breakdown should cover many common tasks and help you learn about some -of the possibilities. - -\begin{itemize} -\item - Basic plot types that produce common, `named' graphics like - scatterplots and line charts, \protect\hyperlink{sec:basics}{link to - section}. -\item - Displaying text, \protect\hyperlink{sec:labelling}{link to section}. -\item - Adding arbitrary additional anotations, - \protect\hyperlink{sec:annotations}{annotations}. -\item - Working with collective geoms, like lines and polygons, that each - display multiple rows of data, - \protect\hyperlink{sec:grouping}{working with groups}. -\item - Surface plots to display 3d surfaces in 2d, - \protect\hyperlink{sec:surface}{link to section}. -\item - Drawing maps, \protect\hyperlink{sec:maps}{link to section}. -\item - Revealing uncertainty and error, with various 1d and 2d intervals, - \protect\hyperlink{sec:uncertainty}{link to section}. -\item - Weighted data, \protect\hyperlink{sec:weighting}{link to section}. -\end{itemize} - -In \protect\hyperlink{sec:diamonds}{diamonds}, you'll learn about the -diamonds dataset. The final three sections use this data to discuss -techniques for visualising larger datasets: - -\begin{itemize} -\item - Displaying distributions, continuous and discrete, 1d and 2d, joint - and conditional, \protect\hyperlink{sec:distributions}{link to - section}. -\item - Dealing with overplotting in scatterplots, a challenge with large - datasets,\\ - \protect\hyperlink{sec:overplotting}{link to section}. -\item - Displaying statistical summaries instead of the raw data, - \protect\hyperlink{sec:summary}{link to section}. -\end{itemize} - -The chapter concludes in \protect\hyperlink{sec:elsewhere}{other -packages} with some pointers to other useful packages built on top of -ggplot2. - -\hypertarget{sec:basics}{\section{Basic plot types}\label{sec:basics}} - -These geoms are the fundamental building blocks of ggplot2. They are -useful in their own right, but are also used to construct more complex -geoms. Most of these geoms are associated with a named plot: when that -geom is used by itself in a plot, that plot has a special name. - -Each of these geoms is two dimensional and requires both \texttt{x} and -\texttt{y} aesthetics. All of them understand \texttt{colour} (or -\texttt{color}) and \texttt{size} aesthetics, and the filled geoms (bar, -tile and polygon) also understand \texttt{fill}. - -\begin{itemize} -\item - \texttt{geom\_area()} draws an \textbf{area plot}, which is a line - plot filled to the y-axis (filled lines). Multiple groups will be - stacked on top of each other. \index{Area plot} \indexf{geom\_area} -\item - \texttt{geom\_bar(stat\ =\ "identity")} makes a \textbf{bar chart}. We - need \texttt{stat\ =\ "identity"} because the default stat - automatically counts values (so is essentially a 1d geom, see - \protect\hyperlink{sec:distributions}{distributions}. The identity - stat leaves the data unchanged. Multiple bars in the same location - will be stacked on top of one another.\index{Barchart} - \indexf{geom\_bar} -\item - \texttt{geom\_line()} makes a \textbf{line plot}. The \texttt{group} - aesthetic determines which observations are connected; see - \protect\hyperlink{sec:grouping}{grouping} for more detail. - \texttt{geom\_line()} connects points from left to right; - \texttt{geom\_path()} is similar but connects points in the order they - appear in the data. Both \texttt{geom\_line()} and - \texttt{geom\_path()} also understand the aesthetic \texttt{linetype}, - which maps a categorical variable to solid, dotted and dashed lines. - \index{Line plot} \indexf{geom\_line} \indexf{geom\_path} -\item - \texttt{geom\_point()} produces a \textbf{scatterplot}. - \texttt{geom\_point()} also understands the \texttt{shape} aesthetic. - \indexf{geom\_point} -\item - \texttt{geom\_polygon()} draws polygons, which are filled paths. Each - vertex of the polygon requires a separate row in the data. It is often - useful to merge a data frame of polygon coordinates with the data just - prior to plotting. \protect\hyperlink{sec:maps}{Drawing maps} - illustrates this concept in more detail for map data. - \indexf{geom\_polygon} -\item - \texttt{geom\_rect()}, \texttt{geom\_tile()} and - \texttt{geom\_raster()} draw rectangles. \texttt{geom\_rect()} is - parameterised by the four corners of the rectangle, \texttt{xmin}, - \texttt{ymin}, \texttt{xmax} and \texttt{ymax}. \texttt{geom\_tile()} - is exactly the same, but parameterised by the center of the rect and - its size, \texttt{x}, \texttt{y}, \texttt{width} and \texttt{height}. - \texttt{geom\_raster()} is a fast special case of - \texttt{geom\_tile()} used when all the tiles are the same size. - \index{Image plot} \index{Level plot} \indexf{geom\_tile}. - \indexf{geom\_rect} \indexf{geom\_raster} -\end{itemize} - -Each geom is shown in the code below. Observe the different axis ranges -for the bar, area and tile plots: these geoms take up space outside the -range of the data, and so push the axes out. - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\DecValTok{3}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{5}\NormalTok{), } - \DataTypeTok{y =} \KeywordTok{c}\NormalTok{(}\DecValTok{2}\NormalTok{, }\DecValTok{4}\NormalTok{, }\DecValTok{6}\NormalTok{), } - \DataTypeTok{label =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{,}\StringTok{"b"}\NormalTok{,}\StringTok{"c"}\NormalTok{)} -\NormalTok{)} -\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{label =} \NormalTok{label)) +}\StringTok{ } -\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \OtherTok{NULL}\NormalTok{, }\DataTypeTok{y =} \OtherTok{NULL}\NormalTok{) +}\StringTok{ }\CommentTok{# Hide axis label} -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{plot.title =} \KeywordTok{element_text}\NormalTok{(}\DataTypeTok{size =} \DecValTok{12}\NormalTok{)) }\CommentTok{# Shrink plot title} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"point"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_text}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"text"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{) +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"bar"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"raster"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.25\linewidth]{_figures/toolbox/geom-basic-1}% - \includegraphics[width=0.25\linewidth]{_figures/toolbox/geom-basic-2}% - \includegraphics[width=0.25\linewidth]{_figures/toolbox/geom-basic-3}% - \includegraphics[width=0.25\linewidth]{_figures/toolbox/geom-basic-4} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"line"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_area}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"area"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_path}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"path"}\NormalTok{)} -\NormalTok{p +}\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{() +}\StringTok{ }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"polygon"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.25\linewidth]{_figures/toolbox/unnamed-chunk-2-1}% - \includegraphics[width=0.25\linewidth]{_figures/toolbox/unnamed-chunk-2-2}% - \includegraphics[width=0.25\linewidth]{_figures/toolbox/unnamed-chunk-2-3}% - \includegraphics[width=0.25\linewidth]{_figures/toolbox/unnamed-chunk-2-4} -\end{figure} - -\subsection{Exercises}\label{exercises} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - What geoms would you use to draw each of the following named plots? - - \begin{enumerate} - \def\labelenumii{\arabic{enumii}.} - \tightlist - \item - Scatterplot - \item - Line chart - \item - Histogram - \item - Bar chart - \item - Pie chart - \end{enumerate} -\item - What's the difference between \texttt{geom\_path()} and - \texttt{geom\_polygon()}? What's the difference between - \texttt{geom\_path()} and \texttt{geom\_line()}? -\item - What low-level geoms are used to draw \texttt{geom\_smooth()}? What - about \texttt{geom\_boxplot()} and \texttt{geom\_violin()}? -\end{enumerate} - -\hypertarget{sec:labelling}{\section{Labels}\label{sec:labelling}} - -\index{Labels} \index{Text} \indexf{geom\_text} - -Adding text to a plot can be quite tricky. ggplot2 doesn't have all the -answers, but does provide some tools to make your life a little easier. -The main tool is \texttt{geom\_text()}, which adds \texttt{label}s at -the specified \texttt{x} and \texttt{y} positions. - -\texttt{geom\_text()} has the most aesthetics of any geom, because there -are so many ways to control the appearance of a text: - -\begin{itemize} -\item - \texttt{family} gives the name of a font. There are only three fonts - that are guaranteed to work everywhere: ``sans'' (the default), - ``serif'', or ``mono'': - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{, }\DataTypeTok{y =} \DecValTok{3}\NormalTok{:}\DecValTok{1}\NormalTok{, }\DataTypeTok{family =} \KeywordTok{c}\NormalTok{(}\StringTok{"sans"}\NormalTok{, }\StringTok{"serif"}\NormalTok{, }\StringTok{"mono"}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{family, }\DataTypeTok{family =} \NormalTok{family))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-family-1} - \end{figure} - - It's trickier to include a system font on a plot because text drawing - is done differently by each graphics device (GD). There are five GDs - in common use (\texttt{png()}, \texttt{pdf()}, on screen devices for - Windows, Mac and Linux), so to have a font work everywhere you need to - configure five devices in five different ways. Two packages simplify - the quandary a bit: - - \begin{itemize} - \item - showtext, \url{https://github.com/yixuan/showtext}, by Yixuan Qiu, - makes GD-independent plots by rendering all text as polygons. - \item - extrafont, \url{https://github.com/wch/extrafont}, by Winston Chang, - converts fonts to a standard format that all devices can use. - \end{itemize} - - Both approaches have pros and cons, so you will to need to try both of - them and see which works best for your needs. \index{Font!family} -\item - \texttt{fontface} specifies the face: ``plain'' (the default), - ``bold'' or ``italic''. \index{Font!face} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{, }\DataTypeTok{y =} \DecValTok{3}\NormalTok{:}\DecValTok{1}\NormalTok{, }\DataTypeTok{face =} \KeywordTok{c}\NormalTok{(}\StringTok{"plain"}\NormalTok{, }\StringTok{"bold"}\NormalTok{, }\StringTok{"italic"}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{face, }\DataTypeTok{fontface =} \NormalTok{face))} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-face-1} - \end{figure} -\item - You can adjust the alignment of the text with the \texttt{hjust} - (``left'', ``center'', ``right'', ``inward'', ``outward'') and - \texttt{vjust} (``bottom'', ``middle'', ``top'', ``inward'', - ``outward'') aesthetics. The default alignment is centered. One of the - most useful alignments is ``inward'': it aligns text towards the - middle of the plot: \index{Font!justification} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{x =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{2}\NormalTok{, }\FloatTok{1.5}\NormalTok{),} - \DataTypeTok{y =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{, }\FloatTok{1.5}\NormalTok{),} - \DataTypeTok{text =} \KeywordTok{c}\NormalTok{(} - \StringTok{"bottom-left"}\NormalTok{, }\StringTok{"bottom-right"}\NormalTok{, } - \StringTok{"top-left"}\NormalTok{, }\StringTok{"top-right"}\NormalTok{, }\StringTok{"center"} - \NormalTok{)} -\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +} -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{text))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +} -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{text), }\DataTypeTok{vjust =} \StringTok{"inward"}\NormalTok{, }\DataTypeTok{hjust =} \StringTok{"inward"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-justification-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-justification-2} - \end{figure} -\item - \texttt{size} controls the font size. Unlike most tools, ggplot2 uses - mm, rather than the usual points (pts). This makes it consistent with - other size units in ggplot2. (There are 72.27 pts in a inch, so to - convert from points to mm, just multiply by 72.27 / 25.4). - \index{Font!size} -\item - \texttt{angle} specifies the rotation of the text in degrees. -\end{itemize} - -You can map data values to these aesthetics, but use restraint: it is -hard to percieve the relationship between variables mapped to these -aesthetics. \texttt{geom\_text()} also has three parameters. Unlike the -aesthetics, these only take single values, so they must be the same for -all labels: - -\begin{itemize} -\item - Often you want to label existing points on the plot. You don't want - the text to overlap with the points (or bars etc), so it's useful to - offset the text a little. The \texttt{nudge\_x} and \texttt{nudge\_y} - parameters allow you to nudge the text a little horizontally or - vertically: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{trt =} \KeywordTok{c}\NormalTok{(}\StringTok{"a"}\NormalTok{, }\StringTok{"b"}\NormalTok{, }\StringTok{"c"}\NormalTok{), }\DataTypeTok{resp =} \KeywordTok{c}\NormalTok{(}\FloatTok{1.2}\NormalTok{, }\FloatTok{3.4}\NormalTok{, }\FloatTok{2.5}\NormalTok{))} -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(resp, trt)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \KeywordTok{paste0}\NormalTok{(}\StringTok{"("}\NormalTok{, resp, }\StringTok{")"}\NormalTok{)), }\DataTypeTok{nudge_y =} \NormalTok{-}\FloatTok{0.25}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{1}\NormalTok{, }\FloatTok{3.6}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-nudge-1} - \end{figure} - - (Note that I manually tweaked the x-axis limits to make sure all the - text fit on the plot.) -\item - If \texttt{check\_overlap\ =\ TRUE}, overlapping labels will be - automatically removed. The algorithm is simple: labels are plotted in - the order they appear in the data frame; if a label would overlap with - an existing point, it's omitted. This is not incredibly useful, but - can be handy. \indexc{check\_overlap} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{model)) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{8}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{model), }\DataTypeTok{check_overlap =} \OtherTok{TRUE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{8}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-overlap-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/text-overlap-2} - \end{figure} -\end{itemize} - -A variation on \texttt{geom\_text()} is \texttt{geom\_label()}: it draws -a rounded rectangle behind the text. This makes it useful for adding -labels to plots with busy backgrounds: \indexf{geom\_label} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{label <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{waiting =} \KeywordTok{c}\NormalTok{(}\DecValTok{55}\NormalTok{, }\DecValTok{80}\NormalTok{), } - \DataTypeTok{eruptions =} \KeywordTok{c}\NormalTok{(}\DecValTok{2}\NormalTok{, }\FloatTok{4.3}\NormalTok{), } - \DataTypeTok{label =} \KeywordTok{c}\NormalTok{(}\StringTok{"peak one"}\NormalTok{, }\StringTok{"peak two"}\NormalTok{)} -\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(waiting, eruptions)) +} -\StringTok{ }\KeywordTok{geom_tile}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{density)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_label}\NormalTok{(}\DataTypeTok{data =} \NormalTok{label, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{label))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/toolbox/label-1} -\end{figure} - -Labelling data well poses some challenges: - -\begin{itemize} -\item - Text does not affect the limits of the plot. Unfortunately there's no - way to make this work since a label has an absolute size (e.g.~3 cm), - regardless of the size of the plot. This means that the limits of a - plot would need to be different depending on the size of the plot --- - there's just no way to make that happen with ggplot2. Instead, you'll - need to tweak \texttt{xlim()} and \texttt{ylim()} based on your data - and plot size. -\item - If you want to label many points, it is difficult to avoid overlaps. - \texttt{check\_overlap\ =\ TRUE} is useful, but offers little control - over which labels are removed. There are a number of techniques - available for base graphics, like \texttt{maptools::pointLabel()}, but - they're not trivial to port to the grid graphics used by ggplot2. If - all else fails, you may need to manually label points in a drawing - tool. -\end{itemize} - -Text labels can also serve as an alternative to a legend. This usually -makes the plot easier to read because it puts the labels closer to the -data. The \href{https://github.com/tdhock/directlabels}{directlabels} -package, by Toby Dylan Hocking, provides a number of tools to make this -easier: \index{directlabels} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, hwy, }\DataTypeTok{colour =} \NormalTok{class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{show.legend =} \OtherTok{FALSE}\NormalTok{) +} -\StringTok{ }\NormalTok{directlabels::}\KeywordTok{geom_dl}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{class), }\DataTypeTok{method =} \StringTok{"smart.grid"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-3-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-3-2} -\end{figure} - -Directlabels provides a number of position methods. \texttt{smart.grid} -is a reasonable place to start for scatterplots, but there are other -methods that are more useful for frequency polygons and line plots. See -the directlabels website, -\url{http://directlabels.r-forge.r-project.org}, for other techniques. - -\hypertarget{sec:annotations}{\section{Annotations}\label{sec:annotations}} - -Annotations add metadata to your plot. But metadata is just data, so you -can use: \index{Annotation} \index{Metadata} - -\begin{itemize} -\item - \texttt{geom\_text()} to add text descriptions or to label points Most - plots will not benefit from adding text to every single observation on - the plot, but labelling outliers and other important points is very - useful. \index{Labels} \indexf{geom\_text} -\item - \texttt{geom\_rect()} to highlight interesting rectangular regions of - the plot. \texttt{geom\_rect()} has aesthetics \texttt{xmin}, - \texttt{xmax}, \texttt{ymin} and \texttt{ymax}. \indexf{geom\_rect} -\item - \texttt{geom\_line()}, \texttt{geom\_path()} and - \texttt{geom\_segment()} to add lines. All these geoms have an - \texttt{arrow} parameter, which allows you to place an arrowhead on - the line. Create arrowheads with \texttt{arrow()}, which has arguments - \texttt{angle}, \texttt{length}, \texttt{ends} and \texttt{type}. - \indexf{geom\_line} -\item - \texttt{geom\_vline()}, \texttt{geom\_hline()} and - \texttt{geom\_abline()} allow you to add reference lines (sometimes - called rules), that span the full range of the plot. - \indexf{geom\_vline} \indexf{geom\_hline} \indexf{geom\_abline} -\end{itemize} - -Typically, you can either put annotations in the foreground (using -\texttt{alpha} if needed so you can still see the data), or in the -background. With the default background, a thick white line makes a -useful reference: it's easy to see but it doesn't jump out at you. - -To show off the basic idea, we'll draw a time series of unemployment: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(date, unemploy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/toolbox/umep-1} -\end{figure} - -We can annotate this plot with which president was in power at the time. -There is little new in this code - it's a straightforward manipulation -of existing geoms. There is one special thing to note: the use of -\texttt{-Inf} and \texttt{Inf} as positions. These refer to the top and -bottom (or left and right) limits of the plot. \indexc{Inf} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{presidential <-}\StringTok{ }\KeywordTok{subset}\NormalTok{(presidential, start >}\StringTok{ }\NormalTok{economics$date[}\DecValTok{1}\NormalTok{])} - -\KeywordTok{ggplot}\NormalTok{(economics) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_rect}\NormalTok{(} - \KeywordTok{aes}\NormalTok{(}\DataTypeTok{xmin =} \NormalTok{start, }\DataTypeTok{xmax =} \NormalTok{end, }\DataTypeTok{fill =} \NormalTok{party), } - \DataTypeTok{ymin =} \NormalTok{-}\OtherTok{Inf}\NormalTok{, }\DataTypeTok{ymax =} \OtherTok{Inf}\NormalTok{, }\DataTypeTok{alpha =} \FloatTok{0.2}\NormalTok{, } - \DataTypeTok{data =} \NormalTok{presidential} - \NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_vline}\NormalTok{(} - \KeywordTok{aes}\NormalTok{(}\DataTypeTok{xintercept =} \KeywordTok{as.numeric}\NormalTok{(start)), } - \DataTypeTok{data =} \NormalTok{presidential,} - \DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, }\DataTypeTok{alpha =} \FloatTok{0.5} - \NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(} - \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{start, }\DataTypeTok{y =} \DecValTok{2500}\NormalTok{, }\DataTypeTok{label =} \NormalTok{name), } - \DataTypeTok{data =} \NormalTok{presidential, } - \DataTypeTok{size =} \DecValTok{3}\NormalTok{, }\DataTypeTok{vjust =} \DecValTok{0}\NormalTok{, }\DataTypeTok{hjust =} \DecValTok{0}\NormalTok{, }\DataTypeTok{nudge_x =} \DecValTok{50} - \NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(date, unemploy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_fill_manual}\NormalTok{(}\DataTypeTok{values =} \KeywordTok{c}\NormalTok{(}\StringTok{"blue"}\NormalTok{, }\StringTok{"red"}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/toolbox/unemp-pres-1} -\end{figure} - -You can use the same technique to add a single annotation to a plot, but -it's a bit fiddly because you have to create a one row data frame: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{yrng <-}\StringTok{ }\KeywordTok{range}\NormalTok{(economics$unemploy)} -\NormalTok{xrng <-}\StringTok{ }\KeywordTok{range}\NormalTok{(economics$date)} -\NormalTok{caption <-}\StringTok{ }\KeywordTok{paste}\NormalTok{(}\KeywordTok{strwrap}\NormalTok{(}\StringTok{"Unemployment rates in the US have } -\StringTok{ varied a lot over the years"}\NormalTok{, }\DecValTok{40}\NormalTok{), }\DataTypeTok{collapse =} \StringTok{"}\CharTok{\textbackslash{}n}\StringTok{"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(date, unemploy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_text}\NormalTok{(} - \KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{label =} \NormalTok{caption), } - \DataTypeTok{data =} \KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \NormalTok{xrng[}\DecValTok{1}\NormalTok{], }\DataTypeTok{y =} \NormalTok{yrng[}\DecValTok{2}\NormalTok{], }\DataTypeTok{caption =} \NormalTok{caption), } - \DataTypeTok{hjust =} \DecValTok{0}\NormalTok{, }\DataTypeTok{vjust =} \DecValTok{1}\NormalTok{, }\DataTypeTok{size =} \DecValTok{4} - \NormalTok{)} -\end{Highlighting} -\end{Shaded} - -It's easier to use the \texttt{annotate()} helper function which creates -the data frame for you: \indexf{annotate} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(economics, }\KeywordTok{aes}\NormalTok{(date, unemploy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{annotate}\NormalTok{(}\StringTok{"text"}\NormalTok{, }\DataTypeTok{x =} \NormalTok{xrng[}\DecValTok{1}\NormalTok{], }\DataTypeTok{y =} \NormalTok{yrng[}\DecValTok{2}\NormalTok{], }\DataTypeTok{label =} \NormalTok{caption,} - \DataTypeTok{hjust =} \DecValTok{0}\NormalTok{, }\DataTypeTok{vjust =} \DecValTok{1}\NormalTok{, }\DataTypeTok{size =} \DecValTok{4} - \NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/toolbox/unnamed-chunk-5-1} -\end{figure} - -Annotations, particularly reference lines, are also useful when -comparing groups across facets. In the following plot, it's much easier -to see the subtle differences if we add a reference line. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{log10}\NormalTok{(carat), }\KeywordTok{log10}\NormalTok{(price))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~cut, }\DataTypeTok{nrow =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/toolbox/unnamed-chunk-6-1}% -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] - -\NormalTok{mod_coef <-}\StringTok{ }\KeywordTok{coef}\NormalTok{(}\KeywordTok{lm}\NormalTok{(}\KeywordTok{log10}\NormalTok{(price) ~}\StringTok{ }\KeywordTok{log10}\NormalTok{(carat), }\DataTypeTok{data =} \NormalTok{diamonds))} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(}\KeywordTok{log10}\NormalTok{(carat), }\KeywordTok{log10}\NormalTok{(price))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_abline}\NormalTok{(}\DataTypeTok{intercept =} \NormalTok{mod_coef[}\DecValTok{1}\NormalTok{], }\DataTypeTok{slope =} \NormalTok{mod_coef[}\DecValTok{2}\NormalTok{], } - \DataTypeTok{colour =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(~cut, }\DataTypeTok{nrow =} \DecValTok{1}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=1\linewidth]{_figures/toolbox/unnamed-chunk-6-2} -\end{figure} - -\hypertarget{sec:grouping}{\section{Collective -geoms}\label{sec:grouping}} - -Geoms can be roughly divided into individual and collective geoms. An -\textbf{individual} geom draws a distinct graphical object for each -observation (row). For example, the point geom draws one point per row. -A \textbf{collective} geom displays multiple observations with one -geometric object. This may be a result of a statistical summary, like a -boxplot, or may be fundamental to the display of the geom, like a -polygon. Lines and paths fall somewhere in between: each line is -composed of a set of straight segments, but each segment represents two -points. How do we control the assignment of observations to graphical -elements? This is the job of the \texttt{group} aesthetic. -\index{Grouping} \indexc{group} \index{Geoms!collective} - -By default, the \texttt{group} aesthetic is mapped to the interaction of -all discrete variables in the plot. This often partitions the data -correctly, but when it does not, or when no discrete variable is used in -a plot, you'll need to explicitly define the grouping structure by -mapping group to a variable that has a different value for each group. - -There are three common cases where the default is not enough, and we -will consider each one below. In the following examples, we will use a -simple longitudinal dataset, \texttt{Oxboys}, from the nlme package. It -records the heights (\texttt{height}) and centered ages (\texttt{age}) -of 26 boys (\texttt{Subject}), measured on nine occasions -(\texttt{Occasion}). \texttt{Subject} and \texttt{Occassion} are stored -as ordered factors. \index{nlme} \index{Data!Oxboys@\texttt{Oxboys}} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{data}\NormalTok{(Oxboys, }\DataTypeTok{package =} \StringTok{"nlme"}\NormalTok{)} -\KeywordTok{head}\NormalTok{(Oxboys)} -\CommentTok{#> Subject age height Occasion} -\CommentTok{#> 1 1 -1.0000 140 1} -\CommentTok{#> 2 1 -0.7479 143 2} -\CommentTok{#> 3 1 -0.4630 145 3} -\CommentTok{#> 4 1 -0.1643 147 4} -\CommentTok{#> 5 1 -0.0027 148 5} -\CommentTok{#> 6 1 0.2466 150 6} -\end{Highlighting} -\end{Shaded} - -\subsection{Multiple groups, one -aesthetic}\label{multiple-groups-one-aesthetic} - -In many situations, you want to separate your data into groups, but -render them in the same way. In other words, you want to be able to -distinguish individual subjects, but not identify them. This is common -in longitudinal studies with many subjects, where the plots are often -descriptively called spaghetti plots. For example, the following plot -shows the growth trajectory for each boy (each \texttt{Subject}): -\index{Data!longitudinal} \indexf{geom\_line} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(age, height, }\DataTypeTok{group =} \NormalTok{Subject)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/oxboys-line-1} -\end{figure} - -If you incorrectly specify the grouping variable, you'll get a -characteristic sawtooth appearance: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(age, height)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/oxboys-line-bad-1} -\end{figure} - -If a group isn't defined by a single variable, but instead by a -combination of multiple variables, use \texttt{interaction()} to combine -them, e.g. -\texttt{aes(group\ =\ interaction(school\_id,\ student\_id))}. -\indexf{interaction} - -\subsection{Different groups on different -layers}\label{different-groups-on-different-layers} - -Sometimes we want to plot summaries that use different levels of -aggregation: one layer might display individuals, while another displays -an overall summary. Building on the previous example, suppose we want to -add a single smooth line, showing the overall trend for \emph{all} boys. -If we use the same grouping in both layers, we get one smooth per boy: -\indexf{geom\_smooth} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(age, height, }\DataTypeTok{group =} \NormalTok{Subject)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/layer18-1} -\end{figure} - -This is not what we wanted; we have inadvertently added a smoothed line -for each boy. Grouping controls both the display of the geoms, and the -operation of the stats: one statistical transformation is run for each -group. - -Instead of setting the grouping aesthetic in \texttt{ggplot()}, where it -will apply to all layers, we set it in \texttt{geom\_line()} so it -applies only to the lines. There are no discrete variables in the plot -so the default grouping variable will be a constant and we get one -smooth: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(age, height)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{Subject)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \StringTok{"lm"}\NormalTok{, }\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\DataTypeTok{se =} \OtherTok{FALSE}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/layer19-1} -\end{figure} - -\subsection{Overriding the default -grouping}\label{overriding-the-default-grouping} - -Some plots have a discrete x scale, but you still want to draw lines -connecting \emph{across} groups. This is the strategy used in -interaction plots, profile plots, and parallel coordinate plots, among -others. For example, imagine we've drawn boxplots of height at each -measurement occasion: \indexf{geom\_boxplot} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(Occasion, height)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/oxbox-1} -\end{figure} - -There is one discrete variable in this plot, \texttt{Occassion}, so we -get one boxplot for each unique x value. Now we want to overlay lines -that connect each individual boy. Simply adding \texttt{geom\_line()} -does not work: the lines are drawn within each occassion, not across -each subject: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(Occasion, height)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{colour =} \StringTok{"#3366FF"}\NormalTok{, }\DataTypeTok{alpha =} \FloatTok{0.5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/oxbox-line-bad-1} -\end{figure} - -To get the plot we want, we need to override the grouping to say we want -one line per boy: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(Oxboys, }\KeywordTok{aes}\NormalTok{(Occasion, height)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{() +} -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{Subject), }\DataTypeTok{colour =} \StringTok{"#3366FF"}\NormalTok{, }\DataTypeTok{alpha =} \FloatTok{0.5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.6\linewidth]{_figures/toolbox/oxbox-line-1} -\end{figure} - -\subsection{Matching aesthetics to graphic objects}\label{sub:matching} - -A final important issue with collective geoms is how the aesthetics of -the individual observations are mapped to the aesthetics of the complete -entity. What happens when different aesthetics are mapped to a single -geometric element? \index{Aesthetics!matching to geoms} - -Lines and paths operate on an off-by-one principle: there is one more -observation than line segment, and so the aesthetic for the first -observation is used for the first segment, the second observation for -the second segment and so on. This means that the aesthetic for the last -observation is not used: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{colour =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{3}\NormalTok{,}\DecValTok{5}\NormalTok{))} - -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{colour =} \KeywordTok{factor}\NormalTok{(colour))) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \DecValTok{1}\NormalTok{), }\DataTypeTok{size =} \DecValTok{2}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{5}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{colour =} \NormalTok{colour)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \DecValTok{1}\NormalTok{), }\DataTypeTok{size =} \DecValTok{2}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{size =} \DecValTok{5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-7-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-7-2} -\end{figure} - -You could imagine a more complicated system where segments smoothly -blend from one aesthetic to another. This would work for continuous -variables like size or colour, but not for discrete variables, and is -not used in ggplot2. If this is the behaviour you want, you can perform -the linear interpolation yourself: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{xgrid <-}\StringTok{ }\KeywordTok{with}\NormalTok{(df, }\KeywordTok{seq}\NormalTok{(}\KeywordTok{min}\NormalTok{(x), }\KeywordTok{max}\NormalTok{(x), }\DataTypeTok{length =} \DecValTok{50}\NormalTok{))} -\NormalTok{interp <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(} - \DataTypeTok{x =} \NormalTok{xgrid,} - \DataTypeTok{y =} \KeywordTok{approx}\NormalTok{(df$x, df$y, }\DataTypeTok{xout =} \NormalTok{xgrid)$y,} - \DataTypeTok{colour =} \KeywordTok{approx}\NormalTok{(df$x, df$colour, }\DataTypeTok{xout =} \NormalTok{xgrid)$y } -\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(interp, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{colour =} \NormalTok{colour)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{data =} \NormalTok{df, }\DataTypeTok{size =} \DecValTok{5}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \centering - \includegraphics[width=0.65\linewidth]{_figures/toolbox/matching-lines2-1} -\end{figure} - -An additional limitation for paths and lines is that line type must be -constant over each individual line. In R there is no way to draw a line -which has varying line type. \indexf{geom\_line} \indexf{geom\_path} - -For all other collective geoms, like polygons, the aesthetics from the -individual components are only used if they are all the same, otherwise -the default value is used. It's particularly clear why this makes sense -for fill: how would you colour a polygon that had a different fill -colour for each point on its border? \indexf{geom\_polygon} - -These issues are most relevant when mapping aesthetics to continuous -variables, because, as described above, when you introduce a mapping to -a discrete variable, it will by default split apart collective geoms -into smaller pieces. This works particularly well for bar and area -plots, because stacking the individual pieces produces the same shape as -the original ungrouped data: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class, }\DataTypeTok{fill =} \NormalTok{drv)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/bar-split-disc-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/bar-split-disc-2} -\end{figure} - -If you try to map fill to a continuous variable in the same way, it -doesn't work. The default grouping will only be based on \texttt{class}, -so each bar will be given multiple colours. Since a bar can only display -one colour, it will use the default grey. To show multiple colours, we -need multiple bars for each \texttt{class}, which we can get by -overriding the grouping: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class, }\DataTypeTok{fill =} \NormalTok{hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(class, }\DataTypeTok{fill =} \NormalTok{hwy, }\DataTypeTok{group =} \NormalTok{hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/bar-split-cont-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/bar-split-cont-2} -\end{figure} - -The bars will be stacked in the order defined by the grouping variable. -If you need fine control, you'll need to create a factor with levels -ordered as needed. - -\subsection{Exercises}\label{exercises-1} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - Draw a boxplot of \texttt{hwy} for each value of \texttt{cyl}, without - turning \texttt{cyl} into a factor. What extra aesthetic do you need - to set? -\item - Modify the following plot so that you get one boxplot per integer - value value of \texttt{displ}. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(displ, cty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{()} -\end{Highlighting} -\end{Shaded} -\item - When illustrating the difference between mapping continuous and - discrete colours to a line, the discrete example needed - \texttt{aes(group\ =\ 1)}. Why? What happens if that is omitted? - What's the difference between \texttt{aes(group\ =\ 1)} and - \texttt{aes(group\ =\ 2)}? Why? -\item - How many bars are in each of the following plots? - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(mpg, }\KeywordTok{aes}\NormalTok{(drv, }\DataTypeTok{fill =} \NormalTok{hwy, }\DataTypeTok{group =} \NormalTok{hwy)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} - -\KeywordTok{library}\NormalTok{(dplyr) } -\NormalTok{mpg2 <-}\StringTok{ }\NormalTok{mpg %>%}\StringTok{ }\KeywordTok{arrange}\NormalTok{(hwy) %>%}\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{id =} \KeywordTok{seq_along}\NormalTok{(hwy)) } -\KeywordTok{ggplot}\NormalTok{(mpg2, }\KeywordTok{aes}\NormalTok{(drv, }\DataTypeTok{fill =} \NormalTok{hwy, }\DataTypeTok{group =} \NormalTok{id)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - - (Hint: try adding an outline around each bar with - \texttt{colour\ =\ "white"}) -\item - Install the babynames package. It contains data about the popularity - of babynames in the US. Run the following code and fix the resulting - graph. Why does this graph make me unhappy? - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(babynames)} -\NormalTok{hadley <-}\StringTok{ }\NormalTok{dplyr::}\KeywordTok{filter}\NormalTok{(babynames, name ==}\StringTok{ "Hadley"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(hadley, }\KeywordTok{aes}\NormalTok{(year, n)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_line}\NormalTok{()} -\end{Highlighting} -\end{Shaded} -\end{enumerate} - -\hypertarget{sec:surface}{\section{Surface plots}\label{sec:surface}} - -ggplot2 does not support true 3d surfaces. However, it does support many -common tools for representing 3d surfaces in 2d: contours, coloured -tiles and bubble plots. These all work similarly, differing only in the -aesthetic used for the third dimension. \index{Surface plots} -\index{Contour plot} \indexf{geom\_contour} \index{3d} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(eruptions, waiting)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_contour}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{z =} \NormalTok{density, }\DataTypeTok{colour =} \NormalTok{..level..))} - -\KeywordTok{ggplot}\NormalTok{(faithfuld, }\KeywordTok{aes}\NormalTok{(eruptions, waiting)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{density))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-11-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-11-2} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Bubble plots work better with fewer observations} -\NormalTok{small <-}\StringTok{ }\NormalTok{faithfuld[}\KeywordTok{seq}\NormalTok{(}\DecValTok{1}\NormalTok{, }\KeywordTok{nrow}\NormalTok{(faithfuld), }\DataTypeTok{by =} \DecValTok{10}\NormalTok{), ]} -\KeywordTok{ggplot}\NormalTok{(small, }\KeywordTok{aes}\NormalTok{(eruptions, waiting)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{density), }\DataTypeTok{alpha =} \DecValTok{1}\NormalTok{/}\DecValTok{3}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_size_area}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-12-1} -\end{figure} - -For interactive 3d plots, including true 3d surfaces, see RGL, -\url{http://rgl.neoscientists.org/about.shtml}. - -\hypertarget{sec:maps}{\section{Drawing maps}\label{sec:maps}} - -\index{Maps!geoms} \index{Data!spatial} - -There are four types of map data you might want to visualise: vector -boundaries, point metadata, area metadata, and raster images. Typically, -assembling these datasets is the most challenging part of drawing maps. -Unfortunately ggplot2 can't help you with that part of the analysis, but -I'll provide some hints about other R packages that you might want to -look at. - -I'll illustrate each of the four types of map data with some maps of -Michigan. - -\subsection{Vector boundaries}\label{vector-boundaries} - -Vector boundaries are defined by a data frame with one row for each -``corner'' of a geographical region like a country, state, or county. It -requires four variables: - -\begin{itemize} -\tightlist -\item - \texttt{lat} and \texttt{long}, giving the location of a point. -\item - \texttt{group}, a unique identifier for each contiguous region. -\item - \texttt{id}, the name of the region. -\end{itemize} - -Separate \texttt{group} and \texttt{id} variables are necessary because -sometimes a geographical unit isn't a contiguous polygon. For example, -Hawaii is composed of multiple islands that can't be drawn using a -single polygon. - -The following code extracts that data from the built in maps package -using \texttt{ggplot2::map\_data()}. The maps package isn't particularly -accurate or up-to-date, but it's built into R so it's a reasonable place -to start. \indexf{map\_data} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mi_counties <-}\StringTok{ }\KeywordTok{map_data}\NormalTok{(}\StringTok{"county"}\NormalTok{, }\StringTok{"michigan"}\NormalTok{) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{select}\NormalTok{(}\DataTypeTok{lon =} \NormalTok{long, lat, group, }\DataTypeTok{id =} \NormalTok{subregion)} -\KeywordTok{head}\NormalTok{(mi_counties)} -\CommentTok{#> lon lat group id} -\CommentTok{#> 1 -83.9 44.9 1 alcona} -\CommentTok{#> 2 -83.4 44.9 1 alcona} -\CommentTok{#> 3 -83.4 44.9 1 alcona} -\CommentTok{#> 4 -83.3 44.8 1 alcona} -\CommentTok{#> 5 -83.3 44.8 1 alcona} -\CommentTok{#> 6 -83.3 44.8 1 alcona} -\end{Highlighting} -\end{Shaded} - -You can visualise vector boundary data with \texttt{geom\_polygon()}: -\indexf{geom\_polygon} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mi_counties, }\KeywordTok{aes}\NormalTok{(lon, lat)) +} -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{group)) +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(mi_counties, }\KeywordTok{aes}\NormalTok{(lon, lat)) +} -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{group), }\DataTypeTok{fill =} \OtherTok{NA}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-14-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-14-2} -\end{figure} - -Note the use of \texttt{coord\_quickmap()}: it's a quick and dirty -adjustment that ensures that the aspect ratio of the plot is set -correctly. - -Other useful sources of vector boundary data are: - -\begin{itemize} -\item - The USAboundaries package, - \url{https://github.com/ropensci/USAboundaries} which contains state, - county and zip code data for the US. As well as current boundaries, it - also has state and county boundaries going back to the 1600s. -\item - The tigris package, \url{https://github.com/walkerke/tigris}, makes it - easy to access the US Census TIGRIS shapefiles. It contains state, - county, zipcode, and census tract boundaries, as well as many other - useful datasets. -\item - The rnaturalearth package bundles up the free, high-quality data from - \url{http://naturalearthdata.com/}. It contains country borders, and - borders for the top-level region within each country (e.g. states in - the USA, regions in France, counties in the UK). -\item - The osmar package, \url{https://cran.r-project.org/package=osmar} - wraps up the OpenStreetMap API so you can access a wide range of - vector data including indvidual streets and buildings -\item - You may have your own shape files (\texttt{.shp}). You can load them - into R with \texttt{maptools::readShapeSpatial()}. -\end{itemize} - -These sources all generate spatial data frames defined by the sp -package. You can convert them into a data frame with \texttt{fortify()}: - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{library}\NormalTok{(USAboundaries)} -\NormalTok{c18 <-}\StringTok{ }\KeywordTok{us_boundaries}\NormalTok{(}\KeywordTok{as.Date}\NormalTok{(}\StringTok{"1820-01-01"}\NormalTok{))} -\NormalTok{c18df <-}\StringTok{ }\KeywordTok{fortify}\NormalTok{(c18)} -\CommentTok{#> Regions defined for each Polygons} -\KeywordTok{head}\NormalTok{(c18df)} -\CommentTok{#> long lat order hole piece id group} -\CommentTok{#> 1 -87.6 35 1 FALSE 1 4 4.1} -\CommentTok{#> 2 -87.6 35 2 FALSE 1 4 4.1} -\CommentTok{#> 3 -87.6 35 3 FALSE 1 4 4.1} -\CommentTok{#> 4 -87.6 35 4 FALSE 1 4 4.1} -\CommentTok{#> 5 -87.5 35 5 FALSE 1 4 4.1} -\CommentTok{#> 6 -87.3 35 6 FALSE 1 4 4.1} - -\KeywordTok{ggplot}\NormalTok{(c18df, }\KeywordTok{aes}\NormalTok{(long, lat)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{group), }\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{, }\DataTypeTok{fill =} \OtherTok{NA}\NormalTok{) +} -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-15-1} -\end{figure} - -\subsection{Point metadata}\label{point-metadata} - -Point metadata connects locations (defined by lat and lon) with other -variables. For example, the code below extracts the biggest cities in MI -(as of 2006): - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mi_cities <-}\StringTok{ }\NormalTok{maps::us.cities %>%}\StringTok{ } -\StringTok{ }\KeywordTok{tbl_df}\NormalTok{() %>%} -\StringTok{ }\KeywordTok{filter}\NormalTok{(country.etc ==}\StringTok{ "MI"}\NormalTok{) %>%} -\StringTok{ }\KeywordTok{select}\NormalTok{(-country.etc, }\DataTypeTok{lon =} \NormalTok{long) %>%} -\StringTok{ }\KeywordTok{arrange}\NormalTok{(}\KeywordTok{desc}\NormalTok{(pop))} -\NormalTok{mi_cities} -\CommentTok{#> Source: local data frame [36 x 5]} -\CommentTok{#> } -\CommentTok{#> name pop lat lon capital} -\CommentTok{#> (chr) (int) (dbl) (dbl) (int)} -\CommentTok{#> 1 Detroit MI 871789 42.4 -83.1 0} -\CommentTok{#> 2 Grand Rapids MI 193006 43.0 -85.7 0} -\CommentTok{#> 3 Warren MI 132537 42.5 -83.0 0} -\CommentTok{#> 4 Sterling Heights MI 127027 42.6 -83.0 0} -\CommentTok{#> 5 Lansing MI 117236 42.7 -84.5 2} -\CommentTok{#> 6 Flint MI 115691 43.0 -83.7 0} -\CommentTok{#> .. ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -We could show this data with a scatterplot, but it's not terribly useful -without a reference. You almost always combine point metadata with -another layer to make it interpretable. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(mi_cities, }\KeywordTok{aes}\NormalTok{(lon, lat)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{pop)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_size_area}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(mi_cities, }\KeywordTok{aes}\NormalTok{(lon, lat)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \NormalTok{group), mi_counties, }\DataTypeTok{fill =} \OtherTok{NA}\NormalTok{, }\DataTypeTok{colour =} \StringTok{"grey50"}\NormalTok{) +} -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{pop), }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_size_area}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-17-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-17-2} -\end{figure} - -\subsection{Raster images}\label{raster-images} - -Instead of displaying context with vector boundaries, you might want to -draw a traditional map underneath. This is called a raster image. The -easiest way to get a raster map of a given area is to use the ggmap -package, which allows you to get data from a variety of online mapping -sources including OpenStreetMap and Google Maps. Downloading the raster -data is often time consuming so it's a good idea to cache it in a rds -file. \index{ggmap} \index{Raster data} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{if (}\KeywordTok{file.exists}\NormalTok{(}\StringTok{"mi_raster.rds"}\NormalTok{)) \{} - \NormalTok{mi_raster <-}\StringTok{ }\KeywordTok{readRDS}\NormalTok{(}\StringTok{"mi_raster.rds"}\NormalTok{)} -\NormalTok{\} else \{} - \NormalTok{bbox <-}\StringTok{ }\KeywordTok{c}\NormalTok{(} - \KeywordTok{min}\NormalTok{(mi_counties$lon), }\KeywordTok{min}\NormalTok{(mi_counties$lat), } - \KeywordTok{max}\NormalTok{(mi_counties$lon), }\KeywordTok{max}\NormalTok{(mi_counties$lat)} - \NormalTok{)} - \NormalTok{mi_raster <-}\StringTok{ }\NormalTok{ggmap::}\KeywordTok{get_openstreetmap}\NormalTok{(bbox, }\DataTypeTok{scale =} \DecValTok{8735660}\NormalTok{)} - \KeywordTok{saveRDS}\NormalTok{(mi_raster, }\StringTok{"mi_raster.rds"}\NormalTok{)} -\NormalTok{\}} -\end{Highlighting} -\end{Shaded} - -(Finding the appropriate \texttt{scale} required a lot of manual -tweaking.) - -You can then plot it with: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{ggmap::}\KeywordTok{ggmap}\NormalTok{(mi_raster)} - -\NormalTok{ggmap::}\KeywordTok{ggmap}\NormalTok{(mi_raster) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{pop), mi_cities, }\DataTypeTok{colour =} \StringTok{"red"}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_size_area}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -If you have raster data from the raster package, you can convert it to -the form needed by ggplot2 with the following code: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{as.data.frame}\NormalTok{(raster::}\KeywordTok{rasterToPoints}\NormalTok{(x))} -\KeywordTok{names}\NormalTok{(df) <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"lon"}\NormalTok{, }\StringTok{"lat"}\NormalTok{, }\StringTok{"x"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(lon, lat)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{x))} -\end{Highlighting} -\end{Shaded} - -\subsection{Area metadata}\label{area-metadata} - -Sometimes metadata is associated not with a point, but with an area. For -example, we can create \texttt{mi\_census} which provides census -information about each county in MI: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{mi_census <-}\StringTok{ }\NormalTok{midwest %>%} -\StringTok{ }\KeywordTok{tbl_df}\NormalTok{() %>%} -\StringTok{ }\KeywordTok{filter}\NormalTok{(state ==}\StringTok{ "MI"}\NormalTok{) %>%}\StringTok{ } -\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{county =} \KeywordTok{tolower}\NormalTok{(county)) %>%} -\StringTok{ }\KeywordTok{select}\NormalTok{(county, area, poptotal, percwhite, percblack)} -\NormalTok{mi_census} -\CommentTok{#> Source: local data frame [83 x 5]} -\CommentTok{#> } -\CommentTok{#> county area poptotal percwhite percblack} -\CommentTok{#> (chr) (dbl) (int) (dbl) (dbl)} -\CommentTok{#> 1 alcona 0.041 10145 98.8 0.266} -\CommentTok{#> 2 alger 0.051 8972 93.9 2.374} -\CommentTok{#> 3 allegan 0.049 90509 95.9 1.600} -\CommentTok{#> 4 alpena 0.034 30605 99.2 0.114} -\CommentTok{#> 5 antrim 0.031 18185 98.4 0.126} -\CommentTok{#> 6 arenac 0.021 14931 98.4 0.067} -\CommentTok{#> .. ... ... ... ... ...} -\end{Highlighting} -\end{Shaded} - -We can't map this data directly because it has no spatial component. -Instead, we must first join it to the vector boundaries data. This is -not particularly space efficient, but it makes it easy to see exactly -what data is being plotted. Here I use \texttt{dplyr::left\_join()} to -combine the two datasets and create a choropleth map. \index{Choropleth} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{census_counties <-}\StringTok{ }\KeywordTok{left_join}\NormalTok{(mi_census, mi_counties, }\DataTypeTok{by =} \KeywordTok{c}\NormalTok{(}\StringTok{"county"} \NormalTok{=}\StringTok{ "id"}\NormalTok{))} -\NormalTok{census_counties} -\CommentTok{#> Source: local data frame [1,472 x 8]} -\CommentTok{#> } -\CommentTok{#> county area poptotal percwhite percblack lon lat group} -\CommentTok{#> (chr) (dbl) (int) (dbl) (dbl) (dbl) (dbl) (dbl)} -\CommentTok{#> 1 alcona 0.041 10145 98.8 0.266 -83.9 44.9 1} -\CommentTok{#> 2 alcona 0.041 10145 98.8 0.266 -83.4 44.9 1} -\CommentTok{#> 3 alcona 0.041 10145 98.8 0.266 -83.4 44.9 1} -\CommentTok{#> 4 alcona 0.041 10145 98.8 0.266 -83.3 44.8 1} -\CommentTok{#> 5 alcona 0.041 10145 98.8 0.266 -83.3 44.8 1} -\CommentTok{#> 6 alcona 0.041 10145 98.8 0.266 -83.3 44.8 1} -\CommentTok{#> .. ... ... ... ... ... ... ... ...} - -\KeywordTok{ggplot}\NormalTok{(census_counties, }\KeywordTok{aes}\NormalTok{(lon, lat, }\DataTypeTok{group =} \NormalTok{county)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{poptotal)) +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(census_counties, }\KeywordTok{aes}\NormalTok{(lon, lat, }\DataTypeTok{group =} \NormalTok{county)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_polygon}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{percwhite)) +}\StringTok{ } -\StringTok{ }\KeywordTok{coord_quickmap}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-22-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-22-2} -\end{figure} - -\hypertarget{sec:uncertainty}{\section{Revealing -uncertainty}\label{sec:uncertainty}} - -If you have information about the uncertainty present in your data, -whether it be from a model or from distributional assumptions, it's a -good idea to display it. There are four basic families of geoms that can -be used for this job, depending on whether the x values are discrete or -continuous, and whether or not you want to display the middle of the -interval, or just the extent: - -\begin{itemize} -\tightlist -\item - Discrete x, range: \texttt{geom\_errorbar()}, - \texttt{geom\_linerange()} -\item - Discrete x, range \& center: \texttt{geom\_crossbar()}, - \texttt{geom\_pointrange()} -\item - Continuous x, range: \texttt{geom\_ribbon()} -\item - Continuous x, range \& center: - \texttt{geom\_smooth(stat\ =\ "identity")} -\end{itemize} - -These geoms assume that you are interested in the distribution of y -conditional on x and use the aesthetics \texttt{ymin} and \texttt{ymax} -to determine the range of the y values. If you want the opposite, see -\protect\hyperlink{sub:coord-flip}{coord\_flip}. \index{Error bars} -\indexf{geom\_ribbon} \indexf{geom\_smooth} \indexf{geom\_errorbar} -\indexf{geom\_linerange} \indexf{geom\_crossbar} -\indexf{geom\_pointrange} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{y <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\DecValTok{18}\NormalTok{, }\DecValTok{11}\NormalTok{, }\DecValTok{16}\NormalTok{)} -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DataTypeTok{y =} \NormalTok{y, }\DataTypeTok{se =} \KeywordTok{c}\NormalTok{(}\FloatTok{1.2}\NormalTok{, }\FloatTok{0.5}\NormalTok{, }\FloatTok{1.0}\NormalTok{))} - -\NormalTok{base <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y, }\DataTypeTok{ymin =} \NormalTok{y -}\StringTok{ }\NormalTok{se, }\DataTypeTok{ymax =} \NormalTok{y +}\StringTok{ }\NormalTok{se))} -\NormalTok{base +}\StringTok{ }\KeywordTok{geom_crossbar}\NormalTok{()} -\NormalTok{base +}\StringTok{ }\KeywordTok{geom_pointrange}\NormalTok{()} -\NormalTok{base +}\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"identity"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/toolbox/unnamed-chunk-23-1}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/unnamed-chunk-23-2}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/unnamed-chunk-23-3} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{base +}\StringTok{ }\KeywordTok{geom_errorbar}\NormalTok{()} -\NormalTok{base +}\StringTok{ }\KeywordTok{geom_linerange}\NormalTok{()} -\NormalTok{base +}\StringTok{ }\KeywordTok{geom_ribbon}\NormalTok{()} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/toolbox/unnamed-chunk-24-1}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/unnamed-chunk-24-2}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/unnamed-chunk-24-3} -\end{figure} - -Because there are so many different ways to calculate standard errors, -the calculation is up to you. \index{Standard errors} For very simple -cases, ggplot2 provides some tools in the form of summary functions -described below, otherwise you will have to do it yourself. -\protect\hyperlink{cha:modelling}{The modelling chapter} contains more -advice on extracting confidence intervals from more sophisticated -models. - -\hypertarget{sec:weighting}{\section{Weighted -data}\label{sec:weighting}} - -When you have aggregated data where each row in the dataset represents -multiple observations, you need some way to take into account the -weighting variable. We will use some data collected on Midwest states in -the 2000 US census in the built-in \texttt{midwest} data frame. The data -consists mainly of percentages (e.g., percent white, percent below -poverty line, percent with college degree) and some information for each -county (area, total population, population density). \index{Weighting} - -There are a few different things we might want to weight by: - -\begin{itemize} -\tightlist -\item - Nothing, to look at numbers of counties. -\item - Total population, to work with absolute numbers. -\item - Area, to investigate geographic effects. (This isn't useful for - \texttt{midwest}, but would be if we had variables like percentage of - farmland.) -\end{itemize} - -The choice of a weighting variable profoundly affects what we are -looking at in the plot and the conclusions that we will draw. There are -two aesthetic attributes that can be used to adjust for weights. -Firstly, for simple geoms like lines and points, use the size aesthetic: - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Unweighted} -\KeywordTok{ggplot}\NormalTok{(midwest, }\KeywordTok{aes}\NormalTok{(percwhite, percbelowpoverty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} - -\CommentTok{# Weight by population} -\KeywordTok{ggplot}\NormalTok{(midwest, }\KeywordTok{aes}\NormalTok{(percwhite, percbelowpoverty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{poptotal /}\StringTok{ }\FloatTok{1e6}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{scale_size_area}\NormalTok{(}\StringTok{"Population}\CharTok{\textbackslash{}n}\StringTok{(millions)"}\NormalTok{, }\DataTypeTok{breaks =} \KeywordTok{c}\NormalTok{(}\FloatTok{0.5}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{4}\NormalTok{))} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/miss-basic-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/miss-basic-2} -\end{figure} - -For more complicated grobs which involve some statistical -transformation, we specify weights with the \texttt{weight} aesthetic. -These weights will be passed on to the statistical summary function. -Weights are supported for every case where it makes sense: smoothers, -quantile regressions, boxplots, histograms, and density plots. You can't -see this weighting variable directly, and it doesn't produce a legend, -but it will change the results of the statistical summary. The following -code shows how weighting by population density affects the relationship -between percent white and percent below the poverty line. - -\begin{Shaded} -\begin{Highlighting}[] -\CommentTok{# Unweighted} -\KeywordTok{ggplot}\NormalTok{(midwest, }\KeywordTok{aes}\NormalTok{(percwhite, percbelowpoverty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{() +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\DataTypeTok{method =} \NormalTok{lm, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{)} - -\CommentTok{# Weighted by population} -\KeywordTok{ggplot}\NormalTok{(midwest, }\KeywordTok{aes}\NormalTok{(percwhite, percbelowpoverty)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{size =} \NormalTok{poptotal /}\StringTok{ }\FloatTok{1e6}\NormalTok{)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_smooth}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{weight =} \NormalTok{poptotal), }\DataTypeTok{method =} \NormalTok{lm, }\DataTypeTok{size =} \DecValTok{1}\NormalTok{) +} -\StringTok{ }\KeywordTok{scale_size_area}\NormalTok{(}\DataTypeTok{guide =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/weight-lm-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/weight-lm-2} -\end{figure} - -When we weight a histogram or density plot by total population, we -change from looking at the distribution of the number of counties, to -the distribution of the number of people. The following code shows the -difference this makes for a histogram of the percentage below the -poverty line: \index{Histogram!weighted} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(midwest, }\KeywordTok{aes}\NormalTok{(percbelowpoverty)) +} -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\StringTok{"Counties"}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(midwest, }\KeywordTok{aes}\NormalTok{(percbelowpoverty)) +} -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{weight =} \NormalTok{poptotal), }\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{) +} -\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\StringTok{"Population (1000s)"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/weight-hist-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/weight-hist-2} -\end{figure} - -\hypertarget{sec:diamonds}{\section{Diamonds data}\label{sec:diamonds}} - -To demonstrate tools for large datasets, we'll use the built in -\texttt{diamonds} dataset, which consists of price and quality -information for \textasciitilde{}54,000 diamonds: - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{diamonds} -\CommentTok{#> Source: local data frame [53,940 x 10]} -\CommentTok{#> } -\CommentTok{#> carat cut color clarity depth table price x y} -\CommentTok{#> (dbl) (fctr) (fctr) (fctr) (dbl) (dbl) (int) (dbl) (dbl)} -\CommentTok{#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98} -\CommentTok{#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84} -\CommentTok{#> 3 0.23 Good E VS1 56.9 65 327 4.05 4.07} -\CommentTok{#> 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23} -\CommentTok{#> 5 0.31 Good J SI2 63.3 58 335 4.34 4.35} -\CommentTok{#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96} -\CommentTok{#> .. ... ... ... ... ... ... ... ... ...} -\CommentTok{#> Variables not shown: z (dbl)} -\end{Highlighting} -\end{Shaded} - -The data contains the four C's of diamond quality: carat, cut, colour -and clarity; and five physical measurements: depth, table, x, y and z, -as described in Figure \ref{fig:diamond-dim}. -\index{Data!diamonds@\texttt{diamonds}} - -\begin{figure}[htbp] - \centering - \includegraphics[width=0.8\linewidth]{diagrams/diamond-dimensions} - \caption{How the variables x, y, z, table and depth are measured.} - \label{fig:diamond-dim} -\end{figure} - -The dataset has not been well cleaned, so as well as demonstrating -interesting facts about diamonds, it also shows some data quality -problems. - -\hypertarget{sec:distributions}{\section{Displaying -distributions}\label{sec:distributions}} - -There are a number of geoms that can be used to display distributions, -depending on the dimensionality of the distribution, whether it is -continuous or discrete, and whether you are interested in the -conditional or joint distribution. \index{Distributions} - -For 1d continuous distributions the most important geom is the -histogram, \texttt{geom\_histogram()}: \indexf{geom\_histogram} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{()} -\CommentTok{#> `stat_bin()` using `bins = 30`. Pick better value with} -\CommentTok{#> `binwidth`.} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\DataTypeTok{binwidth =} \FloatTok{0.1}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{55}\NormalTok{, }\DecValTok{70}\NormalTok{)} -\CommentTok{#> Warning: Removed 45 rows containing non-finite values (stat_bin).} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/geom-1d-con-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/geom-1d-con-2} -\end{figure} - -It is important to experiment with binning to find a revealing view. You -can change the \texttt{binwidth}, specify the number of \texttt{bins}, -or specify the exact location of the \texttt{breaks}. Never rely on the -default parameters to get a revealing view of the distribution. Zooming -in on the x axis, \texttt{xlim(55,\ 70)}, and selecting a smaller bin -width, \texttt{binwidth\ =\ 0.1}, reveals far more detail. -\index{Histogram!choosing bins} - -When publishing figures, don't forget to include information about -important parameters (like bin width) in the caption. - -If you want to compare the distribution between groups, you have a few -options: - -\begin{itemize} -\tightlist -\item - Show small multiples of the histogram, - \texttt{facet\_wrap(\textasciitilde{}\ var)}. -\item - Use colour and a frequency polygon, \texttt{geom\_freqpoly()} . - \index{Frequency polygon} \indexf{geom\_freqpoly} -\item - Use a ``conditional density plot'', - \texttt{geom\_histogram(position\ =\ "fill")}. - \index{Conditional density plot} -\end{itemize} - -The frequency polygon and conditional density plots are shown below. The -conditional density plot uses \texttt{position\_fill()} to stack each -bin, scaling it to the same height. This plot is perceptually -challenging because you need to compare bar heights, not positions, but -you can see the strongest patterns. \indexf{position\_fill} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_freqpoly}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{cut), }\DataTypeTok{binwidth =} \FloatTok{0.1}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +} -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{58}\NormalTok{, }\DecValTok{68}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_histogram}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{fill =} \NormalTok{cut), }\DataTypeTok{binwidth =} \FloatTok{0.1}\NormalTok{, }\DataTypeTok{position =} \StringTok{"fill"}\NormalTok{,} - \DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +} -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{58}\NormalTok{, }\DecValTok{68}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/compare-dist-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/compare-dist-2} -\end{figure} - -(I've suppressed the legends to focus on the display of the data.) - -Both the histogram and frequency polygon geom use the same underlying -statistical transformation: \texttt{stat\ =\ "bin"}. This statistic -produces two output variables: \texttt{count} and \texttt{density}. By -default, count is mapped to y-position, because it's most interpretable. -The density is the count divided by the total count multiplied by the -bin width, and is useful when you want to compare the shape of the -distributions, not the overall size. \indexf{stat\_bin} - -An alternative to a bin-based visualisation is a density estimate. -\texttt{geom\_density()} places a little normal distribution at each -data point and sums up all the curves. It has desirable theoretical -properties, but is more difficult to relate back to the data. Use a -density plot when you know that the underlying density is smooth, -continuous and unbounded. You can use the \texttt{adjust} parameter to -make the density more or less smooth. \index{Density plot} -\indexf{geom\_density} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(depth)) +} -\StringTok{ }\KeywordTok{geom_density}\NormalTok{(}\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{58}\NormalTok{, }\DecValTok{68}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(depth, }\DataTypeTok{fill =} \NormalTok{cut, }\DataTypeTok{colour =} \NormalTok{cut)) +} -\StringTok{ }\KeywordTok{geom_density}\NormalTok{(}\DataTypeTok{alpha =} \FloatTok{0.2}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{58}\NormalTok{, }\DecValTok{68}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position =} \StringTok{"none"}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/geom-density-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/geom-density-2} -\end{figure} - -Note that the area of each density estimate is standardised to one so -that you lose information about the relative size of each group. - -The histogram, frequency polygon and density display a detailed view of -the distribution. However, sometimes you want to compare many -distributions, and it's useful to have alternative options that -sacrifice quality for quantity. Here are three options: - -\begin{itemize} -\item - \texttt{geom\_boxplot()}: the box-and-whisker plot shows five summary - statistics along with individual ``outliers''. It displays far less - information than a histogram, but also takes up much less space. - \index{Boxplot} \indexf{geom\_boxplot} - - You can use boxplot with both categorical and continuous x. For - continuous x, you'll also need to set the group aesthetic to define - how the x variable is broken up into bins. A useful helper function is - \texttt{cut\_width()}: \indexf{cut\_width} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(clarity, depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(carat, depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_boxplot}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \KeywordTok{cut_width}\NormalTok{(carat, }\FloatTok{0.1}\NormalTok{))) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\OtherTok{NA}\NormalTok{, }\FloatTok{2.05}\NormalTok{)} -\CommentTok{#> Warning: Removed 997 rows containing non-finite values} -\CommentTok{#> (stat_boxplot).} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/geom-boxplot-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/geom-boxplot-2} - \end{figure} -\item - \texttt{geom\_violin()}: the violin plot is a compact version of the - density plot. The underlying computation is the same, but the results - are displayed in a similar fashion to the boxplot: - \indexf{geom\_violion} \index{Violin plot} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(clarity, depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_violin}\NormalTok{()} -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(carat, depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_violin}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{group =} \KeywordTok{cut_width}\NormalTok{(carat, }\FloatTok{0.1}\NormalTok{))) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\OtherTok{NA}\NormalTok{, }\FloatTok{2.05}\NormalTok{)} -\CommentTok{#> Warning: Removed 997 rows containing non-finite values} -\CommentTok{#> (stat_ydensity).} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-26-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-26-2} - \end{figure} -\item - \texttt{geom\_dotplot()}: draws one point for each observation, - carefully adjusted in space to avoid overlaps and show the - distribution. It is useful for smaller datasets. - \indexf{geom\_dotplot} \index{Dot plot} -\end{itemize} - -\subsection{Exercises}\label{exercises-2} - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\item - What binwidth tells you the most interesting story about the - distribution of \texttt{carat}? -\item - Draw a histogram of \texttt{price}. What interesting patterns do you - see? -\item - How does the distribution of \texttt{price} vary with - \texttt{clarity}? -\item - Overlay a frequency polygon and density plot of \texttt{depth}. What - computed variable do you need to map to \texttt{y} to make the two - plots comparable? (You can either modify \texttt{geom\_freqpoly()} or - \texttt{geom\_density()}.) -\end{enumerate} - -\hypertarget{sec:overplotting}{\section{Dealing with -overplotting}\label{sec:overplotting}} - -The scatterplot is a very important tool for assessing the relationship -between two continuous variables. However, when the data is large, -points will be often plotted on top of each other, obscuring the true -relationship. In extreme cases, you will only be able to see the extent -of the data, and any conclusions drawn from the graphic will be suspect. -This problem is called \textbf{overplotting}. \index{Overplotting} - -There are a number of ways to deal with it depending on the size of the -data and severity of the overplotting. The first set of techniques -involves tweaking aesthetic properties. These tend to be most effective -for smaller datasets: - -\begin{itemize} -\item - Very small amounts of overplotting can sometimes be alleviated by - making the points smaller, or using hollow glyphs. The following code - shows some options for 2000 points sampled from a bivariate normal - distribution. \indexf{geom\_point} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{df <-}\StringTok{ }\KeywordTok{data.frame}\NormalTok{(}\DataTypeTok{x =} \KeywordTok{rnorm}\NormalTok{(}\DecValTok{2000}\NormalTok{), }\DataTypeTok{y =} \KeywordTok{rnorm}\NormalTok{(}\DecValTok{2000}\NormalTok{))} -\NormalTok{norm <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(df, }\KeywordTok{aes}\NormalTok{(x, y)) +}\StringTok{ }\KeywordTok{xlab}\NormalTok{(}\OtherTok{NULL}\NormalTok{) +}\StringTok{ }\KeywordTok{ylab}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{shape =} \DecValTok{1}\NormalTok{) }\CommentTok{# Hollow circles} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{shape =} \StringTok{"."}\NormalTok{) }\CommentTok{# Pixel sized} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/toolbox/overp-glyph-1}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/overp-glyph-2}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/overp-glyph-3} - \end{figure} -\item - For larger datasets with more overplotting, you can use alpha blending - (transparency) to make the points transparent. If you specify - \texttt{alpha} as a ratio, the denominator gives the number of points - that must be overplotted to give a solid colour. Values smaller than - \textasciitilde{}\(1/500\) are rounded down to zero, giving completely - transparent points. \indexc{alpha} \index{Transparency} - \index{Colour!transparency} \index{Alpha blending} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{3}\NormalTok{)} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{5}\NormalTok{)} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\DataTypeTok{alpha =} \DecValTok{1} \NormalTok{/}\StringTok{ }\DecValTok{10}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.333\linewidth]{_figures/toolbox/overp-alpha-1}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/overp-alpha-2}% - \includegraphics[width=0.333\linewidth]{_figures/toolbox/overp-alpha-3} - \end{figure} -\item - If there is some discreteness in the data, you can randomly jitter the - points to alleviate some overlaps with \texttt{geom\_jitter()}. This - can be particularly useful in conjunction with transparency. By - default, the amount of jitter added is 40\% of the resolution of the - data, which leaves a small gap between adjacent regions. You can - override the default with \texttt{width} and \texttt{height} - arguments. -\end{itemize} - -Alternatively, we can think of overplotting as a 2d density estimation -problem, which gives rise to two more approaches: - -\begin{itemize} -\item - Bin the points and count the number in each bin, then visualise that - count (the 2d generalisation of the histogram), - \texttt{geom\_bin2d()}. Breaking the plot into many small squares can - produce distracting visual artefacts. (D. B. Carr et al. 1987) - suggests using hexagons instead, and this is implemented in - \texttt{geom\_hex()}, using the \textbf{hexbin} package (D. Carr, - Lewin-Koh, and Mächler 2014). \index{hexbin} - - The code below compares square and hexagonal bins, using parameters - \texttt{bins} and \texttt{binwidth} to control the number and size of - the bins. \index{Histogram!2d} \indexf{geom\_hexagon} - \indexf{geom\_hex} \indexf{geom\_bin2d} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{()} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/overp-bin-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/overp-bin-2} - \end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_hex}\NormalTok{()} -\NormalTok{norm +}\StringTok{ }\KeywordTok{geom_hex}\NormalTok{(}\DataTypeTok{bins =} \DecValTok{10}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - - \begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/overp-bin-hex-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/overp-bin-hex-2} - \end{figure} -\item - Estimate the 2d density with \texttt{stat\_density2d()}, and then - display using one of the techniques for showing 3d surfaces in - \protect\hyperlink{sec:surface}{surfaces}. -\item - If you are interested in the conditional distribution of y given x, - then the techniques of \protect\hyperlink{sub:distribution}{displaying - distributions} will also be useful. -\end{itemize} - -Another approach to dealing with overplotting is to add data summaries -to help guide the eye to the true shape of the pattern within the data. -For example, you could add a smooth line showing the centre of the data -with \texttt{geom\_smooth()} or use one of the summaries below. - -\hypertarget{sec:summary}{\section{Statistical -summaries}\label{sec:summary}} - -\indexf{stat\_summary\_bin} \indexf{stat\_summary\_2d} -\index{Stats!summary} - -\texttt{geom\_histogram()} and \texttt{geom\_bin2d()} use a familiar -geom, \texttt{geom\_bar()} and \texttt{geom\_raster()}, combined with a -new statistical transformation, \texttt{stat\_bin()} and -\texttt{stat\_bin2d()}. \texttt{stat\_bin()} and \texttt{stat\_bin2d()} -combine the data into bins and count the number of observations in each -bin. But what if we want a summary other than count? So far, we've just -used the default statistical transformation associated with each geom. -Now we're going to explore how to use \texttt{stat\_summary\_bin()} to -\texttt{stat\_summary\_2d()} to compute different summaries. - -Let's start with a couple of examples with the diamonds data. The first -example in each pair shows how we can count the number of diamonds in -each bin; the second shows how we can compute the average price. - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(color)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{()} - -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(color, price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bar}\NormalTok{(}\DataTypeTok{stat =} \StringTok{"summary_bin"}\NormalTok{, }\DataTypeTok{fun.y =} \NormalTok{mean)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-27-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-27-2} -\end{figure} - -\begin{Shaded} -\begin{Highlighting}[] -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(table, depth)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_bin2d}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{50}\NormalTok{, }\DecValTok{70}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylim}\NormalTok{(}\DecValTok{50}\NormalTok{, }\DecValTok{70}\NormalTok{)} - -\KeywordTok{ggplot}\NormalTok{(diamonds, }\KeywordTok{aes}\NormalTok{(table, depth, }\DataTypeTok{z =} \NormalTok{price)) +}\StringTok{ } -\StringTok{ }\KeywordTok{geom_raster}\NormalTok{(}\DataTypeTok{binwidth =} \DecValTok{1}\NormalTok{, }\DataTypeTok{stat =} \StringTok{"summary_2d"}\NormalTok{, }\DataTypeTok{fun =} \NormalTok{mean, } - \DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{xlim}\NormalTok{(}\DecValTok{50}\NormalTok{, }\DecValTok{70}\NormalTok{) +}\StringTok{ } -\StringTok{ }\KeywordTok{ylim}\NormalTok{(}\DecValTok{50}\NormalTok{, }\DecValTok{70}\NormalTok{)} -\end{Highlighting} -\end{Shaded} - -\begin{figure}[H] - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-28-1}% - \includegraphics[width=0.5\linewidth]{_figures/toolbox/unnamed-chunk-28-2} -\end{figure} - -To get more help on the arguments associated with the two -transformations, look at the help for \texttt{stat\_summary\_bin()} and -\texttt{stat\_summary\_2d()}. You can control the size of the bins and -the summary functions. \texttt{stat\_summary\_bin()} can produce -\texttt{y}, \texttt{ymin} and \texttt{ymax} aesthetics, also making it -useful for displaying measures of spread. See the docs for more details. -You'll learn more about how geoms and stats interact in -\protect\hyperlink{sec:stat}{stats}. - -These summary functions are quite constrained but are often useful for a -quick first pass at a problem. If you find them restraining, you'll need -to do the summaries yourself. See -\protect\hyperlink{sec:summarise}{group-wise summaries} for more -details. - -\hypertarget{sec:elsewhere}{\section{Add-on -packages}\label{sec:elsewhere}} - -If the built-in tools in ggplot2 don't do what you need, you might want -to use a special purpose tool built into one of the packages built on -top of ggplot2. Some of the packages that I was familiar with when the -book was published include: - -\begin{itemize} -\item - animInt, \url{https://github.com/tdhock/animint}, lets you make you - ggplot2 graphics interactive, adding querying, filtering and linking. -\item - GGally, \url{https://github.com/ggobi/ggally}, provides a very - flexible scatterplot matrix, amongst other tools. -\item - ggbio, \url{http://www.tengfei.name/ggbio/}, provides a number of - specialised geoms for genomic data. -\item - ggdendro, \url{https://github.com/andrie/ggdendro}, turns data from - tree methods in to data frames that can easily be displayed with - ggplot2. -\item - ggfortify, \url{https://github.com/sinhrks/ggfortify}, provides - fortify and autoplot methods to handle objects from some popular R - packages. -\item - ggenealogy, \url{https://cran.r-project.org/package=ggenealogy}, helps - explore and visualise genealogy data. -\item - ggmcmc, \url{http://xavier-fim.net/packages/ggmcmc/}, provides a set - of flexible tools for visualising the samples generated by MCMC - methods. -\item - ggparallel, \url{https://cran.r-project.org/package=ggparallel}: - easily draw parallel coordinates plots, and the closely related - hammock and common angle plots. -\item - ggtern, \url{http://www.ggtern.com}, lets you use ggplot2 to draw - ternary diagrams, used when you have three variables that always sum - to one. -\item - ggtree, \url{https://github.com/GuangchuangYu/ggtree}, provides tools - to view and annotate phylogenetic tree with different types of - meta-data. -\item - granovaGG, \url{https://github.com/briandk/granovaGG}, provides tools - to visualise ANOVA results. -\item - plotluck, \url{https://github.com/stefan-schroedl/plotluck}: the - ggplot2 version of Google's ``I'm feeling lucky''. It automatically - creates plots for one, two or three variables. -\end{itemize} - -A great place to track new extensions is -\url{http://www.ggplot2-exts.org}, by Daniel Emaasit. - -\section*{References}\label{references} -\addcontentsline{toc}{section}{References} - -\hypertarget{refs}{} -\hypertarget{ref-carr:1987}{} -Carr, D. B., R. J. Littlefield, W. L. Nicholson, and J. S. Littlefield. -1987. ``Scatterplot Matrix Techniques for Large N.'' \emph{Journal of -the American Statistical Association} 82 (398): 424--36. - -\hypertarget{ref-hexbin}{} -Carr, Dan, Nicholas Lewin-Koh, and Martin Mächler. 2014. \emph{Hexbin: -Hexagonal Binning Routines}. diff --git a/common.R b/common.R index 06c1fe0e..08585d04 100644 --- a/common.R +++ b/common.R @@ -1,6 +1,9 @@ library(ggplot2) library(dplyr) +conflicted::conflict_prefer("filter", "dplyr") library(tidyr) +conflicted::conflict_prefer("extract", "tidyr") + options(digits = 3, dplyr.print_min = 6, dplyr.print_max = 6) # suppress startup message @@ -103,7 +106,7 @@ include_graphics <- function(x, options) { paste0(" \\includegraphics", opts_str, - "{", knitr:::sans_ext(x), "}", + "{", tools::file_path_sans_ext(x), "}", if (options$fig.cur != options$fig.num) "%", "\n" ) diff --git a/position.rmd b/position.rmd index 5c09962a..16f3a5da 100644 --- a/position.rmd +++ b/position.rmd @@ -443,7 +443,7 @@ Internally ggplot2 uses many more segments so that the result looks smooth. Like limits, we can also transform the data in two places: at the scale level or at the coordinate system level. `coord_trans()` has arguments `x` and `y` which should be strings naming the transformer or transformer objects (see [continous position scales](#sub:scale-position)). Transforming at the scale level occurs before statistics are computed and does not change the shape of the geom. Transforming at the coordinate system level occurs after the statistics have been computed, and does affect the shape of the geom. Using both together allows us to model the data on a transformed scale and then backtransform it for interpretation: a common pattern in analysis. \index{Transformation!coordinate system} \index{Coordinate systems!transformed} \indexf{coord\_trans} `r columns(3, 1)` -```{r backtrans} +```{r backtrans, warning=FALSE} # Linear model on original scale is poor fit base <- ggplot(diamonds, aes(carat, price)) + stat_bin2d() + diff --git a/scales.rmd b/scales.rmd index e97ca17f..36a8fbba 100644 --- a/scales.rmd +++ b/scales.rmd @@ -219,8 +219,8 @@ See the documentation of the scales package for more details. `r columns(3)` ```{r breaks-functions} axs + scale_y_continuous(labels = scales::percent_format()) -axs + scale_y_continuous(labels = scales::dollar_format("$")) -leg + scale_fill_continuous(labels = scales::unit_format("k", 1e-3)) +axs + scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +leg + scale_fill_continuous(labels = scales::unit_format(unit = "k", scale = 1e-3)) ``` You can adjust the minor breaks (the faint grid lines that appear between the major grid lines) by supplying a numeric vector of positions to the `minor_breaks` argument. This is particularly useful for log scales: \index{Minor breaks} @@ -247,7 +247,7 @@ Note the use of `%o%` to quickly generate the multiplication table, and that the ```{r, echo = FALSE} ggplot(mpg, aes(displ, hwy)) + geom_point() + - scale_x_continuous("Displacement", labels = scales::unit_format("L")) + + scale_x_continuous("Displacement", labels = scales::unit_format(suffix = "L")) + scale_y_continuous(quote(paste("Highway ", (frac(miles, gallon))))) ``` @@ -868,9 +868,9 @@ bars <- ggplot(df, aes(x, y, fill = x)) + ```{r} library(wesanderson) - bars + scale_fill_manual(values = wes_palette("GrandBudapest")) - bars + scale_fill_manual(values = wes_palette("Zissou")) - bars + scale_fill_manual(values = wes_palette("Rushmore")) + bars + scale_fill_manual(values = wes_palette("GrandBudapest1")) + bars + scale_fill_manual(values = wes_palette("Zissou1")) + bars + scale_fill_manual(values = wes_palette("Rushmore1")) ``` Note that one set of colours is not uniformly good for all purposes: bright colours work well for points, but are overwhelming on bars. Subtle colours work well for bars, but are hard to see on points: diff --git a/toolbox.rmd b/toolbox.rmd index 9620b553..c6ddbc8c 100644 --- a/toolbox.rmd +++ b/toolbox.rmd @@ -682,7 +682,9 @@ These sources all generate spatial data frames defined by the sp package. You ca ```{r} library(USAboundaries) +library(sf) c18 <- us_boundaries(as.Date("1820-01-01")) +c18 <- as(c18, "Spatial") c18df <- fortify(c18) head(c18df)