# ggplot2

R can produce high quality graphics using the ggplot2 package. The "gg" portion of the name comes from Leland Wilkinson's book *The Grammar of Graphics*, which is considered a classic in the field.

We'll explore scatter plots in detail to give you a feel for how ggplot2 works and then touch briefly on some other graphing and plotting methods.

In [None]:
require("ggplot2")

The learning curve for ggplot2 can be a little steeper than for some other plotting packages, but once you've mastered the fundamentals you'll appreciate the power of ggplot2. Two key points to keep in mind:

+ ggplot2 works on data frames
+ plots can be built up by adding successive layers

Let's revisit the mtcars data set and the relationship between horsepower (hp) and fuel efficiency (mpg)

In [None]:
head(mtcars,3)

We'll start by making a simple scatter plot, but first we'll use the options function to set the graphs to a convenient size. 

In [None]:
options(repr.plot.width=4, repr.plot.height=3)

In [None]:
g <- ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point()
plot(g)

A few comments on the previous cell

+ Note that we saved the plot as 'g' and then displayed using plot. We can go back later and add additional layers to g
+ The dataframe is the first argument to ggplot and we used an aesthetic mapping (aes) so that the hp and mpg columns serve as the x and y data
+ geom_point() adds the scatter plot layer to the graph

Starting with our previously saved plot, we can add another layer that contains a title, subtitle, x-axis label and y-axis label

In [None]:
g <- g + labs(title="HP vs. MPG", subtitle="Exploring relationship between power and efficiency", 
              x="Horsepower", y="Miles per gallon")
plot(g)

## A brief digression - getting math formatting into figures

Unlike Python's matplotlib, ggplot2 doesn't have a convenient way of formatting math equations and special characters using LaTeX syntax. Instead it provides some limited functionality using the expression function as demonstrated below for the x and y axis labels.

In [None]:
g <- g + labs(title="HP vs. MPG", subtitle="Exploring relationship between power and efficiency", 
              x=expression(Omega + lambda^2), y=expression(alpha - beta))
plot(g)

## A second brief digression - writing R statements over multiple lines

R has a clever way of handling code that runs over several lines. If no more arguments are required and all open parentheses are matched with closing parenthesis, R will consider the expression complete. Therefore, we can write multiline expressions by placing operators at the end of the line and/or making sure that closing parentheses are needed. We demonstrate this below, but note that we've already been taking advantage of this feature.

In [None]:
g <- ggplot(mtcars, aes(x=hp, y=mpg)) + 
geom_point() + 
labs(title="HP vs. MPG", 
    subtitle="Exploring relationship between power and efficiency", 
    x="Horsepower", 
    y="Miles per gallon")

plot(g)

## Changing the marker colors, shapes, sizes and alphas

The marker properties can be specified in the geom_point function. For reasons I don't fully understand, shapes are defined by numbers rather than meaningful names (see following for list of shapes http://www.cookbook-r.com/Graphs/Shapes_and_line_types/)

In [None]:
g <- ggplot(mtcars, aes(x=hp, y=mpg)) + 
geom_point(col="blue", size=3, shape=17, alpha=0.5)

plot(g)

The geom_point function also lets you vary the symbols properties based on the value of another column. Below, we color the markers according to the value of the cylinder column and the shape using the weight (wt) column. We first convert the cylinder column from a numeric value to a factor so that it will be treated as categorical data.

In [None]:
g <- ggplot(mtcars, aes(x=hp, y=mpg)) + 
geom_point(aes(size = wt, color = factor(cyl)), alpha=0.8)

plot(g)

## Changing the axis limits

Axis limits are set using the coord_cartesion function

In [None]:
g <- ggplot(mtcars, aes(x=hp, y=mpg)) + 
geom_point(aes(size = wt, color = factor(cyl)), alpha=0.8) +
coord_cartesian(xlim=c(0,400), ylim=c(0, 35))

plot(g)

## Changing the color palettes

The RColorBrewer contains a collecion of color palettes. More information on these palettes can be found here https://moderndata.plot.ly/create-colorful-graphs-in-r-with-rcolorbrewer-and-plotly/.

To set the color palette, use the scale_colour_brewer function.

In [None]:
library(RColorBrewer)

g <- ggplot(mtcars, aes(x=hp, y=mpg)) + 
geom_point(aes(size = wt, color = factor(cyl)), alpha=0.8) +
coord_cartesian(xlim=c(0,400), ylim=c(0, 35)) +
scale_colour_brewer(palette = "Dark2")

plot(g)