# 1 Introduction

In this chapter we’ll get you into the right frame of mind for developing meaningful visualizations with R. You’ll understand that as a communications tool, visualizations require you to think about your audience first. You’ll also be introduced to the basics of ggplot2 - the 7 different grammatical elements (layers) and aesthetic mappings.

# Explore and explain

In this video we made the distinction between plots for exploring and plots for explaining data. Which of the following are exploratory plots typically NOT?

# Select one answer

(x) Meant for a specialist audience.

( ) Data-heavy.

( ) Pretty.

( ) Rough first drafts.

( ) Part of our data science toolkit as graphical data analysis.

# Drawing your first plot

To get a first feel for ggplot2, let's try to run some basic ggplot2 commands. The mtcars dataset contains information on 32 cars from a 1973 issue of Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

# Instructions:

- Load the ggplot2 package using library().
- Use str() to explore the structure of the mtcars dataset.
- Hit submit. This will execute the example code on the right. See if you can understand what ggplot does with the data.

In [None]:
# Load the ggplot2 package
library(ggplot2)

# Explore the mtcars data frame with str()
str(mtcars)

# Execute the following command
ggplot(mtcars, aes(cyl, mpg)) +
  geom_point()

# Data columns types affect plot types

The plot from the previous exercise wasn't really satisfying. Although cyl (the number of cylinders) is categorical, you probably noticed that it is classified as numeric in mtcars. This is really misleading because the representation in the plot doesn't match the actual data type. You'll have to explicitly tell ggplot2 that cyl is a categorical variable.

# Instructions:

- Change the ggplot() command by wrapping factor() around cyl.
- Hit submit and see if the resulting plot is better this time.

In [None]:
# Load the ggplot2 package
library(ggplot2)

# Change the command below so that cyl is treated as factor
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_point()

# Mapping data columns to aesthetics

Let's dive a little deeper into the three main topics in this course: The data, aesthetics, and geom layers. We'll get to making pretty plots in the last chapter with the themes layer.

We'll continue working on the 32 cars in the mtcars data frame.

Consider how the examples and concepts we discuss throughout these courses apply to your own data-sets!

# Instructions:

- Add a color aesthetic mapped to the displacement of the car engine: inside aes(), add a color argument equal to disp.

In [None]:
# Edit to add a color aesthetic mapped to disp
ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) +
  geom_point()

- This time, map disp to the size aesthetic.

In [None]:
# Change the color aesthetic to a size aesthetic
ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) +
  geom_point()

# Understanding variables

In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale.

Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points. However, if you try this command in the console:

ggplot(mtcars, aes(wt, mpg, shape = disp)) +
  geom_point()
It gives an error. What does this mean?

# Instructions:

( ) shape is not a defined argument.

(x) shape only makes sense with categorical data, and disp is continuous.

( ) shape only makes sense with continuous data, and disp is categorical.

( ) shape is not a variable in your dataset.

( ) shape has to be defined as a function.

# Adding geometries

The diamonds dataset contains details of 1,000 diamonds. Among the variables included are carat (a measurement of the diamond's size) and price.

You'll use two common geom layer functions:

geom_point() adds points (as in a scatter plot).
geom_smooth() adds a smooth trend curve.
As you saw previously, these are added using the + operator.

ggplot(data, aes(x, y)) +
  geom_*()
Where * is the specific geometry needed.

# Instructions:

- Explore the diamonds data frame with the str() function.

In [None]:
# Explore the diamonds data frame with str()
str(diamonds)

- Edit the plot code to add a point geom. Use the + operator to add geom_point() to the ggplot() command.

In [None]:
# Add geom_point() with +
ggplot(diamonds, aes(carat, price)) +
  geom_point()

- Add a smooth geom to the plot. Use the + operator to add geom_smooth().

In [None]:
# Add geom_smooth() with +
ggplot(diamonds, aes(carat, price)) +
  geom_point() +
  geom_smooth()

# Changing one geom or every geom

If you have multiple geoms, then mapping an aesthetic to data variable inside the call to ggplot() will change all the geoms. It is also possible to make changes to individual geoms by passing arguments to the geom_*() functions.

geom_point() has an alpha argument that controls the opacity of the points. A value of 1 (the default) means that the points are totally opaque; a value of 0 means the points are totally transparent (and therefore invisible). Values in between specify transparency.

The plot you drew last time is provided in the script.

# Instructions:

- Edit the plot code to map the color aesthetic to the clarity data variable.

In [None]:
# Map the color aesthetic to clarity
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point() +
  geom_smooth()

- Make the points translucent by setting the alpha argument to 0.4.

In [None]:
# Make the points 40% opaque
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point(alpha = 0.4) +
  geom_smooth()

# Saving plots as variables

Plots can be saved as variables, which can be added to later on using the + operator. This is really useful if you want to make multiple related plots from a common base.

# Instructions:

- Using the diamonds dataset, plot the price (y-axis) versus the carat (x-axis), assigning to plt_price_vs_carat.
- Using geom_point(), add a point layer to plt_price_vs_carat.

In [None]:
# Draw a ggplot
plt_price_vs_carat <- ggplot(
  # Use the diamonds dataset
  diamonds,
  # For the aesthetics, map x to carat and y to price
  aes(carat, price)
)

# Add a point layer to plt_price_vs_carat
plt_price_vs_carat + geom_point()

- Add an alpha argument to the point layer to make the points 20% opaque, assigning to plt_price_vs_carat_transparent.
- Type the plot's variable name (plt_price_vs_carat_transparent) to display it.

In [None]:
# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))

# Edit this to make points 20% opaque: plt_price_vs_carat_transparent
plt_price_vs_carat_transparent <- plt_price_vs_carat + geom_point(alpha = 0.2)

# See the plot
plt_price_vs_carat_transparent

- Inside geom_point(), call aes() and map color to clarity, assigning to plt_price_vs_carat_by_clarity.
- Type the plot's variable name (plt_price_vs_carat_by_clarity) to display it.

In [None]:
# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))

# Edit this to map color to clarity,
# Assign the updated plot to a new object
plt_price_vs_carat_by_clarity <- plt_price_vs_carat + geom_point(aes(color = clarity))

# See the plot
plt_price_vs_carat_by_clarity