# L01-E2-Scatterplots
## Exercise Instructions

* Complete all cells as instructed, replacing any ??? with the appropriate code

* Execute Jupyter **Kernel** > **Restart & Run All** and ensure that all code blocks run without error


# Scatterplots

We have been looking at vehicle fuel economy using a scatterplot of miles per gallon and the size of the vehicle’s engine as measured in piston displacement. Scatterplots are useful to show the relationship between two variables. It is good practice to explore all the data variables available in the dataset. There are other variables that are related to the size of the engine. 

We’ll look at these variables to see if they reveal any new information. Using the same datasets, mtcars and mpg, we will deepen our understanding of the functions you previously used, namely library(), glimpse(), ggplot() and geom_point(). We will also introduce comments, a key feature to making your code more human readable, including your future self.


## R Features
* library()
* class()
* glimpse()
* ggplot()
* geom_point()

## Datasets
* mtcars
* mpg

## Comments
R Code comments start with a \# (pound sign). They can be on their own line or at the end of a line of code. All text after the \# is ignored by R. Comments are used for two purposes. 
1. Communciate something via text to other humans...or to your future self
2. Keep a line of code from executing, often called 'commented out' code

Feel free to add your comments in any of the exercises. It doesn't affect the running of the code and it is good practice.

## Jupyter Code Blocks
Jupyter code blocks are like the one shown below. They simply group the code so you can run all those lines at once either by clicking the run button on the toolbar or by holding down the control key and pressing Enter.

To the left of the code block is In []:. Inside the square brackets is the status of that block. If it is: 
* blank - it hasn't been run before
* \* - currently running
* (number) means that it completed running. The number is just an execution counter and goes up by one every time any code block is run in the notebook.

In [None]:
# Run this block (click the run button on 
# the toolbar or the ctrl-enter keyboard shortcut)

# Does anything happen?

# Run it a few more times. You will notice that in the upper left of this block
# In [n]: there is a number for the 'n'. That number increases each
# time you run the block. 
# Try it again to see.

Did you notice the In [ ]: above changing?

## Error messages
You will deliberately run a line of code that produces an error. 

Run the code block to see the error, then comment out that line of code by placing a \# at the beginning of the line.

In [None]:
# The below line of code is broken
# Run it to see the error message
# Pretty terse message. 
# Comment out this line,
# We'll fix it in the next code block
library( ??? )

It produces an error something like:

` Error in parse(text = x, srcfile = src): <text>:6:12: unexpected ')'  
5: # We'll fix it in the next code block 
6: library(???) 
              ^
Traceback: `



## Updating ??? (question mark blanks)
Throughout this course you will get starter code to guide you. This starter code needs to be edited to get to the solution. One way of guiding you through this is by replacing some of the solution code with ??? . You would delete the ??? and replace it with the appropriate line of code. 

The ??? doesn't reflect the size of the replacement code. The replacement code could be one character to several characters.



## Library
Load the tidyverse library in the code block below. Note that tidyverse doesn't need to be enclosed in quotes. Some functions require the parameters to be in quotes whiles others don't. library() actually works both ways but convention is to not use quotes unless required. That is more of a human readability style than anything else.

In [None]:
# Load the tidyverse library
# Do this by editing the line below replacing the 
# ??? placeholder with the word tidyverse
# After you fix it, run the code block

library( ??? )  # Update this and run

## Class
The class() prints the kind or type of variable. Different classes organize the data differently and functions are designed to work with certain classes.

In [None]:
# Let's check out the class of the data
class(mtcars) 

Notice in the above output that the class is 'data.frame'. No tibble enhancements.

## Glimpse
Glimpse allows you to look at the data in a data frame. It reports the number of rows and columns, column names, data types, and some of the data.

In [None]:
# Use glimpse to display the variable names and sample data
# for mtcars dataset
glimpse( ??? )  # Update the ??? and run cell

## ggplot
ggplot() is the primary function for plotting. It stores the basic information that can be inherited by the layers and geometries. geom_point() is a geometry layer that plots points on the canvas.

In [None]:
# Create a scatterplot of mtcars 
# Miles per gallon vs. engine displacement
# mpg on the y-axis and disp on the x-axis
ggplot(data = mtcars) + 
   geom_point(mapping = aes(x = ??? , y = ??? )) # Update ??? then run cell

In [None]:
# Create a scatterplot of mtcars 
# Miles per gallon vs engine horsepower
# mpg on the y-axis and hp on the x-axis
ggplot(data = mtcars) + 
   geom_point(mapping = aes(x = ???, y = ???)) 

In [None]:
# Create a scatterplot of mtcars 
# Miles per gallon vs engine cylinders
# mpg on the y-axis and cyl on the x-axis
ggplot(data = mtcars) + 
   geom_point(mapping = aes(x = ??? , y = ??? ))

Let's switch from mtcars to mpg dataset. Also, a Jupyter feature I find helpful is clicking in the margin to the left of the output window chart to collapse the output area to a smaller size with a scrollbar. Click it again to expand to full size. Give it a try on the above chart. See if you can click in the white area to the left of the chart to collapse it.

## mpg dataset

In [None]:
# Let's look at the class for
# the mpg dataset
class(mpg)

Notice above that there are three classes belonging to mpg, the data.frame and two tibble classes that store additional metadata.

In [None]:
# Let's take a peek at the data
# Do you remember the function we were using?
# Hint: glimpse()
??? (mpg)

In [None]:
# Create a scatterplot of mpg 
# Highway miles per gallon vs. engine displacement
# hwy on the y-axis and displ on the x-axis
ggplot(data = mpg) + 
   geom_point(mapping = aes(x = ???, y = ???))

In [None]:
# Create a scatterplot of mpg 
# Highway miles per gallon vs engine cylinders
# hwy on the y-axis and cyl on the x-axis
ggplot(data = mpg) + 
   geom_point(mapping = aes(x = ??? , y = ??? ))

## Code Recap
Let's put all the lines of code together to see the bigger picture of what we did. You can do this too in your notebook if you like. Select the last code block to highlight it. On the Jupyter toolbar, click the + to create a new block. Change the block type to 'Markdown' or 'Code' depending on whether you want a text block like this one or a code block like the one below. This can be found in the dropdown in the toolbar. If necessary, you can use the up and down arrows on the toolbar to move the blocks around. The scissors toolbar button is used to delete a block.

In [None]:
# Load libraries
library( ??? )

# Explore mtcars data
# Hint: glimpse()
??? (mtcars)

# Plot mtcars: mpg vs disp, hp, cyl
ggplot(data = mtcars) + 
   geom_point(mapping = aes(x = ??? , y = mpg)) 

ggplot(data = mtcars) + 
   geom_point(mapping = aes(x = ??? , y = mpg)) 

ggplot(data = mtcars) + 
   geom_point(mapping = aes(x = ??? , y = mpg))

# Explore mpg
??? (mpg)

# Plot mpg: hwy vs displ, cyl
ggplot(data = mpg) + 
   geom_point(mapping = aes(x = ??? , y = hwy)) 

ggplot(data = mpg) + 
   geom_point(mapping = aes(x = ??? , y = hwy))

# Congratulations!
Wonderful progress! You can load a library and you know how to look into a dataset to see what's available. You have also learned how to create simple scatterplots controlling what variables are plotted on the x- and y-axes. 