# Module 1 Practice: Anscombe's Quartet

In this practice, we will recreate the Anscombe's Quartet visualization using ggplot2 functionality. Following is the statistics and the graphics we have seen in the lab notebook.


<img src="../images/AnscombeStats.png">

<img src="../images/AnscombeGraph.png">

We will use the **`ggplot2`** library which has many useful functions to create nice plots and graphics. We will also use some other libraries to help with the layout. 

In [None]:
library(ggplot2)
library(gridExtra)

# First, call the data set to the workspace and display it (like in the lab notebook)

# YOUR CODE HERE #

In [None]:
# display summary statistics here 
# YOUR CODE HERE #

In [None]:
# Let's compute the correlation
sapply(1:4, function(x) cor(anscombe[, x], anscombe[, x+4]))

In [None]:
# and compute variance
sapply(5:8, function(x) var(anscombe[, x]))

In [None]:
# linear regression (first pair)
lm(y1 ~ x1, data = anscombe)

In [None]:
# linear regression (second pair)
lm(y2 ~ x2, data = anscombe)

### Now, it's your turn: Find the linear regression coefficients for the next two data sets:

In [None]:
# linear regression (next two pairs)

# YOUR CODE HERE #

**Let's create a plot with x1 and y1 as the inputs and use the geom_point; it will be a scatterplot.** ggplot2 constructs the graphs in a modular way: first we define the data to be used, then the type of the plot, and then the other parameters and functionality we desire. All these can be added to the plot by using the **+** operator as below. 


In [None]:
# Inputs: x1 and y1. Use the geom_point to make it a scatterplot. 

p1 <- ggplot(anscombe, aes(x=x1, y=y1)) + geom_point() 

# add a linear regression line and set minimum x,y limits on the plot
p1 <- p1 + geom_smooth(method = lm) + expand_limits(x = 4, y = 4)
p1

# Now, let's do the same for x2,y2; we can remove the confidence interval with se=FALSE 
p2 <- ggplot(anscombe, aes(x=x2, y=y2)) + geom_point() 
p2 <- p2 + geom_smooth(method = lm, se = FALSE) + expand_limits(x = 4, y = 4)
p2

### Now, it's your turn: create the plots for x3,y3 and x4,y4 in similar way as above :

In [None]:
p3 <- <--YOUR CODE HERE-->

p3 <- p3 + <--YOUR CODE HERE-->
p3

p4 <- <--YOUR CODE HERE-->

p4 <- p4 + <--YOUR CODE HERE-->
p4

In [None]:
# Now, we plot them as a grid using the grid.arrange 
grid.arrange(p1, p2, p3, p4)

**`ggplot2` has many ways of manipulating the plot. Let's try to make our plots look very close to the original picture above.** 

In [None]:
# This is another way of drawing the scatterplot in order to make it look very close to our original graph. 

# change color and size of the marker
pp1 <- ggplot(anscombe) + geom_point(aes(x1, y1), color = "darkorange", size = 3) 

# add black and white theme
pp1 <- pp1 + theme_bw()

# adjust the limits on the axes
pp1 <- pp1 + scale_x_continuous(breaks = seq(0, 18, 2)) + scale_y_continuous(breaks = seq(0, 12, 2)) 

# draw a line with an intercept and slope -- see the difference from above
pp1 <- pp1 + geom_abline(intercept = 3, slope = 0.5, color = "cornflowerblue") 

# more axis stuff
pp1 <- pp1 + expand_limits(x = c(4,18), y = c(4,12)) 

# and add a title 
pp1 <- pp1 + labs(title = "dataset 1")
pp1

### Again, it's your turn: create the rest of the plots: 

In [None]:
pp2 <- ggplot(anscombe) + <--YOUR CODE HERE-->
pp2 <- pp2 + theme_bw() 
pp2 <- pp2 + scale_x_continuous(breaks = seq(0, 18, 2)) + scale_y_continuous(breaks = seq(0, 12, 2)) 
pp2 <- pp2 + geom_abline(intercept = 3, slope = 0.5, color = "cornflowerblue") 
pp2 <- pp2 + expand_limits(x = c(4,18), y = c(4,12)) 
pp2 <- pp2 + <--YOUR CODE HERE-->

In [None]:
pp3 <- <--YOUR CODE HERE-->

pp4 <- <--YOUR CODE HERE-->

grid.arrange(pp1, pp2, pp3, pp4)

**In the coming labs and practices, we will learn how to manipulate different types of visualizations (plots, bar charts, histograms, etc.) using ggplot2.**