# Data Visualization in R

In this example, we will be plotting the factors that affect compressive strength of concrete.

We start by importing additional packages necessary for this visualization.

In [None]:
library("ggplot2")

print("Libraries loaded!")

Next, we load data and look at the first few rows.

In [None]:
concrete <- read.csv(file = "data/Concrete_Data.csv")
head(concrete)

Our first plot will be plotting the Compressive Strength vs. the amount of cement. Using the ggplot method `geom_smooth`, it will automatically draw the line from linear regression.

In [None]:
ggplot(data = concrete, mapping = aes(x = Cement, y = Compressive_Strength)) +
  geom_point() +
  geom_smooth(method = "lm")

We should update our axis labels, using the `xlab` and `ylab` commands.

In [None]:
ggplot(data = concrete, mapping = aes(x = Cement, y = Compressive_Strength)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ylab("Strength (MPa)") +
  xlab("Cement (kg/m3)")

Next, we will modify our code to change what we plot on the x-axis. In this case we want to plot Age on the x-axis. Update the code below to change the values we are plotting on the x-axis. _Hint_: you'll need to change the name of the variable passed to x on the first line, as well as the axis label on the last line.

In [None]:
ggplot(data = concrete, mapping = aes(x = Cement, y = Compressive_Strength)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ylab("Strength (MPa)") +
  xlab("Cement (kg/m3)")

By default, it will add a linear regression line and confidence intervals. This does not look like a linear relationship (the compressive strength doesn't appear to increase continuously with age). Try a polynomial relationship by adding `formula = y ~ poly(x, degree = 4)` to the `geom_smooth` method (immediately following the method specification).

In [None]:
# Paste your code from above here, and update

Maybe a little better, but a 4-order polynomial works best. Change the order specification, replacing the 2 with 4 in the `degree` argument of the `poly` command.

In [None]:
# Paste your code from the previous code block and update

To finish off this plot, we want to:

1. Change the axis labels to include units in parentheses (days for age and MPa for compressive strength)
2. Write the plot to a png file

In [None]:
# Replace "Cement" with "Age" in the line below
age_plot <- ggplot(data = concrete, mapping = aes(x = Cement, y = Compressive_Strength)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ylab("Strength (MPa)") +
  xlab("Cement (kg/m3)") # <---- update this axis label, too
print(age_plot) # Prints to screen
ggsave(file = "age-strength-R.png", plot = age_plot)