# Data Visualization in R

In this example, we will be plotting the COVID case rate by the vaccination rates for U.S. states and territories.

We start by importing additional packages necessary for this visualization.

In [None]:
library("ggplot2")

print("Libraries loaded!")

Next, we load data and look at the first few rows.

In [None]:
covid <- read.csv(file = "data/COVID_Data.csv")
head(covid)

Our first plot will be plotting the case rate per 100,000 people vs. the percentage of people who have received at least one dose. Using the ggplot method `geom_smooth`, it will automatically draw the line from linear regression.

In [None]:
ggplot(data = covid, mapping = aes(x = One_Dose, y = Case_Rate)) +
  geom_point() +
  geom_smooth(method = "lm")

We should update our axis labels, using the `xlab` and `ylab` commands.

In [None]:
ggplot(data = covid, mapping = aes(x = One_Dose, y = Case_Rate)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ylab("Cases (per 100K)") +
  xlab("Percentage with at least one dose")

Next, we will modify our code to change what we plot on the x-axis. In this case we want to plot Dose_Rate on the x-axis. Update the code below to change the values we are plotting on the x-axis. _Hint_: you'll need to change the name of the variable passed to x on the first line, as well as the axis label on the last line.

In [None]:
ggplot(data = covid, mapping = aes(x = One_Dose, y = Case_Rate)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ylab("Cases (per 100K)") +
  xlab("Percentage with at least one dose")

By default, it will add a linear regression line and confidence intervals. This does not look like a linear relationship (the case rate doesn't appear to increase continuously with vaccination rate). Try a polynomial relationship by adding `formula = y ~ poly(x, degree = 2)` to the `geom_smooth` method (immediately following the method specification).

In [None]:
# Paste your code from above here, and update

Maybe a little better, but a 3-order polynomial works best. Change the order specification, replacing the 2 with 3 in the `degree` argument of the `poly` command.

In [None]:
# Paste your code from the previous code block and update

To finish off this plot, we want to:

1. Update the axis labels to be readable and include units in parentheses
2. Write the plot to a png file

In [None]:
# Replace "One_Dose" with "Dose_Rate" in the line below
vaccination_plot <- ggplot(data = covid, mapping = aes(x = One_Dose, y = Case_Rate)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ylab("Cases (per 100K)") +
  xlab("Percentage with at least one dose") # <---- update this axis label, too
print(vaccination_plot) # Prints to screen
ggsave(file = "vaccination-cases.png", plot = vaccination_plot)