# BIOS512 Assignment #3

Import the 📈Tidyverse into your `R` session

In [None]:
library('tidyverse')
library('ggrepel')

In this assignment we'll be using data from the [TidyTuesday Project](https://github.com/rfordatascience/tidytuesday). Specifically, we'll investigate vaccination rates at US schools. A `CSV` file of the data is located at:  

[https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-25/measles.csv](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-25/measles.csv)

Get the "raw" data URL and import it into your `R` session using `read_csv`. Remember to capture the data under a variable name of your choosing!

In [None]:
vaccination.rates = read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/refs/heads/master/data/2020/2020-02-25/measles.csv')

In [None]:
vaccination.rates |> head()

The `mmr` column has holds 🤒 measles, mumps, rubella 🤒 vaccination rates for students in each respective school. **If the `mmr` value is not available for a school the `mmr` value is set to -1 in this data set.** 

The target `mmr` vaccination rate as [recommended by the CDC](https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5632a3.htm) for schools is 95%. 

**Calculate the fraction of schools per state that have vaccination rates greater than the CDC 95%. Capture the output as a table called `df_vacc_rates`.**

You'll need to do use `filter`, `group_by`, and `summarize`. 

I.e.
1. Filter out schools that don't report `mmr` vaccination rate (keep schools where `mmr >= 0`).
1. Group the data by `state`.
1. Summarize fracion of vaccination rates above 95%.

💡Remember `n()` stands for the number or records in a group. Also, `sum(mmr >= 95, na.rm=TRUE)` will count the number of values above or equal to 95.💡

In [7]:
# create the df_vacc_rates here
df_vacc_rates = vaccination.rates |>
    filter(mmr >= 0) |>
    group_by(state) |>
    summarize(vaccination.rate = (sum(mmr >= 95, na.rm = TRUE)/n()))
    
df_vacc_rates |> head()

state,vaccination.rate
<chr>,<dbl>
Arizona,0.506404782
Arkansas,0.003527337
California,0.888506151
Colorado,0.623092236
Connecticut,0.811544992
Illinois,0.896825397


Which state (of those that report `mmr`) has the smallest fraction of schools above the CDC target vaccination rate of 95%?  

Arkansas

**Make a ECDF plot of the school vaccination rates in North Carolina.** Use the `overall` column which reports the "overall" vaccination rate at each school.

❗️Remember, you can calculate the `y` value for ECDF charts using `mutate`, and `cume_dist`.

In [None]:
vaccination.rates |>
    filter(overall >= 0) |>
    mutate(cumulative_dist = cume_dist(overall)) |>
    ggplot(aes(x = overall, y = cumulative_dist)) + 
        geom_step(direction = 'vh') + 
        labs(y = 'Fraction of Schools', x = 'Vaccination Rate (%)') + 
        geom_vline(xintercept = 95, color = 'dark blue') 
        

(My solution is below if you want to peek 👀. Uncomment the code, change the cell to `markdown`, and exectute it. I used `geom_text_repel` from [ggrepel](https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html) fo the annotation.)

![](https://github.com/chuckpr/BIOS512/blob/main/assignments/2023/measles-chart.png?raw=true)

Is the median vaccination rate for NC schools above the CDC recommended target?

In [10]:
vaccination.rates |>
    filter(overall >= 0) |>
    summarize(median.vaccination.rate = median(overall, na.rm = TRUE))
    

median.vaccination.rate
<dbl>
95


The median vaccination rate for NC schools is exactly CDC's recommended target