# BIOS512 Assignment #3

Import the 📈Tidyverse into your `R` session

In [None]:
library('tidyverse')

In this assignment we'll be using data from the [TidyTuesday Project](https://github.com/rfordatascience/tidytuesday). Specifically, we'll investigate vaccination rates at US schools. A `CSV` file of the data is located at:  

[https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-25/measles.csv]()

Get the "raw" data URL and import it into your `R` session using `read_csv`. Remember to capture the data under a variable name of your choosing!

In [3]:
Vaccine = read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-25/measles.csv', , col_types = cols())

Vaccine %>% head(5)

index,state,year,name,type,city,county,district,enroll,mmr,overall,xrel,xmed,xper,lat,lng
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<dbl>,<dbl>,<dbl>,<lgl>,<dbl>,<dbl>,<dbl>,<dbl>
1,Arizona,2018-19,A J Mitchell Elementary,Public,Nogales,Santa Cruz,,51,100,-1,,,,31.34782,-110.938
2,Arizona,2018-19,Academy Del Sol,Charter,Tucson,Pima,,22,100,-1,,,,32.22192,-110.8961
3,Arizona,2018-19,Academy Del Sol - Hope,Charter,Tucson,Pima,,85,100,-1,,,,32.13049,-111.117
4,Arizona,2018-19,Academy Of Mathematics And Science South,Charter,Phoenix,Maricopa,,60,100,-1,,,,33.48545,-112.1306
5,Arizona,2018-19,Acclaim Academy,Charter,Phoenix,Maricopa,,43,100,-1,,2.33,2.33,33.49562,-112.2247


The `mmr` column has holds 🤒 measles, mumps, rubella 🤒 vaccination rates for students in each respective school. **If the `mmr` value is not available for a school the `mmr` value is set to -1 in this data set.** 

The target `mmr` vaccination rate as [recommended by the CDC](https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5632a3.htm) for schools is 95%. 

**Calculate the fraction of schools per state that have vaccination rates greater than the CDC 95%. Capture the output as a table called `df_vacc_rates`.**

You'll need to do use `filter`, `group_by`, and `summarize`. 

I.e.
1. Filter out schools that don't report `mmr` vaccination rate (keep schools where `mmr >= 0`).
1. Group the data by `state`.
1. Summarize fracion of vaccination rates above 95%.

💡Remember `n()` stands for the number or records in a group. Also, `sum(mmr >= 95, na.rm=TRUE)` will count the number of values above or equal to 95.💡

In [4]:
# create the df_vacc_rates here

df_vacc_rates = Vaccine %>% 
                    filter(mmr >= 0) %>%
                        group_by(state) %>%
                            summarize(num_95plus = sum(mmr >= 95, na.rm=TRUE), 
                                      Total_Records = n(), 
                                      vacc_rates  = sum(mmr >= 95, na.rm=TRUE)/n())



Which state (of those that report `mmr`) has the smallest fraction of schools above the CDC target vaccination rate of 95%?  

In [5]:
df_vacc_rates %>% 
    arrange(vacc_rates) %>% #This arrangment works well because I need to find the smallest fraction and the default sort is ascending order.
        head(1) # Printing out the first row

state,num_95plus,Total_Records,vacc_rates
<chr>,<int>,<int>,<dbl>
Arkansas,2,567,0.003527337


Make a ECDF plot of the school vaccination rates in North Carolina. Use the `overall` column which reports the "overall" vaccination rate at each school.

❗️Remember, you can calculate the `y` value for ECDF charts using `row_number`, `mutate`, and `arrange`.

In [6]:
Vaccine %>% head(3)

index,state,year,name,type,city,county,district,enroll,mmr,overall,xrel,xmed,xper,lat,lng
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<dbl>,<dbl>,<dbl>,<lgl>,<dbl>,<dbl>,<dbl>,<dbl>
1,Arizona,2018-19,A J Mitchell Elementary,Public,Nogales,Santa Cruz,,51,100,-1,,,,31.34782,-110.938
2,Arizona,2018-19,Academy Del Sol,Charter,Tucson,Pima,,22,100,-1,,,,32.22192,-110.8961
3,Arizona,2018-19,Academy Del Sol - Hope,Charter,Tucson,Pima,,85,100,-1,,,,32.13049,-111.117


In [7]:
ecdf = Vaccine %>%
        filter(overall >= 0 & state == 'North Carolina') %>%
             arrange(overall) %>%
                mutate(rn = row_number(), fraction_of_schools =  rn/max(rn)) %>%
                    select(state, name, overall, fraction_of_schools, rn) 

In [None]:
p = ggplot(ecdf, aes(x = overall, y = fraction_of_schools))

p = p + geom_point() + geom_step()

p = p + scale_x_discrete(labels = c(0, 25, 50, 75, 100))

p

(My solution is below if you want to peek 👀. Uncomment the code, change the cell to `markdown`, and exectute it.)

![](https://github.com/chuckpr/BIOS512/blob/main/assignments/measles-chart.png?raw=true)

Is the median vaccination rate for NC schools above the CDC recommended target?