## 1. The Discovery of Handwashing
<p>In the mid 1800s, Dr. Ignaz Semmelweis was an obstetrician at Vienna General Hospital. At the time, maternal death due to puerperal fever was common, but he was particularly concerned that the death rate in his clinic (Clinic 1) was much higher than the death rate in another clinic at Vienna General Hospital (Clinic 2). <em>So what was the difference between these two clinics?</em> Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2. This led Dr. Semmelweis to hypothesize that doctors carried deadly "cadaverous particles" from their autopsies to their patients in Clinic 2.</p>
<p>In 1847, Dr. Semmelweis instated a policy where doctors had to use a chlorine solution to wash their hands between performing autopsies and seeing patients. The maternal mortality rate drastically decreased as seen in the plot below. Sadly, germ theory (the idea that there are particles that cause disease) was not widely accepted at the time, so his hypothesis was rejected by most doctors.</p>
<p><img src="https://assets.datacamp.com/production/project_1187/img/semmelweis_plot.png" alt="Line plot of maternal mortality rate in Clinic 1 at Vienna General Hospital" width="600px"></p>
<p>The two datasets you will use are from Dr. Semmelweis's original 1859 publication<sup>1</sup>. Here are the details:</p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/clinic_data.csv</b></div>
This contains yearly clinic-level data on births and maternal deaths in each of the two maternity clinics at Vienna General Hospital.
<ul>
<li><b><code>year</code>:</b> each year from 1833 to 1858</li>
<li><b><code>births</code>:</b> total number of births in the clinic</li>
<li><b><code>deaths</code>:</b> number of maternal deaths in the clinic</li>
<li><b><code>clinic</code>:</b> clinic (either <code>clinic_1</code> or <code>clinic_2</code>). Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2.</li>
</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/hospital_data.csv</b></div>
This contains yearly hospital-level data on births and maternal deaths. 
<ul>
<li><b><code>year</code>:</b> each year from 1784 to 1848</li>
<li><b><code>births</code>:</b> total number of births at the hospital</li>
<li><b><code>deaths</code>:</b> number of maternal deaths at the hospital</li>
<li><b><code>hospital</code>:</b> hospital (either <code>Vienna</code> or <code>Dublin</code>). At the Vienna General Hospital where Dr. Semmelweis worked, doctors began performing pathological autopsies in 1823. At the Dublin Rotunda Hospital, doctors did not perform pathological autopsies at all.</li>
</ul>
</div>
<p><small><sup>1</sup><a href="http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/the%20etiology,%20concept%20and%20prophylaxis%20of%20childbed%20fever.pdf">Ignaz Semmelweis: The etiology, concept, and prophylaxis of childbed fever.</a></small></p>

In [14]:
# The three questions to answer are the following:
# 1. What were the death rates for each year in the both datasets?
# 2. In each clinic, what was the average death rate for the years before handwashing was introduced in 1847?
# 3. What were the average death rates in the Vienna General Hospital both before and after pathological autopsies were introduced in 1823?

In [15]:
# Load the dplyr and ggplot2 packages
library(dplyr)
library(ggplot2)

In [16]:
# Read the clinic_data.csv and hospital_data.csv datasets and store them to the clinic_data and hospital_data dataframes, respectively
clinic_data <- read.csv("datasets/clinic_data.csv")
hospital_data <- read.csv("datasets/hospital_data.csv")

In [17]:
# Print the clinic_data dataframe
clinic_data

year,births,deaths,clinic
<int>,<int>,<int>,<chr>
1833,3737,197,clinic_1
1834,2657,205,clinic_1
1835,2573,143,clinic_1
1836,2677,200,clinic_1
1837,2765,251,clinic_1
1838,2987,91,clinic_1
1839,2781,151,clinic_1
1840,2889,267,clinic_1
1841,3036,237,clinic_1
1842,3287,518,clinic_1


In [18]:
# Print the hospital_data dataframe
hospital_data

year,births,deaths,hospital
<int>,<int>,<int>,<chr>
1784,1261,11,Dublin
1785,1292,8,Dublin
1786,1351,8,Dublin
1787,1347,10,Dublin
1788,1469,23,Dublin
1789,1435,25,Dublin
1790,1546,12,Dublin
1791,1602,25,Dublin
1792,1631,10,Dublin
1793,1747,19,Dublin


In [19]:
# Add a death_rate column to both the clinic_data and hospital_data dataframes giving the number of deaths divided by the number of births per year
clinic_data <- clinic_data %>%
    group_by(year) %>%
    mutate(death_rate = deaths / births)

hospital_data <- hospital_data %>%
    group_by(year) %>%
    mutate(death_rate = deaths / births)

In [20]:
# Print the newly edited clinic_data dataframe
clinic_data

# The first half of the answer to Question 1. is given by the death_rate column in the clinic_data dataframe

year,births,deaths,clinic,death_rate
<int>,<int>,<int>,<chr>,<dbl>
1833,3737,197,clinic_1,0.052716082
1834,2657,205,clinic_1,0.077154686
1835,2573,143,clinic_1,0.055577147
1836,2677,200,clinic_1,0.074710497
1837,2765,251,clinic_1,0.090777577
1838,2987,91,clinic_1,0.03046535
1839,2781,151,clinic_1,0.054297015
1840,2889,267,clinic_1,0.092419522
1841,3036,237,clinic_1,0.078063241
1842,3287,518,clinic_1,0.157590508


In [21]:
# Print the newly edited hospital_data dataframe
hospital_data

# The second half of the answer to Question 1. is given by the death_rate column in the hospital_data dataframe

year,births,deaths,hospital,death_rate
<int>,<int>,<int>,<chr>,<dbl>
1784,1261,11,Dublin,0.008723236
1785,1292,8,Dublin,0.006191950
1786,1351,8,Dublin,0.005921540
1787,1347,10,Dublin,0.007423905
1788,1469,23,Dublin,0.015656909
1789,1435,25,Dublin,0.017421603
1790,1546,12,Dublin,0.007761966
1791,1602,25,Dublin,0.015605493
1792,1631,10,Dublin,0.006131208
1793,1747,19,Dublin,0.010875787


In [22]:
# Filter the clinic_data dataframe for all entries whose year is less than 1847, group by clinic, summarize by calculating the average death rate per clinic and store the result to the rate_by_clinic_pre_handwashing dataframe
rate_by_clinic_pre_handwashing <- clinic_data %>%
    filter(year < 1847) %>%
    group_by(clinic) %>%
    summarize(avg_rate = mean(death_rate))

`summarise()` ungrouping output (override with `.groups` argument)



In [23]:
# Print the rate_by_clinic_pre_handwashing dataframe
rate_by_clinic_pre_handwashing

# The answer to Quesiton 2. is given by the avg_rate column of the rate_by_clinic_pre_handwashing dataframe

clinic,avg_rate
<chr>,<dbl>
clinic_1,0.07993925
clinic_2,0.04787381


In [24]:
# Add a autopsies_introduced column which yields TRUE for entries whose year is greater than or equal to 1823, and FALSE otherwise, group by the autopsies_introduced columns, summarize by calculating the average death rate before and after autopsies were introduced and store the result to the rate_by_autopsies_introduced dataframe

rate_by_autopsies_introduced <- hospital_data %>%
    mutate(autopsies_introduced  = (year >= 1823)) %>%
    group_by(autopsies_introduced) %>%
    summarize(avg_rate = mean(death_rate))

`summarise()` ungrouping output (override with `.groups` argument)



In [25]:
# Print the rate_by_autopsies_introduced dataframe
rate_by_autopsies_introduced

# The answer to Question 3. is given by the avg_rate column of the rate_by_autopsies_introduced dataframe

autopsies_introduced,avg_rate
<lgl>,<dbl>
False,0.01125877
True,0.03660768
