## 1. The Discovery of Handwashing
<p>In the mid 1800s, Dr. Ignaz Semmelweis was an obstetrician at Vienna General Hospital. At the time, maternal death due to puerperal fever was common, but he was particularly concerned that the death rate in his clinic (Clinic 1) was much higher than the death rate in another clinic at Vienna General Hospital (Clinic 2). <em>So what was the difference between these two clinics?</em> Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2. This led Dr. Semmelweis to hypothesize that doctors carried deadly "cadaverous particles" from their autopsies to their patients in Clinic 2.</p>
<p>In 1847, Dr. Semmelweis instated a policy where doctors had to use a chlorine solution to wash their hands between performing autopsies and seeing patients. The maternal mortality rate drastically decreased as seen in the plot below. Sadly, germ theory (the idea that there are particles that cause disease) was not widely accepted at the time, so his hypothesis was rejected by most doctors.</p>
<p><img src="https://assets.datacamp.com/production/project_1187/img/semmelweis_plot.png" alt="Line plot of maternal mortality rate in Clinic 1 at Vienna General Hospital" width="600px"></p>
<p>The two datasets you will use are from Dr. Semmelweis's original 1859 publication<sup>1</sup>. Here are the details:</p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/clinic_data.csv</b></div>
This contains yearly clinic-level data on births and maternal deaths in each of the two maternity clinics at Vienna General Hospital.
<ul>
<li><b><code>year</code>:</b> each year from 1833 to 1858</li>
<li><b><code>births</code>:</b> total number of births in the clinic</li>
<li><b><code>deaths</code>:</b> number of maternal deaths in the clinic</li>
<li><b><code>clinic</code>:</b> clinic (either <code>clinic_1</code> or <code>clinic_2</code>). Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2.</li>
</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/hospital_data.csv</b></div>
This contains yearly hospital-level data on births and maternal deaths. 
<ul>
<li><b><code>year</code>:</b> each year from 1784 to 1848</li>
<li><b><code>births</code>:</b> total number of births at the hospital</li>
<li><b><code>deaths</code>:</b> number of maternal deaths at the hospital</li>
<li><b><code>hospital</code>:</b> hospital (either <code>Vienna</code> or <code>Dublin</code>). At the Vienna General Hospital where Dr. Semmelweis worked, doctors began performing pathological autopsies in 1823. At the Dublin Rotunda Hospital, doctors did not perform pathological autopsies at all.</li>
</ul>
</div>
<p><small><sup>1</sup><a href="http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/the%20etiology,%20concept%20and%20prophylaxis%20of%20childbed%20fever.pdf">Ignaz Semmelweis: The etiology, concept, and prophylaxis of childbed fever.</a></small></p>

Your questions are as follows:

1.  What were the death rates for each year in the both datasets? Add a death_rate column to both the clinic_data and the hospital_data. This can be calculated by dividing deaths by births.

2.  In each clinic, what was the average death rate for the years before handwashing was introduced in 1847? Save your answer as a data frame called rate_by_clinic_pre_handwashing, containing two rows (one for each clinic) and  two columns called clinic and avg_rate.

3.  What were the average death rates in the Vienna General Hospital both before and after pathological autopsies were introduced in 1823? Save your answer as a data frame called rate_by_autopsies_introduced, containing two columns called autopsies_introduced (which takes TRUE/FALSE) and avg_rate.


In [4]:
#Ucitavanje podataka
import pandas as pd

clinic_data=pd.read_csv('datasets/clinic_data.csv')
hospital_data=pd.read_csv('datasets/hospital_data.csv')

clinic_data.head(),hospital_data.head()


(   year  births  deaths    clinic
 0  1833    3737     197  clinic_1
 1  1834    2657     205  clinic_1
 2  1835    2573     143  clinic_1
 3  1836    2677     200  clinic_1
 4  1837    2765     251  clinic_1,
    year  births  deaths hospital
 0  1784    1261      11   Dublin
 1  1785    1292       8   Dublin
 2  1786    1351       8   Dublin
 3  1787    1347      10   Dublin
 4  1788    1469      23   Dublin)

In [7]:
#Izracunavanje procenta smrtnosti - 1. pitanje

clinic_data['procenat_smrtnosti']=clinic_data['deaths']/clinic_data['births'] *100
hospital_data['procenat_smrtnosti']=hospital_data['deaths']/hospital_data['births']*100
clinic_data.head(),hospital_data.head()

(   year  births  deaths    clinic  procenat_smrtnosti
 0  1833    3737     197  clinic_1            5.271608
 1  1834    2657     205  clinic_1            7.715469
 2  1835    2573     143  clinic_1            5.557715
 3  1836    2677     200  clinic_1            7.471050
 4  1837    2765     251  clinic_1            9.077758,
    year  births  deaths hospital  procenat_smrtnosti
 0  1784    1261      11   Dublin            0.872324
 1  1785    1292       8   Dublin            0.619195
 2  1786    1351       8   Dublin            0.592154
 3  1787    1347      10   Dublin            0.742390
 4  1788    1469      23   Dublin            1.565691)

In [19]:
#Izracunavanje prosecne smrtnosti za svaku kliniku - 2.pitanje

rate_by_clinic_pre_handwashing=clinic_data[clinic_data['year']<1847]
a=rate_by_clinic_pre_handwashing.groupby('clinic')['deaths'].mean()
a

clinic
clinic_1    249.571429
clinic_2    101.571429
Name: deaths, dtype: float64

In [41]:
#Izracunavanje prosecne smrtnosti u Vienna General Hospital
lista=hospital_data.groupby('hospital')[['year','deaths']]
lista=lista.get_group('Vienna')
vienna_pre=lista[lista.year<1823]['deaths'].mean()
vienna_posle=lista[lista.year>1823]['deaths'].mean()


rate_by_autopsies_introduced=pd.DataFrame(columns=['autopsies_introduced','avg_rates'])
rate_by_autopsies_introduced['autopsies_introduced']=[False,True]
rate_by_autopsies_introduced['avg_rates']=[vienna_pre,vienna_posle]

rate_by_autopsies_introduced



Unnamed: 0,autopsies_introduced,avg_rates
0,False,23.0
1,True,263.08
