# Dr. Semmelweis Handwashing Survey Data

In 1847 the Hungarian physician Ignaz Semmelweis makes a breakthough discovery: He discovers handwashing. Contaminated hands was a major cause of childbed fever and by enforcing handwashing at his hospital he saved hundreds of lives.

This data is based on the research of Dr. Semmelweis on Handwashing.

There are two tables in this data for number of births and deaths: one is yearly with clinics and another one is monthly.

Lets explore the data.

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display, HTML
pd.options.display.max_columns = None
pd.options.display.max_rows = None

CSS = """
          .output {
              flex-direction: row;
          } 
          """
HTML('<style>{}</style>'.format(CSS))

In [None]:
monthly_deaths=pd.read_csv("../input/monthly_deaths.csv")
yearly_deaths = pd.read_csv("../input/yearly_deaths_by_clinic.csv")

First of all we are going to check the shape and column types for the data

In [None]:
print("monthly deaths shape: ",monthly_deaths.shape)
print("yearly deaths shape: ", yearly_deaths.shape)

Looks like this is a very small dataset

In [None]:
print("Monthly deaths column types: \n",monthly_deaths.dtypes)
display(monthly_deaths.head())

In [None]:
print("Yearly deaths column types: \n", yearly_deaths.dtypes)
display(yearly_deaths.head())

In [None]:
print(len(monthly_deaths.date.unique()) == monthly_deaths.date.count())

In the monthly deaths table the first column is for date which is unique key to the table but the column type is object

So we are going to set dtype as datetime.

In [None]:
monthly_deaths['date'] = pd.to_datetime(monthly_deaths['date'])
display(monthly_deaths.dtypes)

In [None]:
display(monthly_deaths.describe())
display(yearly_deaths.describe())

In [None]:
display(yearly_deaths.clinic.unique())

In [None]:
display(monthly_deaths.isnull().any())
display(yearly_deaths.isnull().any())

So we don't have any null values in our tables.

And the yearly data is for two clinics: 'clinic 1', 'clinic 2'

According to the articles I found the two clinics that Dr. Semmelweis studied the first one was with all male doctors and medical students, while the other one only the midwives.

The clinic with doctors had more rate of death than clinic with midwives

Now that we know that our data is clean, we can start our exploration of dataset

In [None]:
plt.plot(monthly_deaths.date, monthly_deaths.births, color='g', label="births")
plt.plot(monthly_deaths.date, monthly_deaths.deaths, color='r', label="deaths")
plt.xlabel("Date")
plt.ylabel("people count")
plt.legend(loc="upper left")
plt.suptitle("Fig. 1: Total number of births and deaths monthly")
_ = plt.plot()

Next we are going to plot a percentage graph for number of deaths out of total number of births in that month

In [None]:
monthly_deaths['percent'] = monthly_deaths.deaths *100 / monthly_deaths.births
plt.plot(monthly_deaths.date, monthly_deaths.percent, color='black', label="deaths percentage on births")
plt.ylabel("percentage")
plt.xlabel("Date")
plt.ylim(0,100)
plt.yticks(range(0,100, 10))
plt.grid(linestyle='dotted', linewidth=1)
plt.legend(loc="upper right")
plt.suptitle("Fig. 2: Death %age on births monthly")
_ = plt.plot()

The rate is not consistent but we can see that the max value is almost 30%

Next we are going to explore the yearly data for each clinic

In [None]:
yearly_deaths['percent'] = yearly_deaths.deaths *100/ yearly_deaths.births
clinic1 = yearly_deaths[yearly_deaths['clinic']=='clinic 1']
clinic2 = yearly_deaths[yearly_deaths['clinic']=='clinic 2']
print("total records in clinic 1: ", len(clinic1))
print("total records in clinic 2: ", len(clinic2))

In [None]:
display(clinic1.describe())
display(clinic2.describe())

We can see in the above outputs that both clinics have same number of records with data from year 1841 to 1846

We can say that Dr. Semmelweis compared date from these two clinics for his experiments with the same time frame. With clinic1 having more rate of death than clinic2.

In [None]:
plt.plot(clinic1.year, clinic1.births, label="Clinic 1")
plt.plot(clinic2.year, clinic2.births, label="Clinic 2")
plt.legend(loc="best")
plt.ylabel("number of births")
plt.xlabel("Year")
plt.suptitle("Fig. 3: Total number of births yearly in both clinics")
_ = plt.plot()

In [None]:
plt.plot(clinic1.year, clinic1.deaths, label="Clinic 1")
plt.plot(clinic2.year, clinic2.deaths, label="Clinic 2")
plt.legend(loc="best")
plt.ylabel("number of deaths")
plt.xlabel("Year")
plt.suptitle("Fig. 4: Total number of deaths yearly in both clinics")
_ = plt.plot()

In [None]:
plt.plot(clinic1.year, clinic1.percent, label="Clinic 1")
plt.plot(clinic2.year, clinic2.percent, label="Clinic 2")
plt.legend(loc="best")
plt.ylabel("percent of deaths on births")
plt.xlabel("Year")
plt.suptitle("Fig. 5: Death %age on births in both clinics")
_ = plt.plot()

In [None]:
avg_c1 = clinic1.deaths.sum() *100/ clinic1.births.sum()
print("average rate of death in clinic1 :", avg_c1)

In [None]:
avg_c2 = clinic2.deaths.sum() *100/ clinic2.births.sum()
print("average rate of death in clinic1 :", avg_c2)

In order to figure out the cause for more deaths in clinic 1 Dr. Semmelweis did many experiments. He had women in clinic 1 give birth on their side as in clinic 2. He asked preiest to not ring bell when he walks among patients in clinic1. But nothing changed

After a pathologist died from finger pricking a patient, who died from childbed fever, he observed that the symptoms in both the patologist and the patient are the same. The big diffrence in both clinics was that the doctor were performing autopsies too.

So he made a hypothesis that when doctors and students were performing autopsies they carried "cadaverous particles" from corpses and when delivering babies these got inside women's bodies, that made them sick and they died.

In [None]:
total_1847 = monthly_deaths[monthly_deaths.date <= pd.to_datetime('1847-12-01')]
total_1848 = monthly_deaths[monthly_deaths.date > pd.to_datetime('1847-12-01')]
print("average death rate till 1847 : ", (total_1847.deaths.sum()*100/ total_1847.births.sum()))
print("average death rate after 1847 : ", (total_1848.deaths.sum()*100/ total_1848.births.sum()))

This was when he ordered everyone to wash hands with Chlorine solution(chlorine to kill the smell). As we can see in Fig. 2, from year 1847 onwards, the rate of deaths significantly decreased from almost 10% to 2%.

Refrences:

[1] https://www.npr.org/sections/health-shots/2015/01/12/375663920/the-doctor-who-championed-hand-washing-and-saved-women-s-lives

[2] https://en.wikipedia.org/wiki/Ignaz_Semmelweis