# Finding Interesting Patterns in Data

Finding the origins of cholera's spread was a contentious issue in the 1800s. Before even seeing the problem in the data, the people of that time could see cholera **all around them**. As friends and relatives grew gravely ill, it became urgent to discover **why** this was happening in hopes of putting a stop to it. 

People, including the local media, had different ideas about what could be causing cholera. For instance, take a look at the following political cartoon of the time: 
<br>

<table><tr>
    <td> <img src="imgs/king_cholera.png" alt="Drawing" style="width: 600px;"/> </td>
</tr></table>

<br>

In [None]:
# Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt

19th Century London was divided into districts, much like Chicago is divided into neighborhoods. These districts were grouped by region, just like Chicago (South Side, North Side, West side, Far South Side, etc.). There were 5 regions in 19th Century London: North, South, East, West and Central.

<br>

<img src="https://i2.wp.com/londontopia.net/wp-content/uploads/2014/08/london-county.jpg" width=600>

To begin, let's look at some data about a particulary bad cholera outbreak in London in 1849. Demographic data and data about the outbreak are contained in the `outbreak_of_1849.csv` file in the `data` folder.

In [None]:
outbreak = pd.read_csv("data/outbreak_of_1849.csv")
outbreak.head()

We will use mortality rate as our **outcome variable**:

This is done with the following calculation:
$$death \ rate = {deaths \over population} \times 1000$$

Make a new column of data called `deaths_per_1000` that includes the mortality rate of each district.

While there should be only one outcome variable, there can be multiple **explanatory variables**.

Our job now is to explore explanatory variables that could potentially explain differences in mortality rate among the different disctricts. 

### Is there a relationship between mortality and where people live?

Use the `groupby()` function to average the mortality rates for each `region` of London.

## Using Scatter Plots to Find Relationships

Scatter plots are especially useful in enabling people to study the relationship between an outcome variable and an explanatory variable.

**Make a scatter plot showing the relationship between a potential explanatory variable and the mortality rate.**

### Continue Exploring

Generate additional scatter plots for other explanatory variables or use the `.groupby()` function to compare groups.

**Challenge Yourself: Develop new explanatory variables from the data you have.**

## 📓 Reflection 📓

Did you notice any interesting or unexpected patterns in the data? What do you think they suggest, if anything?