In [1]:
%run ./resources/library.py
style_notebook()

# Notebook 2: Calculating the Mean Center Point for Mortality Locations

## Unpickle Dataframe Files

In [2]:
import pandas as pd

deaths_df = pd.read_pickle('outputs/deaths_df.pickle')
pumps_df = pd.read_pickle('outputs/pumps_df.pickle')

## Add the Mean Center Point to Triangulate Observations

### Three Equations

Assuming those who got sick and died had access to water within walkable distance to a nearby water pump, let's add the mean center of all points, weighted by deaths at each point. The formula for mean center is as follows (written within `MarkDown` in [`Latex`](http://data-blog.udacity.com/posts/2016/10/latex-primer/)):

$\begin{align} \small Equation-1 && \normalsize\bar x _{weighted} = \normalsize\frac{\Sigma (x_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

$\begin{align} \small Equation-2 && \normalsize \bar y _{weighted} = \normalsize\frac{\Sigma (y_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

$\begin{align} \small Equation-3 && \normalsize {mean\ center} = \normalsize(\bar x _{weighted}, \bar y _{weighted}) \end{align}$

### Transforming Equations to Python Code

Substituting longitude values for x, latitude values for y, and deaths for w, we create two new columns, `product_LAT` (for numerator of Equation 1) and `product_LON` (for numerator of Equation 2), in the `deaths_df` dataframe. Let's display the new dataframe.

In [3]:
# Add mean center marker weighted by deaths
deaths_df['product_LAT'] = deaths_df['LAT'] * deaths_df['DEATHS']
deaths_df['product_LON'] = deaths_df['LON'] * deaths_df['DEATHS']

# Let's copy this dataframe to a new one which we can save (pickle)
mean_center_df = deaths_df

# Let's display it. Type the command to display mean_center_df below.
mean_center_df

Unnamed: 0,FID,DEATHS,LON,LAT,product_LAT,product_LON
0,0,3,-0.137930,51.513418,154.540254,-0.413790
1,1,2,-0.137883,51.513361,103.026722,-0.275766
2,2,1,-0.137853,51.513317,51.513317,-0.137853
3,3,1,-0.137812,51.513262,51.513262,-0.137812
4,4,4,-0.137767,51.513204,206.052816,-0.551068
5,5,2,-0.137537,51.513184,103.026368,-0.275074
6,6,2,-0.138200,51.513359,103.026718,-0.276400
7,7,2,-0.138045,51.513328,103.026656,-0.276090
8,8,3,-0.138276,51.513323,154.539969,-0.414828
9,9,2,-0.138223,51.513427,103.026854,-0.276446


Let's pickle this dataframe.

In [4]:
mean_center_df.to_pickle("outputs/mean_center_df.pickle")

Let's obtain the mean center coordinates by combining `mean_LAT` and `mean_LON`. We will use the <font color='red'>`sum()`</font> dot function of package `numpy`.

In [5]:
import numpy as np

Let's calculate Equation 1.

$\begin{align} \small Equation-1 && \normalsize\bar x _{weighted} = \normalsize\frac{\Sigma (x_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

In [6]:
# This corresponds to the x bar, weighted, in the mean center formula
# Equation 1
mean_LON = np.sum(deaths_df['product_LON'])/np.sum(deaths_df['DEATHS'])

Let's calculate Equation 2.

$\begin{align} \small Equation-2 && \normalsize \bar y _{weighted} = \normalsize\frac{\Sigma (y_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

In [7]:
# This corresponds to y bar, weighted, in the mean center formula
# Equation 2
mean_LAT = np.sum(deaths_df['product_LAT'])/np.sum(deaths_df['DEATHS'])

Let's put together Equation 3 and display the mean center point.

$\begin{align} \small Equation-3 && \normalsize {mean\ center} = \normalsize(\bar x _{weighted}, \bar y _{weighted}) \end{align}$

In [8]:
# Let's put these two together as coordinates
# Equation 3
mean_center_POINT = (mean_LAT, mean_LON)

mean_center_POINT

(51.51339831083845, -0.1364029734151329)

### Recreate the Notebook 1 map, `map1`

We can copy-paste all the code from Notebook 1.

In [9]:
import pandas as pd
import folium

deaths_df = pd.read_csv('resources/cholera_deaths.csv')
pumps_df = pd.read_csv('resources/johnsnow_pumps.csv')

SOHO_COORDINATES = (51.513578, -0.136722)

map1 = folium.Map(location=SOHO_COORDINATES, zoom_start=17)

folium.TileLayer('stamentoner').add_to(map1)

locationlist = deaths_df[["LAT","LON"]].values.tolist()
radiuslist = deaths_df[["DEATHS"]].values.tolist()

for i in range(0, len(locationlist)):
    popup = folium.Popup('Location: '+'('+str(locationlist[i][0])+\
                         ', '+str(locationlist[i][1])+')'+\
                         '<br/>'+\
                        'Deaths: '+ str(radiuslist[i][0]))
    folium.RegularPolygonMarker(locationlist[i], \
                                fill_color="red", \
                                number_of_sides=12, \
                                popup=popup, \
                                radius=radiuslist[i]).add_to(map1) 

for each in pumps_df.iterrows():
    popup = folium.Popup('Pump ID: '+str(each[1]['FID'].astype(int))+\
                         '<br/>'+\
                         'Location: '+'('+str(each[1]['LAT'])+\
                         ', '+str(each[1]['LON'])+')')
    #add each water pump to map1
    folium.RegularPolygonMarker([each[1]['LAT'],each[1]['LON']], \
                                fill_color='blue', \
                                number_of_sides=4, \
                                popup=popup, \
                                radius=10).add_to(map1)

map1

Let's plug that `mean_POINT` value into `map1` as a Folium `RegularPolygonMarker` and find out where the mean center is of case locations weighted by number of deaths in each location.

In [10]:
folium.RegularPolygonMarker(mean_center_POINT, \
                        fill_color="yellowgreen", \
                        number_of_sides=12, \
                        popup=folium.Popup('Mean Center Point: '+\
                                           str(mean_center_POINT)), \
                        radius=10).add_to(map1)
map1

### Applying Amplified Cognition

Let's change the tile layer to `cartodbpositron` for amplified cognition.

In [11]:
# Change the theme to "cartodbpositron" below
folium.TileLayer('cartodbpositron').add_to(map1)

# Display map1
map1

##  Congratulations !   

You have:
1. Recreated the famous John Snow Cholera map within a Jupyter notebook
2. Added Mean Center analysis to triangulate observations of pump and mortality locations on the map

## References


### Weighted Mean Center

1. https://glenbambrick.com/tag/weighted-mean-center/
2. https://docs.scipy.org/doc/numpy/index.html
3. http://data-blog.udacity.com/posts/2016/10/latex-primer/

*For case study suggestions for improvement, please contact Herman Tolentino, Jan MacGregor, James Tobias or Zhanar Haimovich.*