# Mapping and merging

In this notebook, we are going to work a bit on making a shapefile and blending it with disease data. We used old maps to create a shapefile of the old wards of New York City, which can be found in the project's Data file. 

Anyways, let's start it up and see how it goes:

In [1]:
import geopandas as gpd
import mplleaflet
import pandas as pd
import os

from geopandas import GeoDataFrame
from IPython.display import HTML

%matplotlib inline

In [2]:
workdir = os.getcwd()
try:
    os.chdir(workdir + '//Data')
except:
    print("already there...")

In [3]:
Wards = gpd.read_file('NYCWards1900.shp')

Just a check to make sure everything is working...

In [5]:
Wards.plot()
mplleaflet.show()

In [7]:
Wards.head()

Unnamed: 0,cartodb_id,descriptio,geometry,name
0,60,"Tenth Ward, Manhattan","POLYGON ((-73.99347782 40.72163218, -73.986697...",MN Ward 10
1,3,"First Ward, Brooklyn",POLYGON ((-73.99060249328613 40.68911547188819...,BK Ward 1
2,6,"Fifth Ward, Brooklyn","POLYGON ((-73.9851737 40.6949075, -73.98435831...",BK Ward 5
3,22,"Twenty-First Ward, Brooklyn","POLYGON ((-73.95704268999999 40.69908851, -73....",BK Ward 21
4,9,"Tenth Ward, Brooklyn","POLYGON ((-73.99150372 40.67500749, -73.996481...",BK Ward 10


We see that the best way of identifying or merging the data is seemingly by using the "name" code inthe Wards data. So, we will have to munge the data to get that to work correctly. Here goes - after first reading in the data:

In [8]:
Data = pd.read_stata('190313mod.dta')

In [9]:
set(Data['borough'])

{'BR', 'BX', 'MA', 'QU', 'RI'}

In [10]:
splitnames =  [name.split() for name in Wards['name']]

In [11]:
Wards['borough'] = [split[0] for split in splitnames]
Wards['number']  = [split[2] for split in splitnames]

Now that the boroughs have been split out of the tag that we gave them, we can change the tags so that they correspond with those in our data.

In [12]:
Wards['borough'].loc[Wards['borough'] == 'MN'] = 'MA'
Wards['borough'].loc[Wards['borough'] == 'QN'] = 'QU'
Wards['borough'].loc[Wards['borough'] == 'BK'] = 'BR'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [14]:
Slice = Data[Data['wards'].str.contains("Twelfth")]
Slice = Slice[Slice['year'] > 1911]
Slice = Slice[Slice['borough'] == 'MA']

There must be a better way to do this, but we will make a new row manually...

In [15]:
Slice['wards'].loc[Slice['wards'] == 'TwelfthN'] = 'Twelfth'
Slice['populationbycensusof1900'].loc[Slice['wards'] == 'Twelfth'] = (205130+332692+103532+165294)
Slice['numberofpersonstotheacre'].loc[Slice['wards'] == 'Twelfth'] = 86.599998

<pandas.core.indexing._LocIndexer at 0xab83eb8>

In [16]:
Slice['typhoidfever'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = (13+23+18+6)
Slice['typhoidfever'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = (8+17+17+12)
Slice['malarialfever'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 0
Slice['malarialfever'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 2
Slice['measles'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 54+23+9+6
Slice['measles'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 55+15+4+4
Slice['scarletfever'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 30+18+6+10
Slice['scarletfever'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 11+32+5+4
Slice['whoopingcough'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 22+15+6+4
Slice['whoopingcough'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 36+23+6+6
Slice['diphteria'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 61+33+21+16
Slice['diphteria'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 92+53+29+23
Slice['pulmonarytuberculosisphithisis'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 316+387+161+150
Slice['pulmonarytuberculosisphithisis'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 334+502+234+153
Slice['cerebrospinalmeningitis'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 16+4+4+5
Slice['cerebrospinalmeningitis'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 9+12+3+1
Slice['pneumonia'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 274+281+153+116
Slice['pneumonia'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 222+315+166+113
Slice['bronchopneumonia'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 333+191+79+54
Slice['bronchopneumonia'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 359+209+87+45
Slice['diarrhealdiseases'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 293+138+69+50
Slice['diarrhealdiseases'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 258+147+62+44
Slice['allcauses'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 3421+3728+2307+1637
Slice['allcauses'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 3100+4225+2468+1571
Slice['deathsofchildrenunder5years'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1912)] = 1454+810+347+285
Slice['deathsofchildrenunder5years'].loc[(Slice['wards'] == 'Twelfth') & (Slice['year'] == 1913)] = 1356+940+367+275

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [17]:
Slice = Slice.loc[Slice['wards'] == 'Twelfth']

In [22]:
Data = Data.loc[(Data['wards'] != 'TwelfthN')]
Data = Data.loc[(Data['wards'] != 'TwelfthC')]
Data = Data.loc[(Data['wards'] != 'TwelfthE')]
Data = Data.loc[(Data['wards'] != 'TwelfthW')]
Data = Data.loc[(Data['wards'] != 'Total')]

In [19]:
Data = Data.append(Slice)

# At this point...

We now have a clean bunch of data, and we only have to clean up the numbering of the wards. 

In [27]:
translator = {'Eighteenth':18, 'Eighth':8, 'Eleventh':11, 'Fifteenth':15, 'Fifth':5, 'First':1, 'Fourteenth':14,
 'Fourth':4, 'Nineteenth':19, 'Ninth':9, 'Second':2, 'Seventeenth':17, 'Seventh':7, 'Sixteenth':16, 'Sixth':6, 'Tenth':10,
 'Third':3, 'Thirteenth':13, 'Thirtieth':30, 'Thirty-first':31, 'Thirty-second':32, 'Twelfth':12, 'Twentieth':20,
              'Twenty-Third':23,
 'Twenty-eigth':28, 'Twenty-fifth':25, 'Twenty-first':21, 'Twenty-fourth':24, 'Twenty-ninth':29, 'Twenty-second':22,
 'Twenty-seventh':27, 'Twenty-sixth':26, 'Twenty-third':23}

In [28]:
Data['wardno']= Data['wards'].map(translator)