# Police Homicide Demographics



This notebook will walk you through the process of connecting block-level demographic data to the Guardian's Counted database.  The Counted is a project initiated by The Guardian attempting to count the number of people killed by police in the United States.

We'll start by downloading the Guardian database to our local folder and unzipping it.

In [4]:
import urllib.request as urllib # We'll need this to download the file...
import zipfile as zf # And this to unzip it.

counted_data = urllib.urlretrieve("https://interactive.guim.co.uk/2015/the-counted/thecounted-data.zip", "counted/thecounted-data.zip")
zip_ref = zf.ZipFile("counted/thecounted-data.zip", 'r')
zip_ref.extractall(path = 'counted')
zip_ref.close()

Now that we've got the data, let's convert it to a Pandas dataframe and take a look.

In [8]:
import pandas as pd

counted = pd.read_csv('counted/the-counted.csv') #Guardian's The Counted Database

counted.head()

Unnamed: 0,uid,name,age,gender,raceethnicity,month,day,year,streetaddress,city,state,classification,lawenforcementagency,armed
0,2,Matthew Ajibade,22,Male,Black,January,1,2015,1050 Carl Griffin Dr,Savannah,GA,Death in custody,Chatham County Sheriff's Office,No
1,4,Lewis Lembke,47,Male,White,January,2,2015,4505 SW Masters Loop,Aloha,OR,Gunshot,Washington County Sheriff's Office,Firearm
2,7,Tim Elliott,53,Male,Asian/Pacific Islander,January,2,2015,600 E Island Lake Dr,Shelton,WA,Gunshot,Mason County Sheriff's Office,Firearm
3,5,Michael Kocher Jr,19,Male,White,January,3,2015,2600 Kaumualii Hwy,Kaumakani,HI,Struck by vehicle,Kauai Police Department,No
4,6,John Quintero,23,Male,Hispanic/Latino,January,3,2015,500 North Oliver Ave,Wichita,KS,Gunshot,Wichita Police Department,No


Great.  We've got the core data and it looks like everything imported just fine.  Since we'll need geographic coordinates to eventually get our demographic data, we'll need to first convert our addresses to those.  Luckily, the Python library 'pygeocoder' will do that for us.

In [11]:
from pygeocoder import Geocoder

counted['fulladdress'] = counted['streetaddress'] + ', ' + counted['city'] + ' ' + counted['state']
# We could have done this within the function but we may as well concat our address variables now.

You'll need to register with Google to get an API key for Google maps in order to convert our addresses.  For now, we'll just use a fake API key.

In [19]:
apikey = '12345'

def geo(address):
    geocoder = Geocoder(api_key = apikey)
    try:
        location = Geocoder.geocode(address)
        coords = [location.latitude,location.longitude]
        if location.valid_address:
            return coords, 'success'
        else:
            #print('1st pass: Invalid address: ' + address)
            return coords,'invalid address'
    except:
        #print('2nd pass: This one didn\'t work: ' + address)
        return [0.0,0.0], 'failure'

Now that we've created a function to calculate our addresses, we can use the pandas 'apply' function to run it on every row of our data.

In [16]:
output = counted['fulladdress'].apply(geo)

Some entries are missing addresses, some have incomplete addresses (i.e. 'Main St.' or 'Jesse St and Mateo St'), and some fail for other reasons.  At some point we'll need to do some work to clean these up, but for now the vast majority work.  Also note that while we'll receive an 'invalid address' response on some of the intersection based addresses, Google actually returns coordinates on most of those and a cursory glance in Google maps shows that it's usually spot on.

Let's add the coordinates and the results to our dataframe.

In [17]:
counted['coordinates'] = [x[0] for x in output]
counted['result'] = [x[1] for x in output]

Let's check out the data again.

In [18]:
counted.head()

Unnamed: 0,uid,name,age,gender,raceethnicity,month,day,year,streetaddress,city,state,classification,lawenforcementagency,armed,fulladdress,coordinates,result
0,2,Matthew Ajibade,22,Male,Black,January,1,2015,1050 Carl Griffin Dr,Savannah,GA,Death in custody,Chatham County Sheriff's Office,No,"1050 Carl Griffin Dr, Savannah GA","[32.066691, -81.167881]",success
1,4,Lewis Lembke,47,Male,White,January,2,2015,4505 SW Masters Loop,Aloha,OR,Gunshot,Washington County Sheriff's Office,Firearm,"4505 SW Masters Loop, Aloha OR","[45.4864514, -122.8912564]",success
2,7,Tim Elliott,53,Male,Asian/Pacific Islander,January,2,2015,600 E Island Lake Dr,Shelton,WA,Gunshot,Mason County Sheriff's Office,Firearm,"600 E Island Lake Dr, Shelton WA","[47.2465339, -123.119497]",success
3,5,Michael Kocher Jr,19,Male,White,January,3,2015,2600 Kaumualii Hwy,Kaumakani,HI,Struck by vehicle,Kauai Police Department,No,"2600 Kaumualii Hwy, Kaumakani HI","[21.9332907, -159.6418879]",invalid address
4,6,John Quintero,23,Male,Hispanic/Latino,January,3,2015,500 North Oliver Ave,Wichita,KS,Gunshot,Wichita Police Department,No,"500 North Oliver Ave, Wichita KS","[37.6938192, -97.2805298]",success


Since everything worked, let's go ahead and save this as a .csv.  Since grabbing the coordinates takes a while, it's a good idea to go ahead and save this output so that we can pick up from here later if need be.

In [22]:
counted.to_csv('CSVs/police_homicide_gps.csv')

Now that we have our coordinates, we'll want to translate those coordinates to census block numbers.  Luckily, the FCC has an API, so we can use the python 'requests' module to easily get our data.  We'll also use BeautifulSoup to parse our XML.  There are lighter weight XML parsers available for Python, but I'm used to BeautifulSoup so that's what I've used here.

In [27]:
import requests
from bs4 import BeautifulSoup

def fccblockfetch(coords): #Grab block numbers from the FCC
    if coords != [0.0,0.0]:
        url = 'http://data.fcc.gov/api/block/2010/find'
        params = {'latitude': str(coords[0]), 'longitude': str(coords[1]), 'censusYear': 2010, 'showall': 'false'}
        response = requests.get(url, params = params) #Get XML
        parsed = BeautifulSoup(response.content) #Parse XML
        try:
            blockFIPS = parsed.block['fips']
        except:
            blockFIPS = '000000000000000'
        #print(blockFIPS)
        return blockFIPS
    else:
        return '000000000000000'

Now we've got a function that will grab the data, so we can again use 'apply' to apply it to all of our rows.

In [28]:
counted['blockFIPS'] = counted['coordinates'].apply(fccblockfetch)

Let's check our data out again.

In [29]:
counted.head()

Unnamed: 0,uid,name,age,gender,raceethnicity,month,day,year,streetaddress,city,state,classification,lawenforcementagency,armed,fulladdress,coordinates,result,blockFIPS
0,2,Matthew Ajibade,22,Male,Black,January,1,2015,1050 Carl Griffin Dr,Savannah,GA,Death in custody,Chatham County Sheriff's Office,No,"1050 Carl Griffin Dr, Savannah GA","[32.066691, -81.167881]",success,130510105013018
1,4,Lewis Lembke,47,Male,White,January,2,2015,4505 SW Masters Loop,Aloha,OR,Gunshot,Washington County Sheriff's Office,Firearm,"4505 SW Masters Loop, Aloha OR","[45.4864514, -122.8912564]",success,410670317033016
2,7,Tim Elliott,53,Male,Asian/Pacific Islander,January,2,2015,600 E Island Lake Dr,Shelton,WA,Gunshot,Mason County Sheriff's Office,Firearm,"600 E Island Lake Dr, Shelton WA","[47.2465339, -123.119497]",success,530459606002032
3,5,Michael Kocher Jr,19,Male,White,January,3,2015,2600 Kaumualii Hwy,Kaumakani,HI,Struck by vehicle,Kauai Police Department,No,"2600 Kaumualii Hwy, Kaumakani HI","[21.9332907, -159.6418879]",invalid address,150070408001141
4,6,John Quintero,23,Male,Hispanic/Latino,January,3,2015,500 North Oliver Ave,Wichita,KS,Gunshot,Wichita Police Department,No,"500 North Oliver Ave, Wichita KS","[37.6938192, -97.2805298]",success,201730010004007


Once again, looks good.  Let's save again before moving on and getting our racial demographics.

In [31]:
counted.to_csv('CSVs/police_homicide_gps_blockFIPS.csv')