# Introduction

The goal with this project is to determine the best neighborhood in Pittsburgh. We thought that crime was the most important factor because neighborhood safety is the top priority, so we used that as our evaluation metric.

# Metrics

We split the crimes into a handful of different types, and then weighed each crime by how "bad" we considered them to be (anything not specified was considered a 1). Here are the crime types and weight numbers:

* Theft 4
* Burglary 4
* Simple Assault 2
* Aggravated Assault 4
* Homicide 10
* Robbery 4
* Kidnapping 8

We then added up all of our weighted crimes and divided by the population of each neighborhood to determine how "bad" the crime is there. This is ultimately how we decided the best neighborhood.

# Our Process

First, we need to import pandas, geopandas, and our datasets: Arrest Data and Census Data.

In [7]:
import pandas as pd
import geopandas
arrest_data = pd.read_csv("arrest-data.csv")
census_data = pd.read_excel("census-data.xlsx")

We should sample our datasets to see what we're working with:

In [8]:
arrest_data.sample(3)

Unnamed: 0,PK,CCR,AGE,GENDER,RACE,ARRESTTIME,ARRESTLOCATION,OFFENSES,INCIDENTLOCATION,INCIDENTNEIGHBORHOOD,INCIDENTZONE,INCIDENTTRACT,COUNCIL_DISTRICT,PUBLIC_WORKS_DIVISION,X,Y
23215,2003975,18147537,59.0,M,B,2018-08-02T02:18:00,"7200 Block Kelly ST Pittsburgh, PA 15208",2706 Terroristic Threats. / 5503 Disorderly Co...,"7200 Block Kelly ST Pittsburgh, PA 15208",Homewood South,5,1303.0,9.0,2.0,-79.896996,40.455519
39146,2024423,20026583,30.0,F,B,2020-02-07T21:15:00,"N Homewood AV & Formosa WY Pittsburgh, PA 15208",4303 General Lighting Requirements. / 13(a)(31...,"N Homewood AV & Formosa WY Pittsburgh, PA 15208",Homewood South,5,1303.0,9.0,2.0,-79.89724,40.455077
39784,2025313,20043822,34.0,F,W,2020-03-03T10:46:00,"900 Block Freeport RD Pittsburgh, PA 15238",3929 Retail Theft.,"900 Block Freeport RD Pittsburgh, PA 15238",East Hills,5,1306.0,9.0,2.0,-79.892353,40.486119


In [9]:
census_data.sample(3)

Unnamed: 0,Neighborhood,Sector #,Pop. 1940,Pop. 1950,Pop. 1960,Pop. 1970,Pop. 1980,Pop. 1990,Pop. 2000,Pop. 2010,...,% Other (2010),% White (2010),% 2+ Races (2010),% Hispanic (of any race) (2010),% Pop. Age < 5 (2010),% Pop. Age 5-19 (2010),% Pop. Age 20-34 (2010),% Pop. Age 35-59 (2010),% Pop. Age 60-74 (2010),% Pop. Age > 75 (2010)
31,Fairywood,4,1324,4491,3819,3240,3008,2951,1099,1002,...,0.020958,0.628743,0.0489,0.048,0.1225,0.2375,0.2025,0.25,0.1875,0.0
78,Stanton Heights,13,4610,6024,8249,7679,6223,5085,4842,4601,...,0.006086,0.558792,0.0259,0.014,0.0646,0.1409,0.1819,0.3951,0.1353,0.0822
25,East Allegheny,3,12971,11763,8763,5953,4420,3088,2635,2136,...,0.016854,0.626873,0.0445,0.04,0.0433,0.0957,0.2338,0.4355,0.1344,0.0573


Now, we define a function that takes in a list of specified crime types and a dictionary and adds every instance of each crime per neighborhood to the dictionary. All other crimes are added to the group "Other".

In [24]:
def addCrimes(crimeTypes, crimeList):
    otherMask = offenses.str.contains("ABCDEFGHIJKLMNOP") # Should be false for everything
    
    for crime in crimeTypes:
        mask = offenses.str.contains(crime, na=False)
        a = arrest_data[mask].groupby("INCIDENTNEIGHBORHOOD")["OFFENSES"].count()
        crimeList[crime] = a
        otherMask = mask | otherMask # Sets any rows we used to true
    
    # All rows we DIDN'T use are added as "Other" (note the ~)
    a = arrest_data[~otherMask].groupby("INCIDENTNEIGHBORHOOD")["OFFENSES"].count()
    crimeList["Other"] = a

We can then create our dictionary and add the crimes to it:

In [25]:
offenses = arrest_data["OFFENSES"]

# Group crimes by neighborhood into a dictionary
crimeList = {}
addCrimes(["Theft", "Burglary", "Simple Assault", "Aggravated Assault", "Homicide", "Robbery", "Kidnapping"], crimeList)

Let's make a Data Frame out of our dictionary:

In [26]:
# Put all crime types into one DataFrame
crimeInstances = pd.DataFrame(crimeList)
crimeInstances

Unnamed: 0_level_0,Theft,Burglary,Simple Assault,Aggravated Assault,Homicide,Robbery,Kidnapping,Other
INCIDENTNEIGHBORHOOD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Allegheny Center,36.0,7.0,70.0,62.0,1.0,23.0,4.0,690
Allegheny West,10.0,6.0,11.0,3.0,,,,63
Allentown,36.0,16.0,169.0,37.0,1.0,16.0,1.0,442
Arlington,18.0,7.0,65.0,25.0,,10.0,,100
Arlington Heights,3.0,,36.0,29.0,,7.0,,43
...,...,...,...,...,...,...,...,...
Upper Lawrenceville,19.0,11.0,43.0,6.0,,4.0,1.0,77
West End,8.0,4.0,21.0,7.0,2.0,6.0,4.0,156
West Oakland,24.0,11.0,59.0,17.0,2.0,9.0,2.0,126
Westwood,33.0,3.0,50.0,11.0,,10.0,,68


The next thing to do is to get the population of each neighborhood and include it in the same dataset by merging. The most recent census data was from 2010, so that's the column we'll use from the census dataset.

In [27]:
# merge crime types with population
cd = census_data.set_index("Neighborhood")["Pop. 2010"]
crimeInstances = crimeInstances.merge(cd, how='outer', left_index=True, right_index=True)
crimeInstances.fillna(0)

Unnamed: 0,Theft,Burglary,Simple Assault,Aggravated Assault,Homicide,Robbery,Kidnapping,Other,Pop. 2010
Allegheny Center,36.0,7.0,70.0,62.0,1.0,23.0,4.0,690.0,933.0
Allegheny West,10.0,6.0,11.0,3.0,0.0,0.0,0.0,63.0,462.0
Allentown,36.0,16.0,169.0,37.0,1.0,16.0,1.0,442.0,2500.0
Arlington,18.0,7.0,65.0,25.0,0.0,10.0,0.0,100.0,1869.0
Arlington Heights,3.0,0.0,36.0,29.0,0.0,7.0,0.0,43.0,244.0
...,...,...,...,...,...,...,...,...,...
Upper Lawrenceville,19.0,11.0,43.0,6.0,0.0,4.0,1.0,77.0,2669.0
West End,8.0,4.0,21.0,7.0,2.0,6.0,4.0,156.0,254.0
West Oakland,24.0,11.0,59.0,17.0,2.0,9.0,2.0,126.0,2604.0
Westwood,33.0,3.0,50.0,11.0,0.0,10.0,0.0,68.0,3066.0


Now we can multiply the instances of each crime by our specified weights:

In [28]:
weighted = crimeInstances
weighted["Theft"] = crimeInstances["Theft"]*4
weighted["Burglary"] = crimeInstances["Burglary"]*4
weighted["Simple Assault"] = crimeInstances["Simple Assault"]*2
weighted["Aggravated Assault"] = crimeInstances["Aggravated Assault"]*4
weighted["Homicide"] = crimeInstances["Homicide"]*10
weighted["Robbery"] = crimeInstances["Robbery"]*4
weighted["Kidnapping"] = crimeInstances["Kidnapping"]*8

weighted = weighted.fillna(0)
weighted

Unnamed: 0,Theft,Burglary,Simple Assault,Aggravated Assault,Homicide,Robbery,Kidnapping,Other,Pop. 2010
Allegheny Center,144.0,28.0,140.0,248.0,10.0,92.0,32.0,690.0,933.0
Allegheny West,40.0,24.0,22.0,12.0,0.0,0.0,0.0,63.0,462.0
Allentown,144.0,64.0,338.0,148.0,10.0,64.0,8.0,442.0,2500.0
Arlington,72.0,28.0,130.0,100.0,0.0,40.0,0.0,100.0,1869.0
Arlington Heights,12.0,0.0,72.0,116.0,0.0,28.0,0.0,43.0,244.0
...,...,...,...,...,...,...,...,...,...
Upper Lawrenceville,76.0,44.0,86.0,24.0,0.0,16.0,8.0,77.0,2669.0
West End,32.0,16.0,42.0,28.0,20.0,24.0,32.0,156.0,254.0
West Oakland,96.0,44.0,118.0,68.0,20.0,36.0,16.0,126.0,2604.0
Westwood,132.0,12.0,100.0,44.0,0.0,40.0,0.0,68.0,3066.0


Since we need a single number for each neighborhood, we'll make a total of our weighted crimes, then divide that number by the population:

In [29]:
weighted["Total"] = weighted["Theft"]+weighted["Burglary"]+weighted["Simple Assault"]+weighted["Aggravated Assault"]+weighted["Homicide"]+weighted["Robbery"]+weighted["Kidnapping"]+weighted["Other"]
weighted["Total/Pop"]=weighted["Total"]/weighted["Pop. 2010"]
weighted

Unnamed: 0,Theft,Burglary,Simple Assault,Aggravated Assault,Homicide,Robbery,Kidnapping,Other,Pop. 2010,Total,Total/Pop
Allegheny Center,144.0,28.0,140.0,248.0,10.0,92.0,32.0,690.0,933.0,1384.0,1.483387
Allegheny West,40.0,24.0,22.0,12.0,0.0,0.0,0.0,63.0,462.0,161.0,0.348485
Allentown,144.0,64.0,338.0,148.0,10.0,64.0,8.0,442.0,2500.0,1218.0,0.487200
Arlington,72.0,28.0,130.0,100.0,0.0,40.0,0.0,100.0,1869.0,470.0,0.251471
Arlington Heights,12.0,0.0,72.0,116.0,0.0,28.0,0.0,43.0,244.0,271.0,1.110656
...,...,...,...,...,...,...,...,...,...,...,...
Upper Lawrenceville,76.0,44.0,86.0,24.0,0.0,16.0,8.0,77.0,2669.0,331.0,0.124016
West End,32.0,16.0,42.0,28.0,20.0,24.0,32.0,156.0,254.0,350.0,1.377953
West Oakland,96.0,44.0,118.0,68.0,20.0,36.0,16.0,126.0,2604.0,524.0,0.201229
Westwood,132.0,12.0,100.0,44.0,0.0,40.0,0.0,68.0,3066.0,396.0,0.129159
