## Dataset Ideas
**https://data.wprdc.org/dataset/arrest-data** Maybe group by types of crime and weight differently?

**https://data.wprdc.org/dataset/pgh/resource/b7156251-6036-4b68-ad2a-95566c84343e** Neighborhood Population Data

https://data.wprdc.org/dataset/playgrounds

https://data.wprdc.org/dataset/smart-trash-containers

In [2]:
import pandas as pd
import numpy as np

In [3]:
arrest_data = pd.read_csv("arrest-data.csv")
census_data = pd.read_excel("census-data.xlsx")

In [18]:
arrest_data.sample(15)

Unnamed: 0,PK,CCR,AGE,GENDER,RACE,ARRESTTIME,ARRESTLOCATION,OFFENSES,INCIDENTLOCATION,INCIDENTNEIGHBORHOOD,INCIDENTZONE,INCIDENTTRACT,COUNCIL_DISTRICT,PUBLIC_WORKS_DIVISION,X,Y
31052,2013960,19065229,38.0,M,B,2019-05-20T17:20:00,"900 Block Second AV Pittsburgh, PA 15219",1543 Driving While Operating Privilege is Susp...,"E Carson ST & Hot Metal ST Pittsburgh, PA 15203",South Side Flats,3,1609.0,3.0,3.0,-79.964531,40.425424
45155,2032814,20157463,35.0,M,B,2020-11-24T15:25:00,"800 Block Sherwood AV Pittsburgh, PA 15204",2701(a)(3) Simple Assault - Attempts by Physic...,"3100 Block Cordell PL Pittsburgh, PA 15203",Arlington Heights,3,1604.0,3.0,3.0,-79.963453,40.417717
37143,2021768,19241536,52.0,F,B,2019-11-26T22:00:00,"300 Block Cedar AV Pittsburgh, PA 15212",13(a)(32) Paraphernalia - Use or Possession / ...,"300 Block Cedar AV Pittsburgh, PA 15212",East Allegheny,1,2304.0,1.0,1.0,-80.000908,40.450894
27224,2009090,18218332,48.0,M,W,2019-01-07T17:15:00,"900 Block 2nd AV Pittsburgh, PA 15219",9015 Failure To Appear/Arrest on Attachment Order,"1800 Block Belleau DR Pittsburgh, PA 15212",Fineview,1,2509.0,6.0,1.0,-80.006808,40.461905
12684,1990531,17170568,36.0,M,B,2017-09-05T23:54:00,"E Carson ST & S 13th ST Pittsburgh, PA 15203",5505 Public Drunkenness / 5503(a)(1) DISORDERL...,"E Carson ST & S 13th ST Pittsburgh, PA 15203",South Side Flats,3,1702.0,3.0,3.0,-79.985327,40.428775
46398,2034673,20191823,23.0,F,U,2021-02-10T17:55:00,"600 Block First AV Pittsburgh, PA 15219",4304(a)(1) Enhanced Endangering Welfare of Chi...,"2000 Block 5th AV Pittsburgh, PA 15219",Bluff,2,103.0,6.0,3.0,-79.977533,40.438011
21136,2001318,18105887,33.0,F,W,2018-06-04T14:50:00,"1900 Block Pioneer AV Pittsburgh, PA 15226",3714 Careless Driving / 3802(a)(1) DUI - Gener...,"1900 Block Pioneer AV Pittsburgh, PA 15226",Brookline,6,1917.0,4.0,5.0,-80.016059,40.404037
20764,2000877,18100267,65.0,M,B,2018-05-28T01:56:00,"7200 Block Felicia WY Pittsburgh, PA 15208",13(a)(16) Possession of Controlled Substance /...,"7200 Block Felicia WY Pittsburgh, PA 15208",Homewood South,5,1303.0,9.0,2.0,-79.896447,40.456933
27159,2009024,19000433,34.0,M,W,2019-01-01T13:33:00,"1600 Block Arlington AV Pittsburgh, PA 15210",13(a)(16) Possession of Controlled Substance /...,"1600 Block Arlington AV Pittsburgh, PA 15210",Mt. Oliver Boro,OSC,4810.0,,,-79.988162,40.418496
23942,2004919,18155658,28.0,M,B,2018-09-18T07:45:00,"900 Block Second AV Pittsburgh, PA 15219",2705 Recklessy Endangering Another Person. / 3...,Fleury WY,Homewood South,5,1303.0,,,0.0,0.0


In [34]:
# CRIME TYPES AND WEIGHTS
# Theft 4
# Burglary 4
# Simple Assault 2
# Aggravated Assault 4
# Homicide 10
# Robbery 4
# Kidnapping 8

# Idea: Get total offenses by neighborhood
#       Get number of different types of crimes by neighborhood
#       Multiply the crime types by (weight - 1) (so we can add their values to the total offenses by neighborhood)
#       Add the crime types by neighborhood value to total offenses
#       Divide this number by the population * some constant (maybe weighted crime per 10k or something)
#       Graph total offenses by neighborhood, different crimes by neighborhood, crimes per capita, weighted crimes per capita

# Creates a series for each crime in 'crimeTypes', containing the number of instances of that crime
# Each crime series is added to the dictionary 'crimeList'
def addCrimes(crimeTypes, crimeList):
    otherMask = offenses.str.contains("ABCDEFGHIJKLMNOP") # Should be false for everything
    
    for crime in crimeTypes:
        mask = offenses.str.contains(crime, na=False)
        a = arrest_data[mask].groupby("INCIDENTNEIGHBORHOOD")["OFFENSES"].count()
        crimeList[crime] = a
        otherMask = mask | otherMask # Sets any rows we used to true
    
    # All rows we DIDN'T use are added as "Other" (note the ~)
    a = arrest_data[~otherMask].groupby("INCIDENTNEIGHBORHOOD")["OFFENSES"].count()
    crimeList["Other"] = a
    

offenses = arrest_data["OFFENSES"]

# Group crimes by neighborhood into a dictionary
crimeList = {}
addCrimes(["Theft", "Burglary", "Simple Assault", "Aggravated Assault", "Homicide", "Robbery", "Kidnapping"], crimeList)


# Putt all crime types into one DataFrame
crimeInstances = pd.DataFrame(crimeList)

# merge crime types with population
cd = census_data.set_index("Neighborhood")["Pop. 2010"]
crimeInstances = crimeInstances.merge(cd, how='outer', left_index=True, right_index=True)
crimeInstances.fillna(0)

Unnamed: 0,Theft,Burglary,Simple Assault,Aggravated Assault,Homicide,Robbery,Kidnapping,Other,Pop. 2010
Allegheny Center,36.0,7.0,70.0,62.0,1.0,23.0,4.0,690.0,933.0
Allegheny West,10.0,6.0,11.0,3.0,0.0,0.0,0.0,63.0,462.0
Allentown,36.0,16.0,169.0,37.0,1.0,16.0,1.0,442.0,2500.0
Arlington,18.0,7.0,65.0,25.0,0.0,10.0,0.0,100.0,1869.0
Arlington Heights,3.0,0.0,36.0,29.0,0.0,7.0,0.0,43.0,244.0
...,...,...,...,...,...,...,...,...,...
Upper Lawrenceville,19.0,11.0,43.0,6.0,0.0,4.0,1.0,77.0,2669.0
West End,8.0,4.0,21.0,7.0,2.0,6.0,4.0,156.0,254.0
West Oakland,24.0,11.0,59.0,17.0,2.0,9.0,2.0,126.0,2604.0
Westwood,33.0,3.0,50.0,11.0,0.0,10.0,0.0,68.0,3066.0
