## Dataset Ideas
**https://data.wprdc.org/dataset/arrest-data** Maybe group by types of crime and weight differently?

**https://data.wprdc.org/dataset/pgh/resource/b7156251-6036-4b68-ad2a-95566c84343e** Neighborhood Population Data

https://data.wprdc.org/dataset/playgrounds

https://data.wprdc.org/dataset/smart-trash-containers

In [1]:
import pandas as pd
import numpy as np

In [2]:
arrest_data = pd.read_csv("arrest-data.csv")
census_data = pd.read_excel("census-data.xlsx")

In [123]:
arrest_data

Unnamed: 0,PK,CCR,AGE,GENDER,RACE,ARRESTTIME,ARRESTLOCATION,OFFENSES,INCIDENTLOCATION,INCIDENTNEIGHBORHOOD,INCIDENTZONE,INCIDENTTRACT,COUNCIL_DISTRICT,PUBLIC_WORKS_DIVISION,X,Y
0,1975272,16158872,42.0,F,B,2016-08-24T12:20:00,"4700 Block Centre AV Pittsburgh, PA 15213",3929 Retail Theft.,"4700 Block Centre AV Pittsburgh, PA 15213",Bloomfield,5,804.0,8.0,2.0,-79.949277,40.452551
1,1974456,16144120,31.0,M,W,2016-08-03T14:55:00,"4200 Block Steubenville PKE Pittsburgh, PA 15205",13(a)(16) Possession of Controlled Substance,"4200 Block Steubenville PKE Pittsburgh, PA 15205",Outside City,OSC,5599.0,,,-80.088018,40.440136
2,1974466,16144165,63.0,F,B,2016-08-03T16:45:00,"900 Block Freeport RD Fox Chapel, PA 15238",3929 Retail Theft.,"900 Block Freeport RD Fox Chapel, PA 15238",Westwood,5,2811.0,9.0,2.0,-79.891803,40.486625
3,1974550,16145257,25.0,F,W,2016-08-05T02:36:00,"Foreland ST & Cedar AV Pittsburgh, PA 15212",5503 Disorderly Conduct. / 5505 Public Drunken...,"Foreland ST & Cedar AV Pittsburgh, PA 15212",East Allegheny,1,2304.0,1.0,1.0,-80.001939,40.454080
4,1974596,16145962,25.0,M,B,2016-08-06T02:00:00,"900 Block Woodlow ST Pittsburgh, PA 15205",2702 Aggravated Assault. / 2705 Recklessy Enda...,"900 Block Woodlow ST Pittsburgh, PA 15205",Crafton Heights,5,2814.0,2.0,5.0,-80.052204,40.445900
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47272,2035968,21044101,21.0,F,B,2021-03-16T03:29:00,"600 Block Commonwealth PL Pittsburgh, PA 15222",2701 Simple Assault. / 13(a)(31) Marijuana: Po...,"600 Block Commonwealth PL Pittsburgh, PA 15222",Central Business District,2,201.0,6.0,6.0,-80.006409,40.441741
47273,2035969,21044206,33.0,M,W,2021-03-16T08:55:00,"70 Block Wyoming ST Pittsburgh, PA 15211",13(a)(16) Possession of Controlled Substance,"70 Block Wyoming ST Pittsburgh, PA 15211",Mount Washington,3,1914.0,2.0,5.0,-80.006580,40.428386
47274,2035970,21044238,49.0,M,B,2021-03-16T10:21:00,"3300 Block Milwaukee ST Pittsburgh, PA 15219",9093 Indirect Criminal Contempt,"3300 Block Milwaukee ST Pittsburgh, PA 15219",Upper Hill,2,506.0,6.0,3.0,-79.964359,40.452303
47275,2035971,21044859,21.0,M,B,2021-03-17T09:35:00,"500 Block Rosedale ST Pittsburgh, PA 15221",2701 Simple Assault.,"500 Block Rosedale ST Pittsburgh, PA 15221",Homewood South,5,1304.0,9.0,2.0,-79.887261,40.449271


In [125]:
# CRIME TYPES AND WEIGHTS
# Theft 1.25
# Simple Assault 1.5
# Aggravated Assault 1.75
# Rape 2
# Possession 0.75
# Robbery 1.5
# Trespass 0.75
# Disorderly Conduct 1.25

# Idea: Get total offenses by neighborhood
#       Get number of different types of crimes by neighborhood
#       Multiply the crime types by (weight - 1) (so we can add their values to the total offenses by neighborhood)
#       Add the crime types by neighborhood value to total offenses
#       Divide this number by the population * some constant (maybe weighted crime per 10k or something)
#       Graph total offenses by neighborhood, different crimes by neighborhood, crimes per capita, weighted crimes per capita

# Creates a series for each crime in 'crimeTypes', containing the number of instances of that crime
# Each crime series is added to the dictionary 'crimeList'
def addCrimes(crimeTypes, crimeList):
    for crime in crimeTypes:
        mask = offenses.str.contains(crime, na=False)
        a = arrest_data[mask].groupby("INCIDENTNEIGHBORHOOD")["OFFENSES"].count()
        crimeList[crime] = a
    

offenses = arrest_data["OFFENSES"]

# Group crimes by neighborhood into a dictionary
crimeList = {}
addCrimes(["Theft", "Simple Assault", "Aggravated Assault", "Possession", "Robbery", "Trespass", "Disorderly Conduct"], crimeList)


# Putt all crime types into one DataFrame
crimeInstances = pd.DataFrame(crimeList).fillna(0)

# merge crime types with population
cd = census_data.set_index("Neighborhood")["Pop. 2010"]
crimeInstances = crimeInstances.merge(cd, how='outer', left_index=True, right_index=True)
crimeInstances

Unnamed: 0,Theft,Simple Assault,Aggravated Assault,Possession,Robbery,Trespass,Disorderly Conduct,Pop. 2010
Allegheny Center,36.0,70.0,62.0,411.0,23.0,22.0,40.0,933.0
Allegheny West,10.0,11.0,3.0,23.0,0.0,2.0,5.0,462.0
Allentown,36.0,169.0,37.0,229.0,16.0,15.0,17.0,2500.0
Arlington,18.0,65.0,25.0,39.0,10.0,8.0,2.0,1869.0
Arlington Heights,3.0,36.0,29.0,19.0,7.0,0.0,5.0,244.0
...,...,...,...,...,...,...,...,...
Upper Lawrenceville,19.0,43.0,6.0,29.0,4.0,3.0,5.0,2669.0
West End,8.0,21.0,7.0,61.0,6.0,0.0,4.0,254.0
West Oakland,24.0,59.0,17.0,47.0,9.0,7.0,1.0,2604.0
Westwood,33.0,50.0,11.0,20.0,10.0,1.0,0.0,3066.0
