# How Education Affects Crime in Baltimore, MD
### <i> Daniel Choo & Adrianne Santinor </i>

<p>For this project, we decided to see if there was any significant relationship between libraries in Baltimore and any victim-based crimes that occur in each branch's area. We used the following datasets from <a href="https://data.baltimorecity.gov/">OpenBaltimore</a>:
    <ul>
        <li><strong>Libraries</strong>: this data set shows the location of Baltimore City public libraries listed on the Enoch Pratt Library website.</li>
        <li><strong>BPD Part 1 Victim Based Crime Data</strong>: this data set shows all victim-based data reported by the Baltimore Police department from 2014 to April 20<sup>th</sup>, 2019</li>
    </ul>
</p>

In [None]:
import numpy as np
import pandas as pd

lib = pd.read_csv('../Libraries.csv')
crime = pd.read_csv('../BPD_Part_1_Victim_Based_Crime_Data.csv')
lib.head()

In [3]:
crime.head()

Unnamed: 0,CrimeDate,CrimeTime,CrimeCode,Location,Description,Inside/Outside,Weapon,Post,District,Neighborhood,Longitude,Latitude,Location 1,Premise,crimeCaseNumber,Total Incidents
0,04/20/2019,9:50:00 PM,6D,MILTON AV & E BALTIMORE ST,LARCENY FROM AUTO,,,222.0,SOUTHEASTERN,Patterson Place,-76.58188,39.292,,,,1
1,04/20/2019,9:23:00 AM,4E,2100 CRIMEA RD,COMMON ASSAULT,I,,821.0,SOUTHWESTERN,Wakefield,-76.70306,39.31018,,ROW/TOWNHOUSE-OCC,,1
2,04/20/2019,9:20:00 AM,6D,2700 LIGHTHOUSE PT E,LARCENY FROM AUTO,I,,232.0,SOUTHEASTERN,Canton,-76.578,39.278,,OTHER - INSIDE,,1
3,04/20/2019,9:00:00 AM,4C,3500 W BELVEDERE AVE,AGG. ASSAULT,O,OTHER,614.0,NORTHWESTERN,Central Park Heights,-76.67907,39.3469,,STREET,,1
4,04/20/2019,9:00:00 AM,4C,3500 W BELVEDERE AVE,AGG. ASSAULT,O,OTHER,614.0,NORTHWESTERN,Central Park Heights,-76.67907,39.3469,,STREET,,1


The only attributes we're interested in right now are the following:
- Description (crime)
- District (crime)
- Total Incidents (crime)
- name (lib)
- policeDistrict (lib)

So we remove any other columns from their respective datasets.

In [4]:
crime = crime.drop(['CrimeDate', 'CrimeTime', 'CrimeCode', 'Location 1', 'Inside/Outside', 'Weapon', 'Post', 'Neighborhood', 'Longitude', 'Latitude', 'Location', 'Premise', 'crimeCaseNumber', 'Total Incidents'], axis=1)
lib = lib.drop(['zipCode', 'neighborhood', 'councilDistrict', 'Location 1', '2010 Census Neighborhoods', '2010 Census Wards Precincts', 'Zip Codes'], axis=1)

In [None]:
lib.head()

Unnamed: 0,name,policeDistrict
0,Central,CENTRAL
1,Brooklyn,SOUTHERN
2,Canton,SOUTHEASTERN
3,Cherry Hill,SOUTHERN
4,Clifton,EASTERN


In [None]:
crime.head()

Unnamed: 0,Description,District
0,LARCENY FROM AUTO,SOUTHEASTERN
1,COMMON ASSAULT,SOUTHWESTERN
2,LARCENY FROM AUTO,SOUTHEASTERN
3,AGG. ASSAULT,NORTHWESTERN
4,AGG. ASSAULT,NORTHWESTERN


In [None]:
# determining whether there are any missing values in any columns and cleaning if necessary
missingCrime = crime['District'].isna().any()
missingLib = lib['policeDistrict'].isna().any()

if missingCrime:
    print("Missing values in Victim Based Crime dataset.\nRemoving those instances now.")
    crime = crime.dropna(subset=['District'])
    
if missingLib:
    print("Missing values in Libraries dataset.\nRemoving those instances now.")
    lib = lib.dropna(subset=['policeDistrict'])

Missing values in Victim Based Crime dataset.
Removing those instances now.


In [None]:
# adding a column to crime dataset with amount of libraries in that district
# first, we gotta create a dictionary with districts as the keys and library count as the values
dictLib = {}

for row in lib.itertuples(index=False):
    if (row[1] in dictLib) and (dictLib.get(row[1]) >= 1):
        dictLib[row[1]] = dictLib.get(row[1]) + 1
    else:
        dictLib[row[1]] = 1

# array of library counts that correspond to each index of crime dataset
libCount = []        

# using dictLib to find amount of libraries for each district that a crime is committed in
for crimes in crime.itertuples(index=False):
    numLib = 0 if not dictLib.get(crimes[1]) else dictLib.get(crimes[1])
    libCount.append(numLib)

# adding that list to crime dataset
crime['Library Count'] = libCount

In [None]:
crime.head()

Unnamed: 0,Description,District,Library Count
0,LARCENY FROM AUTO,SOUTHEASTERN,4
1,COMMON ASSAULT,SOUTHWESTERN,2
2,LARCENY FROM AUTO,SOUTHEASTERN,4
3,AGG. ASSAULT,NORTHWESTERN,2
4,AGG. ASSAULT,NORTHWESTERN,2


In [None]:
ax = crime.plot.bar(rot=0)

In [None]:
crime['Description'].value_counts()

In [None]:
crime['District'].value_counts()