# Baltimore City Crime Analysis

Baltimore Crime Data: https://data.baltimorecity.gov/datasets/baltimore::part-1-crime-data-/explore               
Baltimore shape file: https://data.imap.maryland.gov/datasets/maryland::maryland-baltimore-city-neighborhoods/about

   Growing up in the state of Maryland, the excess of crime that occurs in Baltimore covers most news cycles in the local region. As a data scientist, I want to explore trends in crime within the city of Baltimore, and perhaps uncover trends in crime occurance based on inceident based reporting data kept by the city of Baltimore. This notebook details a few ways of bringing the incident based data to life using the folium package, building out interactive marker cluster maps and heat maps.
    
   The crime data source is a frequently updated database of all crime incidents that occur within the city, with attributes for time of day, crime type and location.
    
   This analysis explores the visualization of this data, cultminating with an implementation of choropleth map that iterates through time. A tool like this could ideally be used to identify problem areas for various types of crime, understanding the geopgraphical lay out of different crime types across the city.


In [1]:
import pandas as pd
import numpy as np
import folium
import datetime as dt
import plotly.express as px

First, we read in the data, exploring some of the features. The observations represent specific crime incidents within Baltimore.

In [2]:
crime = pd.read_csv('D:/Part_1_Crime_Data_.csv')
crime.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,X,Y,RowID,CrimeDateTime,CrimeCode,Location,Description,Inside_Outside,Weapon,Post,...,Ethnicity,District,Neighborhood,Latitude,Longitude,GeoLocation,Premise,VRIName,Total_Incidents,Shape
0,-76.5719,39.2949,1,2022/06/25 18:00:00+00,4E,3200 E FAYETTE ST,COMMON ASSAULT,,PERSONAL_WEAPONS,224,...,NOT_HISPANIC_OR_LATINO,SOUTHEAST,ELLWOOD PARK/MONUMENT,39.2949,-76.5719,"(39.2949,-76.5719)",,,1,
1,-76.6824,39.3477,2,2022/06/25 20:38:00+00,4E,5300 CUTHBERT AVE,COMMON ASSAULT,,PERSONAL_WEAPONS,633,...,,NORTHWEST,ARLINGTON,39.3477,-76.6824,"(39.3477,-76.6824)",,Northwestern,1,
2,-76.5981,39.2931,3,2022/06/25 14:19:00+00,6F,1400 E FAYETTE ST,LARCENY,,,212,...,,SOUTHEAST,DUNBAR-BROADWAY,39.2931,-76.5981,"(39.2931,-76.5981)",,,1,
3,-76.5939,39.2903,4,2022/06/25 02:00:00+00,6C,100 S BROADWAY,LARCENY,,,212,...,,SOUTHEAST,WASHINGTON HILL,39.2903,-76.5939,"(39.2903,-76.5939)",,,1,
4,-76.6188,39.3033,5,2022/06/25 03:50:00+00,5B,1000 PARK AVE,BURGLARY,,,134,...,,CENTRAL,MID-TOWN BELVEDERE,39.3033,-76.6188,"(39.3033,-76.6188)",,,1,


After some basic data manipulation to format the dates and subset to necessary columns, we discover that the data set is incomplete for 2010 and prior years. Subsequently, we remove those incomplete years form the data. Some more data manipulation is performed to properly format the data for the visualizations.

In [3]:
crime_baltimore = crime.copy()
crime_baltimore = crime_baltimore[['CrimeDateTime','Description','Neighborhood','Latitude','Longitude']].dropna()
crime_baltimore = crime_baltimore.rename(columns = {'CrimeDateTime': 'DATE', 'Description': 'TYPE', 'District': 'DISTRICT', 'Neighborhood': 'NEIGHBORHOOD','Latitude': 'LAT','Longitude':'LON'})
crime_baltimore['YEAR'] = crime_baltimore['DATE'].str[:4].astype(int)
crime_baltimore = crime_baltimore[crime_baltimore['YEAR'] > 2000]
crime_baltimore['YEAR'].value_counts() # reporting system looks to have changed in 2010

2017    52183
2011    48631
2016    48582
2018    48482
2013    48017
2015    47976
2012    47819
2019    46497
2014    45129
2021    37208
2020    36194
2022    17535
2010       27
2009       15
2008       12
2007       12
2001       10
2002        6
2004        5
2003        4
2006        3
2005        3
Name: YEAR, dtype: int64

In [4]:
crime_baltimore = crime_baltimore[(crime_baltimore['YEAR'] > 2010) & (crime_baltimore['YEAR'] < 2022)]
crime_baltimore['DATE'] = crime_baltimore['DATE'].str[:10].astype('datetime64[ns]')
crime_baltimore = crime_baltimore.sort_values('DATE').reset_index(drop=True)
crime_baltimore['DATE'] = pd.to_datetime(crime_baltimore['DATE'], format = '%Y-%m-%d')
crime_baltimore['DATE'] = crime_baltimore['DATE'].dt.year
crime_baltimore = crime_baltimore[['DATE','TYPE','NEIGHBORHOOD']].reset_index()
crime_baltimore = crime_baltimore.groupby(['DATE','NEIGHBORHOOD', 'TYPE'])['index'].count().reset_index()
crime_baltimore = crime_baltimore.pivot(index = ['DATE', 'NEIGHBORHOOD'], columns = 'TYPE', values  ='index').reset_index().fillna(0)

crime_baltimore.head()

TYPE,DATE,NEIGHBORHOOD,AGG. ASSAULT,ARSON,AUTO THEFT,BURGLARY,COMMON ASSAULT,HOMICIDE,LARCENY,LARCENY FROM AUTO,RAPE,ROBBERY - CARJACKING,ROBBERY - COMMERCIAL,ROBBERY - RESIDENCE,ROBBERY - STREET,SHOOTING
0,2011,ABELL,7.0,0.0,9.0,19.0,13.0,0.0,28.0,5.0,1.0,0.0,3.0,1.0,6.0,0.0
1,2011,ALLENDALE,29.0,0.0,34.0,45.0,35.0,0.0,30.0,29.0,1.0,0.0,1.0,2.0,11.0,0.0
2,2011,ARCADIA,10.0,0.0,5.0,22.0,19.0,0.0,11.0,6.0,0.0,0.0,0.0,1.0,0.0,0.0
3,2011,ARLINGTON,29.0,2.0,9.0,38.0,46.0,0.0,27.0,6.0,2.0,0.0,3.0,4.0,9.0,0.0
4,2011,ARMISTEAD GARDENS,17.0,2.0,22.0,57.0,58.0,0.0,51.0,14.0,2.0,0.0,2.0,2.0,7.0,0.0


In [5]:
import json
with open('Maryland_Baltimore_City_Neighborhoods.geojson') as f:
    geojson = json.load(f)

In [6]:
crime = 'larceny'
crime = crime.upper()
data = crime_baltimore[['DATE','NEIGHBORHOOD',crime]]

# file becomes too large to push to github upon the creation of the figure

# fig = px.choropleth_mapbox(
#     data, geojson=geojson, 
#     locations='NEIGHBORHOOD', 
#     featureidkey = 'properties.NBRDESC',                           
#     color='LARCENY',                           
#     range_color = (0,100),                           
#     color_continuous_scale="Viridis",                           
#     mapbox_style="carto-positron",                           
#     zoom=11,                            
#     center = {"lat": 39.3, "lon": -76.6},                           
#     animation_frame = 'DATE'                          
# )
# fig.show()
# fi.write_html('larceny_year_counts_choropleth.html')