In [1]:
import requests
import pandas as pd
import folium
from folium.plugins import MarkerCluster

In [2]:
arrests = pd.read_csv(r"C:\Users\jonat\Downloads\Arrests_Data.csv")

In [3]:
# cleaning data by dropping unessecary columns, as well as missing values
arrests1 = arrests.dropna(subset=['Latitude', 'Longitude', 'Age'])
arrests1 = arrests1.drop(['X', 'Y'], axis=1)

# creating random sample of 500 from original dataset
arrests_smpl = arrests1.sample(500)
arrests_smpl

Unnamed: 0,RowID,ArrestNumber,Age,Gender,Race,ArrestDateTime,ArrestLocation,IncidentOffence,IncidentLocation,Charge,ChargeDescription,District,Post,Neighborhood,Latitude,Longitude,GeoLocation,Shape
252923,252924,22040125.0,21.0,M,B,2022/04/07 02:00:00+00,400 MULBERRY ST,Unknown Offense,400 MULBERRY ST,1 1119,CDS VIOLATION,Central,114,Downtown,39.2938,-76.6220,"(39.2938,-76.622)",
122476,122477,10044932.0,28.0,M,B,2010/08/17 08:50:00+00,1200 CENTRAL AV,Unknown Offense,1200 CENTRAL AV,,Unknown Charge,Eastern,314,Oliver,39.3043,-76.6004,"(39.3043,-76.6004)",
124579,124580,10044628.0,36.0,F,B,2010/08/15 19:00:00+00,600 FRANKLIN ST,79OTHER,600 FRANKLIN ST,2 2220,TRESPASSING,Central,125,Seton Hill,39.2948,-76.6250,"(39.2948,-76.625)",
123834,123835,17126895.0,19.0,M,B,2017/08/15 13:58:00+00,5100 PARK HEIGHTS AVE,4ECOMMON ASSAULT,5100 PARK HEIGHTS AVE,1 1415,CDS VIOLATION,Northwest,614,Central Park Heights,39.3483,-76.6748,"(39.3483,-76.6748)",
231625,231626,14067390.0,39.0,F,B,2014/04/28 03:20:00+00,4700 VALLEY VIEW AVE,5ABURG. RES. (FORCE),4700 VALLEY VIEW AVE,1 1415,COMMON ASSAULT,Northeast,443,Frankford,39.3307,-76.5510,"(39.3307,-76.551)",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45039,45040,12583037.0,48.0,M,B,2012/11/05 18:45:00+00,1600 NORMAL ST,87NARCOTICS,1600 NORMAL ST,4 3550,CDS POSS.,Eastern,312,Darley Park,39.3163,-76.5958,"(39.3163,-76.5958)",
280364,280365,18042089.0,41.0,F,B,2018/03/12 23:00:00+00,1600 CLIFTVIEW AVE,4BAGG. ASSLT.- CUT,1600 CLIFTVIEW AVE,1 1415,AGGRAVATED ASSAULT,Eastern,312,Darley Park,39.3153,-76.5947,"(39.3153,-76.5947)",
139189,139190,17116822.0,18.0,M,W,2017/07/31 03:00:00+00,3400 FOSTER AVE,5DBURG. OTH. (FORCE),3400 FOSTER AVE,1 1130,1ST DEGREE B&E,Southeast,214,Canton,39.2844,-76.5687,"(39.2844,-76.5687)",
234281,234282,10022022.0,20.0,M,B,2010/04/26 17:03:00+00,2100 GUILFORD AV,Unknown Offense,2100 GUILFORD AV,,Unknown Charge,Northern,514,Barclay,39.3138,-76.6127,"(39.3138,-76.6127)",


In [4]:
map_osm = folium.Map(location=[39.29, -76.61], zoom_start=13)
map_osm

In [5]:
# creating location cluster for markers
loc_cluster = MarkerCluster().add_to(map_osm)

arrests_smpl['Age'] = arrests_smpl['Age'].astype(int)

# each marker gets a symbol corresponding to gender and color corresponding to race
# blue color corresponds to an unknown/undocumented race for that record
for idx, row in arrests_smpl.iterrows():
    if row['Gender'] == 'M':
        gender_icon = 'mars'
    elif row['Gender'] == 'F':
        gender_icon = 'venus'
        
    if row['Race'] == 'W':
        race_color = 'green'
    elif row['Race'] == 'B':
        race_color = 'black'
    elif row['Race'] == 'A':
        race_color = 'orange'
    elif row['Race'] == 'I':
        race_color = 'purple'
    elif row['Race'] == 'H':
        race_color = 'red'
    else:
        race_color = 'blue'
        
    # creating each marker and placing it on map   
    folium.Marker(location=[row['Latitude'], row['Longitude']],
                  popup='Age: ' + str(row['Age']) + '\nCharge Description: ' + row['ChargeDescription'] + '\nLocation: ' + row['Neighborhood'],
                  icon=folium.Icon(color=race_color, icon=gender_icon, prefix='fa')
                 ).add_to(loc_cluster)

In [6]:
map_osm

The dataset I chose was the Baltimore City arrest data provided by the Baltimore City Police Department through their open data portal. The data was last updated on Janurary 7, 2022. The dataset contains about 353,300 records and contains attributes for each record including the age, gender, and race of the arrestee, and the charge they were arrested for. The district and neighborhood of the arrest is also included, as well as the geographical coordinates (latitude and longitude) of the arrest.

The interactive map above contains a sample from the original dataset of 500 arrests, since mapping 350K+ arrest records would make the map unusable, which would defeat its purpose. I used the record's gender and race as the attributes to differentiate the data. Each marker on the map has a different color (corresponding to the race) and a different symbol (corresponding to the gender). Each marker is clustered based on location (either district or neighborhood), and when a marker is clicked on, a popup including the age, charge description, and neighborhood of the arrest record is revealed.

Analysis of the sample data reveals that throughout Baltimore City, the demographic that accounts for most of the arrests are black males, usually between 20 and 30 years of age. The next most prominent demographic that appear are white males, between the same age range as previously stated. However, interestingly enough there are actually several female offenders in our sample who are white, even though white females only account for about 2% of our total dataset. Our data shows that the number of arrests are dominated by black males, but since this is not an in-depth analysis, we are not given the reason behind why this is. One reason may be the fact that the demographic that is most present in Baltimore City are in fact African Americans.