# An Analysis of the Impact of COVID-19 on Crime in College Park, MD

### By Moses Kang, Andrew Xu, and Murat Ablimit

## Introduction

The COVID-19 pandemic has truly served as a transformative historical event in the 21st century. The pandemic has led to the dramatic loss of human life around the world. Additionally, many aspects of life and human behavior have been forced to change due to the restrictions and limitations COVID-19 has brought forth. The food service industry, retail, educational institutions, and healthcare have all been challenged and threatened enourmously. Periodic lockdowns and enforced quarantines have resulted in loss of income, economic assests, and employment in general for millions. When livelihoods become endangered to the extent like they have in this pandemic, immense feelings of desperation and hopelessness become frequent. For this reason, it becomes worthwhile to ask if this pandemic and all of its challenges have impacted criminal activitiy.    

In this project, we will attempt to present and uncover a correlation between the pandemic and criminal activity in the city of College Park, Maryland. We will do so by analyzing crime data between two time periods. The first time period being the entire year of 2019 to March 2020 (pre-pandemic), and the second time period being April of 2020 onwards (pandemic).     

## Data Collection

To collect the neccessary data for this project, we will scrape the University of Maryland, College Park Police Department website: http://www.umpd.umd.edu/stats/incident_logs.cfm. Because the website organizes the data by year and month, we will use python's requests module to retrieve the HTML that corresponds to the page for each month in a year. Afterwards, we will use the BeautifulSoup library to to find the html for the table that contains the data for each month. We will then concatenate each month's data to make a dataframe for each year. Because we are comparing crime between pre-pandemic and during the pandemic, we will only use the data from 2019 to 2020. Yearly data is then stored in a dictionary, with the year (2019 or 2020) acting as the key and the corresponding dataframes for each year serving as the values.  



In [1]:
import requests
import pandas as pd

from bs4 import BeautifulSoup

import sys
!{sys.executable} -m pip install lxml html5lib



In [2]:
years = ['2019','2020']

#Dictionary that will hold the data
data_raw = {}

#Loops for each year
for year in years:
    
    #This will hold each yearly data set
    df = pd.DataFrame()
    
    #Loops for each month
    for m in range(1,13):
        page = requests.get('http://www.umpd.umd.edu/stats/incident_logs.cfm?year='+year+"&month="+str(m))
        content = BeautifulSoup(page.content)
        table = content.find('table')
        t = pd.read_html(str(table))
        d=pd.concat(t)
        df = pd.concat([df,d])
        
    #adds the yearly data to the main dictionary
    data_raw[year] = df
    
data_raw['2019'].head()

Unnamed: 0,UMPD CASENUMBER,OCCURRED DATE TIMELOCATION,REPORT DATE TIME,TYPE,DISPOSITION,Unnamed: 5
0,2019-00000001,01/01/19 00:01,01/01/19 00:01,Fireworks Complaint,Arrest,
1,2019-00000001,4300 block of Knox Rd,4300 block of Knox Rd,4300 block of Knox Rd,4300 block of Knox Rd,4300 block of Knox Rd
2,2019-00000009,01/01/19 01:20,01/01/19 01:20,DWI/DUI,Arrest,
3,2019-00000009,8300 block of Baltimore Ave,8300 block of Baltimore Ave,8300 block of Baltimore Ave,8300 block of Baltimore Ave,8300 block of Baltimore Ave
4,2019-00000011,01/01/19 01:28,01/01/19 01:28,DWI/DUI,Arrest,


## Data Processing

As can be seen above, data scraping does not give ideally formatted or "clean" data, due to how the website and html stores data. The main problem is that each case is listed twice in the dataframe, with the location of the case being listed 5 times in the 2nd entry and there being a column named "Unnamed: 5". Additonally, the way we have divided our data in terms of year does not really make sense either. We have two dataframes for crime data for 2019 and 2020, since we want to be comparing data pre-pandemic and during the pandemic, it makes more sense if we change these two dataframes so that they reflect the actual periods of time that constitute life before the pandemic and life during the pandemic. For the purposes of this project, we are going to consider 2019 to before March 2020 as pre-pandemic and March 2020 onwards as the pandemic. In this section we will tidy up the data, reformat it for our convenience, and deal with all NaN or missing values.

In [3]:
data_cleaned = {}

for year in years:
    i = 0
    df_new = pd.DataFrame(columns = ['CaseNumber', 'OccuredDateTime','ReportDateTime', 'Type', 'Disposition', 'Location'])

    while i < len(data_raw[year]):
    
        df_new = df_new.append({'CaseNumber' : data_raw[year].iloc[i]['UMPD CASENUMBER'],
                                'OccuredDateTime' : data_raw[year].iloc[i]['OCCURRED DATE TIMELOCATION'],
                                'ReportDateTime' : data_raw[year].iloc[i]['REPORT DATE TIME'],
                                'Type' : data_raw[year].iloc[i]['TYPE'],
                                'Disposition' : data_raw[year].iloc[i]['DISPOSITION'],
                                'Location' : data_raw[year].iloc[i+1]['OCCURRED DATE TIMELOCATION']}, ignore_index=True)
        i+=2
    data_cleaned[year] = df_new
    

data_cleaned['2019'].head()

Unnamed: 0,CaseNumber,OccuredDateTime,ReportDateTime,Type,Disposition,Location
0,2019-00000001,01/01/19 00:01,01/01/19 00:01,Fireworks Complaint,Arrest,4300 block of Knox Rd
1,2019-00000009,01/01/19 01:20,01/01/19 01:20,DWI/DUI,Arrest,8300 block of Baltimore Ave
2,2019-00000011,01/01/19 01:28,01/01/19 01:28,DWI/DUI,Arrest,7400 block of Baltimore Ave
3,2019-00000203,12/28/18 17:00,01/02/19 11:34,Vandalism,CBE,7500 block of Calvert Service Ln
4,2019-00000312,01/02/19 23:04,01/02/19 23:04,CDS Violation,Arrest,Metzerott Rd


In summary, instead of modifying each dataframe, we opted to create a new dataframe for each year, as this approach was simpler to code. Essentially, we have removed every other row and replaced the unnamed column with the location of the incident, as well as renaming each column header.

After this data cleaning process, we now need to split this clean data into the appropriate time periods, pre-pandemic and pandemic, for our analysis. The below code will accomplish this.

As described before, pre-pandemic will constitute 2019 to March 2020 and the pandemic will constitute March 2020 onwards. Therefore, we need to split the 2020 dataframe in the data_cleaned dictionary at the first case that occured in March. We then need to append all the cases that occured in 2020 but before March 2020 to the 2019 crime data dataframe (represented as data_cleaned['2019']). 

In [4]:
data_to_analyze = {} #store the final processed data into this dictionary

for index, row in data_cleaned['2020'].iterrows():
    if row['OccuredDateTime'].startswith('03'): #finds the first case that occured in March of 2020
        pre_pandemic_2020 = data_cleaned['2020'].iloc[:index,:] #dataframe that has cases before March 2020
        pandemic_2020 = data_cleaned['2020'].iloc[index + 1:,:] #dataframe that has cases from March 2020 onwards
        break
        

data_to_analyze['pre_pandemic'] = data_cleaned['2019'].append(pre_pandemic_2020) #crime data for 2019 to March 2020
data_to_analyze['pandemic'] = pandemic_2020 #crime data for March 2020 onwards
data_to_analyze['pandemic']

Unnamed: 0,CaseNumber,OccuredDateTime,ReportDateTime,Type,Disposition,Location
261,2020-00015628,03/01/20 01:53,03/01/20 01:53,Injured/Sick Person,CBE,3900 block of Campus Dr
262,2020-00015635,03/01/20 02:20,03/01/20 02:20,Disorderly Conduct,Arrest,7300 block of Baltimore Ave
263,2020-00015636,03/01/20 02:25,03/01/20 02:28,Disorderly Conduct,Arrest,4400 block of Knox Rd
264,2020-00015637,03/01/20 02:30,03/01/20 02:30,Suspicious Activity,Arrest,7500 block of Baltimore Ave
265,2020-00015661,03/01/20 04:11,03/01/20 04:11,Assist Other Agency / Check on the Welfare,CBE,8300 block of Baltimore Ave
...,...,...,...,...,...,...
905,2020-00089196,12/13/20 00:02,12/13/20 00:02,CDS Violation,Arrest,Baltimore Ave
906,2020-00089210,12/13/20 03:49,12/13/20 03:49,Trespassing,CBE,Stadium Dr / Farm Dr
907,2020-00089949,12/11/20 21:00,12/15/20 17:47,Theft from Auto,Active/Pending,8300 block of Boteler Ln
908,2020-00089984,12/15/20 21:40,12/15/20 21:40,Trespassing,CBE,


## Exploratory Analysis and Data Visualization

## Analysis & Hypothesis Testing

In this section, we'll be looking at the data from the crime rate in College Park, Maryland in the years of 2019 and 2020. We'll first look for the top 5 violations of each year and then compare them to see if there's any difference between pre-COVID and post-COVID with violations.

In [5]:
## Copy created to not mess with inital DF
data19 = data_cleaned['2019'].copy()
## Have to change to datetime format for pandas to split by month
data19["ReportDateTime"] = pd.to_datetime(data19["ReportDateTime"]) 

## Grouping by the type of violation and getting the count of each violation in 2019
jan19 = pd.DataFrame(df.groupby(data19["Type"], as_index = False).size()).rename(columns = {'Type':'Violation', 'size':'Count'}).sort_values(by = ['Count'], ascending = False)
jan19[:7]

Unnamed: 0,Violation,Count
8,DWI/DUI,353
21,Injured/Sick Person,313
6,CDS Violation,218
33,Theft,216
42,Warrant/Summons Service,66
12,Dept Property Damage/Loss,62
39,Vandalism,58


In [6]:
## Copy created to not mess with inital DF
data20 = data_cleaned["2020"].copy()
## Have to change to datetime format for pandas to split by month
data20["ReportDateTime"] = pd.to_datetime(data20["ReportDateTime"]) 

## Grouping by the type of violation and getting the count of each violation in 2020
jan20 = pd.DataFrame(df.groupby(data20["Type"], as_index = False).size()).rename(columns = {'Type':'Violation', 'size':'Count'}).sort_values(by = ['Count'], ascending = False)
jan20[:7]

Unnamed: 0,Violation,Count
26,Injured/Sick Person,395
14,DWI/DUI,272
37,Theft,238
12,CDS Violation,164
28,Other Incident,79
38,Theft from Auto,76
42,Vandalism,67


Initially comparing the data in 2019 and 2020, we see that DWI/DUI, Injured/Sick Person, CDS Violation, Theft, and Vandalism are the highest amounts of violations that overlap in 2019 and 2020. It's seen that Injured/Sick Persons, Theft, and Vandalism has increased from 2019 to 2020 while the number of DWI/DUI, CDS Violations has decreased from 2019 to 2020.

We'll continue with comparing these 5 values on a month to month basis. For example, we'll be comparing DWI/DUI, Injured/Sick Person, CDS Violation, Theft, and Vandalism from Janurary of 2019 to Janurary of 2020. Note that for the first 2 months, Janurary and Feburary, we don't expect to see a rise since the effects of the Pandemic haven't hit the United States at this time. We'll be treating it more as a base case to see if there's any inital differences between 2019 and 2020. After, starting with the comparison between March 2019 and March 2020 is when we'll test for our hypothesis since March 2020 is around the time the pandemic started to affect the United States. 

Note that during this time, we do know that the number of students, staff, and faculty decreased on campus after students were sent home after March 12, 2020 and stay-at-home (virtual) learning began. This does decrease the number of people commuting to and residing within College Park (at least students living in dorms have decreased).