# Muckrock Requests Data Analysis #

Included below is the code used to clean and analyze the Muckrock Request data as well as the preview of the analysis results. Full results are exported to .csv files and moved to the "analysis results" folder to prevent the notebook from being clogged.

The .to_csv lines of code have been commented out as a new csv file doesn't need to be generated everytime. If changes have been made, uncomment the line code and rerun the notebook to produce a new csv file.

In [25]:
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import timedelta

#prevents warnings from being printed
import warnings
warnings.filterwarnings('ignore')

In [26]:
requests = pd.read_csv('requests.csv')
requests.head()

Unnamed: 0,User,Title,Status,URL,Jurisdiction,Jurisdiction ID,Jurisdiction Level,Jurisdiction State,Agency,Agency ID,...,Tracking Number,Embargo,Days since submitted,Days since updated,Projects,Tags,Price,Date Submitted,Date Due,Date Done
0,PNWPals,02.10.18 IAP and OPAAR,No Responsive Documents,https://www.muckrock.com/foi/seattle-69/021018...,Seattle,69,Local,Washington,Seattle Police Department,227,...,P024303-022618,False,1303.0,1240,,,0.0,2018-02-23 05:00:00+00:00,2018-03-02,2018-04-27 00:00:00+00:00
1,Woods,0215 Memphis - Kristen Smith,No Responsive Documents,https://www.muckrock.com/foi/memphis-319/0215-...,Memphis,319,Local,Tennessee,University Of Memphis - Memphis,15225,...,,False,214.0,169,,,0.0,2021-02-16 16:41:44.333874+00:00,2021-02-25,2021-04-02 17:57:40.391587+00:00
2,null_name,02/29/16 - SLCPD Abdi Mohamed Protest Action P...,No Responsive Documents,https://www.muckrock.com/foi/salt-lake-city-35...,Salt Lake City,359,Local,Utah,Salt Lake City Police Department,4223,...,,False,2010.0,1986,A Protest Project,,0.0,2016-03-18 04:00:00+00:00,2016-04-01,2016-04-11 00:00:00+00:00
3,MichelleMalkin,#04-5812 public records request,Completed,https://www.muckrock.com/foi/midwest-city-2705...,Midwest City,27056,Local,Oklahoma,Midwest City Police Department,12077,...,,False,957.0,927,,,0.0,2019-02-04 18:58:22.499172+00:00,,2019-03-06 16:04:14.465141+00:00
4,EmmaBest,100-18762 Harry Hay,Awaiting Response,https://www.muckrock.com/foi/united-states-of-...,United States of America,10,Federal,United States of America,Federal Bureau of Investigation,10,...,,False,1115.0,934,Freedom of LGBTQIA+ Information,,0.0,2018-08-30 12:05:11.174398+00:00,2018-09-28,


### How many requests do law enforcement agencies receive vs. non-law enforcement agencies? ###

Law enforcement requests were separated by checking if the Agency name included "police", "sheriff", or "safety". While this doesn't ensure that every law enforcement agency was included, it should encapsulate most of them. 

In [28]:
# split into police and non-police requests
police_reqs = requests[requests['Agency'].str.contains("(?i)police|sheriff|public safety", case = False)]
non_police_reqs = requests[~requests['Agency'].str.contains("(?i)police|sheriff|public safety", case = False)]

In [29]:
# ensure that length of both tables is equal to the original
len(police_reqs) + len(non_police_reqs) == len(requests)

True

In [48]:
print("Number of requests received by law enforcement agencies: ", len(police_reqs),
      "\nNumber of requests received by non-law enforcement agencies: ", len(non_police_reqs))

Number of requests received by law enforcement agencies:  21385 
Number of requests received by non-law enforcement agencies:  51117



### How many requests did each law enforcement agency receive? ###

Included below is a small preview of the the analysis results sorted by most requests received. The full table is exported to a .csv file.

In [49]:
by_agency = police_reqs[['Agency']].value_counts().to_frame().rename(columns = {0: "Requests Received"}).reset_index()
by_agency.head()

#by_agency.to_csv('numRequests_byAgency.csv', index = False)

Unnamed: 0,Agency,Requests Received
0,New York City Police Department,556
1,Chicago Police Department,508
2,Boston Police Department,369
3,Seattle Police Department,326
4,Massachusetts State Police,241


### How many requests were received by law enforcement agencies in each jurisdiction level? ###

In [51]:
by_jurlevel = police_reqs["Jurisdiction Level"].value_counts().to_frame().reset_index().rename(columns = {'index': 'Jurisdiction Level', 'Jurisdiction Level': 'Requests Received'})
by_jurlevel

#by_jurlevel.to_csv('numRequests_byJurisdiction.csv', index = False)

Unnamed: 0,Jurisdiction Level,Requests Received
0,Local,19063
1,State,2279
2,Federal,43


### How quickly on average do law enforcement agencies respond to requests versus non-law enforcement agencies?

The issue with the dataset is that there are some entries that never had the "Date Done" column filled out or there were errors with the entry where the date was entered incorrectly. Errors include:
- Completing the entry on the same day but instead of entering the "Date Done" to be the next day, the same date was entered leading to a negtive time difference. 
- Date is just entered incorrectly. 

For a simple fix, the ```replaceNegatives``` function is used to change the "Days Till Completion" value to be positive. 
Next, for calculating the mean, entries without a Date Done entry are excluded as well as entries where the dates were entered incorrectly.

In [53]:
def replaceNegatives(x):
    """
        x: input, a timedelta object
        output: the same timedelta object or the adjusted timedelta object
        
        The completion date for some entries were inputted to be the same date instead of the next date which resulted in a negative
        difference between the Date Done and the Date Submitted. Adding 1 to the time changes the time difference from being 
        "-1 days and XX hours" to be "0 days and XX hours".
    """

    if x.days == -1:
        x += timedelta(days = 1)
        return x
    return x

In [59]:
#convert dates to a datetime object for easy manipulation
police_reqs['Date Submitted'] = pd.to_datetime(police_reqs['Date Submitted'])
police_reqs['Date Due'] = pd.to_datetime(police_reqs['Date Due'])
police_reqs['Date Done'] = pd.to_datetime(police_reqs['Date Done'])

non_police_reqs['Date Submitted'] = pd.to_datetime(non_police_reqs['Date Submitted'])
non_police_reqs['Date Due'] = pd.to_datetime(non_police_reqs['Date Due'])
non_police_reqs['Date Done'] = pd.to_datetime(non_police_reqs['Date Done'])

In [55]:
#find time it took for agency to complete request
police_reqs['Days Till Completion'] = police_reqs['Date Done'] - police_reqs['Date Submitted']
non_police_reqs['Days Till Completion'] = non_police_reqs['Date Done'] - non_police_reqs['Date Submitted']

In [60]:
# remove null values: entries where "Date Done" was never entered
police_completed = police_reqs[~pd.isnull(police_reqs['Days Till Completion'])]
non_police_completed = non_police_reqs[~pd.isnull(non_police_reqs['Days Till Completion'])]

# exclude entries where the difference in time was less than -1
police_completed = police_completed[police_completed['Days Till Completion'] >= timedelta(days = -1)]
non_police_completed = non_police_completed[non_police_completed['Days Till Completion'] >= timedelta(days = -1)]

#apply function to adjust for -1 day differences
police_completed['Days Till Completion'] = police_completed['Days Till Completion'].apply(replaceNegatives)
non_police_completed['Days Till Completion'] = non_police_completed['Days Till Completion'].apply(replaceNegatives)

In [57]:
#calculate mean
non_police_time = np.mean(non_police_completed['Days Till Completion'])
police_time = np.mean(police_completed['Days Till Completion'])

#calculate proportion of removed entries
prop_nonresponse_police = 1 - len(police_completed)/len(police_reqs)
prop_nonresponse_nonpolice = 1 - len(non_police_completed)/len(non_police_reqs)

In [58]:
print("Law enforcement response time to requests: ", police_time.round('1min') ,"\nNon-Law Enforcement response time: ", 
      non_police_time.round('1min'))
print("\nMeans calculated with nonresponse and incorrect entries omitted. \nProportion of omitted entries for police records: ", 
      round(prop_nonresponse_police, 4), "\nProportion of omitted non-police records: ", round(prop_nonresponse_nonpolice, 4))

Law enforcement response time to requests:  96 days 15:02:00 
Non-Law Enforcement response time:  151 days 16:29:00

Means calculated with nonresponse and incorrect entries omitted. 
Proportion of omitted entries for police records:  0.3301 
Proportion of omitted non-police records:  0.3326


### How quickly on average does each individual law enforcement agency respond to requests? 

In [43]:
avg_time = police_completed.groupby('Agency').agg({'Days Till Completion': pd.Series.mean}).reset_index().rename(
    columns = {'Days Till Completion': 'Avg Days Till Completion'})
#avg_time.to_csv('Avg Time of Response by Agency.csv', index = False)

#first 5 records, check csv for full list
avg_time.head()

Unnamed: 0,Agency,Avg Days Till Completion
0,Abbevile Police Department,31 days 20:00:00
1,Abbeville Police Department,97 days 20:37:43.811949
2,Aberdeen Police Department,30 days 19:03:00.422427250
3,Abilene Texas Police Department,252 days 20:49:24.204696500
4,Abington Police Department,19 days 07:08:21.447689111


### Number of Requests under Each Status ###

A column for the total number of requests has been added. The rest of the values show the proportion of requests that are under each status rounded to 4 decimal places if necessary. The table has been sorted alphabetically.

There are some agencies that don't have many requests so the rates can be quite misleading if only sorted by any of the statuses. Something we can do is to only include agencies that have more than a certain number of requests to get a better idea of which agencies are able to complete a large proportion of their requests, but choosing the cutoff number would be arbritrary and needs more discussion.

In [72]:
police_status = police_reqs.pivot_table(columns = 'Status', index = 'Agency', aggfunc = 'size', fill_value = 0)
police_status['Total Number of Requests'] = police_status.sum(axis=1)
for col in police_status.columns:
    if col != "Total Number of Requests":
        police_status[col] = police_status[col]/police_status['Total Number of Requests']
police_status = police_status.round(4)
#police_status.to_csv('Law Enforcement - Proportion of Requests under Each Status.csv', index = False)

police_status.head()

Status,Awaiting Acknowledgement,Awaiting Appeal,Awaiting Response,Completed,Fix Required,In Litigation,No Responsive Documents,Partially Completed,Payment Required,Processing,Rejected,Withdrawn,Total Number of Requests
Agency,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Abbevile Police Department,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1
Abbeville Police Department,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1
Aberdeen Police Department,0.2,0.0,0.0,0.4,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,5
Abilene Texas Police Department,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
Abington Police Department,0.0,0.0,0.0,0.2,0.0,0.0,0.6,0.0,0.1,0.0,0.1,0.0,10


In [73]:
non_police_status = non_police_reqs.pivot_table(columns = 'Status', index = 'Agency', aggfunc = 'size', fill_value = 0)
non_police_status['Total Number of Requests'] = non_police_status.sum(axis=1)
for col in non_police_status.columns:
    if col != "Total Number of Requests":
        non_police_status[col] = non_police_status[col]/non_police_status['Total Number of Requests']
non_police_status = non_police_status.round(4)
#non_police_status.to_csv('Non-Law Enforcement- Proportion of Requests under Each Status.csv', index = False)

non_police_status.head()

Status,Awaiting Acknowledgement,Awaiting Appeal,Awaiting Response,Completed,Fix Required,In Litigation,No Responsive Documents,Partially Completed,Payment Required,Processing,Rejected,Withdrawn,Total Number of Requests
Agency,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
"101st Airborne Division, U.S. Army",0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,3
10th Judicial District Drug and Violent Crime Task Force,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,2
"162nd Wing, Arizona Air National Guard",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
193d Special Operations Wing,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,2
1st Marine Division,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
