In [5]:
import requests
import pandas as pd
import dill
import matplotlib.pyplot as plt

from requests.exceptions import RequestException

api = 'sqVBzLYS6LjUvcsegM3scIPEFIJbGhhWQaIOgN3o'

In [6]:
# Loads data from FBI API

crime_df_agency = pd.read_pickle('crime_df_agency.pkl')
USAandCO_df = pd.read_pickle('USAandCO_df.pkl')
# with open('mvtdata.pkd', 'rb') as f:
#     mvtdict = dill.load(f)
    
# with open('burgdata.pkd', 'rb') as f:
#     burgdict = dill.load(f)

#crime_df_agency.to_pickle('crime_df_agency.pkl')

In [12]:
# Columns included in crime_df_agency, assembled in this notebook
crime_df_agency.columns

Index(['agency', 'year', 'mvt stolen value total', 'mvt offense count',
       'mvt cleared', 'county_name', 'agency_type_name',
       'burglary value total', 'burglary offense count', 'robbery value total',
       'robbery offense count', 'larceny value total', 'larceny offense count',
       'homicide cleared', 'homicide actual', 'aggravated-assault cleared',
       'aggravated-assault actual', 'arson cleared', 'arson actual',
       'Curfew and Loitering Law Violations', 'Disorderly Conduct',
       'Driving Under the Influence', 'Drug Abuse Violations - Grand Total',
       'Drunkenness', 'Embezzlement', 'Forgery and Counterfeiting', 'Fraud',
       'Gambling - Total', 'Human Trafficking - Commercial Sex Acts',
       'Human Trafficking - Involuntary Servitude', 'Larceny - Theft',
       'Liquor Laws', 'Manslaughter by Negligence',
       'Offenses Against the Family and Children',
       'Prostitution and Commercialized Vice', 'Rape', 'Simple Assault',
       'Stolen Property: Bu

# Methodology / Notes

This notebook ingests and processes data from the FBI Crime Data API. <br>
Documentation: https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/docApi <br> <br>
The end data processing for this project is performed on data grouped by county. However, the dataframe <b>'crime_df_agency.pkl'</b>, in which the data is still grouped by agency, was saved to disk to maintain information related to how crime was reported within Colorado.  The data was processed to enable easy conversion to county grouping when needed. No further data processing or feature engineering is performed in this notebook. <br> 

The section 'Linked Offenses' was added to double check if offenses that were revealed to be correlated are frequently part of the same crime.

Information was obtained from the 'Agency', 'Arrest', 'Expanded Property', and 'Summarized' server variables. Each batch of API requests takes about 5-10 minutes with the current grouping. The requests for 'Expanded Property' are needlessly separated by offense-- if necessary to run again, I would recommend altering the code to run more similarly to the 'Arrests' section. <br>

<b>General outline:</b>
1) Information on reporting agencies within Colorado were downloaded using the url format: <br>
'https://api.usa.gov/crime/fbi/cde/agency/byStateAbbr/CO?' <br>
2) Sixteen agencies have their jurisdiction listed in multiple counties. Most of these are small towns, but this also included cities Aurora and Westminster. Both of these cities were in the top 10 worst cities for automobile theft within the entire USA for 2022, so it seemed important to account for them in a more detailed fashion than dividing all values for a city evenly between its counties.  Accordingly, counts were distributed among the different counties a city occupies based on the relative populations reported by the 2020 US Census (taken from Wikipedia). When this info was unavailable, as it was for a few smaller cities, a qualitative comparison of city / county borders was used. The distributions were encoded into a dictionary for later use.
3) Motor vehicle theft (mvt) information was downloaded from both the 'Expanded Property' and 'Summarized' databases. The former contains information on the value that was stolen for each year, while the latter contains information on how many cases were cleared in a specific year.
4) Burglary, robbery, and larceny data was only obtained from the 'Expanded Property' database, therefore, clearance information is currently not present in the crime dataframe, though values stolen are.
5) Homicide, aggravated assault, and arson offenses / amounts cleared were obtained from the 'Summarized' database.
6) Information on arrests for various crimes was obtained from the 'Arrests' database. Additional information can be obtained by querying specific offenses, but this was only done for 'drug sales', as I think drug use patterns are potentially more relevant than e.g. gambling offenses. Trying to obtain more detailed info on 'drug possession' returned a broken link, though it's listed as a valid query.
7) A new dataframe was created from a merge of the various API queries and the agency info. New rows were created for multi-county agencies, with the crime data distributed according to the previously decided appropriate values.  The old, multi-county rows were dropped.

<b>Notes</b>
- Counts for homicide and motor vehicle thefts were checked at various stages with external sources, and matched well. Along with errors in code, there was also the possibility that my ignoring agencies without a specified county (e.g. State Police) would lead to a miscount.  This did not appear to be the case.
- Reporting consistency seems good. There were 4 agencies with no county: Southern Ute Tribal, Colorado Bureau of Investigation,  State Patrol, and Ute Mountain Tribal, leaving 243 agencies included in the dataframe. For 22 years of data, 5,346 rows should be expected. Prior to distributing by county, the dataframe has 4,898 rows, indicating the dataset is 92% complete regarding MVT offense counts. Qualitatively, while I was analyzing the data, it seemed any lack of reporting was from small agencies where they possibly had no offenses to report.
- It does seem as though most NaN or null values are just cases where the value is 0, and the agency didn't bother reporting it. However, just as a check: there are 28,031 cells with null values out of 225,308 total, or about 12%. 

# FBI Crime Data API

Base URL: api.usa.gov/crime/fbi/cde/ <br>
API Key:  sqVBzLYS6LjUvcsegM3scIPEFIJbGhhWQaIOgN3o <br>
Denver ORI: CODPD0000

### Linked Offenses

In [7]:
mvt_linked = requests.get('https://api.usa.gov/crime/fbi/cde/nibrs/state/CO/motor-vehicle-theft/offense/linkedoffense?',
                       params = {'API_KEY': api})
linked_offenses = pd.DataFrame(mvt_linked.json()['data'])

In [11]:
CO_df = USAandCO_df[USAandCO_df['state_abbr'] == 'CO']
CO_df.columns

Index(['state_abbr', 'year', 'population', 'violent_crime', 'homicide',
       'robbery', 'aggravated_assault', 'property_crime', 'burglary',
       'larceny', 'motor_vehicle_theft', 'arson'],
      dtype='object')

In [9]:
CO_df2 = crime_df_agency.groupby('year').sum()
CO_df2.columns

  CO_df2 = crime_df_agency.groupby('year').sum()


Index(['mvt stolen value total', 'mvt offense count', 'mvt cleared',
       'burglary value total', 'burglary offense count', 'robbery value total',
       'robbery offense count', 'larceny value total', 'larceny offense count',
       'homicide cleared', 'homicide actual', 'aggravated-assault cleared',
       'aggravated-assault actual', 'arson cleared', 'arson actual',
       'Curfew and Loitering Law Violations', 'Disorderly Conduct',
       'Driving Under the Influence', 'Drug Abuse Violations - Grand Total',
       'Drunkenness', 'Embezzlement', 'Forgery and Counterfeiting', 'Fraud',
       'Gambling - Total', 'Human Trafficking - Commercial Sex Acts',
       'Human Trafficking - Involuntary Servitude', 'Larceny - Theft',
       'Liquor Laws', 'Manslaughter by Negligence',
       'Offenses Against the Family and Children',
       'Prostitution and Commercialized Vice', 'Rape', 'Simple Assault',
       'Stolen Property: Buying, Receiving, Possessing', 'Suspicion',
       'Vagrancy'

In [11]:
linked_offenses.columns

Index(['data_year', 'Identity Theft', 'Fondling', 'Bribery',
       'Stolen Property Offenses', 'Pocket-Picking',
       'Murder and Nonnegligent Manslaughter', 'Welfare Fraud',
       'Extortion/Blackmail', 'Theft from Coin-operated Machine or Device',
       'Statutory Rape', 'Intimidation', 'Gambling Equipment Violation',
       'Shoplifting', 'Robbery', 'Incest', 'Animal Cruelty',
       'Destruction/Damage/Vandalism of Property',
       'Operating/Promoting/Assisting Gambling', 'Hacking/Computer Invasion',
       'Sports Tampering', 'Kidnapping/Abduction', 'Motor Vehicle Theft',
       'Impersonation', 'Weapon Law Violations', 'Negligent Manslaughter',
       'Wire Fraud', 'Theft From Building',
       'Theft of Motor Vehicle Parts or Accessories',
       'False Pretenses/Swindle/Confidence game', 'Drug Equipment Violations',
       'Sexual Assault with an Object', 'Counterfeiting/Forgery',
       'Purse-snatching', 'Prostitution',
       'Credit Card/Automated Teller Machine Frau

In [15]:
linked_offenses[linked_offenses['data_year'] == 2018]

Unnamed: 0,data_year,Identity Theft,Fondling,Bribery,Stolen Property Offenses,Pocket-Picking,Murder and Nonnegligent Manslaughter,Welfare Fraud,Extortion/Blackmail,Theft from Coin-operated Machine or Device,...,Theft from Motor Vehicle,Drug/Narcotic Violations,Aggravated Assault,Simple Assault,Assisting or Promoting Prostitution,Burglary/Breaking & Entering,Arson,All Other Larceny,Sodomy,Embezzlement
24,2018,126,2,2,59,1,2,0,5,14,...,884,330,110,86,1,541,13,495,0,3


In [5]:
# Calculate percent of offenses related to mvt

offense1 = 'robbery' # offense name in USAandCO_df
offense2 = 'Robbery' # name of same offense in linked_offenses
CO_df = USAandCO_df.loc[USAandCO_df['state_abbr'] == 'CO'].copy()
CO_df[offense2 + ' linked'] = linked_offenses[linked_offenses['data_year'] >= 2000][offense2]
CO_df['percent of mvt linked to ' + offense2] = linked_offenses[offense2] / CO_df['motor_vehicle_theft'] *100
CO_df['percent of ' + offense2 +' linked to mvt'] = linked_offenses[offense2] / CO_df[offense1] *100
CO_df[['year', 'percent of mvt linked to ' + offense2, 'percent of ' + offense2 +' linked to mvt']]

Unnamed: 0,year,percent of mvt linked to Robbery,percent of Robbery linked to mvt
0,2000,0.0,0.0
1,2001,0.004763,0.028129
2,2002,0.0,0.0
3,2003,0.008766,0.053505
4,2004,0.012498,0.080235
5,2005,0.007663,0.050659
6,2006,0.024206,0.130753
7,2007,0.036023,0.17336
8,2008,0.022474,0.088132
9,2009,0.048077,0.177989


In [47]:
CO_df[['percent of mvt linked to ' + offense2, 'percent of ' + offense2 +' linked to mvt']].mean()

percent of mvt linked to Robbery    0.069324
percent of Robbery linked to mvt    0.296049
dtype: float64

In [None]:
offense1 = 'larceny' # offense name in USAandCO_df
offense2 = 'Robbery' # name of same offense in linked_offenses
CO_df = USAandCO_df.loc[USAandCO_df['state_abbr'] == 'CO'].copy()
CO_df[offense2 + ' linked'] = linked_offenses[linked_offenses['data_year'] >= 2000][offense2]
CO_df['percent of mvt linked to ' + offense2] = linked_offenses[offense2] / CO_df['motor_vehicle_theft'] *100
CO_df['percent of ' + offense2 +' linked to mvt'] = linked_offenses[offense2] / CO_df[offense1] *100
CO_df[['year', 'percent of mvt linked to ' + offense2, 'percent of ' + offense2 +' linked to mvt']]

### National and state-wide info

In [5]:
national = requests.get('https://api.usa.gov/crime/fbi/cde/estimate/national?from=2000&to=2025',
                       params = {'API_KEY': api})

state = requests.get('https://api.usa.gov/crime/fbi/cde/estimate/state/CO?from=2000&to=2025',
                       params = {'API_KEY': api})

In [6]:
national_df = pd.DataFrame(national.json()).drop(['state_id', 'rape_legacy', 'rape_revised'], axis = 1)
national_df['state_abbr'] = 'USA'

state_df = pd.DataFrame(state.json()).drop(['state_id', 'rape_legacy', 'rape_revised'], axis = 1)

In [8]:
USAandCO_df.columns

Index(['state_abbr', 'year', 'population', 'violent_crime', 'homicide',
       'robbery', 'aggravated_assault', 'property_crime', 'burglary',
       'larceny', 'motor_vehicle_theft', 'arson'],
      dtype='object')

In [11]:
USAandCO_df = pd.concat([national_df, state_df])
cols2num = ['year', 'population', 'violent_crime', 'homicide',
       'robbery', 'aggravated_assault', 'property_crime', 'burglary',
       'larceny', 'motor_vehicle_theft', 'arson']
USAandCO_df[cols2num] = USAandCO_df[cols2num].apply(pd.to_numeric)

#USAandCO_df.to_pickle('USAandCO_df.pkl')

In [13]:
USAandCO_df[USAandCO_df['year'] > 2018]

Unnamed: 0,state_abbr,year,population,violent_crime,homicide,robbery,aggravated_assault,property_crime,burglary,larceny,motor_vehicle_theft,arson
19,USA,2019,328329953,1250393,16669,268483,822017,6995235,1118096,5152267,724872,35919
20,USA,2020,329484123,1313105,21570,243600,921505,6452038,1035314,4606324,810400,43602
19,CO,2019,5758486,22149,229,3716,14117,150564,20266,108575,21723,899
20,CO,2020,5807719,24570,294,3964,16660,164582,23246,110884,30452,1330


### Obtaining agency information

In [14]:
# sample agency url: 'https://api.usa.gov/crime/fbi/cde/agency/byStateAbbr/CO?API_KEY=sqVBzLYS6LjUvcsegM3scIPEFIJbGhhWQaIOgN3o'
agencies = requests.get('https://api.usa.gov/crime/fbi/cde/agency/byStateAbbr/CO?',
                       params = {'API_KEY': api})

agency_df = pd.DataFrame(agencies.json())
agency_df['county_name'] = agency_df['county_name'].str.split(', ')

In [366]:
agency_df

Unnamed: 0,ori,agency_name,state_name,state_abbr,division_name,region_name,region_desc,county_name,agency_type_name,nibrs,nibrs_start_date,latitude,longitude
0,CO0010000,Adams County Sheriff's Office,Colorado,CO,Mountain,West,Region IV,[ADAMS],County,True,1997-01-01T00:00:00.000Z,39.874325,-104.331872
1,CO0010100,Aurora Police Department,Colorado,CO,Mountain,West,Region IV,"[ADAMS, ARAPAHOE, DOUGLAS]",City,True,1985-01-01T00:00:00.000Z,39.70955,-104.81493
2,CO0010200,Brighton Police Department,Colorado,CO,Mountain,West,Region IV,"[ADAMS, WELD]",City,True,2004-09-01T00:00:00.000Z,39.874325,-104.331872
3,CO0010300,Commerce City Police Department,Colorado,CO,Mountain,West,Region IV,[ADAMS],City,True,1997-01-01T00:00:00.000Z,39.874325,-104.331872
4,CO0010400,Thornton Police Department,Colorado,CO,Mountain,West,Region IV,[ADAMS],City,True,2012-01-01T00:00:00.000Z,39.868996,-104.984215
...,...,...,...,...,...,...,...,...,...,...,...,...,...
242,CO0640100,Broomfield Police Department,Colorado,CO,Mountain,West,Region IV,[BROOMFIELD],City,True,2001-01-01T00:00:00.000Z,39.921543,-105.07059
243,COCBI0000,Colorado Bureau of Investigation,Colorado,CO,Mountain,West,Region IV,[NOT SPECIFIED],Other State Agency,True,2008-01-01T00:00:00.000Z,39.727325,-105.10969
244,COCSP0000,State Patrol,Colorado,CO,Mountain,West,Region IV,[NOT SPECIFIED],State Police,True,2006-01-01T00:00:00.000Z,39.72749,-105.10969
245,CODI01100,Ute Mountain Tribal,Colorado,CO,Mountain,West,Region IV,[NOT SPECIFIED],Tribal,True,2021-01-01T00:00:00.000Z,37.338025,-108.595786


In [428]:
multicounty = agency_df[['agency_name','county_name']][agency_df['county_name'].apply(lambda x: len(x) > 1)].values.tolist()
multicounty = [x[0] for x in multicounty]

In [429]:
multicounty

['Aurora Police Department',
 'Brighton Police Department',
 'Westminster Police Department',
 'Northglenn Police Department',
 'Littleton Police Department',
 'Bow Mar Police Department',
 'Longmont Police Department',
 'Basalt Police Department',
 'Green Mountain Falls Police Department',
 'Arvada Police Department',
 'Timnath Police Department',
 'Center Police Department',
 'Windsor Police Department',
 'Erie Police Department',
 'Johnstown Police Department',
 'Lochbuie Police Department']

### Deciding how to distribute values for agencies with multiple counties

Populations taken from Wikipedia citing the 2020 census. <br>
Helpful link: https://www.google.com/maps/d/u/0/viewer?mid=15C5-O3Pt28PZ8axPFxRTP-ANhon-w6N7&hl=en_US&ll=40.18824668844791%2C-104.5735925866669&z=9 <br>

Departments in more than one county: <br><br>
1 	Aurora Police Department 	[ADAMS, ARAPAHOE, DOUGLAS] <br>
336,035 residing in Arapahoe County, 47,720 residing in Adams County, and 2,506 residing in Douglas County. 386,261 total. <br>
87.0% in Arapahoe, 12.4% in Adams, 0.6% in Douglas

2 	Brighton Police Department 	[ADAMS, WELD] <br>
39,718 residing in Adams County and 365 residing in Weld County

3 	Westminster Police Department 	[ADAMS, JEFFERSON] <br>
71,240 residing in Adams County and 45,077 residing in Jefferson County. 116,317 total
61.2% in Adams, 38.8% in Jefferson


4 	Northglenn Police Department 	[ADAMS, WELD] <br>
As of the 2020 census the city's population was 38,131.  Pretty much all in Adams County.

5 	Littleton Police Department 	[ARAPAHOE, DOUGLAS, JEFFERSON] <br>
Mostly in Arapahoe County

6 	Bow Mar Police Department 	[ARAPAHOE, JEFFERSON] <br>
The town population was 853 at the 2020 United States Census with 587 residing in Arapahoe County and 266 residing in Jefferson County. 68.8% in Arapahoe, 31.2% in Jefferson


7 	Longmont Police Department 	[BOULDER, WELD] <br>
Looks like most of the city is in Boulder county based on map

8 	Basalt Police Department 	[EAGLE, PITKIN] <br>
The town population was 3,984 at the 2020 United States Census with 2,917 residing in Eagle County and 1,067 residing in Pitkin County. 73.2% in Eagle, 26.8% in Pitkin

9 	Green Mountain Falls Police Department 	[EL PASO, TELLER] <br>
The town population was 646 at the 2020 United States Census with 622 residents in El Paso County and 24 residents in Teller County.

10 	Arvada Police Department 	[ADAMS, JEFFERSON] <br>
The city population was 124,402 at the 2020 United States Census, with 121,510 residing in Jefferson County and 2,892 residing in Adams County.

11 	Timnath Police Department 	[LARIMER, WELD] <br>
Pretty much all in Larimer.  Wikipedia doesn't even list Weld County

12 	Center Police Department 	[RIO GRANDE, SAGUACHE] <br>
The town's population was 1,929 at the 2020 United States Census with 1,885 residing in Saguache County and 44 residing in Rio Grande County.

13 	Windsor Police Department 	[LARIMER, WELD] <br>
Mostly in Weld County

14 	Erie Police Department 	[BOULDER, WELD] <br>
At the 2020 census, 17,387 (58%) Erie residents lived in Weld County and 12,651 (42%) lived in Boulder County.

15 	Johnstown Police Department 	[LARIMER, WELD] <br>
Mostly in Weld County

16 	Lochbuie Police Department 	[ADAMS, WELD] <br>
Mostly in Weld County


In [29]:
county_dict = {
    'Aurora Police Department': {'ADAMS': 0.124, 'ARAPAHOE': 0.87, 'DOUGLAS': 0.006},
    'Brighton Police Department': {'ADAMS': 1},
    'Westminster Police Department': {'ADAMS': 0.612, 'JEFFERSON': 0.388},
    'Northglenn Police Department': {'ADAMS': 1},
    'Littleton Police Department': {'ARAPAHOE': 1},
    'Bow Mar Police Department': {'ARAPAHOE': 0.688, 'JEFFERSON': 0.312},
    'Longmont Police Department': {'BOULDER': 1},
    'Basalt Police Department': {'EAGLE': 0.732, 'PITKIN':0.268},
    'Green Mountain Falls Police Department': {'EL PASO': 1},
    'Arvada Police Department': {'JEFFERSON': 1},
    'Timnath Police Department': {'LARIMER': 1},
    'Center Police Department': {'SAGUACHE': 1},
    'Windsor Police Department': {'WELD': 1},
    'Erie Police Department': {'WELD': 0.58, 'BOULDER': 0.42},
    'Johnstown Police Department': {'WELD': 1},
    'Lochbuie Police Department': {'WELD': 1}
}

In [90]:
# State Patrol (CO) ORI is COCSP0000
# However, it doesn't seem to process many crimes... random years I checked showed ~10s of MVTs vs. Denver PD's 1000s

# Colorado Bureau of Investigation (COCBI0000) also showed low amounts of MVTs (only 1 in 20 years)
agency_df[agency_df['agency_type_name'] == 'State Police']

Unnamed: 0,ori,agency_name,state_name,state_abbr,division_name,region_name,region_desc,county_name,agency_type_name,nibrs,nibrs_start_date,latitude,longitude
244,COCSP0000,State Patrol,Colorado,CO,Mountain,West,Region IV,NOT SPECIFIED,State Police,True,2006-01-01T00:00:00.000Z,39.72749,-105.10969


### Accessing MVT offense totals by department

Example requests url for MVT by agency, obtained from first format listed under "Expanded Property Crime":
https://api.usa.gov/crime/fbi/cde/supplemental/agency/CO0010000/motor-vehicle-theft/offense/byYearRange?from=2015&to=2015&API_KEY=iiHnOKfno2Mgkt5AynpvPpUQTEyxE77jo1RU8PIv <br><br>

Offenses under "Expanded Property Crime": 'burglary', 'robbery', 'motor-vehicle-theft', 'larceny'

In [154]:
ORI_dict = dict(zip(agency_df['agency_name'],agency_df['ori']))

In [180]:
ORI_dict['Denver Police Department']

'CODPD0000'

In [155]:
# Offense types: 'motor-vehicle-theft', 
def agency_offenses(ori, offense, startyear = 2000, endyear = 2023):
    agency_mvt = requests.get('https://api.usa.gov/crime/fbi/cde/supplemental/agency/' + ori +'/' + offense +'/offense/byYearRange?',
                       params = {'API_KEY': api,
                                'from': startyear,
                                'to': endyear},
                       timeout = 10)
    return agency_mvt.json()['data']

#### MVT offenses

In [31]:
# Collect mvt data.  
# Error occurs for 'Division of Gaming Criminal Enforcement and Investigations Section'.  I don't think it matters
mvtdict = {}
ORI_dict_error = {}
for key in ORI_dict:
    try:
        data = agency_offenses(ORI_dict[key], 'motor-vehicle-theft')
        mvtdict[key] = data
        
    except RequestException as e:
        print(f"Error occurred while making the API request: {str(e)}")
        ORI_dict_error[key] = ORI_dict[key]

Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)


In [8]:
# Process mvt data
rows = []
for key in list(mvtdict.keys()):
    for item in mvtdict[key]:
        mvtrows = {'agency' : key,
                  'year' : item['data_year'],
                  'mvt stolen value total' : item['Stolen Value Total'],
                  'mvt offense count' : item['Offense Count']}
        rows.append(mvtrows)
        
mvt_df = pd.DataFrame(rows)
mvt_df['mvt offense count'] = pd.to_numeric(mvt_df['mvt offense count'], errors = 'coerce')
mvt_df['mvt stolen value total'] = pd.to_numeric(mvt_df['mvt stolen value total'], errors = 'coerce')

In [216]:
# MVT with clearance rate
mvtclearance = []
ORI_dict_error = {}
for key in ORI_dict:
    ori = ORI_dict[key]
    url = 'https://api.usa.gov/crime/fbi/cde/summarized/agency/' + ori + '/motor-vehicle-theft?from=2000&to=2023&API_KEY=sqVBzLYS6LjUvcsegM3scIPEFIJbGhhWQaIOgN3o'
    try:
        data = requests.get(url).json()
        for entry in data:
            mvtclearance.append({'agency': key,
                                 'year': entry['data_year'],
                                 'mvt cleared': entry['cleared']})
        
    except RequestException as e:
        print(f"Error occurred while making the API request: {str(e)}")
        ORI_dict_error[key] = ORI_dict[key]
        
# with open('clearancedata.pkd', 'wb') as f:
#     dill.dump(mvtclearance, f)

In [225]:
clearance_df = pd.DataFrame(mvtclearance)

In [231]:
mvt_df = mvt_df.merge(clearance_df, on = ['agency', 'year'], how = 'left')

#### Burglary offenses

In [157]:
# Collect burglary data. Error also occured for 'Nunn' department

burgdict = {}
ORI_dict_error = {}
for key in ORI_dict:
    try:
        data = agency_offenses(ORI_dict[key], 'burglary')
        burgdict[key] = data
        
    except RequestException as e:
        print(f"Error occurred while making the API request: {str(e)}")
        ORI_dict_error[key] = ORI_dict[key]

Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)
Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)


In [167]:
# with open('burgdata.pkd', 'wb') as f:
#     dill.dump(burgdict, f)

In [170]:
rows = []
for key in list(burgdict.keys()):
    for item in burgdict[key]:
        burgrows = {'agency' : key,
                  'year' : item['data_year'],
                  'burglary value total' : item['Stolen Value Total'],
                  'burglary offense count' : item['Offense Count']}
        rows.append(burgrows)
        
burg_df = pd.DataFrame(rows)
burg_df['burglary value total'] = pd.to_numeric(burg_df['burglary value total'], errors = 'coerce')
burg_df['burglary offense count'] = pd.to_numeric(burg_df['burglary offense count'], errors = 'coerce')

#### Robbery Offenses

In [172]:
# Returned error for 'Nunn' again. Looks like it's a very, very small town
robdict = {}
ORI_dict_error = {}
for key in ORI_dict:
    try:
        data = agency_offenses(ORI_dict[key], 'robbery')
        robdict[key] = data
        
    except RequestException as e:
        print(f"Error occurred while making the API request: {str(e)}")
        ORI_dict_error[key] = ORI_dict[key]
        
# with open('robdata.pkd', 'wb') as f:
#     dill.dump(robdict, f)

Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)
Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)


In [176]:
rows = []
for key in list(robdict.keys()):
    for item in robdict[key]:
        robrows = {'agency' : key,
                  'year' : item['data_year'],
                  'robbery value total' : item['Stolen Value Total'],
                  'robbery offense count' : item['Offense Count']}
        rows.append(robrows)
        
rob_df = pd.DataFrame(rows)
rob_df['robbery value total'] = pd.to_numeric(rob_df['robbery value total'], errors = 'coerce')
rob_df['robbery offense count'] = pd.to_numeric(rob_df['robbery offense count'], errors = 'coerce')

#### Larceny Offenses

In [177]:
# Returned error for 'Nunn' again. Looks like it's a very, very small town
larcdict = {}
ORI_dict_error = {}
for key in ORI_dict:
    try:
        data = agency_offenses(ORI_dict[key], 'larceny')
        larcdict[key] = data
        
    except RequestException as e:
        print(f"Error occurred while making the API request: {str(e)}")
        ORI_dict_error[key] = ORI_dict[key]
        
# with open('larcdata.pkd', 'wb') as f:
#     dill.dump(larcdict, f)

Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)
Error occurred while making the API request: Expecting value: line 1 column 1 (char 0)


In [178]:
rows = []
for key in list(larcdict.keys()):
    for item in larcdict[key]:
        larcrows = {'agency' : key,
                  'year' : item['data_year'],
                  'larceny value total' : item['Stolen Value Total'],
                  'larceny offense count' : item['Offense Count']}
        rows.append(larcrows)
        
larc_df = pd.DataFrame(rows)
larc_df['larceny value total'] = pd.to_numeric(larc_df['larceny value total'], errors = 'coerce')
larc_df['larceny offense count'] = pd.to_numeric(larc_df['larceny offense count'], errors = 'coerce')

#### Offenses in 'Summarized' section
Includes homicide, aggravated assault, and arson <br>
Info on total counts and amount cleared

In [246]:
summarizedinfo = []
offenses = ['homicide', 'aggravated-assault', 'arson']
ORI_dict_error = {}
for key in ORI_dict:
    ori = ORI_dict[key]
    for offense in offenses:
        url = 'https://api.usa.gov/crime/fbi/cde/summarized/agency/' + ori + '/' + offense + '?from=2000&to=2023&API_KEY=sqVBzLYS6LjUvcsegM3scIPEFIJbGhhWQaIOgN3o'
        try:
            data = requests.get(url).json()
            for entry in data:
                summarizedinfo.append({'agency': key,
                                     'year': entry['data_year'],
                                     offense + ' cleared': entry['cleared'],
                                     offense + ' actual': entry['actual']})

        except RequestException as e:
            print(f"Error occurred while making the API request: {str(e)}")
            ORI_dict_error[key] = ORI_dict[key]
            
# with open('summarizeddata.pkd', 'wb') as f:
#      dill.dump(summarizedinfo, f)

In [255]:
summarized_df = pd.DataFrame(summarizedinfo)
summarized_df = summarized_df.groupby(['agency', 'year'], as_index=False).sum()

  summarized_df.groupby('year').sum()


Unnamed: 0_level_0,homicide cleared,homicide actual,aggravated-assault cleared,aggravated-assault actual,arson cleared,arson actual
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000,69.0,126.0,5436.0,8863.0,141.0,740.0
2001,100.0,144.0,5223.0,8864.0,242.0,1545.0
2002,92.0,170.0,5411.0,9251.0,304.0,1443.0
2003,101.0,170.0,5607.0,9115.0,289.0,1471.0
2004,121.0,197.0,6584.0,10839.0,226.0,1158.0
...,...,...,...,...,...,...
2017,160.0,220.0,7521.0,12622.0,336.0,1050.0
2018,169.0,216.0,8748.0,14390.0,275.0,992.0
2019,159.0,227.0,8550.0,14010.0,264.0,898.0
2020,195.0,293.0,9267.0,16450.0,361.0,1327.0


#### Arrests

In [263]:
arrestskeys = [
    "data_year",
    "Curfew and Loitering Law Violations",
    "Disorderly Conduct",
    "Driving Under the Influence",
    "Drug Abuse Violations - Grand Total",
    "Drunkenness",
    "Embezzlement",
    "Forgery and Counterfeiting",
    "Fraud",
    "Gambling - Total",
    "Human Trafficking - Commercial Sex Acts",
    "Human Trafficking - Involuntary Servitude",
    "Larceny - Theft",
    "Liquor Laws",
    "Manslaughter by Negligence",
    "Offenses Against the Family and Children",
    "Prostitution and Commercialized Vice",
    "Rape",
    "Simple Assault",
    "Stolen Property: Buying, Receiving, Possessing",
    "Suspicion",
    "Vagrancy",
    "Vandalism",
    "Weapons: Carrying, Possessing, Etc.",
    "Sex Offenses (Except Rape, and Prostitution and Commercialized Vice)"
    "Marijuana",
    "Opium or Cocaine or Their Derivatives",
    "Other - Dangerous Nonnarcotic Drugs",
    "Synthetic Narcotics",
    "Drug Sale/Manufacturing - Subtotal"
]

arrestinfo = []
offenses = ['all', 'drug_sales']
ORI_dict_error = {}
for key in ORI_dict:
    ori = ORI_dict[key]
    for offense in offenses:
        url = 'https://api.usa.gov/crime/fbi/cde/arrest/agency/' + ori + '/' + offense + '?from=2000&to=2025&API_KEY=sqVBzLYS6LjUvcsegM3scIPEFIJbGhhWQaIOgN3o'
        try:
            data = requests.get(url).json()['data']
            for entry in data:
                newentry = {'agency': key, 'year': entry['data_year']}
                for arresttype in arrestskeys:
                    newentry[arresttype] = entry.get(arresttype)
                arrestinfo.append(newentry)

        except RequestException as e:
            print(f"Error occurred while making the API request: {str(e)}")
            ORI_dict_error[key] = ORI_dict[key]
            
# with open('arrestdata.pkd', 'wb') as f:
#      dill.dump(arrestinfo, f)

In [276]:
arrest_df = pd.DataFrame(arrestinfo).groupby(['agency', 'year'], as_index=False).sum().drop('data_year', axis = 1)

  arrest_df = pd.DataFrame(arrestinfo).groupby(['agency', 'year'], as_index=False).sum().drop('data_year', axis = 1)


### Joining crime data to agency info, clean dataframe

In [399]:
# merge mvt df with agency df for county information
big_df = mvt_df.merge(agency_df, left_on = 'agency', right_on = 'agency_name')
big_df = big_df.drop(columns = ['state_name', 'state_abbr', 'division_name', 'region_name', 'region_desc', 'ori',
                               'nibrs', 'nibrs_start_date', 'latitude', 'longitude', 'agency_name'])

In [400]:
# Merge results of different API requests
big_df = big_df.merge(burg_df, how = 'left')
big_df = big_df.merge(rob_df, how = 'left')
big_df = big_df.merge(larc_df, how = 'left')
big_df = big_df.merge(summarized_df, how = 'left')
big_df = big_df.merge(arrest_df, how = 'left')

In [401]:
## Distribute values for agencies with multiple counties
multicounty_df = big_df[big_df['county_name'].apply(lambda x: len(x) > 1)]
safecolumns = ['year', 'agency', 'county_name', 'agency_type_name']
new_rows = []
for index, row in multicounty_df.iterrows():
    agency = row['agency']
    county_percentages = county_dict.get(agency, {})
    for county, percentage in county_percentages.items():
        new_row = {}
        for column in multicounty_df.columns:
            if column not in safecolumns and not np.isnan(row[column]):
                new_row[column] = round(row[column] * percentage)
            else:
                new_row[column] = row[column]
            new_row['county_name'] = [county]
        new_rows.append(new_row)

# Create a new DataFrame from the new rows and merge with original
new_df = pd.DataFrame(new_rows)
big_df = pd.concat([big_df, new_df], ignore_index=True)

In [402]:
# Delete original multi-county rows, pop county names out of list
big_df = big_df[big_df['county_name'].str.len() < 2]
big_df['county_name'] = big_df['county_name'].apply(lambda x: x[0])
big_df = big_df[big_df['county_name'] != 'NOT SPECIFIED']

In [433]:
crime_df_agency = big_df.reset_index(drop = True)

In [435]:
#crime_df_agency.to_pickle('crime_df_agency.pkl')

In [405]:
## Display top counties for auto theft
pd.set_option('display.max_rows', 10)
crime_df_agency.groupby(['county_name'], as_index = False).sum() \
    .nlargest(10, 'mvt offense count')

  crime_df_agency.groupby(['county_name'], as_index = False).sum() \


Unnamed: 0,county_name,year,mvt stolen value total,mvt offense count,mvt cleared,burglary value total,burglary offense count,robbery value total,robbery offense count,larceny value total,...,Simple Assault,"Stolen Property: Buying, Receiving, Possessing",Suspicion,Vagrancy,Vandalism,"Weapons: Carrying, Possessing, Etc.",Opium or Cocaine or Their Derivatives,Other - Dangerous Nonnarcotic Drugs,Synthetic Narcotics,Drug Sale/Manufacturing - Subtotal
16,DENVER,88462,913413243,122601,14098.0,295312094.0,115347.0,36562085.0,25577.0,370442027.0,...,75273.0,2090.0,0.0,4275.0,19468.0,12164.0,7192.0,1947.0,142.0,10966.0
2,ARAPAHOE,512638,403571896,60753,4613.0,198830319.0,66053.0,21669511.0,13370.0,278315655.0,...,47502.0,1963.0,24.0,1029.0,14605.0,6911.0,2073.0,852.0,63.0,4403.0
0,ADAMS,390037,405224948,58361,5909.0,116761875.0,51354.0,11409325.0,6508.0,335521418.0,...,33553.0,3452.0,24.0,402.0,12868.0,5181.0,890.0,1267.0,221.0,3431.0
20,EL PASO,407991,372589847,46723,7723.0,216716805.0,84380.0,15314783.0,10964.0,299013763.0,...,44461.0,1543.0,5.0,42.0,18489.0,6921.0,1839.0,2036.0,105.0,5239.0
30,JEFFERSON,548755,317540970,43786,4690.0,107911718.0,50589.0,10201959.0,6274.0,207580533.0,...,33012.0,3353.0,16.0,217.0,14722.0,3561.0,341.0,720.0,410.0,2377.0
51,PUEBLO,122598,105319450,15651,1645.0,64918706.0,35721.0,4770579.0,4016.0,77779750.0,...,14451.0,179.0,0.0,92.0,3318.0,1186.0,550.0,240.0,43.0,958.0
62,WELD,876777,96267502,10010,2253.0,46107217.0,20015.0,2098369.0,1516.0,69737983.0,...,18901.0,804.0,3.0,54.0,5663.0,1636.0,397.0,740.0,171.0,1732.0
6,BOULDER,347818,84924091,9739,1947.0,54743718.0,23786.0,2374199.0,1668.0,122738289.0,...,16925.0,501.0,3.0,56.0,6042.0,1254.0,657.0,516.0,60.0,1790.0
35,LARIMER,245341,66904077,8731,1918.0,36855454.0,22742.0,2766497.0,1362.0,101014043.0,...,18391.0,1019.0,0.0,3340.0,8233.0,2446.0,346.0,1395.0,157.0,2486.0
39,MESA,243177,46758038,6438,1355.0,27644232.0,15554.0,905658.0,926.0,61193065.0,...,11911.0,314.0,7.0,34.0,4257.0,1625.0,402.0,955.0,251.0,2121.0


In [407]:
crime_df = crime_df_agency.groupby(['county_name', 'year'], as_index = False).sum(numeric_only = True)

In [9]:
crime_df

NameError: name 'crime_df' is not defined