## Aggregate site of crime APIs


  - https://rapidapi.com/collection/crime

## White Collar Crime Definition

Two post I found to figure out the definition to white collar crime and what constitute
  - https://ucr.fbi.gov/nibrs/nibrs_wcc.pdf
  - https://www.fbi.gov/investigate/white-collar-crime
  
## Project Presentation

Link to our project presentation
  - https://docs.google.com/presentation/d/1vRDEG75763FbuR4u8t4vXNta9lzOx1x-flDJ1Tv1bbM/edit?usp=sharing
  
## FBI Crime Data Explore github page

github page for fbi crime data frontend
  - https://github.com/fbi-cde/crime-data-frontend/blob/master/README.md
  
github page for fbi crime data api
  - https://github.com/fbi-cde/crime-data-api
  #### fbi crime data api support notes
    - https://github.com/fbi-cde/crime-data-api/blob/master/api_support_notes.md
    
## Articles
  - https://techjury.net/blog/white-collar-crime-statistics/#gref
  
  - https://www.ojp.gov/pdffiles1/bjs/grants/248667.pdf
  
Powerpoint about using python, postgres, R to analyze data on NIBRS
  - http://washstat.org/presentations/20190923/Thomas_Ian.pdf
  
## Definition

ORI = Originating Agency Identifier


## Census information

The Census API
  - https://www.census.gov/data/developers/guidance/api-user-guide.Example_API_Queries.html
  - https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf
  - https://api.census.gov/data/2019/acs/acs1/examples.html
  
The Census package
  - https://pypi.org/project/census/

In [1]:
# Installing the census library
#pip install census

In [1]:
# Dependencies
import numpy as np
import json
import pandas as pd
import matplotlib.pyplot as plt
import requests
import time

from config import crimekey
from config import censuskey
from pprint import pprint
from census import Census

In [2]:
# API Setup
base_url = "https://api.usa.gov/crime/fbi/sapi/"
summary_state_url = "api/summarized/state/"
agency_by_state_url = "api/agencies/byStateAbbr/"
c = Census(censuskey, year=2019)

In [3]:
# List of types of offenses for
#offense = ["aggravated-assault", "burglary", "larceny", "motor-vehicle-theft", "homicide", "rape", "robbery", "arson", "violent-crime", "property-crime"]
offense_theft = ["burglary", "larceny", "motor-vehicle-theft", "robbery", "property-crime"]

# Empty list to store fbi crime url by theft offenses
fbi_crime_url_list = []

# For-loop to get fbi crime url by theft offenses for page = 0 only
for crime in offense_theft:
    fbi_crime_url = (f"{base_url}{summary_state_url}CA/{crime}/2019/2019?page=0&API_KEY={crimekey}")
    fbi_crime_url_list.append(fbi_crime_url)
print("Finished FBI Crime url")

Finished FBI Crime url


In [4]:
# Getting the url for ORI per state (mainly CA)
agency_list_url = f"{base_url}{agency_by_state_url}CA?API_KEY={crimekey}"
agency_list = requests.get(agency_list_url).json()

# Empty list to store ORI agency for CA
ca_agency_list = []

# For-loop to get ORI agency for CA specific to ORANGE County
for element in range(len(agency_list["results"])):
    if agency_list["results"][element]["county_name"] == "ORANGE":
        #pprint(agency_list["results"][element])
        ca_agency_list.append(agency_list["results"][element])
print("Finihsed CA Agency List appending")

Finihsed CA Agency List appending


In [5]:
ca_agency_list

[{'ori': 'CA0300000',
  'agency_name': "Orange County Sheriff's Office",
  'agency_type_name': 'County',
  'state_name': 'California',
  'state_abbr': 'CA',
  'division_name': 'Pacific',
  'region_name': 'West',
  'region_desc': 'Region IV',
  'county_name': 'ORANGE',
  'nibrs': False,
  'latitude': 33.675687,
  'longitude': -117.777207,
  'nibrs_start_date': None},
 {'ori': 'CA0300100',
  'agency_name': 'Anaheim Police Department',
  'agency_type_name': 'City',
  'state_name': 'California',
  'state_abbr': 'CA',
  'division_name': 'Pacific',
  'region_name': 'West',
  'region_desc': 'Region IV',
  'county_name': 'ORANGE',
  'nibrs': False,
  'latitude': 33.675687,
  'longitude': -117.777207,
  'nibrs_start_date': None},
 {'ori': 'CA0300200',
  'agency_name': 'Brea Police Department',
  'agency_type_name': 'City',
  'state_name': 'California',
  'state_abbr': 'CA',
  'division_name': 'Pacific',
  'region_name': 'West',
  'region_desc': 'Region IV',
  'county_name': 'ORANGE',
  'nibrs':

In [7]:
start_time = time.time()    # start timer

# Testing to see how API request pulls on burglary and first page of this offense
getdatasets = requests.get(fbi_crime_url_list[0])
datasets_json = getdatasets.json()
#pprint(datasets_json)

end_time = time.time()    # end timer
time_diff = end_time - start_time    # time difference

# Prints runtime
print(f"start time: {start_time}; end time: {end_time}; time diff: {time_diff}")

datasets_json

start time: 1643626935.7508721; end time: 1643626939.9802518; time diff: 4.229379653930664


{'results': [{'ori': 'CA0010000',
   'data_year': 2019,
   'offense': 'burglary',
   'state_abbr': 'CA',
   'cleared': 42,
   'actual': 354,
   'data_range': None},
  {'ori': 'CA0010100',
   'data_year': 2019,
   'offense': 'burglary',
   'state_abbr': 'CA',
   'cleared': 23,
   'actual': 218,
   'data_range': None},
  {'ori': 'CA0010200',
   'data_year': 2019,
   'offense': 'burglary',
   'state_abbr': 'CA',
   'cleared': 6,
   'actual': 105,
   'data_range': None},
  {'ori': 'CA0010300',
   'data_year': 2019,
   'offense': 'burglary',
   'state_abbr': 'CA',
   'cleared': 119,
   'actual': 771,
   'data_range': None},
  {'ori': 'CA0010400',
   'data_year': 2019,
   'offense': 'burglary',
   'state_abbr': 'CA',
   'cleared': 10,
   'actual': 95,
   'data_range': None},
  {'ori': 'CA0010500',
   'data_year': 2019,
   'offense': 'burglary',
   'state_abbr': 'CA',
   'cleared': 48,
   'actual': 547,
   'data_range': None},
  {'ori': 'CA0010600',
   'data_year': 2019,
   'offense': 'burgla

In [8]:
# Find crime rate for each state
#stateAbbs = ['AL', 'AK', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS',
#            'KY', 'LA', 'MA', 'ME', 'MD', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 
#            'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']
stateAbbs = ["CA"]
statedict = {}
record = 1

start_time = time.time()    # start timer

# loop through all states and pull data for each
for item in stateAbbs:
    page = 0
    query_url = "https://api.usa.gov/crime/fbi/sapi/api/summarized/state/" + item + "/burglary/2019/2019?API_KEY=" + crimekey
    # Get response into JSON
    stateresponse = requests.get(query_url)
    statejson = stateresponse.json()
    pages = statejson['pagination']['pages']
    actuals = 0
    try:
        for j in range(pages):
            query_url = "https://api.usa.gov/crime/fbi/sapi/api/summarized/state/" + item + "/burglary/2019/2019?page=" + str(j) + "&API_KEY=" + crimekey
            # Get response into JSON
            stateresponse = requests.get(query_url)
            statejson = stateresponse.json()   
            x = len(statejson['results'])
            # Log each state and account for exceptions
            try:
                # Collect crime data and put it into a dictionary
                for i in range(x):
                    if statejson['results'][i]["ori"] == "CA0302600":
                        pprint(statejson['results'][i])
                    #actuals = actuals + statejson['results'][i]['actual']
                    #statedict[item] = [actuals]
                print(f"Processing record {record} | {item}")
                record += 1

            # Exception if state data isn't found
            except:
                print(f"Data not found for {item}. Skipping...")
    except:
        print(f"Page(s) not found for {item}. Skipping...")

print("----------Job complete!----------")

end_time = time.time()    # end timer
time_diff = end_time - start_time    # time difference

# Prints runtime
print(f"start time: {start_time}; end time: {end_time}; time diff: {time_diff}")

Processing record 1 | CA
Processing record 2 | CA
Processing record 3 | CA
Processing record 4 | CA
Processing record 5 | CA
Processing record 6 | CA
Processing record 7 | CA
Processing record 8 | CA
Processing record 9 | CA
Processing record 10 | CA
Processing record 11 | CA
Processing record 12 | CA
Processing record 13 | CA
Processing record 14 | CA
Processing record 15 | CA
Processing record 16 | CA
Processing record 17 | CA
Processing record 18 | CA
{'ori': 'CA0302600', 'data_year': 2019, 'offense': 'burglary', 'state_abbr': 'CA', 'cleared': 86, 'actual': 574, 'data_range': None}
Processing record 19 | CA
Processing record 20 | CA
Processing record 21 | CA
Processing record 22 | CA
Processing record 23 | CA
Processing record 24 | CA
Processing record 25 | CA
Processing record 26 | CA
Processing record 27 | CA
Processing record 28 | CA
Processing record 29 | CA
Processing record 30 | CA
Processing record 31 | CA
Processing record 32 | CA
Processing record 33 | CA
Processing record 

In [7]:
# Convert the dictionary into a dataframe
statedf = pd.DataFrame.from_dict(statedict, orient='index')
statedf

Unnamed: 0,0
CA,151664


In [76]:
# Run Census Search to retrieve data on all states
# Note the addition of "B23025_005E" for unemployment count
# c.acs5.tables() is really helpful in figuring out the "B19013_001E" and etc type of codes

#census_data = c.acs5.get(("NAME", "B19013_001E", "B01003_001E", "B01002_001E", "B19301_001E", "B17001_002E", "B23025_005E"), {'for': 'state:*'})
#census_data = c.acs5.get(("NAME", "B19013_001E", "B01003_001E", "B01002_001E", "B19301_001E", "B17001_002E", "B23025_005E"), {'for': 'county:*', 'in': 'state:*'})
census_data = c.acs5.get(("NAME", "B19013_001E", "B01003_001E", "B01002_001E", "B19301_001E", "B17001_002E", "B23025_005E"), {'for': 'county subdivision:*', 'in': 'state:06 county:*'})

#census_data = c.acs5.get(("NAME", "B19013_001E", "B01003_001E", "B01002_001E",
#                          "B19301_001E",
#                          "B17001_002E"), {'for': 'zip code tabulation area:*'})

start_time = time.time()    # start timer

# Convert to DataFrame
census_pd = pd.DataFrame(census_data)

# Column Reordering
#census_pd = census_pd.rename(columns={"B01003_001E": "Population",
#                                      "B01002_001E": "Median Age",
#                                      "B19013_001E": "Household Income",
#                                      "B19301_001E": "Per Capita Income",
#                                      "B17001_002E": "Poverty Count",
#                                      "B23025_005E": "Unemployment Count",
#                                      "NAME": "Name", "state": "State"})

census_pd = census_pd.rename(columns={"B01003_001E": "Population",
                                      "B01002_001E": "Median Age",
                                      "B19013_001E": "Household Income",
                                      "B19301_001E": "Per Capita Income",
                                      "B17001_002E": "Poverty Count",
                                      "B23025_005E": "Unemployment Count",
                                      "NAME": "Name",
                                      "state": "State",
                                      "county": "County"})

# Add in Poverty Rate (Poverty Count / Population)
census_pd["Poverty Rate"] = 100 * \
    census_pd["Poverty Count"].astype(
        int) / census_pd["Population"].astype(int)

# Add in Employment Rate (Employment Count / Population)
census_pd["Unemployment Rate"] = 100 * \
    census_pd["Unemployment Count"].astype(
        int) / census_pd["Population"].astype(int)

# Final DataFrame
census_pd = census_pd[["State", "Name", "County", "Population", "Median Age", "Household Income",
                       "Per Capita Income", "Poverty Count", "Poverty Rate", "Unemployment Rate"]]

end_time = time.time()    # end timer
time_diff = end_time - start_time    # time difference

# Prints runtime
print(f"start time: {start_time}; end time: {end_time}; time diff: {time_diff}")


#census_pd

#census_pd["State"].value_counts()

#ca = census_pd.loc[(census_pd["State"] == "06"), :]
#ca

ca = census_pd.loc[(census_pd["County"] == "059"), :]
ca

#census_pd["County Subdiv"].value_counts()

start time: 1643634601.012873; end time: 1643634601.018647; time diff: 0.005774021148681641


Unnamed: 0,State,Name,County,Population,Median Age,Household Income,Per Capita Income,Poverty Count,Poverty Rate,Unemployment Rate
96,6,"Irvine-Lake Forest CCD, Orange County, California",59,270166.0,37.3,110315.0,50042.0,21756.0,8.052827,2.225299
97,6,"Mission Viejo CCD, Orange County, California",59,237286.0,40.7,125713.0,55702.0,11126.0,4.688856,2.059961
247,6,"Anaheim-Santa Ana-Garden Grove CCD, Orange Cou...",59,1701602.0,35.7,80009.0,31836.0,212933.0,12.513678,2.610893
248,6,"Central Coast CCD, Orange County, California",59,258293.0,36.3,98410.0,59553.0,34228.0,13.251617,2.273387
249,6,"South Coast CCD, Orange County, California",59,320302.0,45.8,100160.0,59282.0,21650.0,6.759246,2.151407
345,6,"Silverado CCD, Orange County, California",59,2650.0,49.9,109744.0,51442.0,112.0,4.226415,2.075472
362,6,"North Coast CCD, Orange County, California",59,377745.0,43.4,85726.0,42630.0,38243.0,10.124025,2.390766
