# Collecting California police calls for service data

This notebook downloads tens of millions of police dispatch records from several California cities. Where possible, current data are downloaded from live sources while past years (typically 2015-2019) have been previously downloaded and concatenated. Dates and times are processed here, too, and then the concatenated file for each department is output in CSV format for analysis. [Questions](mailto:matt.stiles@latimes.com)?

In [6]:
import json
import glob
import io
import os
import pandas as pd
import numpy as np
import altair as alt
import altair_latimes as lat
pd.options.display.max_columns = 50
pd.options.display.max_rows = 50

## Los Angeles

#### URL codes for City of LA open data portal

In [1]:
codes = ['r4ka-x5je', 'nayp-w2tw', 'ryvm-a59m', 'xwgr-xw5q', 'tss8-455b', \
         'mgue-vbsx', 'urhh-yf63', 'i7pm-cnmm', '4tmc-7r6g', 'iy4q-t9vr']

In [3]:
for c in codes:
    !wget 'https://data.lacity.org/api/views/{c}/rows.csv?accessType=DOWNLOAD' \
    -P /Users/mhustiles/data/LAPD/

--2021-10-06 05:47:33--  https://data.lacity.org/api/views/r4ka-x5je/rows.csv?accessType=DOWNLOAD
Resolving data.lacity.org (data.lacity.org)... 52.206.68.26, 52.206.140.205, 52.206.140.199
Connecting to data.lacity.org (data.lacity.org)|52.206.68.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘/Users/mhustiles/data/LAPD/rows.csv?accessType=DOWNLOAD’

rows.csv?accessType     [                <=> ] 136.63M  3.55MB/s    in 40s     

2021-10-06 05:48:14 (3.41 MB/s) - ‘/Users/mhustiles/data/LAPD/rows.csv?accessType=DOWNLOAD’ saved [143268193]

--2021-10-06 05:48:14--  https://data.lacity.org/api/views/nayp-w2tw/rows.csv?accessType=DOWNLOAD
Resolving data.lacity.org (data.lacity.org)... 52.206.140.205, 52.206.140.199, 52.206.68.26
Connecting to data.lacity.org (data.lacity.org)|52.206.140.205|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘/Users/mhustiles/data/LAPD/r

In [4]:
names = ['Incident Number','Reporting District','Area Occurred','Dispatch Date','Dispatch Time','Call Type Code','Call Type Description']
dtypes = { 'Incident Number':str, 'Area Occurred': str}

#### Read the most recent year of calls

In [5]:
la_current = pd.read_csv('https://data.lacity.org/api/views/84iq-i2r6/rows.csv?accessType=DOWNLOAD',\
                         header=0,\
                         names=names,\
                         dtype=dtypes,
                        parse_dates=True)

NameError: name 'pd' is not defined

In [5]:
la_current.columns = la_current.columns.str.strip().str.lower().str.replace(' ','_')\
    .str.replace('(', '').str.replace(')', '').str.replace('/','_')

  


#### Read the files for past years

In [6]:
path = '/Users/mhustiles/data/LAPD/'
files = glob.glob(os.path.join(path, "*.csv"))

In [7]:
file_df = (pd.read_csv(f, encoding = "ISO-8859-1", low_memory=False)\
           .assign(year=os.path.basename(f)) for f in files)

In [8]:
la_past = pd.concat(file_df, ignore_index=True)

#### Concatenate everything

In [9]:
df = pd.concat([la_current, la_past]).drop(['year'], axis=1)

#### Clean up dates

In [10]:
df['date'] = df['dispatch_date'].str.replace(' 12:00:00 AM','', regex=False)

In [11]:
df.head()

Unnamed: 0,incident_number,reporting_district,area_occurred,dispatch_date,dispatch_time,call_type_code,call_type_description,date
0,PD20032400001494,Van Nuys,906.0,03/24/2020 12:00:00 AM,09:34:26,006,CODE 6,03/24/2020
1,PD20032800004517,Central,182.0,03/28/2020 12:00:00 AM,21:19:40,242DS,DOM VIOL SUSP,03/28/2020
2,PD20032500004416,Outside,,03/25/2020 12:00:00 AM,18:18:34,006,CODE 6,03/25/2020
3,PD20032500003685,Outside,,03/25/2020 12:00:00 AM,16:14:42,006,CODE 6,03/25/2020
4,PD20032300004227,Hollywood,668.0,03/23/2020 12:00:00 AM,18:22:28,9212,TRESPASS SUSP,03/23/2020


In [12]:
df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y')
df['time'] = pd.to_datetime(df['dispatch_time'], format='%H:%M:%S')

In [13]:
df['date'] = df['date'].dt.date
df['time'] = df['time'].dt.time

In [14]:
df = df.drop(['dispatch_time', 'dispatch_date'], axis=1).reset_index()

#### Export full data frame

In [15]:
df.to_csv('/Users/mhustiles/data/data/LA/calls/la/los_angeles.csv', index=None)

In [16]:
df.head()

Unnamed: 0,index,incident_number,reporting_district,area_occurred,call_type_code,call_type_description,date,time
0,0,PD20032400001494,Van Nuys,906.0,006,CODE 6,2020-03-24,09:34:26
1,1,PD20032800004517,Central,182.0,242DS,DOM VIOL SUSP,2020-03-28,21:19:40
2,2,PD20032500004416,Outside,,006,CODE 6,2020-03-25,18:18:34
3,3,PD20032500003685,Outside,,006,CODE 6,2020-03-25,16:14:42
4,4,PD20032300004227,Hollywood,668.0,9212,TRESPASS SUSP,2020-03-23,18:22:28


---

## San Diego

In [17]:
# https://data.sandiego.gov/datasets/police-calls-for-service/

#### Get the most recent year of calls

In [18]:
sd_current = pd.read_csv('http://seshat.datasd.org/pd/pd_calls_for_service_2020_datasd.csv', low_memory=False)

#### Get the current and past years and concatenate them

In [19]:
sd_past = pd.read_csv('/Users/mhustiles/data/data/LA/calls/san-diego/san_diego_2015_2019.csv', low_memory=False)

In [20]:
sd_df = pd.concat([sd_current,sd_past])

#### Clean up headers and dates

In [21]:
sd_df.columns = sd_df.columns.str.strip().str.lower().str.replace(' ','_')\
    .str.replace('(', '').str.replace(')', '').str.replace('/','_')

  


In [22]:
sd_df['date_time'] = pd.to_datetime(sd_df['date_time'], errors='coerce', format='%Y-%m-%d %H:%M:%S')

In [23]:
sd_df['date'] = sd_df['date_time'].dt.date
sd_df['time'] = sd_df['date_time'].dt.time

In [24]:
sd_df.head()

Unnamed: 0,incident_num,date_time,day_of_week,address_number_primary,address_dir_primary,address_road_primary,address_sfx_primary,address_dir_intersecting,address_road_intersecting,address_sfx_intersecting,call_type,disposition,beat,priority,date,time
0,E20010000001,2020-01-01 00:00:09,4,400,,06TH,AVE,,,,11-8,A,523,0,2020-01-01,00:00:09
1,E20010000002,2020-01-01 00:00:20,4,5000,,UNIVERSITY,AVE,,,,FD,K,826,2,2020-01-01,00:00:20
2,E20010000003,2020-01-01 00:00:21,4,800,,SAWTELLE,AVE,,,,AU1,W,434,1,2020-01-01,00:00:21
3,E20010000004,2020-01-01 00:00:32,4,5000,,UNIVERSITY,AVE,,,,FD,K,826,2,2020-01-01,00:00:32
4,E20010000005,2020-01-01 00:00:42,4,5200,,CLAIREMONT MESA,BLV,,,,415V,K,111,1,2020-01-01,00:00:42


#### Export full data frame

In [25]:
sd_df.to_csv('/Users/mhustiles/data/data/LA/calls/san-diego/san_diego.csv', index=None)

In [26]:
len(sd_df)

3504430

---

## San Jose

In [27]:
# https://data.sanjoseca.gov/dataset/police-calls-for-service

#### Get the most recent year

In [28]:
sj_current = pd.read_csv('https://data.sanjoseca.gov/dataset/c5929f1b-7dbe-445e-83ed-35cca0d3ca8b/resource/aa926acb-63e0-425b-abea-613d293b5b46/download/policecalls2020.csv',\
                        low_memory=False)

#### Get past years and concatenate them

In [29]:
sj_past = pd.read_csv('/Users/mhustiles/data/data/LA/calls/san-jose/san_jose_2015_2019.csv', low_memory=False)

In [30]:
sj_df = pd.concat([sj_current,sj_past])

#### Clean up headers and dates

In [31]:
sj_df.columns = sj_df.columns.str.strip().str.lower().str.replace(' ','_')\
    .str.replace('(', '').str.replace(')', '').str.replace('/','_')

  


In [32]:
sj_df = sj_df[sj_df['offense_date'] != 'OFFENSE_DATE']

In [33]:
sj_df['date'] = pd.to_datetime(sj_df['offense_date'])
sj_df['time'] = pd.to_datetime(sj_df['offense_time'], errors='coerce', format='%H:%M:%S')

In [34]:
sj_df['date'] = sj_df['date'].dt.date
sj_df['time'] = sj_df['time'].dt.time

In [35]:
sj_df.head()

Unnamed: 0,cdts,eid,start_date,call_number,priority,report_date,offense_date,offense_time,calltype_code,call_type,final_dispo_code,final_dispo,common_place_name,address,city,state,date,time
0,20200101002621PS,7981569,1/1/2020 12:00:00 AM,P200010004,2,1/1/2020 12:00:00 AM,1/1/2020 12:00:00 AM,00:03:08,1057,FIREARMS DISCHARGED,N,No report required; dispatch record only,,[900]-[1000] FURLONG DR,San Jose,CA,2020-01-01,00:03:08
1,20200101021606PS,7981674,1/1/2020 12:00:00 AM,P200010092,4,1/1/2020 12:00:00 AM,1/1/2020 12:00:00 AM,01:03:05,415,DISTURBANCE,CAN,Canceled,,[0]-[100] PALM VALLEY BL,San Jose,CA,2020-01-01,01:03:05
2,20200101021613PS,7981725,1/1/2020 12:00:00 AM,P200010134,5,1/1/2020 12:00:00 AM,1/1/2020 12:00:00 AM,01:50:32,647F,DRUNK IN PUBLIC,N,No report required; dispatch record only,,S 1ST ST & E SAN SALVADOR ST,San Jose,CA,2020-01-01,01:50:32
3,20200101021628PS,7981722,1/1/2020 12:00:00 AM,P200010131,2,1/1/2020 12:00:00 AM,1/1/2020 12:00:00 AM,01:49:04,415A,"DISTURBANCE, FIGHT",N,No report required; dispatch record only,,NERDY AV & CAS DR,San Jose,CA,2020-01-01,01:49:04
4,20200101021642PS,7981640,1/1/2020 12:00:00 AM,P200010064,2,1/1/2020 12:00:00 AM,1/1/2020 12:00:00 AM,00:43:04,911UNK,UNK TYPE 911 CALL,N,No report required; dispatch record only,,[0]-[100] MERIDIAN AV,San Jose,CA,2020-01-01,00:43:04


In [36]:
len(sj_df)

1879739

#### Export full data frame

In [37]:
sj_df.to_csv('/Users/mhustiles/data/data/LA/calls/san-jose/san_jose.csv', index=None)

---

## San Francisco

In [38]:
# https://data.sfgov.org/Public-Safety/Police-Department-Calls-for-Service/hz9m-tj6z

#### Get the most recent year

In [39]:
df_sf = pd.read_csv('/Users/mhustiles/data/data/LA/calls/san-francisco/Police_Department_Calls_for_Service.csv')

In [40]:
df_sf.columns = df_sf.columns.str.strip().str.lower().str.replace(' ','_')\
    .str.replace('(', '').str.replace(')', '').str.replace('/','_')

  


#### Clean up headers and dates

In [41]:
df_sf['date_time'] = pd.to_datetime(df_sf['call_date_time'], errors='coerce', format='%m/%d/%Y %H:%M:%S %p')

In [42]:
df_sf['date'] = df_sf['date_time'].dt.date
df_sf['time'] = df_sf['date_time'].dt.time

In [43]:
df_sf.head()

Unnamed: 0,crime_id,original_crime_type_name,report_date,call_date,offense_date,call_time,call_date_time,disposition,address,city,state,agency_id,address_type,common_location,date_time,date,time
0,193043877,Vandalism,10/31/2019,10/31/2019,10/31/2019,20:46,10/31/2019 08:46:00 PM,UTL,300 Block Of Toland St,San Francisco,CA,1,Premise Address,,2019-10-31 08:46:00,2019-10-31,08:46:00
1,190020633,Trespasser,01/02/2019,01/02/2019,01/02/2019,07:58,01/02/2019 07:58:00 AM,GOA,300 Block Of Industrial St,San Francisco,CA,1,Premise Address,,2019-01-02 07:58:00,2019-01-02,07:58:00
2,190214007,Passing Call,01/21/2019,01/21/2019,01/21/2019,23:07,01/21/2019 11:07:00 PM,HAN,Larkin St/golden Gate Av,San Francisco,CA,1,Intersection,,2019-01-21 11:07:00,2019-01-21,11:07:00
3,190271719,22500e,01/27/2019,01/27/2019,01/27/2019,12:48,01/27/2019 12:48:00 PM,GOA,3400 Block Of Divisadero St,San Francisco,CA,1,Premise Address,,2019-01-27 12:48:00,2019-01-27,12:48:00
4,190390109,Hot,02/08/2019,02/08/2019,02/08/2019,00:55,02/08/2019 12:55:00 AM,NOM,700 Block Of Corbett Av,San Francisco,CA,1,Premise Address,,2019-02-08 12:55:00,2019-02-08,12:55:00


In [44]:
len(df_sf)

3446073

In [45]:
df_sf.to_csv('/Users/mhustiles/data/data/LA/calls/san-francisco/san_francisco.csv', index=None)

---

## Sacramento

In [46]:
# http://data.cityofsacramento.org/datasets/9efe7653009b448f8d177c1da0cc068f_0

#### Get the most recent year

In [47]:
df_current = pd.read_csv('https://opendata.arcgis.com/datasets/9efe7653009b448f8d177c1da0cc068f_0.csv', low_memory=False)

In [48]:
df_current = df_current.rename(columns={'OBJECTID':'FID', 'Occurence_Date':'Occurence_DateTime', 'Received_Date':'Received_DateTime',
       'Dispatch_Date':'Dispatch_DateTime', 'Enroute_Date':'Enroute_DateTime', 'At_Scene_Date':'At_Scene_DateTime',
       'Clear_Date':'Clear_DateTime'})

#### Get the past years and concatenate them all

In [49]:
df_past = pd.read_csv('/Users/mhustiles/data/data/LA/calls/sacramento/sacramento_2015_2019.csv', low_memory=False)

In [50]:
df_sac = pd.concat([df_current,df_past])

#### Clean up headers and dates

In [51]:
df_sac.columns = df_sac.columns.str.strip().str.lower().str.replace(' ','_')\
    .str.replace('(', '').str.replace(')', '').str.replace('/','_')

  


In [52]:
df_sac['occurence_datetime'] = df_sac['occurence_datetime'].str.replace('+00', '', regex=False)

In [53]:
df_sac['occurence_datetime'] = pd.to_datetime(df_sac['occurence_datetime'], errors='coerce', format='%Y/%m/%d %H:%M:%S')

In [54]:
df_sac['date'] = df_sac['occurence_datetime'].dt.date
df_sac['time'] = df_sac['occurence_datetime'].dt.time

In [55]:
df_sac.head()

Unnamed: 0,x,y,fid,record_id,call_type,description,reporting_officer,unit_id,report_created,location,police_district,beat,grid,x_coordinate,y_coordinate,day_of_week,occurence_datetime,received_datetime,dispatch_datetime,enroute_datetime,at_scene_datetime,clear_datetime,date,time
0,-121.505698,38.596674,1,7799679,AU,ALL UNITS BROADCAST,,,N,JIBBOOM ST / RICHARDS BLVD,3,3A,721,6702937,1979502,Wed,2020-01-01 22:40:45,2020/01/01 22:40:45+00,1970/01/01 00:00:00+00,1970/01/01 00:00:00+00,1970/01/01 00:00:00+00,2020/01/01 22:52:52+00,2020-01-01,22:40:45
1,-121.52461,38.631335,2,7799603,952,INCOMPLETE CALL FOR POLICE,529.0,1B16,N,3419 LOGGERHEAD WAY,1,1A,344,6697466,1992096,Wed,2020-01-01 21:24:21,2020/01/01 21:24:21+00,2020/01/01 21:39:35+00,2020/01/01 21:39:41+00,1970/01/01 00:00:00+00,2020/01/01 21:40:04+00,2020-01-01,21:24:21
2,-121.430754,38.616285,3,7799929,996,FOUND PROPERTY,283.0,RT31,N,2700 ACADEMY WAY,2,2B,556,6724311,1986769,Wed,2020-01-02 02:47:07,2020/01/02 02:47:07+00,2020/01/02 02:47:07+00,2020/01/02 02:47:07+00,2020/01/02 02:47:07+00,2020/01/02 02:48:43+00,2020-01-02,02:47:07
3,-121.505159,38.655728,4,7800047,ALMSEC,ALARM-SECURE NO EVID OF CRIME,904.0,1C19,N,1912 DEL PASO RD,1,1A,306,6702974,2001009,Wed,2020-01-02 00:47:12,2020/01/02 00:47:12+00,2020/01/02 03:24:19+00,2020/01/02 03:24:19+00,2020/01/02 05:12:24+00,2020/01/02 05:17:31+00,2020-01-02,00:47:12
4,-121.431055,38.638155,5,7807163,503RPT,STOLEN VEHICLE-REPORT,6231.0,,N,1544 HARRIS AVE,2,2A,505,6724175,1994733,Wed,2020-01-01 10:56:25,2020/01/01 10:56:25+00,1970/01/01 00:00:00+00,1970/01/01 00:00:00+00,1970/01/01 00:00:00+00,2020/01/01 20:29:09+00,2020-01-01,10:56:25


In [56]:
len(df_sac)

2019367

#### Export

In [57]:
df_sac.to_csv('/Users/mhustiles/data/data/LA/calls/sacramento/sacramento.csv', index=None)

---

## Santa Monica

In [58]:
# https://data.smgov.net/Public-Safety/Police-Calls-for-Service/ia9m-wspt
sm_df = pd.read_csv('/Users/mhustiles/data/data/LA/calls/santa_monica/Police_Calls_for_Service.csv', low_memory=False)

In [59]:
sm_df.columns = sm_df.columns.str.strip().str.lower().str.replace(' ','_')\
    .str.replace('(', '').str.replace(')', '').str.replace('/','_')

  


#### Clean up headers and dates

In [60]:
sm_df['received_time'] = pd.to_datetime(sm_df['received_time'], format='%m/%d/%Y %H:%M:%S %p')

In [61]:
sm_df['date'] = sm_df['received_time'].dt.date
sm_df['time'] = sm_df['received_time'].dt.time

In [62]:
sm_df.head()

Unnamed: 0,incident_number,call_type,incident_date,location,beat,reporting_district,received_time,cleared_time,disposition,latitude,longitude,map_point,census_block_2000_geoid,census_tract_2000_geoid,census_block_2010_geoid,census_tract_2010_geoid,date,time
0,0000000+0,Disturbance at a Business,06/15/2008,300BLK WILSHIRE BLVD,A003,03A1,2008-06-15 03:24:24,06/15/2008 04:25:57 PM,Arrest,34.018684,-118.498718,"(34.018683505093, -118.49871764276)",60377020000000.0,6037702000.0,,,2008-06-15,03:24:24
1,000000000,Theft Suspect in Custody,08/05/2008,300BLK COLORADO AVE,A003,12F1,2008-08-05 08:11:40,08/06/2008 02:51:09 AM,Arrest,34.013351,-118.492491,"(34.013351216062, -118.49249072852)",60377020000000.0,6037702000.0,,,2008-08-05,08:11:40
2,0000000D0,Battery Now,05/03/2007,1400BLK OLYMPIC BLVD,0006,006B,2007-05-03 02:00:36,05/03/2007 05:01:14 PM,Other,34.019723,-118.480514,"(34.019723295419, -118.480514236158)",60377020000000.0,6037702000.0,,,2007-05-03,02:00:36
3,060000001,Party Complaint,01/01/2006,1000BLK 20TH ST,0005,005D,2006-01-01 12:02:03,01/01/2006 12:41:38 AM,Advisal,34.033427,-118.485564,"(34.033426730853, -118.48556375277)",60377020000000.0,6037702000.0,,,2006-01-01,12:02:03
4,060000003,Municipal Code Violation,01/01/2006,800BLK PACIFIC ST,0007,007A,2006-01-01 12:04:34,01/01/2006 12:30:47 AM,Checks Okay,34.010268,-118.480374,"(34.010267744622, -118.480374045543)",60377020000000.0,6037702000.0,,,2006-01-01,12:04:34


In [63]:
len(sm_df)

1696503

#### Export

In [64]:
sm_df.to_csv('/Users/mhustiles/data/data/LA/calls/santa_monica/santa_monica.csv', index=None)

---

## How many records do we have?

In [65]:
len(sm_df) + len(df_sac) + len(df_sf) + len(sj_df) + len(sd_df) + len(la_df)

NameError: name 'la_df' is not defined