## CHP incident data

The purpose of this notebook to download and filter CHP incidents data.

The output is a file named "chp_incidents_for_postgis.csv" that can be loaded to PostGIS using the "load_chp_incidents.sql" script.

The local directory structure used for this notebook is below.  You may need to modify the data loading function calls if your directory structure is different.

```
    |-- data
         |-- waze.csv
         |-- notebooks
                |-- CHP_Incident_Data.ipynb
         |-- opendata
                |-- all_text_chp_incidents_month_2017_02.txt
                |-- all_text_chp_incidents_month_2017_03.txt
                |-- all_text_chp_incidents_month_2017_04.txt
                |-- all_text_chp_incidents_month_2017_05.txt
                |-- all_text_chp_incidents_month_2017_06.txt
```

### Load CHP incident data

In [1]:
import pandas as pd

In [2]:
chp_2017_02 = pd.read_csv('../opendata/all_text_chp_incidents_month_2017_02.txt', header=None)
chp_2017_03 = pd.read_csv('../opendata/all_text_chp_incidents_month_2017_03.txt', header=None)
chp_2017_04 = pd.read_csv('../opendata/all_text_chp_incidents_month_2017_04.txt', header=None)
chp_2017_05 = pd.read_csv('../opendata/all_text_chp_incidents_month_2017_05.txt', header=None)
chp_2017_06 = pd.read_csv('../opendata/all_text_chp_incidents_month_2017_06.txt', header=None)

chp_incident_df = pd.concat([chp_2017_02, chp_2017_03, chp_2017_04, chp_2017_05, chp_2017_06])

chp_inc_headers = ['incident_id','cc_code','incident_number','timestamp',
                   'description','location','area','zoom_map','tb_xy',
                   'latitude','longitude','district','county_fips_id',
                   'city_fips_id','freeway_number','freeway_direction',
                   'state_postmile','absolute_postmile','severity',
                   'duration']

chp_incident_df.columns = chp_inc_headers

In [3]:
chp_incident_df.shape

(215872, 20)

In [4]:
chp_incident_df.head()

Unnamed: 0,incident_id,cc_code,incident_number,timestamp,description,location,area,zoom_map,tb_xy,latitude,longitude,district,county_fips_id,city_fips_id,freeway_number,freeway_direction,state_postmile,absolute_postmile,severity,duration
0,16985679,SAHB,170201,02/01/2017 00:00:00,1125-Traffic Hazard,Us50 E / Cameron Park Dr Onr,Placerville,,,38.659374,-120.965157,3.0,17.0,,50,E,6.802,35.6,,4.0
1,16985685,INHB,170201,02/01/2017 00:01:00,1182-Trfc Collision-No Inj,I5 S / El Toro Rd,Capistrano,,,33.614552,-117.707706,12.0,59.0,39220.0,5,S,18.705,90.9,,40.0
2,16985691,GGHB,170201,02/01/2017 00:08:00,1182-Trfc Collision-No Inj,I580 E / Lakeshore Ave Onr,Oakland,,,37.808592,-122.242804,4.0,1.0,53000.0,580,E,43.115,59.2,,15.0
3,16985692,LAHB,170201,02/01/2017 00:12:00,1125-Traffic Hazard,I210 W / E Foothill Blvd Pas,Altadena,,,34.149919,-118.088107,7.0,37.0,56000.0,210,W,R28.92,28.9,,21.0
4,16985693,LAHB,170201,02/01/2017 00:11:00,1179-Trfc Collision-1141 Enrt,Us101 N / Universal Studios Blvd,Central LA,,,34.133306,-118.352405,7.0,37.0,44000.0,101,N,9.652,11.0,,57.0


### Filter CHP incident data to appropriate lon/lat range

In [5]:
import numpy as np
import psycopg2 as pg

In [6]:
# replace database inputs as appropriate
conn_str = "host={} dbname={} user={} password={}".format('localhost', 'waze', 'postgres', 'password')
conn = pg.connect(conn_str)

In [7]:
linestrings = pd.read_sql('select ST_AsText(geom) from congestion', con=conn)
ls = linestrings.values

In [8]:
# extract lon/lat from linestring and flatten list
lonlats = [l[0].split('(')[1][:-1].split(',') for l in ls]
flat_lonlats = [item for sublist in lonlats for item in sublist]

In [9]:
# get lons and lats
lons = [float(l.split()[0]) for l in flat_lonlats]
lats = [float(l.split()[1]) for l in flat_lonlats]

In [10]:
maxlon = np.max(lons)
minlon = np.min(lons)
maxlat = np.max(lats)
minlat = np.min(lats)

print 'max lon: {}'.format(maxlon)
print 'min lon: {}'.format(minlon)
print 'max lat: {}'.format(maxlat)
print 'min lat: {}'.format(minlat)

print '\nSanity check:  The latitude of San Diego, CA, USA is 32.715736, and the longitude is -117.161087'

max lon: -116.831579
min lon: -117.281509
max lat: 33.145562
min lat: 32.535057

Sanity check:  The latitude of San Diego, CA, USA is 32.715736, and the longitude is -117.161087


In [11]:
print 'pre filter shape: {}'.format(chp_incident_df.shape)

chp_inc_lonlat = chp_incident_df.loc[(chp_incident_df['latitude']>=minlat) & 
                                     (chp_incident_df['latitude']<=maxlat) & 
                                     (chp_incident_df['longitude']>=minlon) & 
                                     (chp_incident_df['longitude']<=maxlon), :
                                    ]

print 'post filter shape: {}'.format(chp_inc_lonlat.shape)

pre filter shape: (215872, 20)
post filter shape: (17076, 20)


### Filter CHP incident data to appropriate date range

In [12]:
df = pd.read_csv("../waze_data.csv")
mindate = df.waze_timestamp.min()
maxdate = df.waze_timestamp.max()

print 'min waze_timestamp:'
print mindate

print '\nmax waze_timestamp:'
print maxdate

min waze_timestamp:
2017-02-08 16:12:54

max waze_timestamp:
2017-06-12 15:16:17


In [13]:
print 'pre filter shape: {}'.format(chp_inc_lonlat.shape)

chp_inc_lonlat.loc[:,'timestamp'] = pd.to_datetime(chp_inc_lonlat.loc[:,'timestamp']).values
chp_inc_lonlatdates = chp_inc_lonlat.loc[(chp_inc_lonlat['timestamp']>=mindate) & 
                                         (chp_inc_lonlat['timestamp']<=maxdate),:]

print 'post filter shape: {}'.format(chp_inc_lonlatdates.shape)

pre filter shape: (17076, 20)
post filter shape: (13929, 20)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


### Write resulting CHP incidents data to csv

In [14]:
chp_inc_lonlatdates.to_csv('../chp_incidents_for_postgis.csv', index=False)