# Traffic Incident Reports in San Francisco

Visualize traffic incident reports in San Francisco.

Data sources:

* [Police Department Incident Reports in San Francisco](https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783/data)
    - csv: https://data.sfgov.org/resource/wg3w-h783.csv
* [Traffic Signals in San Francisco](https://data.sfgov.org/Transportation/Traffic-Signals/ybh5-27n2)
    - csv: https://data.sfgov.org/resource/c8ue-f4py.csv
* [San Francisco Congestion Roads](https://cartovl.carto.com/dataset/sfcta_congestion_roads)
    - hosted in CARTO

In [1]:
from cartoframes.auth import set_default_credentials, Credentials
from cartoframes.viz import Map, Layer, Legend, Source
import pandas

> If you have a CARTO account, you can set your credentials in the following cell. This allows you to upload the dataset and share the final visualization through your account.

In [2]:
# username = '' # <-- insert your username here
# api_key = ''# <-- insert your API key here

# credentials = Credentials(username, api_key)
# set_default_credentials(credentials)

## Load incident reports

Using pandas, we can read an external data source, which is converted to a dataframe. Let's see which columns we have:

In [3]:
incident_reports_df = pandas.read_csv('http://data.sfgov.org/resource/wg3w-h783.csv')
incident_reports_df.head()

Unnamed: 0,incident_datetime,incident_date,incident_time,incident_year,incident_day_of_week,report_datetime,row_id,incident_id,incident_number,cad_number,...,point,:@computed_region_6qbp_sg9q,:@computed_region_qgnn_b9vv,:@computed_region_26cr_cadq,:@computed_region_ajp5_b2md,:@computed_region_nqbw_i6c3,:@computed_region_2dwj_jsy4,:@computed_region_h4ep_8xdi,:@computed_region_y6ts_4iup,:@computed_region_jg9y_a9du
0,2019-08-15T11:41:00.000,2019-08-15T00:00:00.000,11:41,2019,Thursday,2019-10-01T14:06:00.000,85424006374,854240,196208089,,...,,,,,,,,,,
1,2019-09-17T22:00:00.000,2019-09-17T00:00:00.000,22:00,2019,Tuesday,2019-10-02T22:01:00.000,85426606374,854266,196208205,,...,,,,,,,,,,
2,2019-10-04T14:25:00.000,2019-10-04T00:00:00.000,14:25,2019,Friday,2019-10-04T16:13:00.000,85442603474,854426,190746203,192772728.0,...,POINT (-122.51129492624534 37.77507596005672),8.0,8.0,4.0,29.0,,,,,
3,2019-10-03T19:30:00.000,2019-10-03T00:00:00.000,19:30,2019,Thursday,2019-10-03T23:25:00.000,85419706244,854197,190744514,192764437.0,...,POINT (-122.42746205880601 37.76877049785351),28.0,3.0,5.0,5.0,5.0,,,,
4,2019-10-04T16:53:00.000,2019-10-04T00:00:00.000,16:53,2019,Friday,2019-10-04T16:53:00.000,85446351040,854463,190746532,192772932.0,...,POINT (-122.5030864538133 37.781176766186576),6.0,8.0,4.0,29.0,,,,,


In [4]:
incident_reports_df.columns

Index(['incident_datetime', 'incident_date', 'incident_time', 'incident_year',
       'incident_day_of_week', 'report_datetime', 'row_id', 'incident_id',
       'incident_number', 'cad_number', 'report_type_code',
       'report_type_description', 'filed_online', 'incident_code',
       'incident_category', 'incident_subcategory', 'incident_description',
       'resolution', 'intersection', 'cnn', 'police_district',
       'analysis_neighborhood', 'supervisor_district', 'latitude', 'longitude',
       'point', ':@computed_region_6qbp_sg9q', ':@computed_region_qgnn_b9vv',
       ':@computed_region_26cr_cadq', ':@computed_region_ajp5_b2md',
       ':@computed_region_nqbw_i6c3', ':@computed_region_2dwj_jsy4',
       ':@computed_region_h4ep_8xdi', ':@computed_region_y6ts_4iup',
       ':@computed_region_jg9y_a9du'],
      dtype='object')

Some of the `latitude` and `longitude` values are `NaN`, in the next step we get rid of them. After that, we create a dataset from the dataframe and use it in a Layer to visualize the data:

In [5]:
incident_reports_df = incident_reports_df[incident_reports_df.longitude == incident_reports_df.longitude]
incident_reports_df = incident_reports_df[incident_reports_df.latitude == incident_reports_df.latitude]

Layer(incident_reports_df)

Now, we are going to use a **helper method** to color by category, and the category is 'Day of Week' (`incident_day_of_week`)

In [6]:
from cartoframes.viz.helpers import color_category_layer

color_category_layer(incident_reports_df, 'incident_day_of_week', 'Day of Week', top=7)

As we can see in the legend, the days are sorted by frequency, which means that there're less incidents on Thursdays and More on Tuesdays. Since our purpose is not to visualize the frequency and we want to see the days properly sorted from Monday to Sunday in the legend, we can modify the helper and set the categories we want to visualize in the desired position:

In [7]:
from cartoframes.viz.helpers import color_category_layer


color_category_layer(incident_reports_df, 'incident_day_of_week', 'Day of Week', cat=[
    'Monday',
    'Tuesday',
    'Wednesday',
    'Thursday',
    'Friday',
    'Saturday',
    'Sunday'
])

Now, we want to look for traffic incidents, and then use these categories to visualize those incidents:

In [8]:
incident_reports_df.incident_category.unique()

array(['Robbery', 'Larceny Theft', 'Non-Criminal', 'Other Miscellaneous',
       'Disorderly Conduct', 'Motor Vehicle Theft', 'Burglary',
       'Offences Against The Family And Children', 'Missing Person',
       'Malicious Mischief', 'Suspicious Occ', 'Lost Property', 'Assault',
       'Forgery And Counterfeiting', 'Miscellaneous Investigation',
       'Fraud', 'Other', 'Stolen Property', 'Recovered Vehicle', 'Arson',
       'Other Offenses', 'Vandalism', 'Embezzlement', 'Suicide',
       'Traffic Collision', 'Fire Report', 'Vehicle Misplaced',
       'Weapons Offense', 'Courtesy Report'], dtype=object)

In [9]:
from cartoframes.viz.helpers import size_category_layer

size_category_layer(
    incident_reports_df,
    'incident_category',
    'Traffic Incidents',
    cat=['Traffic Collision', 'Traffic Violation Arrest'])

In CARTO we have a dataset we can use for the next step, named 'sfcta_congestion_roads'. We are going to set the `Credentials` for this dataset. To have more control over this dataset, if you have a CARTO account you can import it to have everything together, and it won't be needed to create a different source for this Dataset.

Once we've the data source created, we're going to combine two helper methods. The first one uses the Source with the roads data from CARTO, and the second one the traffic incident reports.

In [10]:
from cartoframes.viz.helpers import color_continuous_layer

sfcta_congestion_roads_source=Source(
    'sfcta_congestion_roads',
    Credentials(
        base_url='https://cartovl.carto.com',
        api_key='default_public'
    )
)

Map([
    color_continuous_layer(sfcta_congestion_roads_source, 'auto_speed', 'Recorded vehicle speeds'),
    size_category_layer(
        incident_reports_df,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest'])
])

We are going to add information about traffic signals, by getting data from a different source:

In [11]:
traffic_signals_df = pandas.read_csv('http://data.sfgov.org/resource/c8ue-f4py.csv')
traffic_signals_df.head()

Unnamed: 0,objectid,cnn,code,cnn_1,street1,street2,street3,street4,detection,sup_dist,...,percent_po,point,point_address,point_city,point_state,point_zip,:@computed_region_6qbp_sg9q,:@computed_region_qgnn_b9vv,:@computed_region_26cr_cadq,:@computed_region_ajp5_b2md
0,16590,7210000,CALTRANS,7210000,ALEMANY,CUT THROUGH,,,,9,...,0,POINT (-122.40791773568166 37.73772950544402),,,,,83.0,2.0,2,25.0
1,15731,24440000,Fix,24440000,GOLDEN GATE,LARKIN,,,,6,...,0,POINT (-122.41714561229823 37.78144749458223),,,,,21.0,5.0,10,36.0
2,16839,32891000,CALTRANS,32891000,36TH AVE,SLOAT,,,,47,...,0,POINT (-122.49334544424045 37.734008530048456),,,,,40.0,10.0,8,35.0
3,16805,20872000,Beacon,20872000,JOHN F SHELLEY WEST,MANSELL,,,,10,...,0,POINT (-122.41851237031268 37.71876609066602),,,,,73.0,9.0,9,19.0
4,16804,33338000,Beacon,33338000,BRAZIL,MANSELL,,,,10,...,0,POINT (-122.42247298832734 37.71788737925598),,,,,73.0,9.0,2,19.0


In [12]:
traffic_signals_df.columns

Index(['objectid', 'cnn', 'code', 'cnn_1', 'street1', 'street2', 'street3',
       'street4', 'detection', 'sup_dist', 'veh_actuat', 'aps', 'ped_signal',
       'ped_actuat', 'tbc', 'preempt_pr', 'd_ate2070', 'project_ne',
       'project_ol', 'upgraded', 'yr_of_cont', 'last_upgra', 'new_signal',
       'mod_projec', 'full_upgra', 'beacon_fla', 'funding', 'rlcam',
       'startyear', 'caltrans_r', 'caltrans', 'percent_c', 'sf', 'percent_sf',
       'percent_po', 'point', 'point_address', 'point_city', 'point_state',
       'point_zip', ':@computed_region_6qbp_sg9q',
       ':@computed_region_qgnn_b9vv', ':@computed_region_26cr_cadq',
       ':@computed_region_ajp5_b2md'],
      dtype='object')

In [13]:
traffic_signals_df.code.unique()

array(['CALTRANS', 'Fix', 'Beacon', 'Actuated', 'FUTURE', 'Preempt',
       'Actuated&Preempt', 'FLASHER', 'BEACON', 'FIX', 'SPEED RADAR SIGN',
       'RLC', 'RADAR SPEED SIGN', 'BEACON/SOLAR', 'Caltrans HAWK',
       'Actuated&RLC', 'Speed Radar', 'DALY CITY', 'LANE CONTROL',
       'Preempt&RLC', 'MESSAGE SIGN', 'MASTER', 'Future',
       'LIGHTED CROSSWALK'], dtype=object)

Since there is no `latitude` and `longitude` columns, we can use the `point` column to create a [GeoDataFrame](https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html).

In [14]:
import geopandas
from shapely import wkt

traffic_signals_df['point'] = traffic_signals_df['point'].apply(wkt.loads)
traffic_signals_df = traffic_signals_df.rename(columns={'point': 'geometry'}).set_geometry('geometry')
trafic_signals_gdf = geopandas.GeoDataFrame(traffic_signals_df, geometry='geometry')

In [15]:
Map(Layer(trafic_signals_gdf))

In [16]:
from cartoframes.viz.helpers import color_category_layer


signal_gdf = trafic_signals_gdf[trafic_signals_gdf['code'].isin(['RADAR SPEED SIGN', 'FLASHER',  'LIGHTED CROSSWALK'])]

color_category_layer(signal_gdf, 'code', palette='bold', title='Radar')

All together:

In [17]:
Map([
    color_continuous_layer(
        sfcta_congestion_roads_source, 'auto_speed', 'Recorded vehicle speeds'),
    size_category_layer(
        incident_reports_df, 
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest']),
    color_category_layer(signal_gdf, 'code', palette='bold', title='Radar', opacity='0.5')
])