# Extracting Data from HereMaps (here.com)
<br>

_Author: Bala Krishnamoorthy_

### Overview <br>

[HereMaps](https://www.here.com/) is a location and mapping services company. They offer a product similar to Google Maps or Waze. Along with other services, their API provides real-time information on road closures given a specified region. This notebook explains the steps taken to collect and clean their data for use in our project. <br> <br>

Throughout this notebook, you will find my comments in _markdown_ and `#comment` format.

### Contents 
- [Data Gathering](#Data-Gathering)
- [Data Cleaning](#Data-Cleaning)
- [Outputs](#Outputs)
- [Future Updates](#Future-Updates)

### Data Gathering

In [9]:
# Import Libraries

import json, requests
import time, datetime
import pandas as pd

# Viewing configs for notebook
%matplotlib inline
%config InlineBackend.print_figure_kwargs={'facecolor' : 'w'}
# Code below allows multiple console outputs to be generated without print statements
from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = "all"

_The import statement below imports my API credentials stored locally. These are **not** publically available. To use the API request below, please create your own access credentials on [developer.here.com](https://www.here.com/) and execute the code below._

In [10]:
import config

Specify Bounding Box Coordinates (latitude, longitude) for region of interest:

In [11]:
# Bounding Box Coordinates - test case for larger area

city = 'Boston, MA' 
# Using Twitter's assumption for bounding box around Boston. Since we are pulling data from 
# Twitter and HereMaps, we have used the same bounding box.
top_left = '30.1546646,-95.823268'
bottom_right = '29.522325,-95.069705'

In [12]:
url = 'https://traffic.api.here.com/traffic/6.3/\
incidents.json?app_id=' + config.app_id + '&app_code=' + config.app_code + '&bbox=' + \
top_left + ';' + bottom_right + '&criticality=critical'
url;

# Criticality = "critical" indicates that only road closures are requested from the 
# HereMaps API.

AttributeError: module 'config' has no attribute 'app_id'

In [30]:
# Request for data from HereMaps 
res = requests.get(url)
date_requested = time.strftime('%Y-%m-%d')
time_requested = time.strftime('%H:%M') # 24-hour format
date_requested; time_requested 

'2019-01-17'

'11:54'

In [31]:
# Check to ensure the API is not returning errors.
res.status_code

200

In [32]:
hmaps_json = res.json()
hmaps_json;

### Data Cleaning

In the steps below, I dig through the raw json file given by the API to isolate the columns of interest.

In [33]:
# Inspect json file

# hmaps_json.keys()
# hmaps_json['TRAFFIC_ITEMS'].keys()
# hmaps_json['diagnostic'].keys()
hmaps_json['TRAFFIC_ITEMS']['TRAFFIC_ITEM'];

In [34]:
incident_dict = hmaps_json['TRAFFIC_ITEMS']['TRAFFIC_ITEM']
incident_dict[0].keys()

dict_keys(['TRAFFIC_ITEM_ID', 'ORIGINAL_TRAFFIC_ITEM_ID', 'TRAFFIC_ITEM_STATUS_SHORT_DESC', 'TRAFFIC_ITEM_TYPE_DESC', 'START_TIME', 'END_TIME', 'ENTRY_TIME', 'CRITICALITY', 'VERIFIED', 'ABBREVIATION', 'COMMENTS', 'RDS-TMC_LOCATIONS', 'LOCATION', 'TRAFFIC_ITEM_DETAIL', 'TRAFFIC_ITEM_DESCRIPTION', 'mid', 'PRODUCT'])

In [38]:
# Convert json to dataframe for ease of manipulation
incident_df = pd.DataFrame(incident_dict)
incident_df.head()

Unnamed: 0,ABBREVIATION,COMMENTS,CRITICALITY,END_TIME,ENTRY_TIME,LOCATION,ORIGINAL_TRAFFIC_ITEM_ID,PRODUCT,RDS-TMC_LOCATIONS,START_TIME,TRAFFIC_ITEM_DESCRIPTION,TRAFFIC_ITEM_DETAIL,TRAFFIC_ITEM_ID,TRAFFIC_ITEM_STATUS_SHORT_DESC,TRAFFIC_ITEM_TYPE_DESC,VERIFIED,mid
0,"{'SHORT_DESC': 'ACC', 'DESCRIPTION': ''}",use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/17/2019 20:12:10,01/17/2019 19:44:58,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,1615189557195804916,basic,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/17/2019 19:44:58,[{'value': 'Closed at Beltway 8/E Sam Houston ...,"{'ROAD_CLOSED': True, 'INCIDENT': {'RESPONSE_V...",3186740462135138429,ACTIVE,ACCIDENT,True,NAVTEQ/r_NAVTEQ/11463647_TIC/NAVTEQ_original|1...
1,"{'SHORT_DESC': 'CONST', 'DESCRIPTION': 'constr...",use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/18/2019 00:02:45,01/17/2019 19:03:41,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,980680438302553311,basic,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/17/2019 19:03:41,[{'value': 'Ramp closed to Scott St/Exit 45 -...,"{'ROAD_CLOSED': True, 'EVENT': {'EVENT_ITEM_CA...",3146664517070910659,ACTIVE,CONSTRUCTION,True,NAVTEQ/r_NAVTEQ/11463409_TIC/NAVTEQ|1547751779544
2,"{'SHORT_DESC': 'CONST', 'DESCRIPTION': 'constr...",use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/18/2019 00:19:40,01/17/2019 19:20:14,"{'INTERSECTION': {'ORIGIN': {'ID': '', 'STREET...",747299825456782738,basic,,01/17/2019 19:20:14,[{'value': 'Closed at Huffmeister Rd - Closed ...,"{'ROAD_CLOSED': True, 'EVENT': {'EVENT_ITEM_CA...",2115865119138124808,ACTIVE,CONSTRUCTION,True,NAVTEQ/r_NAVTEQ/11421724_TIC/NAVTEQ|1547752797906
3,"{'SHORT_DESC': 'ACC', 'DESCRIPTION': ''}",use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/17/2019 20:12:10,01/17/2019 19:44:58,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,4470494351845372479,basic,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/17/2019 19:44:58,[{'value': 'Closed at Beltway 8/E Sam Houston ...,"{'ROAD_CLOSED': True, 'INCIDENT': {'RESPONSE_V...",676703443217250929,ACTIVE,ACCIDENT,True,NAVTEQ/r_NAVTEQ/11463647_TIC/NAVTEQ_reverse|15...
4,"{'SHORT_DESC': 'CONST', 'DESCRIPTION': 'constr...",,"{'ID': '0', 'DESCRIPTION': 'critical'}",02/27/2019 01:09:50,01/17/2019 19:20:14,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,3286988449189628489,basic,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/12/2019 23:10:07,[{'value': 'Closed between I-45 and Woodridge ...,"{'ROAD_CLOSED': True, 'EVENT': {'EVENT_ITEM_CA...",3913279025427253470,ACTIVE,CONSTRUCTION,True,a7bb77b4-8449-4f2f-a47e-1826f28f208b|154775281...


In [39]:
# Number of total incidents in specified bounding box
num_incidents = incident_df.shape[0]
num_incidents

9

In [47]:
# Remove unnecessary columns and reform dataframe
incident_df_2 = incident_df.drop(axis=1, columns=['ABBREVIATION', 'ORIGINAL_TRAFFIC_ITEM_ID', 'PRODUCT', 
                                  'TRAFFIC_ITEM_ID', 'mid'])
incident_df_2.rename(str.lower, axis=1, inplace=True)
incident_df_2.head()

Unnamed: 0,comments,criticality,end_time,entry_time,location,rds-tmc_locations,start_time,traffic_item_description,traffic_item_detail,traffic_item_status_short_desc,traffic_item_type_desc,verified
0,use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/17/2019 20:12:10,01/17/2019 19:44:58,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/17/2019 19:44:58,[{'value': 'Closed at Beltway 8/E Sam Houston ...,"{'ROAD_CLOSED': True, 'INCIDENT': {'RESPONSE_V...",ACTIVE,ACCIDENT,True
1,use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/18/2019 00:02:45,01/17/2019 19:03:41,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/17/2019 19:03:41,[{'value': 'Ramp closed to Scott St/Exit 45 -...,"{'ROAD_CLOSED': True, 'EVENT': {'EVENT_ITEM_CA...",ACTIVE,CONSTRUCTION,True
2,use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/18/2019 00:19:40,01/17/2019 19:20:14,"{'INTERSECTION': {'ORIGIN': {'ID': '', 'STREET...",,01/17/2019 19:20:14,[{'value': 'Closed at Huffmeister Rd - Closed ...,"{'ROAD_CLOSED': True, 'EVENT': {'EVENT_ITEM_CA...",ACTIVE,CONSTRUCTION,True
3,use alternate route,"{'ID': '0', 'DESCRIPTION': 'critical'}",01/17/2019 20:12:10,01/17/2019 19:44:58,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/17/2019 19:44:58,[{'value': 'Closed at Beltway 8/E Sam Houston ...,"{'ROAD_CLOSED': True, 'INCIDENT': {'RESPONSE_V...",ACTIVE,ACCIDENT,True
4,,"{'ID': '0', 'DESCRIPTION': 'critical'}",02/27/2019 01:09:50,01/17/2019 19:20:14,{'DEFINED': {'ORIGIN': {'ROADWAY': {'DESCRIPTI...,{'RDS-TMC': [{'ORIGIN': {'EBU_COUNTRY_CODE': '...,01/12/2019 23:10:07,[{'value': 'Closed between I-45 and Woodridge ...,"{'ROAD_CLOSED': True, 'EVENT': {'EVENT_ITEM_CA...",ACTIVE,CONSTRUCTION,True


In [48]:
incident_df_2.columns

Index(['comments', 'criticality', 'end_time', 'entry_time', 'location',
       'rds-tmc_locations', 'start_time', 'traffic_item_description',
       'traffic_item_detail', 'traffic_item_status_short_desc',
       'traffic_item_type_desc', 'verified'],
      dtype='object')

In [49]:
# Create a dataframe for all location information
loc_all = pd.DataFrame(list(incident_df_2['location']))
loc_all.rename(str.lower, axis=1, inplace=True)
loc_all.head()

Unnamed: 0,defined,geoloc,intersection,length,navtech,political_boundary,tpegopenlrbase64
0,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.6307, 'LONGITUDE': ...",,0.20939,"{'EDGE': {'EDGE_ID': ['18019588', '837187655',...",,CCkBEAAlJLxTDxUSGwAJBQQDAr0ACgUEA4JRAP6m//AACQ...
1,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.7369, 'LONGITUDE': ...",,0.05927,"{'EDGE': {'EDGE_ID': ['122928090']}, 'VERSION_...",,CCgBEAAkI7wyTRUlcAAJBQQDBssACgQDA18A/6EAFwAJBQ...
2,,"{'ORIGIN': {'LATITUDE': 29.92321, 'LONGITUDE':...","{'ORIGIN': {'ID': '', 'STREET1': {'ADDRESS1': ...",0.07148,"{'EDGE': {'EDGE_ID': ['122936232', '1053328748...","{'METRO_AREA': {'value': '', 'ID': 5}, 'COUNTY...",CCgBEAAkI7v/PhVHWwAJBQQDAzcACgQDA3MAAA0ARwAJBQ...
3,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.630513, 'LONGITUDE'...",,0.14354,"{'EDGE': {'EDGE_ID': ['758719041', '758719040'...",,CCkBEAAlJLxSbhUSEwAJBQQDAkYACgUEA4FoAADt//YACQ...
4,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.69674, 'LONGITUDE':...",,0.35126,"{'EDGE': {'EDGE_ID': ['17881597', '17881601', ...",,CCkBEAAlJLw8kBUeIAAJBQQBA7QACgUEAYQ0AP3V/68ACQ...


In [50]:
# Fill NaN Values with empty dictionaries so that dataframes can continue to be created
loc_all = loc_all.applymap(lambda x: {} if pd.isnull(x) else x)
loc_all.head()

Unnamed: 0,defined,geoloc,intersection,length,navtech,political_boundary,tpegopenlrbase64
0,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.6307, 'LONGITUDE': ...",{},0.20939,"{'EDGE': {'EDGE_ID': ['18019588', '837187655',...",{},CCkBEAAlJLxTDxUSGwAJBQQDAr0ACgUEA4JRAP6m//AACQ...
1,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.7369, 'LONGITUDE': ...",{},0.05927,"{'EDGE': {'EDGE_ID': ['122928090']}, 'VERSION_...",{},CCgBEAAkI7wyTRUlcAAJBQQDBssACgQDA18A/6EAFwAJBQ...
2,{},"{'ORIGIN': {'LATITUDE': 29.92321, 'LONGITUDE':...","{'ORIGIN': {'ID': '', 'STREET1': {'ADDRESS1': ...",0.07148,"{'EDGE': {'EDGE_ID': ['122936232', '1053328748...","{'METRO_AREA': {'value': '', 'ID': 5}, 'COUNTY...",CCgBEAAkI7v/PhVHWwAJBQQDAzcACgQDA3MAAA0ARwAJBQ...
3,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.630513, 'LONGITUDE'...",{},0.14354,"{'EDGE': {'EDGE_ID': ['758719041', '758719040'...",{},CCkBEAAlJLxSbhUSEwAJBQQDAkYACgUEA4FoAADt//YACQ...
4,{'ORIGIN': {'ROADWAY': {'DESCRIPTION': [{'valu...,"{'ORIGIN': {'LATITUDE': 29.69674, 'LONGITUDE':...",{},0.35126,"{'EDGE': {'EDGE_ID': ['17881597', '17881601', ...",{},CCkBEAAlJLw8kBUeIAAJBQQBA7QACgUEAYQ0AP3V/68ACQ...


In [51]:
# Create a dataframe for only geographical labels (could be used in the final dataframe)
loc_region = pd.DataFrame(list(loc_all['political_boundary']))
loc_region.rename(str.lower, axis=1, inplace=True)
loc_region.head()

Unnamed: 0,county,metro_area
0,,
1,,
2,Harris,"{'value': '', 'ID': 5}"
3,,
4,,


In [52]:
# Create a dataframe for only intersections. NaNs indicate no value was provided by HereMaps.
loc_intersec = pd.DataFrame(list(loc_all['intersection']))
loc_intersec.rename(str.lower, axis=1, inplace=True)
loc_intersec.head()

Unnamed: 0,origin,to
0,,
1,,
2,"{'ID': '', 'STREET1': {'ADDRESS1': 'Highway 29...","{'ID': '', 'STREET1': {'ADDRESS1': 'Highway 29..."
3,,
4,,


In [53]:
# Create a dataframe for the start/end points of each incident. This is important! These 
# geocoordinates will eventually feed into our mapping tool.
loc_coord = pd.DataFrame(list(loc_all['geoloc']))
loc_coord.rename(str.lower, axis=1, inplace=True)
loc_coord.head()

Unnamed: 0,geometry,origin,to
0,,"{'LATITUDE': 29.6307, 'LONGITUDE': -95.16875}","[{'LATITUDE': 29.630614, 'LONGITUDE': -95.1718..."
1,,"{'LATITUDE': 29.7369, 'LONGITUDE': -95.34869}","[{'LATITUDE': 29.73713, 'LONGITUDE': -95.34964}]"
2,"{'SHAPES': {'SHP': [{'value': '29.92321,-95.62...","{'LATITUDE': 29.92321, 'LONGITUDE': -95.62916}","[{'LATITUDE': 29.92392, 'LONGITUDE': -95.62903}]"
3,,"{'LATITUDE': 29.630513, 'LONGITUDE': -95.171997}","[{'LATITUDE': 29.63044, 'LONGITUDE': -95.16984}]"
4,,"{'LATITUDE': 29.69674, 'LONGITUDE': -95.29231}","[{'LATITUDE': 29.69593, 'LONGITUDE': -95.29787}]"


### Outputs

In [66]:
# Instantiate an empty dataframe that captures only the columns of interest
columns = ['city', 'source', 'criticality', 'intersection', 'start_lat_long', 
           'end_lat_long', 'start_time', 'end_time', 'entry_time', 'bbox_top_left', 
           'bbox_bottom_right', 'date_requested', 'time_requested']
output_df = pd.DataFrame(index=incident_df_2.index, columns=columns)
output_df.fillna('', inplace=True)
output_df.loc[:, 'city'] = city

In [67]:
output_df.head()

Unnamed: 0,city,source,criticality,intersection,start_lat_long,end_lat_long,start_time,end_time,entry_time,bbox_top_left,bbox_bottom_right,date_requested,time_requested
0,"Houston, TX",,,,,,,,,,,,
1,"Houston, TX",,,,,,,,,,,,
2,"Houston, TX",,,,,,,,,,,,
3,"Houston, TX",,,,,,,,,,,,
4,"Houston, TX",,,,,,,,,,,,


**Populate the Outputs dataframe using data from HereMaps:**

In [68]:
for index in incident_df_2.index:
    # Incident-Agnostic Parameters
    output_df['bbox_top_left'][index] = str(tuple(top_left.split(',')))
    output_df['bbox_bottom_right'][index] = str(tuple(bottom_right.split(',')))
    output_df['date_requested'][index] = date_requested
    output_df['time_requested'][index] = time_requested
    output_df['source'][index] = 'Here Maps API'
    # Incident-Specific Parameters
    if pd.isna(incident_df_2['criticality'][index]) == False:
        output_df['criticality'][index] = incident_df_2['criticality'][index]['DESCRIPTION']
    else:
        output_df['criticality'][index] = '-'
    if pd.isna(loc_coord['origin'][index]) == False:
        output_df['start_lat_long'][index] = tuple([loc_coord['origin'][index]['LATITUDE'],
                                   loc_coord['origin'][index]['LONGITUDE']])
    else: 
        output_df['start_lat_long'][index] = '-'
    if pd.isna(loc_coord['to'][index][0]) == False:
        output_df['end_lat_long'][index] = tuple([loc_coord['to'][index][0]['LATITUDE'],
                                 loc_coord['to'][index][0]['LONGITUDE']])
    else:
        output_df['end_lat_long'][index] = '-'
    output_df['intersection'][index] = loc_all['intersection'][index]
    output_df['start_time'][index] = incident_df_2['start_time'][index]
    output_df['end_time'][index] = incident_df_2['end_time'][index]
    output_df['entry_time'][index] = incident_df_2['entry_time'][index]                           

In [72]:
# Final Output DataFrame
output_df.head()

Unnamed: 0,city,source,criticality,intersection,start_lat_long,end_lat_long,start_time,end_time,entry_time,bbox_top_left,bbox_bottom_right,date_requested,time_requested
0,"Houston, TX",Here Maps API,critical,{},"(29.6307, -95.16875)","(29.630614, -95.171866)",01/17/2019 19:44:58,01/17/2019 20:12:10,01/17/2019 19:44:58,"('30.1546646', '-95.823268')","('29.522325', '-95.069705')",2019-01-17,11:54
1,"Houston, TX",Here Maps API,critical,{},"(29.7369, -95.34869)","(29.73713, -95.34964)",01/17/2019 19:03:41,01/18/2019 00:02:45,01/17/2019 19:03:41,"('30.1546646', '-95.823268')","('29.522325', '-95.069705')",2019-01-17,11:54
2,"Houston, TX",Here Maps API,critical,"{'ORIGIN': {'ID': '', 'STREET1': {'ADDRESS1': ...","(29.92321, -95.62916)","(29.92392, -95.62903)",01/17/2019 19:20:14,01/18/2019 00:19:40,01/17/2019 19:20:14,"('30.1546646', '-95.823268')","('29.522325', '-95.069705')",2019-01-17,11:54
3,"Houston, TX",Here Maps API,critical,{},"(29.630513, -95.171997)","(29.63044, -95.16984)",01/17/2019 19:44:58,01/17/2019 20:12:10,01/17/2019 19:44:58,"('30.1546646', '-95.823268')","('29.522325', '-95.069705')",2019-01-17,11:54
4,"Houston, TX",Here Maps API,critical,{},"(29.69674, -95.29231)","(29.69593, -95.29787)",01/12/2019 23:10:07,02/27/2019 01:09:50,01/17/2019 19:20:14,"('30.1546646', '-95.823268')","('29.522325', '-95.069705')",2019-01-17,11:54


In [71]:
# Write Output DataFrame to csv file
output_df.to_csv('../data/2-interim/here_maps_output_' + \
                 city + '_' + time.strftime('%Y-%m-%d-%I%p'), index=False)

### Future Updates

- Additional columns:
    - Type of incident (planned, unplanned (blocked by tree, power failure, etc.))
- Add check to ensure that start / end coordinates for each incident are always different to 4 d.p. (should be few cases when they may not be). 