In [188]:
import pandas as pd
import numpy as np

import requests
import arventoapi

from statistics import median, mean
import re
from time import sleep

import folium

from sqlalchemy import create_engine
from sqlalchemy.types import Integer, Text, String, DateTime, Float, JSON, TIMESTAMP

In [18]:
mapbox_access_token = 'pk.eyJ1IjoicnByaWxpYW4iLCJhIjoiY2tiaHF0OW1rMDd4YjJ0bnp3aWo2cmhveiJ9.HnkKEamPoKRtm3Bp1WxeRg'

In [189]:
engine = create_engine('postgres+psycopg2://jcds:pwdk2020@127.0.0.1:5432/gpstrajectory')

Load Dataset from previous step

In [190]:
df_trip1 = pd.read_sql('trip_train_before_mm_1', engine)

## Spatial Feature Extraction: Map Matching

GPS Receiver works by listening to Satellites signal and then try to calculate its position by measuring time interval between signals. This is often called triangulation

But due to the complexity of the earth shape (which is round obviously, i don't want to argue about this) but not exactly perfect round. There could be some error in recording position. There is another factor where the gps receiver can lose its accuracy whenever it is located near tall building or covered areas

<img src="dashboard/static/gps-noise.png" alt="gps-noise" width="50%"/>

In order to correctly asses the spatial data, we need to perform **Map Matching**

A key component of Newson and Krumm’s approach are so called Hidden-Markov-Models which are based on Markov Chains. Markov Chains are popular as many random processes can be modeled with them. Given a set of states and a function that outputs the probability of transitioning from one state to another state it yields a mathematical model of the process. A typical question a Markov-Chain helps answer is: Given state A, how probable is state B after 2 steps?

[source](https://blog.mapbox.com/matching-gps-traces-to-a-map-73730197d0e2)

In the illustration below, by comparing Line coordinates of the road, the algorithm will snap P1 point closer to the road. However P3 is ambiguous whether it's following path B or C. But it can be traced with next trajectory point. If it's not available then P3 and P2 can be considered as noise (or outlier)

<img src="dashboard/static/map-matching.png" alt="gps-noise" width="50%"/>

And then, the next step would be route reconstruction to fix sparse trajectory due to recording interval by matching points to road segment. This is important to fix actual distance counting

<img src="dashboard/static/Route Reconstruction.png" alt="route-reconstruction" width="50%"/>

[MapBox](https://www.mapbox.com/) is the location data platform for mobile and web applications. They provide building blocks to add location features like maps, search, and navigation into any developer project. It is based on OpenStreetMap data and by it's nature of embracing Open Source, MapBox provide MapMatching API for free (with call limit)

Other notable projects: 

- OSRM
- Valhalla (by mapzen, it has been discontinued)
- Graphopper
- Google Maps
- Here


## Map Matching Example

**BLUE** line is original gps points received from hardware

**RED** line is simplified gps trajectory after map-matching

** zoom in for better details

After map-matching, with help of routing services. We can also extract data such as intersections count, how many turn left or right, how many trip segments are highway and not highway, etc.

However map information in Indonesia for OpenStreetMap (which is what MapBox use as base map) is not as much compared to other countries in North America and Europe. Perhaps if more use case presented in our country, we can gather much more spatial data like type of road surfaces, lane counts, traffic

In [184]:
df_test

Unnamed: 0,device_id,license_plate,driver,vehicle_group,departure_time,arrival_time,distance,interval,origin_region,destination_region,...,n_steps,n_left_turns,n_right_turns,n_u_turns,n_go_straight,matched_trajectory,repeat_mapmatch,n_tunnels,matched_distance,mapbox_est_duration
0,1021252,B9230SDC,SUHARYA,DC Ciputat,2020-04-01 09:16:24+07:00,2020-04-01 11:52:37+07:00,26.9,9373.0,DC Ciputat,DC Ciputat,...,49,18,12,0,4,"[[106.775084, -6.268343], [106.775367, -6.2714...",0,0,27.2884,4719.4


In [185]:
m = folium.Map(location=[-6.267437514154206, 106.79388999938965], zoom_start=14)

original_trajectory = [ [float(t.split(',')[1]),float(t.split(',')[0])] for t in df_test.iloc[0]['trajectory_arr']]
matched_trajectory = [[t[1],t[0]] for t in df_test.iloc[0]['matched_trajectory']]

folium.PolyLine(locations=original_trajectory, color="blue", weight=3, opacity=.7, tooltip='original trajectory').add_to(m)
folium.PolyLine(locations=matched_trajectory, color="red", weight=3, opacity=.7, tooltip='matched trajectory').add_to(m)

for i, each in zip(range(len(matched_trajectory)), matched_trajectory): 
    if i == 0:
        folium.Marker(each, icon=folium.Icon(color='red'), tooltip='origin').add_to(m)
    elif i == len(matched_trajectory) - 1:
        folium.Marker(each, icon=folium.Icon(color='green'), tooltip='destination').add_to(m)
    else:
        folium.Marker(each, icon=folium.Icon(color='blue')).add_to(m)

m

We also reduce the number of points by map-matching.

In [159]:
len(df_test.iloc[0]['trajectory_arr'])

392

In [177]:
len(df_test.iloc[0]['matched_trajectory'])

93

### Map Matching using MapBox MapMatching API

In [241]:
re_right = re.compile(r"\bright\b", flags=re.I)
re_left = re.compile(r"\bleft\b", flags=re.I)
re_uturn = re.compile(r"\buturn\b", flags=re.I)
re_straight = re.compile(r"\bstraight\b", flags=re.I)

def mapbox_request(coordinates, timestamps, row_idx, use_timestamp=False ):
    data_ = {
             'coordinates': ';'.join(coordinates),
             'geometries': 'geojson',
             'steps': 'true',
             'tidy': 'true',
             'annotations': 'duration,distance,speed',
             'waypoints': '{};{}'.format(0, len(coordinates) - 1),
             'waypoint_names': 'origin;destination'
         }
    
    if (use_timestamp):
        data_['timestamps'] = ';'.join([str(int(t)) for t in timestamps])
    
    
    # continue the loop if request error. 
    try:
        res = requests.post('https://api.mapbox.com/matching/v5/mapbox/driving-traffic/?access_token={}'.format(mapbox_access_token),
                 data = data_)
    except requests.exceptions.RequestException as e:
        return {
            'error': True
        }
    
    # in case parsing json failed or invalid server response format
    try:
        direction_res = res.json()
    except:
        return {
            'error': True
        }
    
    
    n_intersections = 0
    n_tolls = 0
    n_bridges = 0
    n_tunnels = 0
    n_motorways = 0
    
    n_steps = 0
    n_left_turns = 0
    n_right_turns = 0
    n_u_turns = 0
    n_go_straight = 0
    matched_distance = 0
    est_duration = 0
    
    geometry = []
    
    if 'matchings' in direction_res:
    
        for match in direction_res['matchings']:

            if 'distance' in match:
                matched_distance += match['distance']
            
            if 'geometry' in match:
                geometry = match['geometry']['coordinates']
                
            if 'duration' in match:
                est_duration += match['duration']
            
            for leg in match['legs']:
                n_steps += len(leg['steps'])

                for step in leg['steps']:
                    if 'intersections' in step:

                        for intersection in step['intersections']:
                            n_intersections +=1
                            if 'classes' in intersection:
                                if 'toll' in intersection['classes']:
                                    n_tolls += 1
                                if 'bridge' in intersection['classes']:
                                    n_bridges += 1
                                if 'tunnel' in intersection['classes']:
                                    n_tunnels += 1
                                if 'motorway' in intersection['classes']:
                                    n_motorways += 1
                    
                    if 'modifier' in step['maneuver']:
                        if re.match(re_uturn, step['maneuver']['modifier']) is not None:
                            n_u_turns += 1
                        if re.match(re_left, step['maneuver']['modifier']) is not None:
                            n_left_turns += 1
                        if re.match(re_right, step['maneuver']['modifier']) is not None:
                            n_right_turns += 1
                        if re.match(re_straight, step['maneuver']['modifier']) is not None:
                            n_go_straight += 1
                    
                    
        return {
            'error': False,
            'n_intersections': n_intersections,
            'n_tolls': n_tolls,
            'n_bridges': n_bridges,
            'n_tunnels': n_tunnels,
            'n_motorways': n_motorways,
            'n_steps': n_steps,
            'n_left_turns': n_left_turns,
            'n_right_turns': n_right_turns,
            'n_u_turns': n_u_turns,
            'n_go_straight': n_go_straight,
            'matched_distance': matched_distance,
            'mapbox_est_duration': est_duration,
            'geometry': geometry
        }
    
    else:
        print(row_idx, direction_res)
        return {
            'error': True
        }

    

In [242]:
def map_match(x):
    

    
    if (len(x['trajectory_arr']) > 100):
        # split array to multiple sub array max 100 item each
        # free mapbox account only limit 100 trajectory at one request     
        n = 100
        trajectory_arr_match = [x['trajectory_arr'][i * n:(i + 1) * n] for i in range((len(x['trajectory_arr']) + n - 1) // n )]
        timestamps_match = [x['timestamps'][i * n:(i + 1) * n] for i in range((len(x['timestamps']) + n - 1) // n )]
    else:
        trajectory_arr_match = [x['trajectory_arr']]
        timestamps_match = [x['timestamps']]
        
    
    n_intersections = 0
    n_tolls = 0
    n_bridges = 0
    n_tunnels = 0
    n_motorways = 0
    
    n_steps = 0
    n_left_turns = 0
    n_right_turns = 0
    n_u_turns = 0
    n_go_straight = 0
    matched_distance = 0
    est_duration = 0
    
    geometry = []
    
    error = False
    
    for trajs, times in zip(trajectory_arr_match,timestamps_match):
        if (len(trajs) > 1): 
            match_ = mapbox_request(trajs, times, x.name)

            if (not match_['error']):
                
                # aggregate multiple calls into one data
                n_intersections += match_['n_intersections']
                n_tolls += match_['n_tolls']
                n_bridges += match_['n_bridges']
                n_tunnels += match_['n_tunnels']
                n_motorways += match_['n_motorways']
                n_steps += match_['n_steps']
                n_left_turns += match_['n_left_turns']
                n_right_turns += match_['n_right_turns']
                n_u_turns += match_['n_u_turns']
                n_go_straight += match_['n_go_straight']
                geometry += match_['geometry']
                matched_distance += match_['matched_distance']
                est_duration += match_['mapbox_est_duration']
            else:
                error = True
                
#             sleep(0.2)
         
    x['n_intersections'] = n_intersections
    x['n_tolls'] = n_tolls
    x['n_bridges'] = n_bridges
    x['n_tunnels'] = n_tunnels
    x['n_motorways'] = n_motorways
    
    x['n_steps'] = n_steps
    x['n_left_turns'] = n_left_turns
    x['n_right_turns'] = n_right_turns
    x['n_u_turns'] = n_u_turns
    x['n_go_straight'] = n_go_straight
    x['matched_distance'] = matched_distance / 1000 # in km
    x['mapbox_est_duration'] = est_duration
    
    
    x['matched_trajectory'] = geometry
    
    x['repeat_mapmatch'] = 1 if error else 0
    
    print('finished processing row {}'.format(x.name))
    
    return x
        

In [253]:
for i, row in df_trip1[df_trip1['repeat_mapmatch'] == 1].iterrows():
    ser = map_match(row)
    df_trip1.at[i, 'n_intersections'] = ser['n_intersections']
    df_trip1.at[i, 'n_tolls'] = ser['n_tolls']
    df_trip1.at[i, 'n_bridges'] = ser['n_bridges']
    df_trip1.at[i, 'n_tunnels'] = ser['n_tunnels']
    df_trip1.at[i, 'n_motorways'] = ser['n_motorways']
    df_trip1.at[i, 'n_steps'] = ser['n_steps']
    df_trip1.at[i, 'n_left_turns'] = ser['n_left_turns']
    df_trip1.at[i, 'n_right_turns'] = ser['n_right_turns']
    df_trip1.at[i, 'n_u_turns'] = ser['n_u_turns']
    df_trip1.at[i, 'n_go_straight'] = ser['n_go_straight']
    df_trip1.at[i, 'matched_distance'] = ser['matched_distance']
    df_trip1.at[i, 'mapbox_est_duration'] = ser['mapbox_est_duration']
    df_trip1.at[i, 'matched_trajectory'] = ser['matched_trajectory']
    df_trip1.at[i, 'repeat_mapmatch'] = ser['repeat_mapmatch']

3431 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing row 3431
5703 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing row 5703
5811 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing row 5811
8053 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing row 8053
8401 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing row 8401
8417 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing row 8417
8423 {'message': 'Subsequent coordinates may at most be 5000 meters away from each other', 'code': 'InvalidInput'}
finished processing r

Some row has been marked invalid due to anomaly in data trajectories. I decided to remove them from the train data

In [264]:
df_trip1.loc[[3431,5703,5811,8053,8401,8417,8423]]

Unnamed: 0,index,device_id,license_plate,driver,vehicle_group,departure_time,arrival_time,distance,interval,origin_region,...,n_motorways,n_steps,n_left_turns,n_right_turns,n_u_turns,n_go_straight,matched_distance,mapbox_est_duration,matched_trajectory,repeat_mapmatch
3431,3392,1021280,B9186SCE,YOHANSA,DC Ciputat,2020-04-03 01:10:39+00:00,2020-04-03 04:14:08+00:00,33.3,11009.0,DC Ciputat,...,0.0,20.0,5.0,5.0,10.0,0.0,9.8405,1795.1,"[[106.779222, -6.298688], [106.776177, -6.3003...",1
5703,5668,1021357,B9294EI,SIMSON PAKPAHAN,DC Kawasan,2020-04-01 02:28:29+00:00,2020-04-01 05:04:21+00:00,24.98,9352.0,DC Kawasan,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[],1
5811,5782,1021363,B9858SDB,HERMAN SAPUTRA,DC Kawasan,2020-04-20 02:21:32+00:00,2020-04-20 04:32:37+00:00,13.52,7865.0,DC Kawasan,...,0.0,11.0,1.0,6.0,1.0,0.0,9.2327,1130.6,"[[106.910667, -6.189393], [106.906397, -6.1892...",1
8053,8036,1021480,D8255DD,DADAN M RAMDAN,DC Bandung,2020-04-22 02:53:14+00:00,2020-04-22 05:07:01+00:00,24.14,8027.0,DC Bandung,...,0.0,26.0,11.0,7.0,1.0,1.0,12.3951,2199.4,"[[107.663087, -6.941463], [107.663681, -6.9414...",1
8401,8386,792168,B9922SDB,HARI PERMANA,DC Kawasan,2020-04-02 03:53:10+00:00,2020-04-02 07:14:43+00:00,44.8,12093.0,DC Kawasan,...,4.0,9.0,1.0,1.0,0.0,0.0,10.0269,924.7,"[[106.890105, -6.244167], [106.892919, -6.2437...",1
8417,8399,792168,B9922SDB,HARI PERMANA,DC Kawasan,2020-04-14 03:30:23+00:00,2020-04-14 08:51:04+00:00,88.49,19241.0,DC Kawasan,...,24.0,73.0,26.0,16.0,1.0,2.0,45.6967,5376.4,"[[106.877739, -6.24084], [106.877277, -6.24276...",1
8423,8411,792168,B9922SDB,HARI PERMANA,DC Kawasan,2020-04-23 03:04:52+00:00,2020-04-23 03:39:21+00:00,22.49,2069.0,KWS_MM INDOMARET JAKAMULYA (T3ZR),...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[],1


In [265]:
df_failed = df_trip1.loc[[3431,5703,5811,8053,8401,8417,8423]].copy()

In [266]:
df_failed

Unnamed: 0,index,device_id,license_plate,driver,vehicle_group,departure_time,arrival_time,distance,interval,origin_region,...,n_motorways,n_steps,n_left_turns,n_right_turns,n_u_turns,n_go_straight,matched_distance,mapbox_est_duration,matched_trajectory,repeat_mapmatch
3431,3392,1021280,B9186SCE,YOHANSA,DC Ciputat,2020-04-03 01:10:39+00:00,2020-04-03 04:14:08+00:00,33.3,11009.0,DC Ciputat,...,0.0,20.0,5.0,5.0,10.0,0.0,9.8405,1795.1,"[[106.779222, -6.298688], [106.776177, -6.3003...",1
5703,5668,1021357,B9294EI,SIMSON PAKPAHAN,DC Kawasan,2020-04-01 02:28:29+00:00,2020-04-01 05:04:21+00:00,24.98,9352.0,DC Kawasan,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[],1
5811,5782,1021363,B9858SDB,HERMAN SAPUTRA,DC Kawasan,2020-04-20 02:21:32+00:00,2020-04-20 04:32:37+00:00,13.52,7865.0,DC Kawasan,...,0.0,11.0,1.0,6.0,1.0,0.0,9.2327,1130.6,"[[106.910667, -6.189393], [106.906397, -6.1892...",1
8053,8036,1021480,D8255DD,DADAN M RAMDAN,DC Bandung,2020-04-22 02:53:14+00:00,2020-04-22 05:07:01+00:00,24.14,8027.0,DC Bandung,...,0.0,26.0,11.0,7.0,1.0,1.0,12.3951,2199.4,"[[107.663087, -6.941463], [107.663681, -6.9414...",1
8401,8386,792168,B9922SDB,HARI PERMANA,DC Kawasan,2020-04-02 03:53:10+00:00,2020-04-02 07:14:43+00:00,44.8,12093.0,DC Kawasan,...,4.0,9.0,1.0,1.0,0.0,0.0,10.0269,924.7,"[[106.890105, -6.244167], [106.892919, -6.2437...",1
8417,8399,792168,B9922SDB,HARI PERMANA,DC Kawasan,2020-04-14 03:30:23+00:00,2020-04-14 08:51:04+00:00,88.49,19241.0,DC Kawasan,...,24.0,73.0,26.0,16.0,1.0,2.0,45.6967,5376.4,"[[106.877739, -6.24084], [106.877277, -6.24276...",1
8423,8411,792168,B9922SDB,HARI PERMANA,DC Kawasan,2020-04-23 03:04:52+00:00,2020-04-23 03:39:21+00:00,22.49,2069.0,KWS_MM INDOMARET JAKAMULYA (T3ZR),...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[],1


In [267]:
df_trip1.drop([3431,5703,5811,8053,8401,8417,8423], axis='index', inplace=True)

Unnamed: 0,index,device_id,license_plate,driver,vehicle_group,departure_time,arrival_time,distance,interval,origin_region,destination_region,departure_hour,trip_time_cat,trip_type,average_speed,max_speed,trajectory_arr,timestamps,trajectory_size
0,0,1019939,B9338SDB,ASEP BACHTIAR,DC Cikarang,2020-04-01 03:19:41+00:00,2020-04-01 06:55:25+00:00,74.410,12944.0,DC Cikarang,DC Cikarang,10,morning,round-trip,35.357881,78.0,"[107.151215,-6.363901, 107.151634,-6.363297, 1...","[1585736386.0, 1585736396.0, 1585736407.0, 158...",774
1,1,1019939,B9338SDB,ASEP BACHTIAR,DC Cikarang,2020-04-02 01:48:12+00:00,2020-04-02 03:58:57+00:00,49.020,7845.0,DC Cikarang,DC Cikarang,9,morning,round-trip,36.252446,73.0,"[107.151268,-6.363881, 107.151733,-6.363121, 1...","[1585817299.0, 1585817310.0, 1585817320.0, 158...",511
2,2,1019939,B9338SDB,ASEP BACHTIAR,DC Cikarang,2020-04-02 04:23:46+00:00,2020-04-02 07:09:46+00:00,54.490,9960.0,DC Cikarang,DC Cikarang,11,noon,round-trip,33.583471,74.0,"[107.151344,-6.363778, 107.151772,-6.363103, 1...","[1585826633.0, 1585826643.0, 1585826652.0, 158...",605
3,3,1019939,B9338SDB,ASEP BACHTIAR,DC Cikarang,2020-04-03 03:03:34+00:00,2020-04-03 07:07:19+00:00,77.420,14625.0,DC Cikarang,DC Cikarang,10,morning,round-trip,31.054230,73.0,"[107.151115,-6.364189, 107.151421,-6.363726, 1...","[1585908214.0, 1585908223.0, 1585908233.0, 158...",922
4,4,1019939,B9338SDB,ASEP BACHTIAR,DC Cikarang,2020-04-06 01:30:51+00:00,2020-04-06 03:22:00+00:00,42.300,6669.0,DC Cikarang,DC Cikarang,9,morning,round-trip,39.343980,82.0,"[107.151237,-6.363938, 107.151596,-6.363309, 1...","[1586161856.0, 1586161866.0, 1586161876.0, 158...",407
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8552,8544,792173,B9909SDB,YULIANTO,DC Kawasan,2020-04-29 05:18:16+00:00,2020-04-29 05:19:21+00:00,0.453,65.0,KWS_MM SAT-J506-ARABICA,KWS_MM IDM-TCXA-PONDOK KOPI RAYA 179,12,noon,point-to-point,25.666667,30.0,"[106.942955,-6.224756, 106.943359,-6.225307, 1...","[1588162703.0, 1588162713.0, 1588162723.0, 158...",6
8553,8545,792173,B9909SDB,YULIANTO,DC Kawasan,2020-04-29 05:25:09+00:00,2020-04-29 05:33:48+00:00,5.316,519.0,KWS_MM IDM-TCXA-PONDOK KOPI RAYA 179,KWS_MM SAT-J527-KALIMALANG 2,12,noon,point-to-point,39.060000,62.0,"[106.943855,-6.227672, 106.943741,-6.226854, 1...","[1588163117.0, 1588163127.0, 1588163137.0, 158...",50
8554,8546,792173,B9909SDB,YULIANTO,DC Kawasan,2020-04-29 05:42:34+00:00,2020-04-29 05:55:47+00:00,5.721,793.0,KWS_MM SAT-J527-KALIMALANG 2,KWS_MM IDM-T8QC-RADEN INTEN 2 5BC,13,noon,point-to-point,25.475610,53.0,"[106.92382,-6.247655, 106.923882,-6.247692, 10...","[1588164160.0, 1588164162.0, 1588164171.0, 158...",82
8555,8547,792173,B9909SDB,YULIANTO,DC Kawasan,2020-04-29 06:00:21+00:00,2020-04-29 06:02:31+00:00,1.635,130.0,KWS_MM IDM-T8QC-RADEN INTEN 2 5BC,KWS_MM IDM-TJOP-JENDRAL R.S SUKANTO,13,noon,point-to-point,42.285714,61.0,"[106.923424,-6.222606, 106.923439,-6.223897, 1...","[1588165228.0, 1588165238.0, 1588165248.0, 158...",14


In [252]:
df_trip1['repeat_mapmatch'].value_counts()

0    8550
1       7
Name: repeat_mapmatch, dtype: int64

### Add hour of the day and day of week

A formal definition for travel time reliability is: the consistency or dependability in travel times, as measured from day-to-day and/or across different times of the day.

[source](https://ops.fhwa.dot.gov/publications/tt_reliability/TTR_Report.htm)

In [280]:
df_trip1['day_of_week'] = df_trip1.apply(lambda x: x['departure_time'].dayofweek, axis=1)

I do have departure_hour which is in float. In Jakarta, if you missed by 15 mins for example, the travel time can be much differ. But for visualization purpose, i will also have integer type hour (smaller category)

In [281]:
df_trip1['hour_of_day'] = df_trip1.apply(lambda x: x['departure_time'].hour, axis=1)

### Save to pgsql

In [282]:
from sqlalchemy.types import Integer, Text, String, DateTime, Float,JSON, TIMESTAMP
df_trip1.to_sql("trip_train_after_mm_2",
           engine,
           if_exists='replace',
           index=True,
           chunksize=50,
           dtype={
                'device_id': String,
                'license_plate': String,
                'driver': String,
                'vehicle_group': String,
                'departure_time': TIMESTAMP(timezone=True),
                'arrival_time': TIMESTAMP(timezone=True),
                'distance': Float,
                'interval': Float,
                'origin_region': String,
                'destination_region': String,
                'departure_hour': Float,
                'day_of_week': Integer,
                'hour_of_day': Integer,
                'trip_time_cat': String,
                'trip_type': String,
                'average_speed': Float,
                'max_speed': Float,
                'trajectory_arr': JSON,
                'timestamps': JSON,
                'trajectory_size': Integer,
                'n_intersections': Integer,
                'n_tolls': Integer,
                'n_motorways': Integer,
                'n_bridges': Integer,
                'n_tunnels': Integer,
                'n_steps': Integer,
                'n_left_turns': Integer,
                'n_right_turns': Integer,
                'n_u_turns': Integer,
                'n_go_straight': Integer,
                'matched_distance': Float,
                'mapbox_est_duration': Float,
                'matched_trajectory': JSON
           })
