# Travel Time Decomposition Methods Using Rawnav Data

## Introduction

This notebook explores a number of options for decomposing vehicle travel time based on October 2019 WMATA data. Key objectives for a new decomposition method may include: 

- Differentiate between signal delay and queue delay within overall traffic delay
- Apply decomposition method to an evaluation segment with a number of bus stops or queue jump intersections.
- Apply decomposition methods without need to incorporate data on the locations of stops or intersections.
- Support evaluation of a variety of transit priority improvements, including queue jumps, bus-only lanes, transit signal priority, as well as other evaluation purposes.

During this exploratory phase, some of these objectives may be contradictory.

....\[Placeholder for conclusions from notebook\]


## Environment Setup


In [1]:
import os, sys, glob, pandas as pd, geopandas as gpd
import pyarrow as pa, pyarrow.parquet as pq
from itertools import product

path_repo = os.path.join('C:/Users/WylieTimmerman/Documents/projects_local/WMATA_AVL_datamart')
sys.path.append(path_repo)
path_data = 'C:/Users/WylieTimmerman/Documents/projects_local/wmata_avl_local/data/02-processed'
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")
%load_ext autoreload
%autoreload 2
    
wmata_crs = 2248

import wmatarawnav as wr

## Data Load

This notebook explores decomposition methods using Metrobus Route 70 and 79 on Georgia Avenue. Both will be examined in the southbound direction during the AM Peak of a particular Tuesday in October 2019. The use of these two routes allows us to examine the results of decomposition on two routes that share the same corridor, with one making local stops and the other operating in a skip-stop fashion.

The rawnav data is reloaded from pre-processed Parquet files from the Queue Jump Effectiveness study.

In [2]:
analysis_routes =  ['70','79']
analysis_days = ['Tuesday']

In [4]:
analysis_test = pd.dataframe(asdfsaf)

AttributeError: module 'pandas' has no attribute 'dataframe'

In [17]:
rawnav_raw = (
    wr.read_cleaned_rawnav(
        analysis_routes_= analysis_routes,
        analysis_days_ = analysis_days,
        path = os.path.join(
            path_data,
            "rawnav_data_2019.parquet"
        )
    )
)

We also reload a segment summary table. This table identifies what records met certain quality requirements for use in the Queue Jump Study. While the analyses below are not segment specific, we'll filter down to the 'good' records here to save ourselves some data cleaning effort.

In [12]:
combo = (
    pd.DataFrame(
        list(
            product(
                analysis_routes,
                analysis_days,
                ['georgia_columbia_stub']
            )
        ),
        columns = ['route','wday','seg_name_id']
    )
)


combo_zip = zip(combo.route, combo.wday, combo.seg_name_id)


filter_parquet = [[('route','=',route),('wday', '=', day),('seg_name_id', '=', seg_name_id)] for route, day, seg_name_id in combo_zip]


In [13]:
segment_summary = (
        pq.read_table(
            source = os.path.join(path_data,"segment_summary_2019.parquet"),
            filters = filter_parquet,
            use_pandas_metadata = True
        )
        .to_pandas()
    )
segment_summary_fil = (
        segment_summary
        .query('~(flag_too_far_any\
                  | flag_wrong_order_any\
                  | flag_too_long_odom\
                  | flag_secs_total_mismatch\
                  | flag_odom_total_mismatch)'
        )
    )

In [18]:
rawnav_raw_fil = (
    rawnav_raw[rawnav_raw[['filename','index_run_start']].agg(tuple,1).isin(segment_summary_fil[['filename','index_run_start']].agg(tuple,1))]
)


In [20]:
rawnav_fil = wr.calc_rolling_vals(rawnav_raw_fil)

In [21]:
rawnav_fil.head()

Unnamed: 0,index_loc,lat,long,heading,door_state,veh_state,odom_ft,sec_past_st,sat_cnt,stop_window,blank,lat_raw,long_raw,row_before_apc,route_pattern,pattern,index_run_start,index_run_end,filename,start_date_time,route,wday,odom_ft_next,sec_past_st_next,odom_ft_next3,sec_past_st_next3,secs_marg,odom_ft_marg,fps_next,fps_next3
1475891,8940.0,38.993232,-77.029978,150.0,C,S,0.0,0.0,8.0,,3.0,38.993232,-77.029978,0.0,7901,1,8940.0,10862.0,rawnav02626191023.txt,2019-10-22 17:50:39,79,Tuesday,0.0,1.0,0.0,217.0,1.0,0.0,0.0,0.0
1475892,8941.0,38.993232,-77.029978,150.0,C,S,0.0,1.0,8.0,E00,1.0,38.993232,-77.029978,0.0,7901,1,8940.0,10862.0,rawnav02626191023.txt,2019-10-22 17:50:39,79,Tuesday,0.0,202.0,0.0,221.0,201.0,0.0,0.0,0.0
1475893,8942.0,38.993232,-77.029978,150.0,O,S,0.0,202.0,8.0,,1.0,38.993232,-77.029978,1.0,7901,1,8940.0,10862.0,rawnav02626191023.txt,2019-10-22 17:50:39,79,Tuesday,0.0,217.0,0.0,224.0,15.0,0.0,0.0,0.0
1475894,8944.0,38.993232,-77.029978,150.0,C,S,0.0,217.0,8.0,,2.0,38.993232,-77.029978,0.0,7901,1,8940.0,10862.0,rawnav02626191023.txt,2019-10-22 17:50:39,79,Tuesday,0.0,221.0,10.0,306.0,4.0,0.0,0.0,0.11236
1475895,8945.0,38.993232,-77.029978,150.0,O,S,0.0,221.0,8.0,,2.0,38.993232,-77.029978,1.0,7901,1,8940.0,10862.0,rawnav02626191023.txt,2019-10-22 17:50:39,79,Tuesday,0.0,224.0,28.0,308.0,3.0,0.0,0.0,0.321839


When we focus on details, we'll pull out a few particular trip instances. In the table below, 5 southbound trip instances each from Route 70 and 79 are shown, all of which started between 7:00 AM and 7:40 AM on Tuesday, October 1st, 2019. These instances are uniquely identified by their filename and the index of where the run began. In the future, we'll want to update the unique identifiers here to be something more meaningful and also space efficient. 

In [14]:
use_runs = (
    wr.tribble(
        ['route', 'filename','index_run_start'],
        # Route 70 trips 
        "70", "rawnav06468191002.txt",	1009,
        "70", "rawnav06438191002.txt",	3331,
        "70", "rawnav05443191002.txt",	1025,
        "70", "rawnav02807191002.txt",	1477,
        "70", "rawnav02811191002.txt",	2152,
        "70", "rawnav06463191002.txt",	271,
        # Route 79 trip instances,
        "79", "rawnav06468191002.txt",	1009,
        "79", "rawnav06438191002.txt",	3331,
        "79", "rawnav02807191002.txt",	1477,
        "79", "rawnav02811191002.txt",	2152,
        "79", "rawnav06463191002.txt",	271
    )
)

use_runs


Unnamed: 0,route,filename,index_run_start
0,70,rawnav06468191002.txt,1009
1,70,rawnav06438191002.txt,3331
2,70,rawnav05443191002.txt,1025
3,70,rawnav02807191002.txt,1477
4,70,rawnav02811191002.txt,2152
5,70,rawnav06463191002.txt,271
6,79,rawnav06468191002.txt,1009
7,79,rawnav06438191002.txt,3331
8,79,rawnav02807191002.txt,1477
9,79,rawnav02811191002.txt,2152


## Decomposition Methods

### Background on Existing Decomposition Method

The method used in the 2020 Queue Jump Effectiveness Study can be used to decompose travel times around a single queue jump intersection within a pre-defined evaluation segment. That process takes place as follows:

1. Within the stop area (+/- 150 ft from stop) evaluation segment, vehicle movements are first decomposed into:

- a deceleration phase
- stopping time before doors open
- dwell time to serve passengers
- stopping time with doors closed (including repeated door opening and closings after the first instance at a stop)
- an acceleration phase 

    Vehicles that do not serve passengers within the stop area are handled separately.

2. From the acceleration and deceleration phases and overall travel time through the segment, freeflow travel times and freeflow acceleration and deceleration times are calculated.
3. By removing the segment-level values identified in Step 2, running time can be decomposed into:

- Tff: Free-flow travel time (calculated at segment level)
- Tstop1: Time spent serving passengers (varies by trip instance)
- Tstop2: Time spent decelerating to and accelerating from the stop (calculated at segment level) plus any additional time the vehicle spent stopped immediately before opening its doors. 
- Ttraffic: The remainder of runtime given above the above. Notably, this value does not distinguish between signal delay (i.e. time spent stopped at a light) and queue delay (i.e. additional delay due to interactions with other vehicles or pedestrians).  

\[insert quick demonstration of previous decompositon\]

This decomposition approach had several challenges:

- 

An alternative decomposition 

### Metro Transit Decomposition Method

#### Description

Metro Transit's paper **Diagnosing Obstacles to Speed and Reliability with High-resolution AVL Data: Bus Time Budgets** uses the following decomposition:

- In Motion Time
    - <5 mph
    - \>=5 mph
- Stop Zone Time
    - Passenger 
    - Non-Passenger
        - Schedule Adherence Dwell
        - Freeflow
        - Delay

This approach is intended to be used to help determine what transit priority improvements will be most effective given the challenges a route or segment faces:

![](./image/metrotransit-evaluation.PNG)

This meets the following objectives for our decomposition:


In [15]:
table_mt = (
    wr.tribble(
        ['objective','Metro_Transit'],
        "Separate signal delay and queue delay", "No",
        "Apply to large segment", "Yes",
        "Can avoid incorporating external data", "Yes",
        "Can evaluate many transit priority improvements", "Yes"
    )
)

table_mt

Unnamed: 0,objective,Metro_Transit
0,Separate signal delay and queue delay,No
1,Apply to large segment,Yes
2,Can avoid incorporating external data,Yes
3,Can evaluate many transit priority improvements,Yes


Like the Queue Jump Effectiveness study approach, this method requires establishing a free-flow travel time through a stop area to identify delay. To do so requires calculating a free flow time for all trips in a stop area, with the free-flow time typically found in trips that operate in the early morning or late night. Because some limited stop and express routes will only operate during peak periods, it may be difficult to determine what truly "free-flow" conditions are for these routes. As a result, delay estimates may be biased downward for these types of routes because "free-flow" travel times are high even among the fastest trips. 

This approach also does not explicitly distinguish signal delay from queue delay, though the authors suggest that the locations of signals could be associated with the results to make this determination.

#### Application to WMATA data