In [74]:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import
For this to work, the `01SEP2017.csv` and `01NOV2017.csv` need to be placed into the `data/raw` folder.

In [26]:
SEP_CSV = Path('../data/raw') / '01SEP2017.csv'
NOV_CSV = Path('../data/raw') / '01NOV2017.csv'

raw_sep_df = pd.read_csv(SEP_CSV)
raw_nov_df = pd.read_csv(NOV_CSV)

# Data

These csv files are 1 day snapshots of the `init_veh_stoph` and `trimet_stop_event` tables. One is from September 1st, 2017 and the other is from November 1st, 2017. These days occur before and after the bus lane change that occured in October 2017. Seems to be a good starting point to see what we can do.

The data is pre-filtered based on the following query parameters:
* Routes 4, 10 and 14
* Service stops only (stop type 0 or 5) ... I'm already thinking I should have included stop type 4/6 which are drive thrus
* Stops 3637, 3641, 3633, 2642, 7856
    * 3637 - SE 11th and SE Madison [Just before the start of bus lane - actually begins at 10th]
    * 3641 - SE 7th and SE Madison 
    * 3633 - SE Grand and SE Madison [End of bus lane]
    * 2642 - Hawthorne Bridge, Westbound [Likely outside of this analysis]
    * 7856 - SE 7th and SE Clay [4 (or 2) bus only]
* Weekday service
* Between 6:00 am (21600) and 12:00 pm (43200)
    * The bus lanes run from 6:00 am to 9:00 am in September, but were extended to 10:00 am in October. 

# Goal

So the goal here is to see how long the busses take to traverse the bus lane from 9:00 am (32400) until 10:00 am (36000) in September, and see how that compares to that same time window in November.

To begin with, let's start with the time the busses arrive at stop 3637 to the time they arrive at stop 3633. That means lines 10 and 14 only. And the 10 doesn't acutally come that often. Let's just look at the 14.

We have some ridership data available here too, so we can also convey the number of people affected by the time it takes to travel this area.

## Columns
For now, let's limit the columns to the following:
* SERVICE_DATE   
* VEHICLE_NUMBER
* TRAIN
* ROUTE_NUMBER
* LEAVE_TIME
* STOP_TIME
* ARRIVE_TIME
* LOCATION_ID
* ESTIMATED_LOAD

In [47]:
cols = ['SERVICE_DATE', 'VEHICLE_NUMBER', 'TRAIN', 'ROUTE_NUMBER', 'LEAVE_TIME', 'STOP_TIME', 'ARRIVE_TIME', 'LOCATION_ID', 'ESTIMATED_LOAD']
sep_df = raw_sep_df[cols][(raw_sep_df['ROUTE_NUMBER'] == 14) & 
                          (raw_sep_df['STOP_TIME'].between(32400, 36000)) &
                          (raw_sep_df['LOCATION_ID'].isin([3637, 3633]))].copy()
nov_df = raw_nov_df[cols][(raw_nov_df['ROUTE_NUMBER'] == 14) & 
                          (raw_nov_df['STOP_TIME'].between(32400, 36000)) &
                          (raw_nov_df['LOCATION_ID'].isin([3637, 3633]))].copy()

In [48]:
sep_df

Unnamed: 0,SERVICE_DATE,VEHICLE_NUMBER,TRAIN,ROUTE_NUMBER,LEAVE_TIME,STOP_TIME,ARRIVE_TIME,LOCATION_ID,ESTIMATED_LOAD
114,01SEP2017:00:00:00,3164,1401,14,32728,32700,32709,3637,17
115,01SEP2017:00:00:00,3164,1401,14,32838,32801,32788,3633,14
142,01SEP2017:00:00:00,2504,1434,14,35869,35880,35840,3637,16
143,01SEP2017:00:00:00,2504,1434,14,35989,35981,35930,3633,13
170,01SEP2017:00:00:00,2530,1437,14,33278,33180,33218,3637,26
172,01SEP2017:00:00:00,2530,1437,14,33401,33281,33344,3633,25
183,01SEP2017:00:00:00,2536,1438,14,34117,34080,34088,3637,27
184,01SEP2017:00:00:00,2536,1438,14,34241,34181,34180,3633,24
190,01SEP2017:00:00:00,3631,1439,14,35030,34980,34936,3637,19
191,01SEP2017:00:00:00,3631,1439,14,35149,35081,35089,3633,12


In [50]:
nov_df

Unnamed: 0,SERVICE_DATE,VEHICLE_NUMBER,TRAIN,ROUTE_NUMBER,LEAVE_TIME,STOP_TIME,ARRIVE_TIME,LOCATION_ID,ESTIMATED_LOAD
115,01NOV2017:00:00:00,3128,1401,14,32734,32700,32707,3637,19
116,01NOV2017:00:00:00,3128,1401,14,32836,32801,32784,3633,18
142,01NOV2017:00:00:00,2266,1434,14,35891,35880,35853,3637,19
144,01NOV2017:00:00:00,2266,1434,14,36055,35981,35992,3633,18
172,01NOV2017:00:00:00,2509,1437,14,33211,33180,33190,3637,0
173,01NOV2017:00:00:00,2509,1437,14,33327,33281,33275,3633,0
179,01NOV2017:00:00:00,2502,1438,14,34125,34080,34083,3637,41
181,01NOV2017:00:00:00,2502,1438,14,34304,34181,34239,3633,43
189,01NOV2017:00:00:00,2536,1439,14,35100,34980,35067,3637,28
190,01NOV2017:00:00:00,2536,1439,14,35217,35081,35155,3633,24


In [70]:
grouped = sep_df.groupby('TRAIN')
print('SEPTEMBER [pre-lane change]')
sep_elapses = []
for name, group in grouped:
    start_time = group[group['LOCATION_ID'] == 3637]['ARRIVE_TIME'].values[0]
    end_time = group[group['LOCATION_ID'] == 3633]['ARRIVE_TIME'].values[0]
    elapsed_time = end_time - start_time
    sep_elapses.append(elapsed_time)
    print(f"Train: {name}\n\tStart: {start_time}\n\tEnd: {end_time}\n\tElapsed Time: {elapsed_time}")

grouped = nov_df.groupby('TRAIN')
print('\n\nNOVEMBER [post-lane change]')
nov_elapses = []
for name, group in grouped:
    start_time = group[group['LOCATION_ID'] == 3637]['ARRIVE_TIME'].values[0]
    end_time = group[group['LOCATION_ID'] == 3633]['ARRIVE_TIME'].values[0]
    elapsed_time = end_time - start_time
    nov_elapses.append(elapsed_time)
    print(f"Train: {name}\n\tStart: {start_time}\n\tEnd: {end_time}\n\tElapsed Time: {elapsed_time}")

SEPTEMBER [pre-lane change]
Train: 1401
	Start: 32709
	End: 32788
	Elapsed Time: 79
Train: 1434
	Start: 35840
	End: 35930
	Elapsed Time: 90
Train: 1437
	Start: 33218
	End: 33344
	Elapsed Time: 126
Train: 1438
	Start: 34088
	End: 34180
	Elapsed Time: 92
Train: 1439
	Start: 34936
	End: 35089
	Elapsed Time: 153


NOVEMBER [post-lane change]
Train: 1401
	Start: 32707
	End: 32784
	Elapsed Time: 77
Train: 1434
	Start: 35853
	End: 35992
	Elapsed Time: 139
Train: 1437
	Start: 33190
	End: 33275
	Elapsed Time: 85
Train: 1438
	Start: 34083
	End: 34239
	Elapsed Time: 156
Train: 1439
	Start: 35067
	End: 35155
	Elapsed Time: 88


In [76]:
print(sep_elapses)
print(nov_elapses)
print(f"Average September elapsed time: {np.array(sep_elapses).mean()}")
print(f"Average November elapsed time: {np.array(nov_elapses).mean()}")

[79, 90, 126, 92, 153]
[77, 139, 85, 156, 88]
Average September elapsed time: 108.0
Average November elapsed time: 109.0


Cool. So the busses got 1 second slower on average when comparing 1 day / 5 routes. Gonna need more data.