In [1]:
import pandas as pd
import glob

## Import AVL data

Search for all CSV files and merge them. The glob module, list comprehension, and Pandas' concat method provide an elegant way to do just that.

In [2]:
csvs = glob.glob("**/**/*.csv")

In [3]:
df = pd.concat([pd.read_csv(csv) for csv in csvs], ignore_index=True)

In [4]:
df.route.unique()

array([101,  89,  71,  73,   1], dtype=int64)

Datetime conversion

In [5]:
df.actstoptime = pd.to_datetime(df.actstoptime)

## Define corridors

In [6]:
df.route.unique()

array([101,  89,  71,  73,   1], dtype=int64)

In [7]:
df['corridor'] = ""
df.loc[df.route.isin([71, 73]), 'corridor'] = "Belmont and Mt. Auburn Streets"
df.loc[df.route == 1, 'corridor'] = "South Mass. Ave"
df.loc[df.route.isin([89, 101]), 'corridor'] = "Broadway, Somerville"

## Get travel times

In [8]:
df.head()

Unnamed: 0,tripdate,year,route,trip,stopid,actstoptime,dir,seasonal_period,implemented,corridor
0,2015-07-01,2015,101,41302800,5303,2015-07-01 07:56:33,0,1.0,0,"Broadway, Somerville"
1,2015-07-01,2015,101,41302800,2705,2015-07-01 07:59:12,0,1.0,0,"Broadway, Somerville"
2,2015-07-01,2015,101,41302800,2709,2015-07-01 08:03:31,0,1.0,0,"Broadway, Somerville"
3,2015-07-01,2015,101,41302550,5303,2015-07-01 08:21:15,0,1.0,0,"Broadway, Somerville"
4,2015-07-01,2015,101,41302550,2705,2015-07-01 08:24:23,0,1.0,0,"Broadway, Somerville"


### Data Quality Issues

I would like to use total trip time to compare the performance of the corridor before and after the implementations of the bus priority measures.
 
However, let us check first if all similar trips serve all stops.

In [9]:
df[df.trip == 66763012]

Unnamed: 0,tripdate,year,route,trip,stopid,actstoptime,dir,seasonal_period,implemented,corridor
83528,2019-11-21,2019,101,66763012,2722,2019-11-21 12:49:22,1,2.0,1,"Broadway, Somerville"
83529,2019-11-21,2019,101,66763012,2725,2019-11-21 12:50:43,1,2.0,1,"Broadway, Somerville"
83530,2019-11-21,2019,101,66763012,2729,2019-11-21 12:52:25,1,2.0,1,"Broadway, Somerville"


This trip above serves three stops, which is consistent with the data reference Excel file for an outbound route 101 trip.

In [10]:
df[df.trip == 41372164]

Unnamed: 0,tripdate,year,route,trip,stopid,actstoptime,dir,seasonal_period,implemented,corridor
43190,2015-07-07,2015,101,41372164,2729,2015-07-07 07:09:48,1,1.0,0,"Broadway, Somerville"


Different picture here. For this trip, which serves the same route in the same direction, the AVL system reported only one stop served. Perhaps it skipped the other stops located within the corridor. 

Regardless, we need to exclude such "trips" as they will report a total travel time of zero, which is not ideal.

It is unclear what to do with ones that report 2 stops, however. Those two stops could be any of three logical combinations (1-2, 2-3, 1-3). Each provides different pieces of information and the third one is the ideal for calculating trip travle time.

In [11]:
df[df.trip == 41331065]

Unnamed: 0,tripdate,year,route,trip,stopid,actstoptime,dir,seasonal_period,implemented,corridor
43187,2015-07-03,2015,101,41331065,2722,2015-07-03 07:05:19,1,1.0,0,"Broadway, Somerville"
43188,2015-07-03,2015,101,41331065,2729,2015-07-03 07:10:24,1,1.0,0,"Broadway, Somerville"


This corresponds to the third case, where the first and last stop are reported but not the middle one. I will be operating under the assumption that trips reporting two stops are good.

## Group rows and get travel times

In [12]:
cols = ['tripdate', 
        'trip', 
        'route', 
        'dir', 
        'seasonal_period', 
        'implemented', 
        'corridor']

trips = df.groupby(cols)['actstoptime'].agg([('trip_start', 'min'),
                                             ('trip_end', 'max'),
                                             ('stop_count', 'count')]) # how many stops are reported

Exclude trips with only 1 reported stop

In [13]:
trips = trips[trips.stop_count > 1]

In [14]:
trips.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,trip_start,trip_end,stop_count
tripdate,trip,route,dir,seasonal_period,implemented,corridor,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2015-07-01,41302550,101,0,1.0,0,"Broadway, Somerville",2015-07-01 08:21:15,2015-07-01 08:24:23,2
2015-07-01,41302601,101,0,1.0,0,"Broadway, Somerville",2015-07-01 09:10:42,2015-07-01 09:12:19,2
2015-07-01,41302800,101,0,1.0,0,"Broadway, Somerville",2015-07-01 07:56:33,2015-07-01 08:03:31,3
2015-07-01,41303064,101,1,1.0,0,"Broadway, Somerville",2015-07-01 07:05:43,2015-07-01 07:10:23,2
2015-07-02,41318220,101,0,1.0,0,"Broadway, Somerville",2015-07-02 07:52:09,2015-07-02 07:57:25,2
2015-07-03,41331065,101,1,1.0,0,"Broadway, Somerville",2015-07-03 07:05:19,2015-07-03 07:10:24,2
2015-07-07,41372134,101,0,1.0,0,"Broadway, Somerville",2015-07-07 07:58:09,2015-07-07 08:03:55,3
2015-07-08,41385332,101,0,1.0,0,"Broadway, Somerville",2015-07-08 08:22:38,2015-07-08 08:28:29,3
2015-07-08,41385824,101,1,1.0,0,"Broadway, Somerville",2015-07-08 07:07:28,2015-07-08 07:13:24,2
2015-07-08,41386348,101,0,1.0,0,"Broadway, Somerville",2015-07-08 07:57:27,2015-07-08 07:58:56,2


In [15]:
trips['travel_time'] = (trips.trip_end - trips.trip_start)

In [16]:
trips.travel_time = trips.travel_time.dt.total_seconds()

In [17]:
trips.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,trip_start,trip_end,stop_count,travel_time
tripdate,trip,route,dir,seasonal_period,implemented,corridor,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2015-07-01,41302550,101,0,1.0,0,"Broadway, Somerville",2015-07-01 08:21:15,2015-07-01 08:24:23,2,188.0
2015-07-01,41302601,101,0,1.0,0,"Broadway, Somerville",2015-07-01 09:10:42,2015-07-01 09:12:19,2,97.0
2015-07-01,41302800,101,0,1.0,0,"Broadway, Somerville",2015-07-01 07:56:33,2015-07-01 08:03:31,3,418.0
2015-07-01,41303064,101,1,1.0,0,"Broadway, Somerville",2015-07-01 07:05:43,2015-07-01 07:10:23,2,280.0
2015-07-02,41318220,101,0,1.0,0,"Broadway, Somerville",2015-07-02 07:52:09,2015-07-02 07:57:25,2,316.0


## Time Periods

Define periods as listed in assignment, with AM Peak, Midday, and PM Peak periods. Each is two hours long. Outside the periods as defined is "Other".

In [18]:
trips['time'] = trips.trip_start.dt.time

In [19]:
trips['period'] = "Other"

In [20]:
mask = (trips.time >= pd.to_datetime('7:30:00').time()) &\
       (trips.time <= pd.to_datetime('9:30:00').time())
trips.loc[mask, 'period'] = "AM Peak"

In [21]:
mask = (trips.time >= pd.to_datetime('12:00:00').time()) &\
       (trips.time <= pd.to_datetime('14:00:00').time())
trips.loc[mask, 'period'] = "Midday"

In [22]:
mask = (trips.time >= pd.to_datetime('16:30:00').time()) &\
       (trips.time <= pd.to_datetime('18:30:00').time())
trips.loc[mask, 'period'] = "PM Peak"

In [23]:
trips.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,trip_start,trip_end,stop_count,travel_time,time,period
tripdate,trip,route,dir,seasonal_period,implemented,corridor,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2015-07-01,41302550,101,0,1.0,0,"Broadway, Somerville",2015-07-01 08:21:15,2015-07-01 08:24:23,2,188.0,08:21:15,AM Peak
2015-07-01,41302601,101,0,1.0,0,"Broadway, Somerville",2015-07-01 09:10:42,2015-07-01 09:12:19,2,97.0,09:10:42,AM Peak
2015-07-01,41302800,101,0,1.0,0,"Broadway, Somerville",2015-07-01 07:56:33,2015-07-01 08:03:31,3,418.0,07:56:33,AM Peak
2015-07-01,41303064,101,1,1.0,0,"Broadway, Somerville",2015-07-01 07:05:43,2015-07-01 07:10:23,2,280.0,07:05:43,Other
2015-07-02,41318220,101,0,1.0,0,"Broadway, Somerville",2015-07-02 07:52:09,2015-07-02 07:57:25,2,316.0,07:52:09,AM Peak


In [24]:
trips.to_csv("trips.csv")

## Tableau Dashboard

%%HTML <div class='tableauPlaceholder' id='viz1576323250112' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Bu&#47;BusPriority&#47;Dashboard&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='BusPriority&#47;Dashboard' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Bu&#47;BusPriority&#47;Dashboard&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1576323250112');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Variability by seasonal period

Within the same corridor, no significant difference between routes were observed, even when comparing the trends across different seasonal periods or time periods of the day.