## Edit GTFS as a System
Margaret Atkinson 10/25/22

Overall Goal: Conflate GTFS to TransCAD links

    Issue: Too many route variations
        First Sub-Issue Goal: Get the number of trips per time period for each route variation

Notes:

While this 2018 Recap GTFS has been cleaned by the R script itsleeds_gleangtfs.R and imported to the link layer in TransCAD, I am starting from the cleaned GTFS (not imported to link layer) so that I can easily connect all this data together in a system instead of disparate parts.

In [1]:
import matplotlib
matplotlib.use('agg')  # allows notebook to be tested in Travis

import numpy as np
import pandas as pd
import cartopy.crs as ccrs
import cartopy
import matplotlib.pyplot as plt
import pandana as pdna
import time

import urbanaccess as ua
from urbanaccess.config import settings
from urbanaccess.gtfsfeeds import feeds
from urbanaccess import gtfsfeeds
from urbanaccess.gtfs.gtfsfeeds_dataframe import gtfsfeeds_dfs
from urbanaccess.network import ua_network, load_network

%matplotlib inline

Example of GTFS in UrbanAccess here:

https://github.com/UDST/urbanaccess/blob/dev/demo/simple_example.ipynb

In [2]:
# required bbox including all of Massachusetts and RI as well as parts of NH, CT, NY
bbox = (-73.7207, 41.1198, -69.7876, 43.1161)
# path to the downloaded and cleaned gtfs - mbta recap file for fall 2018
#   this could also be a folder of gtfs folders (pre merge of multiple gtfs)
#path_to_gtfs = r"J:\Shared drives\TMD_TSA\Programs\MID\Networks\Research_Development\Transit_Networks\gtfs_to_transcad\mbta2018_its_clean"
path_to_gtfs = r"J:\Shared drives\TMD_TSA\Programs\MID\Networks\Research_Development\Transit_Networks\gtfs_to_transcad\mbta2018_clean_trips_filter"

In [3]:
loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path= path_to_gtfs,
                                           validation=True,
                                           verbose=True,
                                           bbox=bbox,
                                           remove_stops_outsidebbox=False,
                                           append_definitions=True)

Checking GTFS text file header whitespace... Reading files using encoding: utf-8 set in configuration.
GTFS text file header whitespace check completed. Took 1.21 seconds
--------------------------------
Processing GTFS feed: mbta2018_clean_trips_filter
The unique agency id: mbta was generated using the name of the agency in the agency.txt file.
Unique agency id operation complete. Took 0.05 seconds
Unique GTFS feed id operation complete. Took 0.01 seconds
No GTFS feed stops were found to be outside the bounding box coordinates
mbta2018_clean_trips_filter GTFS feed stops: coordinates are in northwest hemisphere. Latitude = North (90); Longitude = West (-90).
Appended route type to stops
Appended route type to stop_times
--------------------------------
Added descriptive definitions to stops, routes, stop_times, and trips tables
Successfully converted ['departure_time'] to seconds past midnight and appended new columns to stop_times. Took 3.56 seconds
1 GTFS feed file(s) successfully re

### Calculating Route Variation Trip Counts
Now the goal is to get the number of trips per ROUTE, time period, direction, and shape_id and by ROUTE, time period, direction, and route_pattern_id. The idea is to be able to filter out uncommonly used route patterns, which will help with identifying conflation issues that are relevant to the model.

Time of Day Periods
- AM Peak - 6:30 AM to 9:30 AM
- MD - 9:30 AM - 3:00 PM
- PM Peak - 3:00 PM - 7:00 PM
- NT - 7:00 PM - 6:30 AM

In [4]:
hours = gtfsfeeds_dfs.stop_times['arrival_time'].astype('str').str.split(':').apply(lambda x: x[0]).astype('int64') 
minutes = (gtfsfeeds_dfs.stop_times['arrival_time'].astype('str').str.split(':').apply(lambda x: x[1]).astype('int64')/100)
gtfsfeeds_dfs.stop_times['time_integer'] = (hours+minutes).astype('float')
gtfsfeeds_dfs.stop_times['tod'] = np.where(
    ((gtfsfeeds_dfs.stop_times['time_integer'] >= 6.3) & (gtfsfeeds_dfs.stop_times['time_integer']< 9.3)), "AM", 
    np.where(
        ((gtfsfeeds_dfs.stop_times['time_integer'] >= 9.3) & (gtfsfeeds_dfs.stop_times['time_integer']< 15)), "MD",
        np.where(
            ((gtfsfeeds_dfs.stop_times['time_integer'] >= 15) & (gtfsfeeds_dfs.stop_times['time_integer']< 19)), "PM",
            np.where(
                ((gtfsfeeds_dfs.stop_times['time_integer'] >= 19) | (gtfsfeeds_dfs.stop_times['time_integer']< 6.3)), "NT", "0"
            
        ))))

In [5]:
gtfsfeeds_dfs.stop_times.query('tod=="0"')

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,timepoint,checkpoint_id,unique_agency_id,unique_feed_id,route_type,pickup_type_desc,drop_off_type_desc,timepoint_desc,departure_time_sec,time_integer,tod


In [6]:
trips = gtfsfeeds_dfs.trips.merge(gtfsfeeds_dfs.stop_times.groupby(by=['trip_id'])['tod'].min().reset_index()[['trip_id','tod']], on='trip_id')#.query('stop_sequence == 1')[['trip_id','tod']], on='trip_id')

In [7]:
trips_count = trips.groupby(by=['route_id' , 'direction_id', 'route_pattern_id','service_id','tod'])['shape_id'].count().reset_index()
trips_count.to_csv(r"C:\Users\matkinson.AD\Downloads\route_pattern_breakdown.csv")
trips_count

Unnamed: 0,route_id,direction_id,route_pattern_id,service_id,tod,shape_id
0,1,0,1-_-0,BUS42018-hbc48fr1-Weekday-02,AM,22
1,1,0,1-_-0,BUS42018-hbc48fr1-Weekday-02,MD,24
2,1,0,1-_-0,BUS42018-hbc48fr1-Weekday-02,NT,43
3,1,0,1-_-0,BUS42018-hbc48fr1-Weekday-02,PM,25
4,1,0,1-_-0,BUS42018-hbc48wk1-Weekday-02,AM,22
...,...,...,...,...,...,...
2631,Red,1,Red-3-1,RTL42018-hms48011-Weekday-01,AM,30
2632,Red,1,Red-3-1,RTL42018-hms48011-Weekday-01,MD,27
2633,Red,1,Red-3-1,RTL42018-hms48011-Weekday-01,NT,33
2634,Red,1,Red-3-1,RTL42018-hms48011-Weekday-01,PM,23


Notes:

The two following service_ids are identical in the routes and number of trips they serve:
- BUS42018-hbc48fr1-Weekday-02	
- BUS42018-hbc48wk1-Weekday-02

They do have overlap in when trips are running - however, hbc48wk1 seems to be for MTWTH and hbc48fr1 for F. Maybe it allows more flexibility around holidays. This is the main issue when thinking about trips with different ids but the same pattern and stop_times (e.g. same trip, different service_id) when days are conflated (e.g. Tuesdays instead of a specific date)

Both of these schedules run MTWTHF and are taken out of service in the optional calendar_dates.txt where 2 means subtracted and 1 means added. See : https://multigtfs.readthedocs.io/en/latest/gtfs.html for documentation of this schema. 

20180918 (September 18, 2018) seems to be a "regular" date in calendar_dates.txt - meaning that ONLY BUS42018-hbc48fr1-Weekday-02	 is turned off for the day.

In [8]:
bline_trips = gtfsfeeds_dfs.trips.query('route_id == "Green-B"')['trip_id']
gtfsfeeds_dfs.trips.query('route_id == "Green-B"')

Unnamed: 0,route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,wheelchair_accessible,trip_route_type,route_pattern_id,bikes_allowed,unique_agency_id,unique_feed_id,bikes_allowed_desc,wheelchair_accessible_desc
19607,Green-B,LRV42018-hlb48011-Weekday-01,38091538,Boston College,,0,B813_-3,813_0003,1,,Green-B-3-0,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19608,Green-B,LRV42018-hlb48011-Weekday-01,38091539,Boston College,,0,B813_-23,813_0003,1,,Green-B-3-0,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19609,Green-B,LRV42018-hlb48011-Weekday-01,38091540,Boston College,,0,B813_-24,813_0003,1,,Green-B-3-0,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19610,Green-B,LRV42018-hlb48011-Weekday-01,38091611,Boston College,,0,B813_-12,813_0003,1,,Green-B-3-0,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19611,Green-B,LRV42018-hlb48011-Weekday-01,38091612,Boston College,,0,B813_-12,813_0003,1,,Green-B-3-0,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19934,Green-B,LRV42018-hlb48011-Weekday-01,38092227,Park Street,,1,B813_-16,813_0004,1,,Green-B-3-1,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19935,Green-B,LRV42018-hlb48011-Weekday-01,38092228,Park Street,,1,B813_-17,813_0004,1,,Green-B-3-1,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19936,Green-B,LRV42018-hlb48011-Weekday-01,38092229,Park Street,,1,B813_-18,813_0004,1,,Green-B-3-1,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...
19937,Green-B,LRV42018-hlb48011-Weekday-01,38092230,Park Street,,1,B813_-21,813_0004,1,,Green-B-3-1,0,mbta,mbta2018_clean_trips_filter_1,,can accommodate at least one rider in a wheelc...


In [14]:
bline_stops = gtfsfeeds_dfs.stop_times.query('trip_id in @bline_trips')['stop_id']
gtfsfeeds_dfs.stop_times.query('trip_id in @bline_trips')

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,timepoint,checkpoint_id,unique_agency_id,unique_feed_id,route_type,pickup_type_desc,drop_off_type_desc,timepoint_desc,departure_time_sec,time_integer,tod
43999,38091432,05:01:00,05:01:00,70106,1,,0,1,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Not available,Approximate times,18060,5.01,NT
44000,38091432,05:03:00,05:03:00,70110,10,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,18180,5.03,NT
44001,38091432,05:04:00,05:04:00,70112,20,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,18240,5.04,NT
44002,38091432,05:06:00,05:06:00,70114,30,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,18360,5.06,NT
44003,38091432,05:07:00,05:07:00,70116,40,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,18420,5.07,NT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60198,38092255,14:53:00,14:53:00,70117,270,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,53580,14.53,MD
60199,38092255,14:55:00,14:55:00,70115,280,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,53700,14.55,MD
60200,38092255,14:56:00,14:56:00,70113,290,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,53760,14.56,MD
60201,38092255,14:57:00,14:57:00,70111,300,,0,0,0,,mbta,mbta2018_clean_trips_filter_1,0.0,Regularly Scheduled,Regularly Scheduled,Approximate times,53820,14.57,MD


In [15]:
gtfsfeeds_dfs.stops.query('stop_id in @bline_stops')

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,platform_code,platform_name,stop_lat,stop_lon,zone_id,stop_address,stop_url,level_id,location_type,parent_station,wheelchair_boarding,unique_agency_id,unique_feed_id,route_type,location_type_desc,wheelchair_boarding_desc
7778,70126,70126.0,Allston Street,Allston Street - Green Line - Park Street & North,,Park Street & North,42.348649,-71.137881,,,https://www.mbta.com/stops/place-alsgr,level_median,0,place-alsgr,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7779,70127,70127.0,Allston Street,Allston Street - Green Line - (B) Boston College,,Boston College,42.348546,-71.137362,,,https://www.mbta.com/stops/place-alsgr,level_median,0,place-alsgr,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7793,70156,70156.0,Arlington,Arlington - Green Line - Park Street & North,,Park Street & North,42.351902,-71.070893,,,https://www.mbta.com/stops/place-armnl,level_-2_platform,0,place-armnl,1,mbta,mbta2018_clean_trips_filter_1,0.0,stop,At least some vehicles at this stop can be boa...
7794,70157,70157.0,Arlington,Arlington - Green Line - Copley & West,,Copley & West,42.351902,-71.070893,,,https://www.mbta.com/stops/place-armnl,level_-2_platform,0,place-armnl,1,mbta,mbta2018_clean_trips_filter_1,0.0,stop,At least some vehicles at this stop can be boa...
7804,70136,70136.0,Babcock Street,Babcock Street - Green Line - Park Street & North,,Park Street & North,42.351776,-71.12153,,,https://www.mbta.com/stops/place-babck,level_median,0,place-babck,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7805,70137,70137.0,Babcock Street,Babcock Street - Green Line - (B) Boston College,,Boston College,42.351903,-71.122042,,,https://www.mbta.com/stops/place-babck,level_median,0,place-babck,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7828,70148,70148.0,Blandford Street,Blandford Street - Green Line - Park Street & ...,,Park Street & North,42.349165,-71.099821,,,https://www.mbta.com/stops/place-bland,level_median,0,place-bland,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7829,70149,70149.0,Blandford Street,Blandford Street - Green Line - (B) Boston Col...,,Boston College,42.349276,-71.100213,,,https://www.mbta.com/stops/place-bland,level_median,0,place-bland,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7845,70158,70158.0,Boylston,Boylston - Green Line - Park Street & North,,Park Street & North,42.352531,-71.064682,,,https://www.mbta.com/stops/place-boyls,level_-1_platform,0,place-boyls,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop
7846,70159,70159.0,Boylston,Boylston - Green Line - Copley & West,,Copley & West,42.353214,-71.064545,,,https://www.mbta.com/stops/place-boyls,level_-1_platform,0,place-boyls,2,mbta,mbta2018_clean_trips_filter_1,0.0,stop,Wheelchair boarding is not possible at this stop


In [17]:
gtfsfeeds_dfs.stops.query('stop_id in @bline_stops').to_csv(
    r"J:\Shared drives\TMD_TSA\Programs\MID\Networks\Network_Release_Process\2022_FirstRelease\BLine_Green_Stops_2018GTFS.csv")