# Consolidating Route Patterns in GTFS

Intended to be run after running the R script that filters the trips, schedules, etc. by a chosen date.

The goal of this consolidation step is to do the following:
    - All trips with same headsign/route/direction are assigned the same route_pattern_id. The route_pattern_id that represents the group is the route_pattern_id with the most trips per day.
    - If a route pattern has less than 3 trips per tod period across all day, assign the route_pattern_id of the dir/route group with the most trips.

As a consequence of this consolidation, several connected parts must also be updated: shape_id (on trips table), stop sequence (stops per trip, stop times per trip - e.g. stop_times table), and service_id, trip_headsign, and block_id (on trips table) if the trip was part of a route pattern/RHD that had < 3 trips in each time period.

Notes:
- The MBTA service day is 3AM - 26:59. For TOD assignment, everything that isn't between the hours of 6:30 - 19:00 is counted as NT. This addresses this issue as it includes 19:00-26:39 AND 3:00 - 6:30.


Needs/Steps:
- Number of trips per time period per route_pattern_id
    - Midpoint time of each trip
    - Each trip classified by TOD (based on midpoint)
    - Sum of trips per TOD by route_pattern_id
- route_pattern_id with most daily trips per Route, Direction, Headsign
    - Sum all tod trips per route_pattern_id
    - grab just the max per Route, Direction, Headsign (but keep route_pattern_id)
- consolidate route patterns by Route, Direction, Headsign
    - if route_pattern_id has less than 3 trips in each of the 4 TODs, replace with max trips route_pattern_id
- once consolidated, update trips attributes to match new route pattern
- update stop sequence and stop times for updated trips

*** This script takes about 30 minutes to run with the 2018 MBTA Fall Recap file. 3 fxns take about 10 minutes.

In [None]:
import time
import datetime
import copy

import matplotlib
matplotlib.use('agg')  # allows notebook to be tested in Travis

import numpy as np
import pandas as pd
import cartopy.crs as ccrs
import cartopy
import matplotlib.pyplot as plt
import pandana as pdna
import time

import urbanaccess as ua
from urbanaccess.config import settings
from urbanaccess.gtfsfeeds import feeds
from urbanaccess import gtfsfeeds
from urbanaccess.gtfs.gtfsfeeds_dataframe import gtfsfeeds_dfs
from urbanaccess.network import ua_network, load_network

from tqdm import tqdm

%matplotlib inline

In [None]:
# required bbox including all of Massachusetts and RI as well as parts of NH, CT, NY
bbox = (-73.7207, 41.1198, -69.7876, 43.1161)
# path to the downloaded and cleaned gtfs - mbta recap file for fall 2018
#   this could also be a folder of gtfs folders (pre merge of multiple gtfs)

path_to_gtfs = r"J:\Shared drives\TMD_TSA\Model\networks\Transit\gtfs\bnrd\1_gtfs_r" #r"J:\Shared drives\TMD_TSA\Model\networks\Transit\gtfs\bnrd\1_gtfs_r"
out_path = r"J:\Shared drives\TMD_TSA\Model\networks\Transit\gtfs\bnrd\2_gtfs_py" #r"J:\Shared drives\TMD_TSA\Model\networks\Transit\gtfs\bnrd\2_gtfs_py"

In [None]:
loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path= path_to_gtfs,
                                           validation=True,
                                           verbose=True,
                                           bbox=bbox,
                                           remove_stops_outsidebbox=False,
                                           append_definitions=True)

### Functions Part 1 : Admin

1. Separate out Bus Routes
2. Get start/stop times per trip
3. Assign trip to TOD based on stop/start times

In [None]:
# separate out the bus routes and non-bus routes if multiple mode types
def separate_bus(route_table,trip_table,x):
    ''' filter out trips by mode to prevent from being consolidated

    Parameters
    -----------
    route_table : df
        gtfs routes.txt file in dataframe format
    trip_table : df
        gtfs trips.txt file in dataframe format
    x : int
        mode number to keep

    Returns
    --------
    non_bus_trips : df
        df of trips to not consolidate
    bus_trips : df
        df of trips to consolidate
    '''
    non_bus_routes = list(route_table.query('route_type != @x').route_id.unique())
    non_bus_trips = trip_table.query('route_id in @non_bus_routes')
    bus_trips = trip_table.query('route_id not in @non_bus_routes')

    if len(non_bus_trips) == 0:
        print("All routes in this GTFS are bus routes.")
    if len(bus_trips) == 0:
        print("There are no bus routes in this GTFS.")
    return(non_bus_trips, bus_trips)

In [None]:
def get_start_stop_times(stop_times):    
    '''for every trip, grab the start time and stop time of the trip
    
    Parameters
    -----------
    stop_times : df
        gtfs stop_times.txt in df format

    Returns
    --------
        flintstone : df
            df with start and stop times per trip

    '''
    chocula =0 
    for trip_id in stop_times['trip_id'].unique():
        max_row = stop_times.query('trip_id==@trip_id').query('stop_sequence == stop_sequence.max()')[['trip_id','arrival_time']]
        min_row = stop_times.query('trip_id==@trip_id').query('stop_sequence == stop_sequence.min()')[['trip_id','arrival_time']]
        r2 = min_row.merge(max_row, how='left', on='trip_id', suffixes = ('_start','_end'))
        if chocula == 0:
            flintstone = pd.DataFrame(r2)
        else:
            flintstone=pd.concat([flintstone,r2])
        chocula +=1
    return(flintstone)


In [None]:
def assign_tod(start_stop):
    '''calculate midpoint of trip, use midpoint to assign TOD
    
    Parameters
    -----------
    start_stop : df
        df with start and stop times per trip

    Returns
    --------
    start_stop :
        df with start time, stop time, midpoint time, and TOD per trip

    '''
    
    start_stop['at_end_dec'] = (
        (
            (start_stop['arrival_time_end'].str.split(":").str[0]).astype('int32')
            +
            ((start_stop['arrival_time_end'].str.split(":").str[1]
            ).astype('int32')/60)))
    start_stop['at_start_dec'] = (
        (
            (start_stop['arrival_time_start'].str.split(":").str[0]).astype('int32')
            +
            ((start_stop['arrival_time_start'].str.split(":").str[1]
            ).astype('int32')/60)))
    
    start_stop['midpoint'] = start_stop['at_start_dec'] + ((start_stop['at_end_dec']-start_stop['at_start_dec'])/2)
    start_stop['tod'] = np.where(start_stop['midpoint'].between(6.50,9.50),'AM', np.where(
        start_stop['midpoint'].between(9.50,15.00), 'MD', np.where(
            start_stop['midpoint'].between(15.00,19.00),'PM', 'NT' 
        )
            ) 
        )
    
    return start_stop


In [None]:
def assign_trips_tod(trips, start_stop_tod):
    """
    add TOD column from assign_tod() output to trips table
        accomodates gtfs without route_pattern_ids or trip_headsigns for buses (e.g. coming out of remix)

    Parameters
    -----------
    trips : df
        gtfs trips.txt in df form
    start_stop_tod : df
        trips with tod by midpoint
            (output of assign_tod)

    Returns
    -------
    tod_trips : df
        trips with a tod column
    count_trips : df
        number of trips per route_id, direction_id, and tod combination
        - purpose is to be a useful summary file and to join back to imported gtfs

    """
    if len(trips.groupby(by=['route_id','direction_id']).agg({'shape_id' : 'nunique'}).query('shape_id > 1')) > 0:
        print("Please run the functions that follows - you likely need to consolidate your route patterns")
    else: 
        print("Skip to Clean Up & Export!")

    # add TOD to trip table
    tod_trips = trips.merge(start_stop_tod[['trip_id','tod']], how='left', on='trip_id')
    count_trips = tod_trips.groupby(by=['route_id','direction_id','tod']).count()
    return(tod_trips, count_trips)

### Functions Part 2 : Consolidate by RHD and RD
1. Determine route_pattern_id with Max Number of Daily Trips
    - Getting duplicate ids where same number of trips for max, so choosing one arbitrarily.
    - The section can be expanded to try to determine which has the most trips during peak period, but for now this works.
    
--

2. update the route pattern id field in the trips table based on maximum daily trips per route id/trip headsign/direction id combo
    - e.g. consolidate by RHD by creating 1:1 rpid to RHD
    
--

3. Update route/headsign/direction combos where each TOD period has less than three trips per route_pattern_id
    - take the route_pattern_id with max number of daily trips per route/direction

In [None]:
# FOR R/H/D, select RPID with most trips
def max_daily_trips(tod_trips, start_stop_tod):
    """ select the route pattern id with the most daily trips per route id, trip headsign, and direction id combination

    Parameters
    -----------
    trips : df
        gtfs trips.txt in df form
    start_stop_tod : df
        trips with tod by midpoint
            (output of assign_tod)

    Returns
    --------
    max_rpid : df
        df with route id/trip headsign/direction id and rpid with max trips

    """
    
        # get the number of trips per R/H/D & rpid
    day_rpid = tod_trips.groupby(by=['route_id','trip_headsign','direction_id','route_pattern_id']).agg({'trip_id':'nunique'})
    day_rpid = day_rpid.rename(columns = {'trip_id':'daily_trips'}).reset_index()

        # for each R/H/D, select just the route_pattern_id with the most daily trips
    max_rpid = day_rpid.groupby(by=['route_id','trip_headsign','direction_id']).apply(lambda g: g[g['daily_trips'] == g['daily_trips'].max()])[['route_pattern_id','daily_trips']].reset_index()
    max_rpid = max_rpid[['route_id','trip_headsign','direction_id','route_pattern_id']].rename(columns = {'route_pattern_id':'route_pattern_id_new'})

    # because there are several cases where the max is held by two different route_pattern_ids, chose one arbitrarily
    max_rpid = max_rpid.drop_duplicates(subset=['route_id','trip_headsign','direction_id'])
    return(max_rpid)

In [None]:
# update rpid to rpid with max trips
def update_rpid_max(trips, max_trips_rpid, base_trip_columns):
    """ update the route pattern id field in the trips table based on maximum daily trips per route id/trip headsign/direction id combo

    Parameters
    ----------
    trips : df
        gtfs trips.txt in df form
    max_trips_rpid : df
        output df from max_daily_trips()
        route_id/trip_headsign/direction_id -> rpid
    base_trip_columns : idx
        columns of trips 

    Returns
    -------
    trips_update_rpid : df
        gtfs trips.txt in df form with updated route pattern id
    """
    # join trip table with the RPID with most trips (join is on R/H/D, RPID is an attribute of the table)
    trips_update_rpid = trips.merge(max_trips_rpid, how='left', on=['route_id','trip_headsign','direction_id'])

    # update the route_pattern_ids based on R/H/D max
        # this may be the same as original RPID as this should cover all R/H/D combos (therefore all trips)
    trips_update_rpid['route_pattern_id'] = trips_update_rpid['route_pattern_id_new']
    trips_update_rpid = trips_update_rpid[base_trip_columns]
    
    return(trips_update_rpid)

In [None]:
def update_RHD_under_3(trips_update_rpid, start_stop_tod):
    """ select route patterns with less than 3 trips in every TOD period

    Parameters
    -----------
    trips_update_rpid : df
        gtfs trips.txt with updated route_pattern_id (max daily trips) in df form
    start_stop_tod : df
        df with each trip, its assigned TOD period, and start/stop times

    Returns
    --------
    needs_update : df
        route id/trip headsign/direction id with under 3 trips in each TOD period
    update_table : df
         route id/trip headsign/direction id with under 3 trips in each TOD period with rpid with max daily trips per route_id/t/direction_id
    """
    # add tod back to trips table
    tod_trips = trips_update_rpid.merge(start_stop_tod[['trip_id','tod']], how='left', on='trip_id')

    # for each TOD and R/H/D count trips per TOD, get number of unique RPIDs, and unique ServiceIDs
    tod_rpid = tod_trips.groupby(by=['route_id','trip_headsign','direction_id','tod']).agg({'tod':'count','route_pattern_id':'nunique','service_id':'nunique'})
    tod_rpid = tod_rpid.rename(columns = {'tod':'trips_per_tod'}).reset_index()

    # select just the trips that need to be updated
    tod_rpid_4update = tod_rpid[['route_id','trip_headsign','direction_id','tod','trips_per_tod']]
        # for R/H/D, get trips_per_tod separated into columns by TOD
    tod_rpid_pivot = tod_rpid_4update.pivot_table(index = ['route_id','trip_headsign','direction_id'], columns = ['tod'])

        # get just the trips where all 4 periods have less than 3 trips
    needs_update = tod_rpid_pivot['trips_per_tod'].reset_index().fillna(0).query('AM < 3 & MD < 3 & PM < 3 & NT < 3')

    # get the number of trips per R/D & rpid
    day_rpid = tod_trips.groupby(by=['route_id','direction_id','route_pattern_id']).agg({'trip_id':'nunique'})
    day_rpid = day_rpid.rename(columns = {'trip_id':'daily_trips'}).reset_index()

        # for each R/D, select just the route_pattern_id with the most daily trips
    max_rpid = day_rpid.groupby(by=['route_id','direction_id']).apply(lambda g: g[g['daily_trips'] == g['daily_trips'].max()])[['route_pattern_id','daily_trips']].reset_index()
    max_rpid = max_rpid[['route_id','direction_id','route_pattern_id']].rename(columns = {'route_pattern_id':'route_pattern_id_new'})

    # because there are several cases where the max is held by two different route_pattern_ids, chose one arbitrarily
    max_rpid = max_rpid.drop_duplicates(subset=['route_id','direction_id'])

    # update the route_pattern_ids for just R/H/D combos that have < 3 trips in each of the 4 TOD periods
    update_table = needs_update.merge(max_rpid, how='left', on=['route_id','direction_id'])

    update_table = update_table.drop_duplicates(subset=['route_id','direction_id','trip_headsign'])

    return(needs_update, update_table)

### Functions Part 3: Update Trip Attributes Based on Updated RPID

In [None]:
def update_attributes_new_rpid(trips_updated, update_rpid, orig_trips):
    """ update the trip attributes based on the new route pattern id

    Parameters
    -----------
    trips_updated : df
        output from update_rpid_max()
        gtfs trips.txt table with updated route pattern ids 
    
    update_rpid : df
        output from update_RHD_under_3()
        route id/trip headsign/direction id with < 3 trips per TOD period, assigned new route pattern id based on max daily trips per RHD

    Returns
    -------
    trips_update5 : df
        gtfs trip table with route pattern ids consolidated by:
            1. route id/trip headsign/direction id (RHD)
            2. if RHD has < 3 trips in each TOD period:
                - route_pattern_id with max daily trips within route_id/direction_id
            and updated attributes based on the new route pattern ids
    """
    trips_updated = trips_updated.merge(
        update_rpid[['route_id','direction_id','trip_headsign','route_pattern_id_new']], 
        how='left',on=['route_id','direction_id','trip_headsign'])

    trips_updated['route_pattern_id'] = np.where(
        ((trips_updated['route_pattern_id_new'].isna())), 
            trips_updated['route_pattern_id'], 
            trips_updated['route_pattern_id_new']
            )
    
    # update headsign for all trips with updated route_pattern_id
    rpid_th = trips_updated.groupby(by=['route_pattern_id','trip_headsign','block_id','shape_id','service_id']).agg({'trip_id':'count'}).reset_index()
    
    rpid_th_max = rpid_th.groupby(by=['route_pattern_id']).apply(lambda g: g[g['trip_id'] == g['trip_id'].max()]).reset_index(drop=True)
    
    rpid_th_max = rpid_th_max.drop_duplicates(subset='route_pattern_id')
    
    trips_update4 = trips_updated.merge(
        rpid_th_max[['route_pattern_id','trip_headsign','block_id','shape_id','service_id']], 
        how='left',on=['route_pattern_id'], suffixes=(None, '_to_rplce'))

    trips_update4['trip_headsign'] = trips_update4['trip_headsign_to_rplce']
    trips_update4['block_id'] = trips_update4['block_id_to_rplce']
    trips_update4['service_id'] = trips_update4['service_id_to_rplce']

    trips_update5 = trips_update4[orig_trips.columns].merge(
        orig_trips[['trip_id','route_pattern_id','trip_headsign']], 
        how='left', 
        on='trip_id', 
        suffixes=(None,'_old'))

    return(trips_update5)

### Functions Part 4: Stop Sequences & Stop Times
Create Generic Stop Times & Stop Sequence per Route Pattern

- Can't just change route_pattern_id as TransCAD does not use this field to combine trips into routes. There is no effect on the import.
- Working theory is that to consolidate routes one must use the stop_times.txt table as it defines the stop sequence for every trip. Theoretically, this is being used to consolidate trips into routes based on whether they have the same stop sequence.

Plan:
- Explore if stop times differ depending on TOD or if only realtime GTFS takes into account traffic.
    - For every route_pattern_id, get the average trip length (in minutes).
- For every route_pattern_id, get the average number of minutes between each pair of stops in the stop sequence.
- Replace the stop times and sequence for trips that had their route_pattern_id updated with the generic stop sequence and times per route_pattern_id created in the previous step. 
    - Keep the start time and work off of that.
    - Arrival time will equal departure time given that the difference is usually less than a minute. Will assume difference in time can be included in the minutes to next arrival time for aggregate modeling purposes.

In [None]:
def generic_stop_times_sequence(stop_times, trips):
    ''' select the most popular stop sequence per route pattern
    ****** THIS EDITS STOP_TIMES (e.g. the input file even though not returned)

    Parameters
    -----------
    stop_times_full : df
        gtfs stop_times.txt in df form
    trips : df
        gtfs trips.txt in df form

    Returns
    --------
    rpid_stop_dict : dict
        dictionary with keys = route pattern ids, values = df of trips with most popular stop sequence with that rpid
    '''
    # flag the first stop in every trip
    stop_times['first_stop'] =  0
    stop_times.loc[stop_times.groupby('trip_id').stop_sequence.idxmin(),'first_stop'] = 1

    rpid_stop_dict = {}
    # for each route_pattern_id calculate the most common stop pattern
    for rpid in trips['route_pattern_id'].unique(): # 489
        patterns = []
        rpid_trips = trips.query('route_pattern_id == @rpid')['trip_id'].to_list()
        rpid_stops = stop_times.query('@rpid_trips in trip_id').sort_values('stop_sequence')
        
        # for every trip, get stop pattern and save into a list of lists
        for tid in rpid_stops['trip_id']:
            patterns.append([tid,':'.join(rpid_stops.query('trip_id == @tid')['stop_id'].to_list())])
        
        # make data frame of every trip and its pattern of stops
        df = pd.DataFrame(patterns, columns = ['trip_id','stop_pattern'])
        # get most common stop_pattern
        df_grby = df.groupby('stop_pattern').count().reset_index()
        max_stop_pattern = df_grby.query('trip_id == trip_id.max()')['stop_pattern'].to_list()
        # get all the trip_ids that represent the route_pattern_id's most popular stop pattern
        rep_trips = df.query('stop_pattern == @max_stop_pattern[0]')['trip_id'].to_list()

        rpid_stop_dict[rpid] = rep_trips

    return(rpid_stop_dict)

In [None]:
def update_to_majority_stop_pattern(trips, rpid_stop_dict, stop_times):
    '''for trips without the most popular stop sequence pattern within its route pattern, update to the most popular stop sequence pattern 

    Parameters
    ----------
    trips : df
        gtfs trips.txt in df form
    rpid_stop_dict : dict
        output of generic_stop_times_sequence()
        dictionary with keys = route pattern ids, values = df of trips with most popular stop sequence with that rpid
    stop_times : df
        gtfs stop_times.txt in df form

    Returns
    -------
    trips_clean : df
        gtfs trips.txt in df form with updated shape_id based on new stop sequence pattern (majority within rpid)
    trip_stop_replace : dict
        dictionary, key = trip_id, values = trip_id of trip with closest start time to key within the stop sequence majority for the shared rpid

    '''
    # for every row where the route_pattern_id has been updated & doesn't have the majority stop pattern
    trip_stop_replace = {}
    trip_shape_replace = {}
    for idx, row in trips.iterrows():
        # identify trip_id
        tid = row['trip_id']
        # get the route_pattern_id for the trip
        rpid = row['route_pattern_id']

        if (tid not in rpid_stop_dict[rpid]): # if trip_id is not in the majority stop pattern
            # get all of the trips associated with that route_pattern_id (and stop sequence pattern)
            all_trips = rpid_stop_dict[rpid]

            # get the start time (for first stop) for trip_id
            start_time = stop_times.query('(trip_id == @tid) & (first_stop == 1)')
            # get the start time (for first stop) for all trips that share same route_pattern_id
            all_start_times = stop_times.query('(trip_id in @all_trips) & (first_stop == 1)')

            # create list getting the difference in start times between the selected trip and all the other trips that share the same route_pattern_id
            test_list = [[x,(abs(start_time['departure_time_sec']- all_start_times.query('trip_id == @x'))['departure_time_sec'].iloc[0])] for x in all_start_times['trip_id']]
            close = {}
            close = {sub[0]:sub[1] for sub in test_list}
                
            # get trip with the minimum difference in start time    
            min_t = min(close, key=close.get)
            if tid != min_t:
                trip_stop_replace[tid] = min_t
                trip_shape_replace[tid] = trips.query('trip_id == @min_t')['shape_id'].values[0]
        else:
            trip_shape_replace[tid] = trips.query('trip_id == @tid')['shape_id'].values[0]
    
    trip_shape_replace_tab = pd.DataFrame.from_dict(
        trip_shape_replace, orient='index').reset_index().rename(
            columns = {'index':'trip_id', 0:'shape_id_update'})
    
    trips['shape_id_new'] = trips['trip_id'].apply(lambda x: trip_shape_replace[x])

    trips['shape_id'] = trips['shape_id_new']

    # keep only relevant columns
    trips_clean = trips.drop(columns=['route_pattern_id_old','trip_headsign_old','shape_id_new'])

    return(trips_clean, trip_stop_replace)

In [None]:
def update_stop_times_new_pattern(stop_times, trip_stop_replace):
    ''' for trips with stop sequence pattern, update the arrival and departure times associated with each stop

    Parameters
    -----------
    stop_times : df
        gtfs stop_times.txt in df form
    trip_stop_replace : dict
        dictionary, key = trip_id, values = trip_id of trip with closest start time to key within the stop sequence majority for the shared rpid
    stop_times_columns : idx
        original columns of gtfs stop_times.txt
        
    Returns
    -------
    stop_times : df
        gtfs stop_times.txt in df form with updated stops times per trip 

    '''
    stop_times = stop_times.sort_values('stop_sequence')
    # for every trip (in dictionary with value = trip sharing route_pattern_id with closest start time)
    for tid in trip_stop_replace.values():
    # update value of time_between_stops for every stop in the selected trip
        # time between stops calculated by subtracting the prior departure_time_sec from current (why stop_sequence order is important)
        stop_times.loc[stop_times.loc[:,'trip_id'] == str(tid), 'time_between_stops'] = stop_times.loc[stop_times.loc[:,'trip_id'] == str(tid), 'departure_time_sec'].diff()

    for trip in trip_stop_replace.keys():
        start_time = stop_times.query('(trip_id == @trip) & (first_stop == 1)')['departure_time_sec']
        # drop old stop times
        stop_times = stop_times.drop(
            stop_times.loc[stop_times['trip_id']==trip].index)
        # grab new stop times
        new_trip = trip_stop_replace[trip]
        nst = stop_times.query('trip_id == @new_trip')
        nst['trip_id'] = trip

        # replace the start time, then calculate the stop times by the departure_time_sec difference
        nst.loc[nst.loc[:,'first_stop']==1,'departure_time_sec'] = int(start_time.iloc[0])
        nst.loc[nst.loc[:,'first_stop']==1,'time_between_stops'] = int(start_time.iloc[0])
        nst['departure_time_sec'] = nst['time_between_stops'].cumsum()

        # recalc arrival/dep times
        nst['arrival_time'] = pd.to_datetime(nst['departure_time_sec'],unit='s').astype('str').str[11:19]
        nst['departure_time'] = nst['arrival_time']

        #keep only relevant columns
        nst = nst[stop_times.columns]

        stop_times = pd.concat([stop_times,nst])
    # keep only relevant columns and sort
    stop_times = stop_times.sort_values(by=['trip_id','stop_sequence'])
    return(stop_times)

## Run Functions!

#### 1. Run Functions to...
- separate bus and non-bus trip
- get start and stop times for trips
- assign each trip a tod based on midpoint

In [None]:
non_bus_trips, bus_trips = separate_bus(gtfsfeeds_dfs.routes, gtfsfeeds_dfs.trips,3)

In [None]:
start_stop = get_start_stop_times(gtfsfeeds_dfs.stop_times) # simpson

In [None]:
start_stop_tod = assign_tod(start_stop) # smurf

In [None]:
trips, count_tod = assign_trips_tod(bus_trips, start_stop_tod)

#### 2. Run Functions to...
- get max daily trips per RHD/rpid
- update trips rpid with rpid with max daily trips (per RHD)
- if RHD has < 3 trips in each TOD period, update trips rpid with rpid with max daily trips (per Route_ID/Direction_ID) 
- update trip attributes based on updated rpid

In [None]:
max_trips_rpid = max_daily_trips(trips, start_stop_tod)

In [None]:
trips_update1 = update_rpid_max(bus_trips, max_trips_rpid, gtfsfeeds_dfs.trips.columns)

In [None]:
trips_need_update, updated_rpid = update_RHD_under_3(trips_update1, start_stop_tod)

In [None]:
trips_fully_updated = update_attributes_new_rpid(trips_update1, updated_rpid, bus_trips)

#### 3. Run Functions to...
- select the most popular stop sequence per RPID
- replace stop sequence for trips with minority stop sequence (with most popular sequence per RPID)
    - also update shape in trips table based on new stop sequence
- update the stop times associated with the new stop sequence patterns

In [None]:
rpid_stop_dict = generic_stop_times_sequence(gtfsfeeds_dfs.stop_times, trips_fully_updated)

In [None]:
trips, trip_stop_replace = update_to_majority_stop_pattern(trips_fully_updated, rpid_stop_dict, gtfsfeeds_dfs.stop_times)

In [None]:
stop_times = update_stop_times_new_pattern(gtfsfeeds_dfs.stop_times, trip_stop_replace)
gtfsfeeds_dfs.stop_times = stop_times.drop(columns=['first_stop'])

#### Clean Up & Export

In [None]:
# merge back the filtered out trips
gtfsfeeds_dfs.trips = pd.concat([trips, non_bus_trips])

In [None]:
gtfsfeeds_dfs.stop_times.to_csv(out_path+r"/stop_times.txt", index=False)
gtfsfeeds_dfs.trips.to_csv(out_path+r"/trips.txt", index=False)