Busiest day of the service is not exhaustive #71

praneethd7 · 2022-03-01T20:55:42Z

partridge version:
'1.1.1'
Python version:
'3.10.2'
Operating System:
'Windows 10'

Description

Describe what you were trying to get done.

I was trying to read the Portland Trimet Data (data here) using the partridge library and observe the busiest day of service(especially for buses)

Tell us what happened, what went wrong, and what you expected to happen.

The list of service_ids in the busiest day (output of read_busiest_date) was missing several service_id that were actually in service on that particular day.
I expected read_busiest_date to output all the service_id operational in that particular day but it only seems to include the service_id listed in the calendar_dates.txt. The GTFS documentation has the following description for calendar_dates.txt : "Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then calendar_dates.txt is required and must contain all dates of service."
Since the Portland Trimet Data has both calendar.txt and calendar_dates.txt, I believe the read_busiest_date is considering only exceptions in the service_id ignoring the regular service_id.
Perhaps, the reason for this could be that the Portland GTFS reports the regular service_ids in calendar.txt with 0 for Mon- Sun. However, this might lead to incorrect results while using read_busiest_date as the function excludes these regular service_ids with 0s. An example of this can be seen below.

What I Did

Example:

import partridge as ptg
ptg.read_busiest_date('gtfs_Portland_2022_feb1.zip')

Output: (datetime.date(2022, 1, 24), frozenset({'A.613', 'D.613', 'Q.613', 'W.613'}))

The busiest day reported here is 24th January, 2022 which is a Monday.
The following service_ids are missing in the output : [B.613,C.613,F.613,E.613,U.613,S.613]
The missing service_ids include both Light Rail & Bus route_type. For example B.613 (Light Rail) consists of the route 'MAX Red Line' that operates Monday-Friday & Weekends. This is perhaps most busiest line as it connects the Portland Airport. Also U.613 (Bus) consists of 48 routes. All the routes in this service_id can be seen [here].(http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/U.613)

The text was updated successfully, but these errors were encountered:

invisiblefunnel · 2022-03-02T00:55:00Z

The following service_ids are missing in the output : [B.613,C.613,F.613,E.613,U.613,S.613]

Thanks for sharing your findings @praneethd7. I took a look at the linked GTFS file and it matches the results from partridge.

% curl -L -o trimet.zip https://transitfeeds.com/p/trimet/43/20220201/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 18.1M  100 18.1M    0     0  16.6M      0  0:00:01  0:00:01 --:--:-- 31.9M
% unzip trimet.zip calendar.txt calendar_dates.txt 
Archive:  trimet.zip
  inflating: calendar.txt            
  inflating: calendar_dates.txt  
% cat calendar_dates.txt | grep 20220124
A.613,20220124,1
D.613,20220124,1
W.613,20220124,1
Q.613,20220124,1
% cat calendar.txt 
service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
B.613,0,0,0,0,0,0,0,20220123,20220205
B.614,0,0,0,0,0,0,0,20220206,20220514
C.613,0,0,0,0,0,0,0,20220123,20220205
C.614,0,0,0,0,0,0,0,20220206,20220514
A.613,0,0,0,0,0,0,0,20220123,20220205
A.614,0,0,0,0,0,0,0,20220206,20220514
F.613,0,0,0,0,0,0,0,20220123,20220205
F.614,0,0,0,0,0,0,0,20220206,20220514
E.613,0,0,0,0,0,0,0,20220123,20220205
E.614,0,0,0,0,0,0,0,20220206,20220514
D.613,0,0,0,0,0,0,0,20220123,20220205
D.614,0,0,0,0,0,0,0,20220206,20220514
W.613,0,0,0,0,0,0,0,20220123,20220205
W.614,0,0,0,0,0,0,0,20220206,20220514
Q.613,0,0,0,0,0,0,0,20220123,20220205
Q.614,0,0,0,0,0,0,0,20220206,20220514
U.613,0,0,0,0,0,0,0,20220123,20220205
U.614,0,0,0,0,0,0,0,20220206,20220514
S.613,0,0,0,0,0,0,0,20220123,20220205
S.614,0,0,0,0,0,0,0,20220206,20220514

The missing service_ids include both Light Rail & Bus route_type. For example B.613 (Light Rail) consists of the route 'MAX Red Line' that operates Monday-Friday & Weekends. This is perhaps most busiest line as it connects the Portland Airport. Also U.613 (Bus) consists of 48 routes. All the routes in this service_id can be seen [here].(http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/U.613)

Take a look at http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/A.613 and this gist showing that on 20220124 trips for the MAX Red Line are covered by service_id A.613.

praneethd7 · 2022-03-02T19:41:50Z

Thank you @invisiblefunnel for the quick response and gist. I had an incorrect notion that service_ids in calendars.txt must be operational on all days between the start_date and end_date. Also I was unde the impression that two service_id have no overlap of routes. After your response, I realized that despite 24th January, 2022 missing service_ids : [B.613,C.613,F.613,E.613,U.613,S.613] , the routes in these service_ids are covered by ['A.613', 'D.613', 'Q.613', 'W.613'] (reported by partridge). However, I am still missing how the busiest day is actually reported. Is it the day with the maximum number of service_id in the output of _service_ids_by_date()?

invisiblefunnel · 2022-03-02T22:53:53Z

However, I am still missing how the busiest day is actually reported. Is it the day with the maximum number of service_id in the output of _service_ids_by_date()?

Partridge uses the number of trips to approximate busyness. The earliest date is returned if multiple dates have the same number of trips.

partridge/partridge/readers.py

Lines 57 to 60 in df3167e

    
           def read_busiest_date(path: str) -> Tuple[datetime.date, FrozenSet[str]]: 
        
               """Find the earliest date with the most trips""" 
        
               feed = load_raw_feed(path) 
        
               return _busiest_date(feed)

partridge/partridge/readers.py

Lines 117 to 128 in df3167e

    
           def _busiest_date(feed: Feed) -> Tuple[datetime.date, FrozenSet[str]]: 
        
               service_ids_by_date = _service_ids_by_date(feed) 
        
               trip_counts_by_date = _trip_counts_by_date(feed) 
        
               def max_by(kv: Tuple[datetime.date, int]) -> Tuple[int, int]: 
        
                   date, count = kv 
        
                   return count, -date.toordinal() 
        
               date, _ = max(trip_counts_by_date.items(), key=max_by) 
        
               service_ids = service_ids_by_date[date] 
        
               return date, service_ids

partridge/partridge/readers.py

Lines 222 to 229 in df3167e

    
           def _trip_counts_by_date(feed: Feed) -> Dict[datetime.date, int]: 
        
               results: DefaultDict[datetime.date, int] = defaultdict(int) 
        
               trips = feed.trips 
        
               for service_ids, dates in _dates_by_service_ids(feed).items(): 
        
                   trip_count = trips[trips.service_id.isin(service_ids)].shape[0] 
        
                   for date in dates: 
        
                       results[date] += trip_count 
        
               return dict(results)

praneethd7 · 2022-03-03T01:21:55Z

That makes sense! Thank you so much @invisiblefunnel!

praneethd7 closed this as completed Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Busiest day of the service is not exhaustive #71

Busiest day of the service is not exhaustive #71

praneethd7 commented Mar 1, 2022

invisiblefunnel commented Mar 2, 2022

praneethd7 commented Mar 2, 2022

invisiblefunnel commented Mar 2, 2022

praneethd7 commented Mar 3, 2022

Busiest day of the service is not exhaustive #71

Busiest day of the service is not exhaustive #71

Comments

praneethd7 commented Mar 1, 2022

Description

What I Did

invisiblefunnel commented Mar 2, 2022

praneethd7 commented Mar 2, 2022

invisiblefunnel commented Mar 2, 2022

praneethd7 commented Mar 3, 2022