Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Busiest day of the service is not exhaustive #71

Closed
praneethd7 opened this issue Mar 1, 2022 · 4 comments
Closed

Busiest day of the service is not exhaustive #71

praneethd7 opened this issue Mar 1, 2022 · 4 comments

Comments

@praneethd7
Copy link

  • partridge version:
    '1.1.1'
  • Python version:
    '3.10.2'
  • Operating System:
    'Windows 10'

Description

Describe what you were trying to get done.

  • I was trying to read the Portland Trimet Data (data here) using the partridge library and observe the busiest day of service(especially for buses)

Tell us what happened, what went wrong, and what you expected to happen.

  • The list of service_ids in the busiest day (output of read_busiest_date) was missing several service_id that were actually in service on that particular day.
  • I expected read_busiest_date to output all the service_id operational in that particular day but it only seems to include the service_id listed in the calendar_dates.txt. The GTFS documentation has the following description for calendar_dates.txt : "Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then calendar_dates.txt is required and must contain all dates of service."
  • Since the Portland Trimet Data has both calendar.txt and calendar_dates.txt, I believe the read_busiest_date is considering only exceptions in the service_id ignoring the regular service_id.
  • Perhaps, the reason for this could be that the Portland GTFS reports the regular service_ids in calendar.txt with 0 for Mon- Sun. However, this might lead to incorrect results while using read_busiest_date as the function excludes these regular service_ids with 0s. An example of this can be seen below.

What I Did

Example:

import partridge as ptg
ptg.read_busiest_date('gtfs_Portland_2022_feb1.zip')

Output: (datetime.date(2022, 1, 24), frozenset({'A.613', 'D.613', 'Q.613', 'W.613'}))

  • The busiest day reported here is 24th January, 2022 which is a Monday.
  • The following service_ids are missing in the output : [B.613,C.613,F.613,E.613,U.613,S.613]
  • The missing service_ids include both Light Rail & Bus route_type. For example B.613 (Light Rail) consists of the route 'MAX Red Line' that operates Monday-Friday & Weekends. This is perhaps most busiest line as it connects the Portland Airport. Also U.613 (Bus) consists of 48 routes. All the routes in this service_id can be seen [here].(http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/U.613)
@invisiblefunnel
Copy link
Contributor

The following service_ids are missing in the output : [B.613,C.613,F.613,E.613,U.613,S.613]

Thanks for sharing your findings @praneethd7. I took a look at the linked GTFS file and it matches the results from partridge.

% curl -L -o trimet.zip https://transitfeeds.com/p/trimet/43/20220201/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 18.1M  100 18.1M    0     0  16.6M      0  0:00:01  0:00:01 --:--:-- 31.9M
% unzip trimet.zip calendar.txt calendar_dates.txt 
Archive:  trimet.zip
  inflating: calendar.txt            
  inflating: calendar_dates.txt  
% cat calendar_dates.txt | grep 20220124
A.613,20220124,1
D.613,20220124,1
W.613,20220124,1
Q.613,20220124,1
% cat calendar.txt 
service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
B.613,0,0,0,0,0,0,0,20220123,20220205
B.614,0,0,0,0,0,0,0,20220206,20220514
C.613,0,0,0,0,0,0,0,20220123,20220205
C.614,0,0,0,0,0,0,0,20220206,20220514
A.613,0,0,0,0,0,0,0,20220123,20220205
A.614,0,0,0,0,0,0,0,20220206,20220514
F.613,0,0,0,0,0,0,0,20220123,20220205
F.614,0,0,0,0,0,0,0,20220206,20220514
E.613,0,0,0,0,0,0,0,20220123,20220205
E.614,0,0,0,0,0,0,0,20220206,20220514
D.613,0,0,0,0,0,0,0,20220123,20220205
D.614,0,0,0,0,0,0,0,20220206,20220514
W.613,0,0,0,0,0,0,0,20220123,20220205
W.614,0,0,0,0,0,0,0,20220206,20220514
Q.613,0,0,0,0,0,0,0,20220123,20220205
Q.614,0,0,0,0,0,0,0,20220206,20220514
U.613,0,0,0,0,0,0,0,20220123,20220205
U.614,0,0,0,0,0,0,0,20220206,20220514
S.613,0,0,0,0,0,0,0,20220123,20220205
S.614,0,0,0,0,0,0,0,20220206,20220514

The missing service_ids include both Light Rail & Bus route_type. For example B.613 (Light Rail) consists of the route 'MAX Red Line' that operates Monday-Friday & Weekends. This is perhaps most busiest line as it connects the Portland Airport. Also U.613 (Bus) consists of 48 routes. All the routes in this service_id can be seen [here].(http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/U.613)

Take a look at http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/A.613 and this gist showing that on 20220124 trips for the MAX Red Line are covered by service_id A.613.

@praneethd7
Copy link
Author

Thank you @invisiblefunnel for the quick response and gist. I had an incorrect notion that service_ids in calendars.txt must be operational on all days between the start_date and end_date. Also I was unde the impression that two service_id have no overlap of routes. After your response, I realized that despite 24th January, 2022 missing service_ids : [B.613,C.613,F.613,E.613,U.613,S.613] , the routes in these service_ids are covered by ['A.613', 'D.613', 'Q.613', 'W.613'] (reported by partridge). However, I am still missing how the busiest day is actually reported. Is it the day with the maximum number of service_id in the output of _service_ids_by_date()?

@invisiblefunnel
Copy link
Contributor

However, I am still missing how the busiest day is actually reported. Is it the day with the maximum number of service_id in the output of _service_ids_by_date()?

Partridge uses the number of trips to approximate busyness. The earliest date is returned if multiple dates have the same number of trips.

def read_busiest_date(path: str) -> Tuple[datetime.date, FrozenSet[str]]:
"""Find the earliest date with the most trips"""
feed = load_raw_feed(path)
return _busiest_date(feed)

def _busiest_date(feed: Feed) -> Tuple[datetime.date, FrozenSet[str]]:
service_ids_by_date = _service_ids_by_date(feed)
trip_counts_by_date = _trip_counts_by_date(feed)
def max_by(kv: Tuple[datetime.date, int]) -> Tuple[int, int]:
date, count = kv
return count, -date.toordinal()
date, _ = max(trip_counts_by_date.items(), key=max_by)
service_ids = service_ids_by_date[date]
return date, service_ids

def _trip_counts_by_date(feed: Feed) -> Dict[datetime.date, int]:
results: DefaultDict[datetime.date, int] = defaultdict(int)
trips = feed.trips
for service_ids, dates in _dates_by_service_ids(feed).items():
trip_count = trips[trips.service_id.isin(service_ids)].shape[0]
for date in dates:
results[date] += trip_count
return dict(results)

@praneethd7
Copy link
Author

That makes sense! Thank you so much @invisiblefunnel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants