-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTFS-Occupancies #240
GTFS-Occupancies #240
Conversation
Adds the files of MobilityData's GTFS-MeanOccupancies proposal to the GTFS Static reference.
+ add precision in mean_occupancy_status
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
This is again so over engineered, that I wonder which producer stated that they could actually make it in this way? What was wrong by keeping this simple? We are talking about a prediction. Something simple like trip_id,date,stop_sequence,stop_id,occupancy_status. |
Maybe my concerns are unfounded but I'm a little worried about this as a concept. So far everything we have in GTFS is either planned or an actual real-time status. This adds another category of historical-based averages. Of course, historical performance does not equal future performance, particularly as conditions change in the real world (whether that be possible pandemics, changing seasons, special events, cyclical school calendars). Is there any guidance to how far back the mean should be calculated? Should it be updated using data from the past week, the past month, all time, the same trip last year, etc? At what point is an average of many instances not representative of any particular future one real-world instance? I'm mostly concerned that if we get into standardizing a format to allow predictions of something based on past performance but not real-time information and don't standardize the calculation of that data, it will all be unreliable because quality and meaning will vary from one producer to another. For what it's worth, I had the same concerns about #233 but was planning to wait and let the discussing play out (if it did) before commenting. Sharing here because it's actually a PR. |
From September 2020 onward we have predictions for at least the next 3-7 days available as predicted data for The Netherlands. So a journey planner might steer commuters to a different trip under COVID-19 flatting the curve concerns. I think 3 days is the least usable, and we are likely to use this in our GTFS-RT integration. So the quality will be different from operator to operator. And I do hope they will take into account their tap-in-tap-out stuff, including a calendar of nearby events. |
Hi @skinkie
A simple approach like the one you mentioned is possible with the current PR. The main difference being a requirement to provide a
I imagine you are referencing the per-day and per-coach features of this proposal, which I admittedly presume are demanding features in terms of primary input to produce data for this spec. The feasibility of gathering such primary data is a question that MobilityData would love some insight on from producers/agencies! I believe Cal-ITP (@e-lo) are undertaking a related market sounding that could be useful here. Similarly, the point raised by @stevenmwhite is an important one worth discussing with producers:
It would be good to know what producer's historical trend data requirements and calculations look like (as @skinkie has done 👍), and the limits/guidelines the community could establish around producing a spec like this. |
@googlebot I consent. |
LA Metro and Transit just launched this as a rider-facing feature. @gcamp - did you guys introduce a GTFS-like extension with Metro or are you getting data from them in a more proprietary format? |
@stevenmwhite For our data exchange with LA Metro it's a proprietary format, but we do use an internal GTFS extension for our own use. That format was shared with MobilityData and heavily influenced the current proposition. |
I may complicate the discussion here slightly, but at Transport for NSW we have just developed occupancy predictions for trains based off past performance, that also updates in realtime. For example a late running vehicle is likely to accumulate more passengers. Where would this kind of occupancy belong? It’s not a real-time actual like our weight bearing sensors that we disseminate using the vehpos feed, but we currently don’t have these sensors on all trains so are looking to fill in the gaps on the network. Possible example:
This would then be overridden by vehpos information where it's available. |
@Dave-TfNSW I like how you make explicitly this is a prediction, but wouldn't it be better if we would have something like a occupancy_source enumeration, such as "historic, counter, ..." |
At CitySwift was are predicting occupancy for a number of the bus companies that we work for (e.g. https://when2travel.co/ ) |
The feedback on this PR and through other channels has been very informative to the drafting process. Some updates:
I invite everyone to review and compare Option A and Option B so we can work towards a final proposal. As always, you can comment and make suggestions directly in the proposal document, or you can continue to express your needs and concerns in this thread. Thanks! |
@scmcca I'd like to echo the call by @skinkie for information about the source data used to make the predictions. We expect a huge difference in prediction quality between models based on:
To a first approximation, these types of predictions are as different as GTFS schedules and GTFS-Realtime TripUpdates! But the current proposal appears to treat them as identical. I don't think we should expect producers to specify all model inputs via enum: Is it safe to assume that if an agency's VehiclePositions feed includes realtime occupancy, then OccupancyPredictions uses realtime occupancy to make predictions? If so, we don't have to do anything. If not, we should add a boolean like occupancy_is_realtime. |
@Dave-TfNSW I think the proposal to add occupancy predictions in realtime using historical data is a valuable, but different use case from describing usual or expected occupancies at the scheduled @caywood @skinkie On having information on the source data: this is important information but hard to standardize across all methodologies. I think a limited number of enums could be of value (such as 0: Historical average, 1: Modeled predictions) for consumers to rapidly know what they are dealing with. Beyond that, more in-depth clarification would seem better communicated across other side channels between parties. I've added Option C to the working proposal document and have updated this PR. Option C takes the form of Option B, initially proposed by @skinkie, but aims to express forecasted dates more efficiently for both frequent updates, and longer-term (i.e., seasonal) updates. It accomplishes this by adding fields for every day of the week similar to Also added as per conversations above, is a What do we think about the proposed weekly pattern method vs the individual date method, and the description of the data source? I think we are close to reaching something final here! Thanks. |
We'll probably need to cover off handling frequency-based trips defined by |
PR update to Option C as appears in the GTFS-Occupancies proposal working document, replacing Option A.
Minor editorial corrections to syntax of some text.
Coming to this discussion late but was wondering why was adding occupancy_status directly to stop_times not considered ? In this way you are not having to replicate the whole service/calendar mechanism in occupancies and it would be far easier for existing consumers to implement. You also don't run into any frequencies issues as mentioned above. And secondly regarding occupancy_percentage discussions it would seem to me that if it is being added here it should probably also be added alongside departure_trip_occupancy in TripUpdates. In other words there should be some synergy between these two (VehiclePosition on the other hand which has both occupancy_status and occupancy_percentage is different as occupancy is measurable and an exact percentage makes sense) |
Because that assumes that the occupancy is equal for every trip regardless of the day. |
And when it's not, there is always the option of defining new services. In general would you not expect that if trips are using the same timings across multiple days that occupancies would also generally align (within the scope of how approximate occupancy estimations are likely to be) |
If you maintain a feed with any significant size and provide real-time data for it, you will like normalisation at some point. And for weekday trips, it is considered pretty stable in The Netherlands. |
This reverts commit b23784c.
| ----- | ----- | ----- | ----- | | ||
| `trip_id` | ID referencing `stop_times.trip_id` | **Required** | Identifies a `trip_id` for which an occupancy level is described. | | ||
| `stop_sequence` | ID referencing `stop_times.stop_sequence` | Optional | Identifies a `stop_sequence` along `occupancies.trip_id` for which an occupancy level is described.<br><br>Defined values in `occupancies.stop_sequence` will apply to subsequent `stop_times.stop_sequence` that are not defined in `occupancies.stop_sequence` for the same `trip_id`. | | ||
| `occupancy_status` | Enum | **Required** | Indicates the state of in-vehicle occupancy. This field refers to the GTFS Realtime [`OccupancyStatus`](http://gtfs.org/reference/realtime/v2/#enum-occupancystatus) enums. Valid options are:<br><br> `0` - **Empty**. The vehicle is considered empty by most measures, and has few or no passengers onboard, but is still accepting passengers. <br> `1` - **Many seats available**. The vehicle has a large percentage of seats available. What percentage of free seats out of the total seats available is to be considered large enough to fall into this category is determined at the discretion of the producer. <br> `2` - **Few seats available**. The vehicle has a small percentage of seats available. What percentage of free seats out of the total seats available is to be considered small enough to fall into this category is determined at the discretion of the producer. <br> `3` - **Standing room only**. The vehicle can currently accommodate only standing passengers. <br> `4` - **Crushed standing room only**. The vehicle can currently accommodate only standing passengers and has limited space for them. <br> `5` - **Full**. The vehicle is considered full by most measures, but may still be allowing passengers to board. <br> `6` - **Not accepting passengers**. The vehicle can not accept passengers. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those who wish to provide objective historical values without the subjective categorizations, could we make this conditionally required with an option to use occupancy_percentage instead?
| `occupancy_status` | Enum | **Required** | Indicates the state of in-vehicle occupancy. This field refers to the GTFS Realtime [`OccupancyStatus`](http://gtfs.org/reference/realtime/v2/#enum-occupancystatus) enums. Valid options are:<br><br> `0` - **Empty**. The vehicle is considered empty by most measures, and has few or no passengers onboard, but is still accepting passengers. <br> `1` - **Many seats available**. The vehicle has a large percentage of seats available. What percentage of free seats out of the total seats available is to be considered large enough to fall into this category is determined at the discretion of the producer. <br> `2` - **Few seats available**. The vehicle has a small percentage of seats available. What percentage of free seats out of the total seats available is to be considered small enough to fall into this category is determined at the discretion of the producer. <br> `3` - **Standing room only**. The vehicle can currently accommodate only standing passengers. <br> `4` - **Crushed standing room only**. The vehicle can currently accommodate only standing passengers and has limited space for them. <br> `5` - **Full**. The vehicle is considered full by most measures, but may still be allowing passengers to board. <br> `6` - **Not accepting passengers**. The vehicle can not accept passengers. | | |
| `occupancy_status` | Enum | ** Conditionally Required** | Indicates the state of in-vehicle occupancy. This field refers to the GTFS Realtime [`OccupancyStatus`](http://gtfs.org/reference/realtime/v2/#enum-occupancystatus) enums. Valid options are:<br><br> `0` - **Empty**. The vehicle is considered empty by most measures, and has few or no passengers onboard, but is still accepting passengers. <br> `1` - **Many seats available**. The vehicle has a large percentage of seats available. What percentage of free seats out of the total seats available is to be considered large enough to fall into this category is determined at the discretion of the producer. <br> `2` - **Few seats available**. The vehicle has a small percentage of seats available. What percentage of free seats out of the total seats available is to be considered small enough to fall into this category is determined at the discretion of the producer. <br> `3` - **Standing room only**. The vehicle can currently accommodate only standing passengers. <br> `4` - **Crushed standing room only**. The vehicle can currently accommodate only standing passengers and has limited space for them. <br> `5` - **Full**. The vehicle is considered full by most measures, but may still be allowing passengers to board. <br> `6` - **Not accepting passengers**. The vehicle can not accept passengers. <br><br>Conditionally Required:<br>- **Required** if `occupancies.occupancy_percentage` is empty.<br>- Optional| | |
| `occupancy_percentage` | Float | **Required** | Indicates the typical occupancy percentage. The value 100 should represent total the maximum occupancy the vehicle was designed for, including both seating and standing capacity, and current operating regulations allow. It's not impossible that the value goes over 100 if there are currently more passenger than the vehicle was designed for. The precision of precision should be low enough that you can't track a single person boarding and alighting for privacy reasons. <br><br>Conditionally Required:<br>- **Required** if `occupancies.occupancy_status` is empty.<br>- Optional| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
occupancies.txt
is contingent on getting either occupancy_status
or occupancy_percentage
officially adopted in GTFS Realtime for consistency. Whichever method of describing occupancies (possibly both) that becomes official will be represented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if one is officially adopted, I'd still request that both options be available in the static version. (Or, at the very least, we may choose to release both with one being treated as an experimental field.)
@@ -406,6 +406,7 @@ Methods for occupancy forecasting used to populated `occupancies.txt` are not st | |||
| `sunday` | Enum| **Required** | Functions in the same way as `occupancies.monday` except applies to Sundays. | | |||
| `start_date` | Date | **Required** | Start date of the date interval that the occupancy level is valid.<br><br>To define single dates, `start_date` and `end_date` may be the same. | | |||
| `end_date` | Date | **Required** | End date of the date interval that the occupancy level is valid.<br><br>To define single dates, `start_date` and `end_date` may be the same. | | |||
| `start_time` | Time | **Conditionally Required** | First stop departure time for a given vehicle on a trip using `frequencies.txt`.<br><br>Must be some multiple (including zero) of `frequencies.headway_secs` plus `frequencies.start_time` for the corresponding time period.<br><br>Conditionally Required:<br>- **Required** for trips using `frequencies.txt`.<br>- **Forbidden** otherwise. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get very complex. But do we want to do something like start_time end_time so for frequency based trips a producer could do occupancy time bands? Self-note to my future self: I just shot myself in the foot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something, but isn't start_time
enough to identify individual "vehicle journeys" on a frequency-based trip, and then define different occupancies per stop_sequence
, per vehicle journey? This seems to accomplish "occupancy time bands".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scmcca in frequency based operation only exact times will have some useful meaning for start_time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be what I am missing. At the risk of sounding misinformed, why would it not work for exact_times=0
? Can't frequencies.start_time
be used to estimate vehicle journeys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that part is only possible heuristically, hence clamp to the nearest start_time. If that route is suggested that must be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @skinkie that specifying a start and end time to define a window is a more natural solution for exact_times=0 trips.
For exact_times=1, you have specific trips instances that are defined in static GTFS where you can say "A bus is scheduled to arrive in 5 minutes", so start_time unambigously maps to a single trip instance.
For exact_times=0 trips, static GTFS is just defining an expected frequency (not trip instances at specific times), so (without real-time data) you can only say "Vehicles are expected to arrive every 15 minutes" during this time window. The exact time that these trip instances are expected to occur drifts throughout the day depending on the previous trip instances, so defining just a start_time may get ambiguous. We would need defined heuristics in the spec of how specific cases would be handled.
Co-authored-by: Jeff Kessler <31969870+jeffkessler-keolis@users.noreply.github.com>
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@googlebot I fixed it. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@googlebot I fixed it. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@dbabramov @scmcca merged by accident? Did a miss a vote? |
@gcamp our benevolent dictators for life wanted to have this feature. And you see the current github workflow allows that. I don't mind, now we can exchange this info ;) |
Can't we do a vote if this would need to be reverted ;-) |
@skinkie While I like your enthusiasm, we still have to resolve officializing an occupancy indicator in GTFS Realtime, as well as how to handle frequency-based static occupancies. Indeed, the merging of this PR was a mistake. It has been reverted via a force-push. Unfortunately, a PR cannot be unmerged. A new one will be opened in place. |
Due to GitHub workflow and CLA issues, the PR had to be redone from a new branch. Please continue the discussion at #290. Thanks! |
Proposal
Given the recent momentum behind describing vehicle crowdedness in GTFS, we (MobilityData IO) are proposing GTFS-Occupancies (fka GTFS-OccupancyPredictions, fka GTFS-MeanOccuancies) as a static method for describing usual or predicted occupancies at the
trip
orstop_times
level.As occupancies can currently be described in GTFS Realtime using
occupancy_status
andoccupancy_percentage
, GTFS-Occupancies aims to complement the availability of crowdedness information by providing static predictions for future trips based on historical trends, which can help riders plan trips based on their crowdedness preferences and comfortability.The most up-to-date proposal is contained in this PR at the "Files changed" tab. Don't hesitate to voice your needs or concerns that can improve this proposal.
Thanks!
Edits