Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify definition of "cardinality" #19

Closed
wants to merge 1 commit into from
Closed

Conversation

barbeau
Copy link
Collaborator

@barbeau barbeau commented Feb 11, 2016

"Cardinality" as currently defined in the spec documentation is Protocol Buffer cardinality, and not semantic cardinality. This patch clarifies that definition in the field title. See https://groups.google.com/d/msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ for related past discussions.

This proposal is announced on GTFS-realtime list at https://groups.google.com/forum/#!topic/gtfs-realtime/JU5I0HRlzpE.

"Cardinality" as currently defined in the spec documentation is Protocol Buffer cardinality, and not semantic cardinality.  This patch clarifies that definition in the field title.  See https://groups.google.com/d/msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ.
@bboissin
Copy link
Contributor

It's a bit long, maybe Field Cardinality instead? (still not a fan)

@barbeau
Copy link
Collaborator Author

barbeau commented Feb 11, 2016

@bboissin I originally had "Protobuf Cardinality" but then changed to the full text. I can change back to this if it's preferred. Or maybe "PB Cardinality", defining "PB" above? Also open to other ideas. I think it's important to make clear that this is an implementation-specific cardinality, so I'd prefer to call out protobufs specifically.

@magdalar
Copy link
Contributor

Honestly, I think we should just replace that column's contents with the semantic cardinality instead (which we also need to document in the .proto anywhere it's unclear).

@barbeau
Copy link
Collaborator Author

barbeau commented Feb 15, 2016

@magdalar I agree - this PR was a baby-step in that direction, but I'm fine with directly tackling semantic cardinality if you'd prefer. I think there will be some debate, though, on semantic cardinality (for example #20 (comment)).

Do you want me to go ahead and propose the semantic cardinality in this PR?

@magdalar
Copy link
Contributor

@barbeau I think that would be the much more useful discussion, even if it might end up being a bit hairier to tackle :)

@barbeau
Copy link
Collaborator Author

barbeau commented Feb 19, 2016

@magdalar Two questions on semantic cardinality, using #20 (comment) and stop_sequence as an example:

  1. What should the allowed values for semantic cardinality be? I'm thinking Required, Conditionally Required, and Optional.
  2. If the language in the spec is should instead of must, is the semantic cardinality Conditionally Required?

So, would stop_sequence be Conditionally Required? It's required if a trip is a loop and the same stop_id appears more than once in the trip, and (IMHO) if the vendor is capable of providing that information (stop_sequence is only should instead of must in the spec because some vendors aren't capable of providing the info - but that doesn't mean it should be optional for everyone). My concern with listing stop_sequence as Optional in the case of should is that many implementers will just scan down the column and skip it if it's not shown as required in some manner.

So, my opinion for how the spec language should be mapped to semantic cardinality:

  • must = Required
  • should = Conditionally Required (required if the provider has the capability, and based on any other special case)
  • may = Optional

@magdalar
Copy link
Contributor

@barbeau Apologies for the delay -- somehow I lost track of this!

  1. Yes, those seem good.
  2. Hmm, I think "should" refers to "optional" cases, and "conditionally required" might be "if X, Y must..."

The other potential difference however is supporting backwards compatibility. Given at least the current state of affairs, it's difficult to add new requirements and enforce them. I'd like to change that status quo, however, because it ties our hands. My comment from pull #20 was largely addressing the backwards-compatible nature, rather than the desired intent.

I could also quite easily be convinced to your own proposed terminology. :)

@barbeau
Copy link
Collaborator Author

barbeau commented Feb 29, 2016

Hmm, I think "should" refers to "optional" cases, and "conditionally required" might be "if X, Y must..."

This was my original thought, and in a vacuum it makes sense, but the problem is that it doesn't seem to hold up in the wild. stop_sequence with repeated stops in trip is a good example - we are now showing this as should in the spec, and if we map this to *optional", then producers really don't need to include it. There are clear cases where this ends up producing data is impossible for consumers to handle correctly (e.g., when there is only a single prediction in the trip), and it puts a large burden on the consumer to try and deal with this bad data. Our work-around in OneBusAway to deal with this one data problem and avoid showing wrong information to riders is over 1700 lines of code, and we're still not covering all potential cases where something can go wrong (e.g., trips with a loop in the middle, although we don't believe these will occur with the current feed we're working with).

However, I'm also not sure I like conditionally required being mapped to should in the spec, as that could force us to switch a lot of fields that should be required to conditionally required depending on implementation limitations. And we're trying to get away from implementation-specific (protobuf) cardinality with this proposal. I'd like to see the semantic cardinality reflect what should logically be contained in a feed.

A stronger stance to change the status quo could be to start using gtfs_realtime_version attribute - we can bump this to v2.0 and start enforcing the semantic cardinality we determine here in validators, etc. That would allow us to label certain fields as required and conditionally required in the spec and say that they fail in validators to demonstrate what should be contained in a feed, but they could still be consumed if the consumer chooses to ignore or work around certain failures. Given this design, stop_sequence with repeated stops in trip would then be listed as conditionally required, and validators would fail a producer if stop_sequence wasn't included for these stops/trips.

One question with this design is whether we change should to must in the language for the spec for fields like stop_sequence with repeated stops in trip. We could keep it as should to acknowledge that there are certain implementation-specific limitations here, or we could change to must to match the intent of the semantic cardinality.

@magdalar
Copy link
Contributor

magdalar commented Mar 3, 2016

Alright, I'm convinced by your terminology choices :)

On Mon, Feb 29, 2016 at 4:27 PM, Sean Barbeau notifications@github.com
wrote:

Hmm, I think "should" refers to "optional" cases, and "conditionally
required" might be "if X, Y must..."

This was my original thought, and in a vacuum it makes sense, but the
problem is that it doesn't seem to hold up in the wild. stop_sequence
with repeated stops in trip is a good example - we are now showing this as
should in the spec, and if we map this to *optional", then producers
really don't need to include it. There are clear cases where this ends up
producing data is impossible for consumers to handle correctly (e.g., when
there is only a single prediction in the trip), and it puts a large burden
on the consumer to try and deal with this bad data. Our work-around in
OneBusAway
OneBusAway/onebusaway-application-modules#166
to deal with this one data problem and avoid showing wrong information to
riders is over 1700 lines of code, and we're still not covering all
potential cases where something can go wrong (e.g., trips with a loop in
the middle, although we don't believe these will occur with the current f
eed we'r e working with).

However, I'm also not sure I like conditionally required being mapped
to should in the spec, as that could force us to switch a lot of fields
that should be required to conditionally required depending on
implementation limitations. And we're trying to get away from
implementation-specific (protobuf) cardinality with this proposal. I'd like
to see the semantic cardinality reflect what should logically be contained
in a feed.

A stronger stance to change the status quo could be to start using
gtfs_realtime_version attribute - we can bump this to v2.0 and start
enforcing the semantic cardinality we determine here in validators, etc.
That would allow us to label certain fields as required and conditionally
required
in the spec and say that they fail in validators to demonstrate
what should be contained in a feed, but they could still be consumed if the
consumer chooses to ignore or work around certain failures. Given this
design, stop_sequence with repeated stops in trip would then be listed as conditionally
required
, and validators would fail a producer if stop_sequence wasn't
included for these stops/trips.

One question with this design is whether we change should to must in
the language for the spec for fields like stop_sequence with repeated
stops in trip. We could keep it as should to acknowledge that there are
certain implementation-specific limitations here, or we could change to
must to match the intent of the semantic cardinality.


Reply to this email directly or view it on GitHub
#19 (comment).

@barbeau
Copy link
Collaborator Author

barbeau commented Mar 8, 2016

@magdalar Are you ok then with bumping gtfs_realtime_version attribute and starting to fail feeds for v2.0 that don't include required/conditionally required fields?

@magdalar
Copy link
Contributor

@barbeau I don't think that's a call we can make directly, but I think it's worth proposing alongside the rest of the proposal. I would recommend proceeding as if it were the case though, as a specification without strong wording or guarantees is not particularly useful in practice :)

@magdalar
Copy link
Contributor

(So, yes.)

@egorich239
Copy link
Contributor

What is the status of the proposal?

@barbeau
Copy link
Collaborator Author

barbeau commented Aug 22, 2016

@egorich239 I still intend to tackle semantic cardinality in this proposal, although I've had to focus on a few other things over the last few months. I plan to be able to spend more time on this in the next few weeks.

@barbeau
Copy link
Collaborator Author

barbeau commented Feb 3, 2017

Given the age and focus of this PR, I'm going to close it out and open a new PR focused primarily on semantic cardinality. This will make the discussion easier to follow for those new to the topic.

@barbeau barbeau closed this Feb 3, 2017
barbeau added a commit to barbeau/transit that referenced this pull request Jun 28, 2017
* As originally discussed in google#19 and https://groups.google.com/forum/#!msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ, the current GTFS-realtime spec documentation describes field Protocol Buffer cardinality, not semantic cardinality.  This has created confusion for consumers and producers where fields have been omitted based on them being labeled as "optional" in the GTFS-realtime spec, even if they were required under certain logical transit conditions (e.g., stop_sequence must be provided if a trip contains a loop).
* This patch changes the "Cardinality" documentation to define the semantic cardinality for each data element as "required", "conditionally required", and "optional".  It also bumps the gtfs_realtime_version in the .proto file to 2.0 so validators can strictly enforce semantic cardinality based on the gtfs_realtime_version.
RachM pushed a commit that referenced this pull request Aug 9, 2017
* Define semantic cardinality for GTFS-realtime fields

* As originally discussed in #19 and https://groups.google.com/forum/#!msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ, the current GTFS-realtime spec documentation describes field Protocol Buffer cardinality, not semantic cardinality.  This has created confusion for consumers and producers where fields have been omitted based on them being labeled as "optional" in the GTFS-realtime spec, even if they were required under certain logical transit conditions (e.g., stop_sequence must be provided if a trip contains a loop).
* This patch changes the "Cardinality" documentation to define the semantic cardinality for each data element as "required", "conditionally required", and "optional".  It also bumps the gtfs_realtime_version in the .proto file to 2.0 so validators can strictly enforce semantic cardinality based on the gtfs_realtime_version.

* Make trip.stop_time_update conditionally required based on trip.schedule_relationship

Addresses #64 (review)

* Fix conditional states for stop_time_update.arrival/departure

...based on stop_time_update.schedule_relationship.  See https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md#enum-schedulerelationship

* Separate "Required" and "Cardinality" into different columns

* A new "Required" field holds the information as to whether a particular field is Required, Optional, or Conditionally required
* The existing "Cardinality" field is changed to a semantic version of cardinality instead of using Protobuf cardinality

* Change FeedMessage.entity to "Conditionally required"

* This field can't be "Required", because there are legitimate scenarios where a feed may not have any real-time information about the system (e.g., if it's 3am, and the agency doesn't offer any service at 3am).  So, change to "Conditionally required" with description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants