Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize lookback periods #749

Closed
jean-populus opened this issue Feb 25, 2022 · 21 comments
Closed

Standardize lookback periods #749

jean-populus opened this issue Feb 25, 2022 · 21 comments
Labels
Policy Specific to the Policy API
Milestone

Comments

@jean-populus
Copy link
Collaborator

jean-populus commented Feb 25, 2022

Is your feature request related to a problem? Please describe.

For the historical data we get through MDS Provider /status_changes we only get updates from a vehicle when the vehicle state changes. That means we might not “hear” from a vehicle for several days if it just stays parked in one location. After a certain time - aka the lookback period - if we don’t “hear” from a vehicle then we assume it’s been lost and remove it from the counts.

Different cities have different ideas on how long this lookback period should be.

Describe the solution you'd like

It would be helpful if there was a shared understanding and recommendation on a standard lookback period. Currently each city/provider/3rd party is a separate conversation and unique lookback period.

For example, we could say if a vehicle ID is not in /vehicles then it should be considered removed until it reappears.

We could also consider how operators are tracking their vehicle " heartbeat" (#750) and integrate that into /vehicles, making the "heartbeat" the metric for lookback period.

Is this a breaking change

  • I'm not sure

Impacted Spec

  • policy

Describe alternatives you've considered

n/a

Additional context

see 2022.02.17 working group discussion

@schnuerle schnuerle added the Policy Specific to the Policy API label Feb 28, 2022
@schnuerle schnuerle added this to the 2.0.0 milestone Feb 28, 2022
@ezmckinn
Copy link
Contributor

ezmckinn commented Mar 2, 2022

I can confirm that this issue also causes confusion from the operator side, and would support capturing this in the spec in some way.

An initial question is, where would a heartbeat be represented? Is that in /status_changes, with an event_type of status_check? Or would we just track a heartbeat internally, and show our scooters in the /vehicles endpoint each time we hear from them internally? Personally I prefer an explicit representation of the heartbeat in /status_changes, but curious as to what others think.

A second question is, how often do cities need to hear from scooters? One heart beat per day? We'll have to strike a balance between reporting enough status changes to meet cities' use cases, without cluttering the /status_changes feed.

A third question: are heartbeats state dependent? If a scooter is removed, do we need to report its heartbeat? If the primary use case here is checking up on vehicles in the PROW to verify that they are still functional, I'd suggest that status changes need only be reported for scooters in the PROW.

@jean-populus
Copy link
Collaborator Author

Just noting here that even if we get a heartbeat, like every 10 minutes, that we'll still need a lookback period in case due to comms issues the vehicle skips a checkin and returns at the next one. But it could be in the range of hours rather than the current days that are being implemented.

@wellorder
Copy link
Contributor

The original hope for the unknown state was that it would remove the need for lookback periods. I will try to explain, but it will be kind of lengthy.

Some context

The most common way to use MDS data is to answer the question "how many vehicles are in PROW right now?". The way everyone thinks to do this is to say "what MDS state are all the vehicles in my market in now, just count the ones in PROW states according to the spec". In 0.3 there was a problem: one could only move a vehicle into a non-PROW state via a pickup or decommissioning. But providers and cities tended to agree that after some period of time with no new events it did not make sense to count certain vehicles as in PROW, since they had likely gone missing. They had not been picked up, and perhaps not yet decommissioned (this tends to be done on a conservative basis, and cities don't like to see vehicles "come back" from being decommissioned, which does happen occasionally with "missing" vehicles), so moving them into removed seemed to not fit this situation. (elsewhere was introduced for a similar reason; the transitions to removed did not make sense for vehicles moving out of a city during a ride.)

Many cities and aggregators decided to use the notion of a "lookback", but this has the unfortunate implication that just because a vehicle is in a PROW state does not mean it will count as in PROW, which raises the question of what a "PROW state" even means if it doesn't mean "counts as in PROW". There were also complications because different cities and aggregators use different lookback windows. All this combines to mean that the exact same MDS data could be counted by different cities in different ways, which is pretty much the opposite of what you want to have happen when working with a standard.

How does unknown help?

In MDS 1.0 we introduced an unknown state to try and address this issue. There is a transition to unknown called missing. If your vehicle goes missing, then you send a missing transition for it to unknown. Now we can say that everything in, say, available should count as in PROW, because if it had been missing the provider should have said so. "PROW state" truly means "counts as in PROW", in every city. Or that was the idea.

The problem

Unfortunately, unknown currently has a similar problem with lookbacks: different cities are allowed to count vehicles in unknown states differently. This is at least explicitly called out in the spec (unknown is considered a "maybe PROW" state), as opposed to lookbacks which are mentioned nowhere. (Maybe we'll change that, I'll get to it!) So you can't say "the same MDS data will be counted the same way by all cities" still. But at least the difference is contained: you know they can only differ in how they count unknown vehicles.

There is a new complication as well: different providers might have different triggers for moving something to unknown. Maybe one waits 7 days before reporting missing, maybe one waits 14 days. There's also the fact that one can move to unknown with a lost_comms transition. Maybe one provider reports that after 5 minutes without comms, maybe another waits 15 minutes. Maybe a city is fine not counting "missing" vehicles but just because there's a cell outage they don't want to not count the vehicle. So there is still some ambiguity.

Aren't these kind of the same?

Let's say we standardize "how long can a vehicle go without an event before moving to unknown via missing", and we then decide that we have enough standardization, so we stop counting unknown as PROW. This is what we would like at Bird. Isn't this functionally the same as standardizing "how long can a vehicle go without an event before we stop counting it in PROW"? It's functionally the same for the question of PROW counts, which admittedly, is the most common question. But it's not the only question people ask about MDS data. There are things unknown helps with that lookbacks don't:

  • it really can't be overstated how natural "sum up the PROW states to count PROW" is. Technically adding a WHERE updated_at > foo to all of your data queries is not that hard, but if you forget it one place now all of your reports are off and you get to have fun debugging that.
  • unknown also serves a purpose for providers. Sometimes we lose track of a vehicle, during that time it moves, and then some auditor finds the vehicle in a different location than we last reported it. They suspect the worst! But if the last update was to an unknown state then they know how this happened.
  • similarly, if we lose comms with a vehicle and it moves during that time but then we re-establish comms, without an unknown state to move through there is no approved transition just to update the vehicle's location in most Provider or Agency data (though at least we could update it in the /vehicles response).
  • you might worry that providers would abuse this ability to move things to unknown if that is all it takes to remove things from PROW. But cities can still track unknown for other purposes. Bird's position is that unknown vehicles should not count as in PROW, but cities should use "% of fleet in unknown state" as a measure of operational quality. There will always be some vehicles going missing (unfortunately), and every city has its cell service dead spots or phone company hiccups, but if one provider is reporting much more of this sort of thing than another, that indicates something about how they operate. This gives providers an incentive to keep that number low even though it does not have direct implications for operating against a cap.

Setting those things aside, the programmer in me would insist that with a lookback we never refer to any state as "in PROW" because it would never be true, since no state would be inherently "in PROW" in that world! We would want to change the language to instead talk about "how to calculate PROW counts based on MDS data". I don't think this matches people's intuition about MDS states, and I think our data model should try to match that intuition.

Final recommendation

Rather than introduce lookbacks officially, we could instead standardize how transitions to unknown states are used, and not count unknown in PROW. This should clear up confusion and make PROW count calculation more straightforward.

@jiffyclub
Copy link
Contributor

@wellorder, to clarify, are you proposing that all vehicles in the unknown state not count as in the PROW, or only those that enter unknown via the missing event type?

I've seen instances where vehicles have a lot of transitions into and out of unknown on relatively short time scales via the comms event types and I would think those should continue to count as in the PROW.

@schnuerle
Copy link
Member

We will be discussing this at tomorrow's working group meeting.

@wellorder
Copy link
Contributor

I think it's reasonable to want to treat lost_comms transitions differently than missing; as you say, they operate on different timeframes. That said, the easy thing to do is to treat all vehicles in the unknown state the same way, and we think "not in PROW" is better. Since lost_comms is shorter this shouldn't have much impact on overall counts. Still, if it came down to "depends on the transition" I like that a little better than "depends on when it happened". There's already a little bit of this sort of thing when looking at removed, since there is a difference between things that are decommissioned versus just having been picked up by the operator even though those end up in the same state.

For MDS 2.0 we could "split" unknown into missing and out_of_communication (or some such thing) if we want to make the distinction clear at the state level rather than the state/transition level.

@jiffyclub
Copy link
Contributor

Whether a vehicle state is considered in or out of the PROW matters a lot for something like determining how long a vehicle has been parked in one place, which is a common regulation we help enforce at Populus. A vehicle that enters unknown on lost_comms and then goes back to available on comms_restored some time later in the same place has clearly not moved and we would consider that a continuous in-PROW parking event. But an unknown/missing transition would terminate the parking event under this model where it's used to communicate that a vehicle is permanently lost. So I don't think it's possible to entirely say unknown is out of- or in-PROW.

@mrsimpson
Copy link

Isn't PROW a geographical information while "unknown" is an operational information?
After the discussion in the WG meeting, I do feel those are two different dimensions which are being mixed up.

The answer to "is a vehicle in unknown state in PROW?" can only be "we don't know – if we knew, it would not be in unknown state".
Therefore, I'd favor to be very explicit about unknown / lost_sight being a state which is out-of-control for all sides => there needs to be a validation with the physical world to transition it to a defined state which allows for an in-/out-of-PROW-decision.
Therefore, this state would justify both a separate section on the "in-/out-of-PROW count report" as well as allow for a dedicated rule within a policy.

@schnuerle
Copy link
Member

@wellorder could you make a PR for this on how you think the use and clarification of unknown be clarified in the spec? Then the WG could review and provide ideas.

@jean-populus
Copy link
Collaborator Author

@wellorder alternatively if you could propose here how you think we could standardize transitions to unknown states then the task force can draft a PR based on the info you provide. Let us know also if you prefer to do this over a call.

@wellorder
Copy link
Contributor

Sorry for the delay; there has been some team reshuffling and I no longer work as closely with our MDS implementation. Please bear with me if I use the wrong terms here and there.

I would say the main goal with unknown is to avoid the use of lookback periods as applied to PROW states, so vehicles in PROW states are always counted as being in PROW. Lookbacks make this not true, which to me makes the entire notion of a PROW state misleading. Saying "available is in PROW" and also saying "this available vehicle does not count against your PROW caps, because it's been available for a few days" is confusing. "available is PROW except when it's not"?

Having unknown be a "maybe PROW" state might still be somewhat confusing, but at least the question to be answered is clear: when should unknown count in PROW and when should it not? I suppose one could get very fancy about this. Laying out a few possibilities:

  1. unknown vehicles never count in PROW.
  2. unknown vehicles always count in PROW.
  3. We apply a multiplier to the number of unknown vehicles and add that to our PROW count, to indicate uncertainty about if those vehicles are actually in PROW. If we decide it's 50/50 that an unknown vehicle is in PROW, then we'd multiply the unknown count by .5, for example.
  4. We have a notion of lookback that only applies to the unknown state (which after all is "maybe PROW" so we don't run into the above objection we had doing so with available); vehicles in unknown count in PROW for the first X hours but after that do not count as in PROW.
  5. unknown vehicles that arrived there from a lost_comms transition count as in PROW, but those that arrived there from a missing transition don't count as in PROW. The difficulty here is that then we would need to allow some way for transitions from unknown to itself, to allow a lost_comms transition to be replaced with a missing transition.
  6. In MDS 2.0, create two separate states, say missing and non-communicative, and say non-communicative is in PROW but missing is not. This gets around the self-transition issue from 4, but has to wait for a new MDS version.

At Bird currently we just do 1 or 2, depending on the city or aggregator we are working with. This raises another issue (I think brought up by @mrsimpson back when we discussed this in May): to what degree do we want how unknown works left up to the cities? Obviously the more control we allow cities the more different setups providers will have to accommodate. "Always in PROW" sometimes and "Never in PROW" sometimes is basically one switch to flip, which is not so bad. "Every city dictates their own lookback" is more work, and "Every city dictates how long until you can declare something missing" is even more (we currently base that on internal measures so we'd have to create a "translation" layer between what we consider missing and what the city does). I honestly think this is the real question that cities want answered about the unknown state. Not "how should it count", but "when should providers be able to use it in the first place?"

To sum up:

  • lookbacks are weird (to me at least) because they mean vehicles in PROW states don't contribute to our PROW count.
  • from our perspective as a provider, understanding our PROW count is very important for compliance, so that's our main focus. "When does unknown count as in PROW" is therefore the main thing we want to clarify.
  • for a city's perspective, I get the impression that the main thing they want to clarify is "When are providers allowed to use the transitions to unknown", and the question of whether or not to count it in PROW is secondary.

Pretty much any of the possibilities I laid out above are workable for answering "When does unknown count as in PROW", though picking one or two would be best in terms of understanding and supporting the requirement. But I'll be honest, I don't really have a proposal for "When are providers allowed to use the transitions to unknown"; I know what Bird uses and why, but it's based on operational and reporting considerations internally that we are just trying to communicate to cities so they have an accurate picture of our operations, and to me having that be provider-driven seems correct.

We've always held that it's fair for a city to evaluate us on how much time our vehicles spend in unknown, so we have an incentive to not inflate that number, but I don't know that a city should get to tell us how long to wait before we think we've lost communications with a scooter. (Maybe I could see it for missing?) I don't see any reason why that number should be the same across providers, as we each handle telemetry and cell tower communication and all of that differently. I guess my proposal for answering the "when are providers allowed..." question would be instead to say something like "ask your provider how they use these transitions, and feel free to use 'time in unknown' as a metric for evaluating their performance".

@wellorder
Copy link
Contributor

One other thing worth mentioning: lookbacks and unknown interact poorly. We have run into the following situation:

  • we used to send our vehicles to unknown via a missing transition after 7 days
  • one of the aggregators had a lookback window of 7 days, so if they received no events for a vehicle for 7 days they would remove the vehicle from PROW. So we agreed on how long to wait before considering a vehicle missing.
  • receiving an unknown event counted as "hearing from the vehicle"
  • so if we sent data to say a vehicle had gone missing, the aggregator treated it as not missing, and if we didn't send data to say a vehicle had gone missing, the aggregator treated it as missing. So despite agreeing on the timeframe, our vehicles were being counted in PROW after this timeframe if we actually provided the city timely data, whereas if we did not our vehicles would be removed from PROW counts.

This is perverse. It rewards providers that send less data about their operations and penalizes those that send more. We ultimately just stopped sending missing transitions after 7 days, making our operations more opaque but more in line with aggregator expectations. So if we do decide to use lookbacks, we need to carve out unknown some way, because this is silly.

@jiffyclub
Copy link
Contributor

I agree that in an ideal world we wouldn't need lookback periods, but the practical case is that without them we'd still be counting vehicles where the last we heard was that it was available in 2018.

I think it's been a mistake to have a state that is "maybe PROW" and in fact @jean-populus floated the 6th idea from above to me last week: that we should deprecate the unknown state and replace it with out_of_communications and missing states, the former being in-PROW and the latter being out. That leaves the question of "when is the operator justified in transitioning from out_of_communications to missing?", but at least we'd be having that discussion in the context of states that clearly mean one thing or the other.

@wellorder
Copy link
Contributor

As a provider, we're very incentivized to remove dead vehicles from our feeds, since they take up room in our caps. I'm sure you can find some exceptions (bugs and edge cases abound, unfortunately), but we have entire processes in place to try to make sure that, for example, once we write off a vehicle as a lost cause, to send a decommissioned transition to removed for it. If we fail to do that and it clogs up our PROW count, then that seems like a problem for us more than anyone else. We wanted a missing transition to indicate something more "intermediate" (e.g. we haven't seen a vehicle in 7 days, at which point it's unlikely to be exactly where we last reported it, and somewhat but not entirely unlikely to be found again), since cities don't like when things come back from being decommissioned.

We also want lost_comms mostly so if a city is auditing us and finds a vehicle somewhere other than where we last reported it, we can truthfully say "that's because we haven't heard from it, it must have been moved during the period of lost communication" and have the data trail to back that up.

The use of the two transitions is different, so I could definitely get behind turning it into 2 states. I think the question for "when is the operator justified in transitioning to missing?" doesn't depend on the state being transitioned from (in theory we could send stuff to missing from available, right?) but I agree that it's the important question.

In such a world, "send the vehicle to missing" would take the place of a lookback, wouldn't it? I guess if we always treat missing as outside of PROW we don't run into the "perverse" situation above, so I start to get less opposed to lookbacks, but I still fall back to "lookbacks mean PROW states don't always count in PROW" and this seems unfortunate. I'm not saying I couldn't make my peace with it, but I'm not convinced yet.

@jiffyclub
Copy link
Contributor

In such a world, "send the vehicle to missing" would take the place of a lookback, wouldn't it?

I think it helps in that with these states operators have a sanctioned way of saying "we dunno what happened to this vehicle and assume it's gone" in a way that we all (hopefully) agree on and that will remove vehicles from in-PROW counts. I expect lookback periods to remain necessary for retiring vehicles that never get a removed or missing state transition (because, that's going to happen), but hopefully fewer vehicles will fall into that.

@jean-populus
Copy link
Collaborator Author

Based on the conversations so far I'd like to propose the following changes to unknown

  • convert the unknown state to two new states: non_contactable and missing
    • We want to differentiate between temporary and more permanent states of unknown, and to avoid having a state that’s “maybe” PROW. Every vehicle state should clearly map to in- or out-of-PROW.
    • A temporary state is when there’s no heartbeat (non_contactable) but it’s unclear whether the vehicle is really missing yet or it’s just a technical glitch.
    • More permanent state is when there’s no heartbeat for a while (24-hours) and the vehicle is presumed missing. Eventually if the vehicle is not located it transitions to a removed state.
    • all other PROW/not-PROW counts are based on vehicles states so making these explicit states instead of doing PROW/not-PROW on transitions just for this instance is easier and more intuitive
  • new state non_contactable
    • In PROW: yes
    • Description: Provider has temporarily lost contact with the vehicle and its disposition is unknown.
    • Transition in: comms_lost, unspecified
    • Transition out: comms_restored, unspecified
  • new state missing
    • In PROW: no
    • Description: Provider has lost contact with the vehicle for over 24-hours and and its disposition is unknown.
    • Transition in: not_located, unspecified
    • Transition out: located, unspecified
    • Note: missing event type is dropped
  • remove state unknown

New State Machine Diagram

State Machine Diagram for MDS 2 0 0 - micromobility

workflow

  1. vehicle goes to non_contactable, comms_lost from any PROW state
    1. vehicle returns to any PROW state (other than non_contactable) with transition comms_restored when heartbeat is again detected
    2. vehicle is still counted as in PROW in non_contactable state
  2. vehicle goes to missing, not_found after 24 hours with no contact
    1. vehicle returns to PROW state with transition located when vehicle is found
    2. vehicle goes to removed after operator decides vehicle cannot be found and is no longer part of fleet
    3. vehicle is counted as out-of-PROW in missing state

The assumption is that providers will implement this new schema and send the new vehicle states and transitions. If implemented rigorously - ie if providers update the vehicle state based on last heard 'heartbeat' as recommended - then we don't necessarily need a lookback period or separate 'heartbeat' data for most use cases (this can be up for discussion!). It could be assumed that any vehicles with no updates are still in the last reported state even if it's been several days or weeks.

@marie-x
Copy link
Collaborator

marie-x commented Oct 4, 2022

I would prefer out_of_comms to non_contactable but otherwise these changes make sense to me.

@jean-populus
Copy link
Collaborator Author

During the working group meeting today there was general consensus to move this forward. Next steps is to see if we can definite standard timing for the transitions.

  1. How long should providers wait after the last heartbeat before transitioning the vehicle state to non_contactable? How does 1 hour land with this group?
  2. How long should providers wait after the last heartbeat before transitioning the vehicle state to missing? Seems like anywhere from 24hours to 7 days is used. Should we be conservative and use 7 days?

@mrsimpson
Copy link

I can see two options for negotiating the lookback used

  • The provider explicitly denotes which lookback has been used. This could be done as meta-data in the transition to missing or via a separate API on provider.
  • The agency regulates which lookback period is granted prior to considering a vehicle out of PROW. This could be communicated as a separate api on agency or as meta-data on the policy
  • Both documentations could very well be combined – there does not have to be a need to regulate this on agency side (but probably a high interest).

I'd prefer the effective lookback period as part of the event (so that providers can change this), I don't have a bias for the regulations-part.

In any case, I'd prefer an explicit communication over a "most probably acceptable standard" since the requirements / understandings across the globe may vary.

Does that make sense to you?

jean-populus added a commit to populus-ai/mobility-data-specification that referenced this issue Dec 15, 2022
Replaced 'unknown' with 'non-contactable' and 'missing' in the Vehicles States table to be clearer about the transition from Yes PROW -> No PROW as discussed in openmobilityfoundation#749.

Changed explanatory paragraph on 'unknown' to instead describe use of 'non-contactable' and 'missing'.

Open Question: should MDS recommend a specific time limit for the transition from 'non-contactable' to 'missing' if the city does not have an SLA around this?
jean-populus added a commit to populus-ai/mobility-data-specification that referenced this issue Dec 15, 2022
Changed 'missing' to 'not_located' since there's a new vehicle state for 'missing'. Clarified 'located' to match. See openmobilityfoundation#749 (comment)
@schnuerle
Copy link
Member

I believe this issue has been resolved completely with PR #814. Please comment if you think this is correct or incorrect.

@jean-populus
Copy link
Collaborator Author

jean-populus commented Jan 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Policy Specific to the Policy API
Projects
None yet
Development

No branches or pull requests

7 participants