Proposal: Gap Notifications #423

kixelated · 2024-03-28T18:06:35Z

Feel free to s/SUBSCRIBE/FETCH/g in your head.

Problem

The current draft is intentionally vague, but it allows objects to dropped in response to congestion. However, this ambiguity will cause issues for a few use-cases.

Consider a subscriber that requests a range:

-> SUBSCRIBE start=4 end=7
<- SUBSCRIBE_OK
<- OBJECT group=7
<- OBJECT group=6
<- OBJECT group=4
<- SUBSCRIBE_DONE

At this moment, group=5 has been promised as part of the SUBSCRIBE but has not arrived yet. This is perfectly normal as objects may be sent over multiple streams which arrive out of order. A reliable live (#419) and VOD player will block playback as they anticipate group=5 to arrive any moment now and they do not want to skip content.

However, if group=5 was dropped for any reason, potentially upstream, then playback will be stalled indefinitely. The only recourse is to set an arbitrary timeout and eventually assume loss, but this will cause additional buffering especially for reliable live.

Furthermore, a relay that issues this request upstream will mark group=5 in its cache as pending. If a FETCH comes in for start=2 end=5, then the relay could FETCH start=2 end=3 upstream, as it can serve group=4 immediately and group=5 once it arrives at any moment.

However, if group=5 was dropped due to a transient problem (ex. congestion), then the relay would be unaware that it could refetch group=5. Indicating that a dropped object can be retried would be very useful for patching up VODs, as the relay could refill these known holes after network recovery.

Proposal

1. `SUBSCRIBE_OK` may contain a subset of the range requested in `SUBSCRIBE`.

The publisher commits to sending either an object or a drop notification within the specified range.

SUBSCRIBE_OK {
    Subscribe ID (i),
    Start Group (i)
    Start Object (i)
    End Group (i)
    End Object (i)
}

(some encoding finagling required)

Otherwise, a relay that receives a SUBSCRIBE start=0 end=10000 would be expected to send an object or dropped message for every value within the requested range, even if they left cache long ago. For example, a DVR player requesting content from 45s ago, but everything older than 30s was already dropped from the cache.

This modified SUBSCRIBE_OK message tightens the bounds to prevent this.

2. Add a `STREAM_DROPPED` that indicates drops within the `SUBSCRIBE_OK` range.

The publisher indicates that it cannot deliver the entire range as originally indicated, and one or more objects were dropped.

STREAM_DROPPED Message {
  Header (STREAM_HEADER_TRACK | STREAM_HEADER_GROUP | STREAM_HEADER_OBJECT),
  Error Code (i)
  Retryable (t)
}

The publisher sends this message when:

It drops objects by resetting the tail of a stream.
It drops objects by not creating a stream.
It receives a STREAM_DROPPED from upstream (relay only).

This will be a relatively rare message except during congestion, with the frequency depending on the stream mode. The entire header is sent for simplicity although some fields may be useless (ex. priority).

This message would be sent on a data stream (unidirectional) to avoid unnecessary head-of-line blocking on the control stream (bidirectional). This message is reliable, so it should not be sent on the same stream with potentially unreliable OBJECT messages.

There is no drop notification for datagrams, as the publisher does not know if they were dropped.

The text was updated successfully, but these errors were encountered:

kixelated · 2024-03-28T19:15:15Z

I don't think this requires sequential group_ids, although it would be really nice. That way the MoqTransport library could then expose a reliable read function since it knows about the existence of every group/object at the MoqTransport layer.

Otherwise the MoqTransport library has to expose these dropped notifications and let the application determine the existence of each group based on some pre-negotiated scheme (ew). For example, if the group_id is used to encode the PTS, then any reliable live or VOD would require a fixed group duration or some otherwise of signaling group existence (ex. timeline).

suhasHere · 2024-03-28T23:34:36Z

+1 on being on the data path. My thinking around this is to use a tomb stone object or group message. This will keep my relay and cache code agnostic to a dropped group or object. Since gap info is only needed for players to either to wait or not wait.

So my preference to be to keep on data plane and not use a new message type, but use the existing object definitoin or header definition to mark as something missing.

i can propose something if it helps

afrind · 2024-03-28T23:49:00Z

i can propose something if it helps

I'm curious to see what you have in mind and how it is different from this proposal.

wilaw · 2024-03-29T07:57:05Z

-> SUBSCRIBE start=4 end=7
<- SUBSCRIBE_OK
<- OBJECT group=7
<- OBJECT group=6
<- OBJECT group=4
<- SUBSCRIBE_DONE

At this moment, group=5 has been promised as part of the SUBSCRIBE but has not arrived yet.

But here'n lies the problem. Since group numbers are non-sequential, as far as the relay is concerned there is no promise of group 5, as it may never have existed in the first place. 4,6,7 is a perfectly acceptable sequence of groups. The only entities that know that 5 should exist are the original publisher and the final subscriber reading the catalog.

Gaps can exist for two reasons: the original publisher did not create the object for some reason, or a relay had the object but then dropped it when forwarding under congestion:

If the publisher does not create the object, it can signal this by via either a control message (as proposed in this thread) or by creating a "placeholder object". This is an object which contains no playload, and in its header contains information about the error code and whether the object is retryable or not. The advantage of using the data plane to transmit this information is that this placeholder object (and the information about the drop) can be cached by every relay, whereas control messages are not cached and are not persistent.
If a relay drops an object due to congestion response, it should explicitly signal this via a control message. This allows the receiver to not stall out waiting for the object and based on its application knowledge, it may re-request the object, or else move on.

kixelated · 2024-03-29T14:31:59Z

But here'n lies the problem. Since group numbers are non-sequential, as far as the relay is concerned there is no promise of group 5, as it may never have existed in the first place. 4,6,7 is a perfectly acceptable sequence of groups. The only entities that know that 5 should exist are the original publisher and the final subscriber reading the catalog.

I originally had a third proposal: "sequential group_ids". It would make the protocol significantly better but I don't think it's strictly required.

The application definitely needs to know if each group exists. It could do this via sequential groups or some other mechanism like a timeline. For example, suppose a simple scheme where there is a group every 2000 units for whatever reason. The player would wait an object or dropped notification for 0, 2000, 4000, 6000, etc.

Suppose the relay gets a SUBSCRIBE for start=456 end=7000 or something dumb. It would send that upstream and forward any received objects or dropped messages. If it receives group 4000 but decides to drop it, then it would add its own dropped message. When it receives a SUBSCRIBE_DONE end=7000, then it forwards that too.

The relay doesn't actually care if group 1234 ever arrives; only the application knows or cares. But there's an argument for sequential groups so the relay can deterministically clean up any subscription state without waiting for an unsubscribe, because it doesn't know if there's still more data pending.

fluffy · 2024-04-03T13:45:06Z

I'm really confused on the "dropped due to congestion" part of this. How does that happen to a reliable stream. It seems like what we have in the spec for those cases is close the subscription. I do understand dropped due to the TTL was exceeded but assuming that is not the case we are worried about for a VOD. It would think that for applications that need to wait for all the object, the publisher will be designed not to have gaps in the sequence. I can seen reasons to have some way of representing tombstones for objects in the datastream but for the case here, have the producer send things sequentially. If the relay network does not deliver the data to the client, it seems the client is in a better position to detect that then anything else.

wilaw · 2024-04-03T14:39:21Z

I'm really confused on the "dropped due to congestion" part of this. How does that happen to a reliable stream. It seems like what we have in the spec for those cases is close the subscription. I do understand dropped due to the TTL was exceeded but assuming that is not the case we are worried about for a VOD.

I don't think dropping should ever happen for VOD. In my understanding, any talk of dropping should only occur for real-time clients which cannot afford to wait for old content and would prefer to get new content if it is available. For dropping to occur, 3 conditions must be met:

The original publisher must embed TTL and/or priority information in the objects, to enable stateless relays to take a drop action without having to understand the content
The client must signal in its SUBSCRIBE (or FETCH) that it wants the dropping behavior (only the client knows if it is playing the content at the real-time edge or further behind)
There must be congestion in the link between client and relay, such that a queue of outbound objects has built up.

suhasHere · 2024-04-06T23:34:23Z

The original publisher must embed TTL and/or priority information in the objects, to enable stateless relays to take a drop action without having to understand the content

+1

suhasHere · 2024-04-06T23:35:29Z

2. The client must signal in its SUBSCRIBE (or FETCH) that it wants the dropping behavior (only the client knows if it is playing the content at the real-time edge or further behind)

This is not totally true. If the publisher client is a realtime client, then it doesn't matter if the receiving client wants other behavior.

This PR adds a status field to the object that allows an relay to indicate an object or group was lost or dropped. It also allows a producer to indicate a object or group will not be produced including the cases for end of group and end of track. The PR probably needs a bit more text on what what producers and relays do but I think this is close enough to get some initial discussion on this solution. Fixes #318 Fixes #427 Fixes most of #423 Closes #334 Closes #426

kixelated mentioned this issue Apr 3, 2024

END_OF_GROUP message #426

Closed

suhasHere mentioned this issue Apr 6, 2024

Group IDs and gaps #427

Closed

fluffy mentioned this issue Apr 10, 2024

Add object status used to indicate lost or nonexistent objects #429

Merged

ianswett mentioned this issue Apr 17, 2024

Add OBJECTS_DROPPED on the control stream #434

Closed

ianswett added the Object Model Relating to the properties of Tracks, Groups and Object label Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Gap Notifications #423

Proposal: Gap Notifications #423

kixelated commented Mar 28, 2024 •

edited

kixelated commented Mar 28, 2024 •

edited

suhasHere commented Mar 28, 2024

afrind commented Mar 28, 2024

wilaw commented Mar 29, 2024

kixelated commented Mar 29, 2024

fluffy commented Apr 3, 2024

wilaw commented Apr 3, 2024

suhasHere commented Apr 6, 2024

suhasHere commented Apr 6, 2024

Proposal: Gap Notifications #423

Proposal: Gap Notifications #423

Comments

kixelated commented Mar 28, 2024 • edited

Problem

Proposal

1. SUBSCRIBE_OK may contain a subset of the range requested in SUBSCRIBE.

2. Add a STREAM_DROPPED that indicates drops within the SUBSCRIBE_OK range.

kixelated commented Mar 28, 2024 • edited

suhasHere commented Mar 28, 2024

afrind commented Mar 28, 2024

wilaw commented Mar 29, 2024

kixelated commented Mar 29, 2024

fluffy commented Apr 3, 2024

wilaw commented Apr 3, 2024

suhasHere commented Apr 6, 2024

suhasHere commented Apr 6, 2024

kixelated commented Mar 28, 2024 •

edited

1. `SUBSCRIBE_OK` may contain a subset of the range requested in `SUBSCRIBE`.

2. Add a `STREAM_DROPPED` that indicates drops within the `SUBSCRIBE_OK` range.

kixelated commented Mar 28, 2024 •

edited