Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Gap Notifications #423

Open
kixelated opened this issue Mar 28, 2024 · 9 comments
Open

Proposal: Gap Notifications #423

kixelated opened this issue Mar 28, 2024 · 9 comments
Labels
Object Model Relating to the properties of Tracks, Groups and Object

Comments

@kixelated
Copy link
Collaborator

kixelated commented Mar 28, 2024

Feel free to s/SUBSCRIBE/FETCH/g in your head.

Problem

The current draft is intentionally vague, but it allows objects to dropped in response to congestion. However, this ambiguity will cause issues for a few use-cases.

Consider a subscriber that requests a range:

-> SUBSCRIBE start=4 end=7
<- SUBSCRIBE_OK
<- OBJECT group=7
<- OBJECT group=6
<- OBJECT group=4
<- SUBSCRIBE_DONE

At this moment, group=5 has been promised as part of the SUBSCRIBE but has not arrived yet. This is perfectly normal as objects may be sent over multiple streams which arrive out of order. A reliable live (#419) and VOD player will block playback as they anticipate group=5 to arrive any moment now and they do not want to skip content.

However, if group=5 was dropped for any reason, potentially upstream, then playback will be stalled indefinitely. The only recourse is to set an arbitrary timeout and eventually assume loss, but this will cause additional buffering especially for reliable live.

Furthermore, a relay that issues this request upstream will mark group=5 in its cache as pending. If a FETCH comes in for start=2 end=5, then the relay could FETCH start=2 end=3 upstream, as it can serve group=4 immediately and group=5 once it arrives at any moment.

However, if group=5 was dropped due to a transient problem (ex. congestion), then the relay would be unaware that it could refetch group=5. Indicating that a dropped object can be retried would be very useful for patching up VODs, as the relay could refill these known holes after network recovery.

Proposal

1. SUBSCRIBE_OK may contain a subset of the range requested in SUBSCRIBE.

The publisher commits to sending either an object or a drop notification within the specified range.

SUBSCRIBE_OK {
    Subscribe ID (i),
    Start Group (i)
    Start Object (i)
    End Group (i)
    End Object (i)
}

(some encoding finagling required)

Otherwise, a relay that receives a SUBSCRIBE start=0 end=10000 would be expected to send an object or dropped message for every value within the requested range, even if they left cache long ago. For example, a DVR player requesting content from 45s ago, but everything older than 30s was already dropped from the cache.

This modified SUBSCRIBE_OK message tightens the bounds to prevent this.

2. Add a STREAM_DROPPED that indicates drops within the SUBSCRIBE_OK range.

The publisher indicates that it cannot deliver the entire range as originally indicated, and one or more objects were dropped.

STREAM_DROPPED Message {
  Header (STREAM_HEADER_TRACK | STREAM_HEADER_GROUP | STREAM_HEADER_OBJECT),
  Error Code (i)
  Retryable (t)
}

The publisher sends this message when:

  • It drops objects by resetting the tail of a stream.
  • It drops objects by not creating a stream.
  • It receives a STREAM_DROPPED from upstream (relay only).

This will be a relatively rare message except during congestion, with the frequency depending on the stream mode. The entire header is sent for simplicity although some fields may be useless (ex. priority).

This message would be sent on a data stream (unidirectional) to avoid unnecessary head-of-line blocking on the control stream (bidirectional). This message is reliable, so it should not be sent on the same stream with potentially unreliable OBJECT messages.

There is no drop notification for datagrams, as the publisher does not know if they were dropped.

@kixelated
Copy link
Collaborator Author

kixelated commented Mar 28, 2024

I don't think this requires sequential group_ids, although it would be really nice. That way the MoqTransport library could then expose a reliable read function since it knows about the existence of every group/object at the MoqTransport layer.

Otherwise the MoqTransport library has to expose these dropped notifications and let the application determine the existence of each group based on some pre-negotiated scheme (ew). For example, if the group_id is used to encode the PTS, then any reliable live or VOD would require a fixed group duration or some otherwise of signaling group existence (ex. timeline).

@suhasHere
Copy link
Collaborator

+1 on being on the data path. My thinking around this is to use a tomb stone object or group message. This will keep my relay and cache code agnostic to a dropped group or object. Since gap info is only needed for players to either to wait or not wait.

So my preference to be to keep on data plane and not use a new message type, but use the existing object definitoin or header definition to mark as something missing.

i can propose something if it helps

@afrind
Copy link
Collaborator

afrind commented Mar 28, 2024

i can propose something if it helps

I'm curious to see what you have in mind and how it is different from this proposal.

@wilaw
Copy link
Contributor

wilaw commented Mar 29, 2024

-> SUBSCRIBE start=4 end=7
<- SUBSCRIBE_OK
<- OBJECT group=7
<- OBJECT group=6
<- OBJECT group=4
<- SUBSCRIBE_DONE

At this moment, group=5 has been promised as part of the SUBSCRIBE but has not arrived yet.

But here'n lies the problem. Since group numbers are non-sequential, as far as the relay is concerned there is no promise of group 5, as it may never have existed in the first place. 4,6,7 is a perfectly acceptable sequence of groups. The only entities that know that 5 should exist are the original publisher and the final subscriber reading the catalog.

Gaps can exist for two reasons: the original publisher did not create the object for some reason, or a relay had the object but then dropped it when forwarding under congestion:

  • If the publisher does not create the object, it can signal this by via either a control message (as proposed in this thread) or by creating a "placeholder object". This is an object which contains no playload, and in its header contains information about the error code and whether the object is retryable or not. The advantage of using the data plane to transmit this information is that this placeholder object (and the information about the drop) can be cached by every relay, whereas control messages are not cached and are not persistent.
  • If a relay drops an object due to congestion response, it should explicitly signal this via a control message. This allows the receiver to not stall out waiting for the object and based on its application knowledge, it may re-request the object, or else move on.

@kixelated
Copy link
Collaborator Author

But here'n lies the problem. Since group numbers are non-sequential, as far as the relay is concerned there is no promise of group 5, as it may never have existed in the first place. 4,6,7 is a perfectly acceptable sequence of groups. The only entities that know that 5 should exist are the original publisher and the final subscriber reading the catalog.

I originally had a third proposal: "sequential group_ids". It would make the protocol significantly better but I don't think it's strictly required.

The application definitely needs to know if each group exists. It could do this via sequential groups or some other mechanism like a timeline. For example, suppose a simple scheme where there is a group every 2000 units for whatever reason. The player would wait an object or dropped notification for 0, 2000, 4000, 6000, etc.

Suppose the relay gets a SUBSCRIBE for start=456 end=7000 or something dumb. It would send that upstream and forward any received objects or dropped messages. If it receives group 4000 but decides to drop it, then it would add its own dropped message. When it receives a SUBSCRIBE_DONE end=7000, then it forwards that too.

The relay doesn't actually care if group 1234 ever arrives; only the application knows or cares. But there's an argument for sequential groups so the relay can deterministically clean up any subscription state without waiting for an unsubscribe, because it doesn't know if there's still more data pending.

@fluffy
Copy link
Contributor

fluffy commented Apr 3, 2024

I'm really confused on the "dropped due to congestion" part of this. How does that happen to a reliable stream. It seems like what we have in the spec for those cases is close the subscription. I do understand dropped due to the TTL was exceeded but assuming that is not the case we are worried about for a VOD. It would think that for applications that need to wait for all the object, the publisher will be designed not to have gaps in the sequence. I can seen reasons to have some way of representing tombstones for objects in the datastream but for the case here, have the producer send things sequentially. If the relay network does not deliver the data to the client, it seems the client is in a better position to detect that then anything else.

@wilaw
Copy link
Contributor

wilaw commented Apr 3, 2024

I'm really confused on the "dropped due to congestion" part of this. How does that happen to a reliable stream. It seems like what we have in the spec for those cases is close the subscription. I do understand dropped due to the TTL was exceeded but assuming that is not the case we are worried about for a VOD.

I don't think dropping should ever happen for VOD. In my understanding, any talk of dropping should only occur for real-time clients which cannot afford to wait for old content and would prefer to get new content if it is available. For dropping to occur, 3 conditions must be met:

  1. The original publisher must embed TTL and/or priority information in the objects, to enable stateless relays to take a drop action without having to understand the content
  2. The client must signal in its SUBSCRIBE (or FETCH) that it wants the dropping behavior (only the client knows if it is playing the content at the real-time edge or further behind)
  3. There must be congestion in the link between client and relay, such that a queue of outbound objects has built up.

@suhasHere
Copy link
Collaborator

  1. The original publisher must embed TTL and/or priority information in the objects, to enable stateless relays to take a drop action without having to understand the content

+1

@suhasHere
Copy link
Collaborator

2. The client must signal in its SUBSCRIBE (or FETCH) that it wants the dropping behavior (only the client knows if it is playing the content at the real-time edge or further behind)

This is not totally true. If the publisher client is a realtime client, then it doesn't matter if the receiving client wants other behavior.

@ianswett ianswett added the Object Model Relating to the properties of Tracks, Groups and Object label Apr 27, 2024
ianswett added a commit that referenced this issue Apr 28, 2024
This PR adds a status field to the object that allows an relay to
indicate an object or group was lost or dropped.
 
It also allows a producer to indicate a object or group will not be
produced including the cases for end of group and end of track.
 
The PR probably needs a bit more text on what what producers and relays
do but I think this is close enough to get some initial discussion on
this solution.

Fixes #318
Fixes #427
Fixes most of #423 

Closes #334
Closes #426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Object Model Relating to the properties of Tracks, Groups and Object
Projects
None yet
Development

No branches or pull requests

6 participants