Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

presentationTimeOffset added twice for segment-relative cue times #595

Closed
sanbornhilland opened this issue Nov 17, 2016 · 9 comments
Closed
Assignees
Labels
status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Milestone

Comments

@sanbornhilland
Copy link
Contributor

sanbornhilland commented Nov 17, 2016

I'm copying a conversation here from #480 because I believe it requires its own issue.

We have content that contains the following:

<SegmentTemplate timescale="10000000" presentationTimeOffset="1796832000000000" media="$RepresentationID$/Segment-$Time$.vtt">
  <SegmentTimeline>
    <S t="1796832000000000" d="20000000" r="1049" />
  </SegmentTimeline>
</SegmentTemplate>

what would be the expected result? Video and audio times are calculated correctly but text cues are all generated starting at -1796832000000000. I'm looking at the DASH spec but I'm still unsure how the startTimes are supposed to be calculated.

@sandersaares commented:

Specification-wise, media samples that exist in ISOBMFF containers exist on a separate timeline from the period, with presentationTimeOffset being the alignment factor. In other words, the period 00:00:00 is mapped to 179683200 seconds in the media sample timeline. So to display a piece of text at the start of the period, it would need to have the timestamp of 179683200 seconds, which is 49912:00:00.

However, media samples that exist in plain text (sidecar) files are assumed to have a timeline aligned with the period (see DASH-IF IOP 6.4.5)!

@sandersaares Thank you for pointing me in the right direction, I had not read that.

In that section (DASH-IF IOP 6.4.5) it says:

Such external files are assumed do have a timeline aligned with the Period, so that TTML time 00:00:00.000 corresponds to the start of the Period. The presentation time offset is expected to be not presented, and if present, expected to be ignored by the DASH client.

The same applies to side-loaded WebVTT files. In that case, the @MimeType is text/vtt. If segmented subtitles are needed, such as for live sources, ISOBMFF-packaged TTML or WebVTT segments are expected be used to provide proper timing.

According to this it looks like Shaka should be ignoring the presentationTimeOffset if its present. And if I am reading it correctly this is actually independent of the issue of relative timestamps.

I think Shaka should be ignoring the presentationTimeOffset in the manifest above.

sanbornhilland pushed a commit to sanbornhilland/shaka-player that referenced this issue Nov 18, 2016
@baconz
Copy link
Contributor

baconz commented Nov 20, 2016

I recognize that this is what the spec says, and we should honor it, but it is a huge pain! It means that the packager needs to be aware of ad breaks, blackouts, and key rotation so that it can set period relative timestamps on text tracks. This feels like unnecessary complexity!

Does anybody else agree that this is an undue burden? I will create an issue with the IOP...

@sandersaares
Copy link
Contributor

sandersaares commented Dec 5, 2016

From IOP viewpoint, text should be in ISOBMFF in order to achieve interoperable playback. This also eliminates the need to care about period timeline.

Quoting IOP section 6.4.2:

This specification does not specify interoperable playback of these “sidecar” subtitle files in combination with a DASH audio visual presentation.

@baconz
Copy link
Contributor

baconz commented Dec 5, 2016

@sandersaares, so then the IOP really shouldn't have a stance on what the timeline of sidecar files should be, right? This runs contradictory to @sanbornhnewyyz's quote above.

@sandersaares
Copy link
Contributor

Well, that gets somewhat into philosophical areas. Should DASH-IF care about interoperability even when people use obsolete/nonrecommended mechanisms? If yes, implementations will be more unified but it might also lend some unwanted credibility to the approach in question. If no, it is likely that implementations will diverge and people will be unhappy with DASH as a technology.

Both approaches have upsides and downsides. The former has been selected as the favored one in this instance. I think, on balance, it is the better of the two.

@baconz
Copy link
Contributor

baconz commented Dec 6, 2016

Sure, I'm all for providing guidance on divergent implementations. With that said, I maintain that it creates undue burden on the packager to require that plaintext sidecar files provide period-relative timestamps. @sandersaares do you think the IF would consider revising this requirement?

@sandersaares
Copy link
Contributor

I consider it unlikely but you can certainly file an issue on the IOP tracker and try to convince people to change it: https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues

@joeyparrish
Copy link
Member

Okay, sorry for joining the conversation so late.

@sanbornhnewyyz, in your example, there's one critical thing missing from my point of view:

<SegmentTemplate timescale="10000000" presentationTimeOffset="1796832000000000" media="$RepresentationID$/Segment-$Time$.vtt">
  <SegmentTimeline>
    <S t="1796832000000000" d="20000000" r="1049" />
  </SegmentTimeline>
</SegmentTemplate>

What does this VTT file look like at t="1796832000000000"? If the manifest is correct, the timestamps in that VTT file should look something like this:

WEBVTT

49912:00:00.000 --> 49912:00:05.000
Hello, world!

Is that the case in your example? If not, I think the content is broken, not the player.

The point of the manifest is to describe the content. A web app should not have to parse an MP4 file to know the timestamp of a segment. It should know from the manifest what the start time and duration of a segment are.

To summarize what's already been said, media segments are on a different timeline from the presentation itself. So presentationTimeOffset allows the manifest to describe to the player how they should be aligned. For example, SegmentTimeline would use t=X if the segment's own PTS was X. If that segment needed to appear at time Y in the period, presentationTimeOffset would be Y-X. In this way, the manifest both describes what is in the media segment and what the player should do with it.

In MediaSource, we set timestampOffset to presentationTimeOffset + period.start. We don't use MediaSource for text, so the equivalent thing for text in Shaka Player is to offset the cue's timestamps by presentationTimeOffset + period.start. This is exactly how MediaSource applies timestampOffset to the media segments you feed it.

So, image you have a VTT file that says this:

WEBVTT

00:03.837 --> 00:07.297
Captain's log, stardate 41636.9.

And a manifest that describes the segment like this:

<SegmentTemplate timescale="1" media="$RepresentationID$/Segment-$Time$.vtt">
  <SegmentTimeline>
    <S t="123456" d="10" />

Then the manifest does not accurately describe the content. I'm not using presentationTimeOffset in this example because it doesn't matter. The media segment (VTT file) has internal timestamps just like an MP4 would, but the manifest describes that segment as having a completely different timestamp.

The manifest should always describe the media accurately. If the VTT file and manifest don't agree, one or the other needs to be adjusted. Again, this has nothing to do with presentationTimeOffset. The SegmentTimeline needs to reflect the contents of that segment - always.

@joeyparrish joeyparrish self-assigned this Dec 14, 2016
@joeyparrish joeyparrish added type: question A question from the community status: waiting on response Waiting on a response from the reporter(s) of the issue labels Dec 14, 2016
@sanbornhilland
Copy link
Contributor Author

@joeyparrish I see what you are saying but the segmented VTT files we are receiving have segment relative timestamps. So the example you provide:

WEBVTT

49912:00:00.000 --> 49912:00:05.000
Hello, world!

Would actually look like this:

WEBVTT

0:00:00.000 --> 0:00:05.000
Hello, world!

Hence applying any offset will cause a problem IF the assumption is that text tracks always align with the beginning of the period.

I question this:

Again, this has nothing to do with presentationTimeOffset. The SegmentTimeline needs to reflect the contents of that segment - always.

Because this seems to suggest otherwise:

Such external files are assumed do have a timeline aligned with the Period, so that TTML time 00:00:00.000 corresponds to the start of the Period. The presentation time offset is expected to be not presented, and if present, expected to be ignored by the DASH client.

At the end of the day, I don't disagree that there is a problem with the manifest we are receiving. Our proposal to fix this problem was to have the text track offset removed from the manifest. But depending on how that DASH-IF IOP is interpreted it would seem that ignoring presentation time offsets for plaintext tracks might be the way to go.

@joeyparrish joeyparrish removed the status: waiting on response Waiting on a response from the reporter(s) of the issue label Dec 22, 2016
@joeyparrish
Copy link
Member

Okay, I see the problem now.

We have the media timeline, the presentation time offset, the timestamps in the segment reference object, the period start, and the presentation timeline.

Say the timestamp in a media segment is 10, as parsed from the SIDX or the manifest. We then apply the presentation time offset to that and store it in the segment reference object. The segment reference time is relative to the period start. Then, to get to the presentation timeline, we add the period start to the segment reference time.

The timestamps passed to MediaSourceEngine.appendBuffer are in the presentation timeline, meaning they have already had presentation time offset applied. This timestamp is what ends up passed to the VTT parser.

Unconditionally, we add the offset to the cue times. Then, if useRelativeCueTimestamps is set, we add the segment timestamp to the cue times. So if useRelativeCueTimestamps is set, we have accounted for the offset twice!

The solution is to either add the offset (for period-relative timestamps) or the segment time (for segment-relative timestamps), but never both.

Make sense?

@joeyparrish joeyparrish added type: bug Something isn't working correctly and removed type: question A question from the community labels Jan 6, 2017
@joeyparrish joeyparrish changed the title Ignore presentationTimeOffset for text tracks presentationTimeOffset added twice for segment-relative cue times Jan 6, 2017
@joeyparrish joeyparrish added this to the v2.1.0 milestone Jan 6, 2017
joeyparrish added a commit that referenced this issue Jan 28, 2017
When using segment-relative timestamps in VTT, the presentation
offset has already been factored into the segment time.  Therefore,
it should not be added again in the VTT parser.

Closes #595
Closes #599

Change-Id: I9d062af7a17859f6f3374ecf20369b361f3eac7b
@shaka-project shaka-project locked and limited conversation to collaborators Mar 22, 2018
@shaka-bot shaka-bot added the status: archived Archived and locked; will not be updated label Apr 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

5 participants