Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmented VTT - absolute or relative timestamps? #480

Closed
sanbornhilland opened this issue Aug 9, 2016 · 21 comments
Closed

Segmented VTT - absolute or relative timestamps? #480

sanbornhilland opened this issue Aug 9, 2016 · 21 comments
Labels
status: archived Archived and locked; will not be updated type: enhancement New feature or request
Milestone

Comments

@sanbornhilland
Copy link
Contributor

WebVTT segments are not syncing properly with live streams because they are not accounting for the segmentStartTime. The first cue in each segment starts at 00:00:00.000 so the segmentStartTime needs to be used to offset each segment properly.

@joeyparrish
Copy link
Member

I would not expect the first cue in each segment to start at 0. What packager are you using?

@sanbornhilland
Copy link
Contributor Author

#481 has more discussion. It seems that it's unclear whether VTT timestamps are supposed to be absolute of relative.

@joeyparrish joeyparrish added the type: question A question from the community label Aug 9, 2016
@joeyparrish joeyparrish changed the title Live VTT not accounting for segmentStartTime Segmented VTT - absolute or relative timestamps? Aug 9, 2016
@joeyparrish
Copy link
Member

We treat them as absolute because that's how timestamps work in all other kinds of media segments.

@joeyparrish
Copy link
Member

Are you able to share any details on the encoder vendor or packager software you're using?

@sanbornhilland
Copy link
Contributor Author

I'm working on getting you details and a test stream I can share.

@sanbornhilland
Copy link
Contributor Author

@joeyparrish I emailed you a test stream.

@sanbornhilland
Copy link
Contributor Author

Is there any update on this? The stream I emailed @joeyparrish is will expire on Sept. 7 so if there is an opportunity to discuss this issue before then, it would be useful.

@joeyparrish
Copy link
Member

Sorry for the delay in my response.

I ran into a small, unrelated issue with your manifest that caused us to ignore the WebVTT text:

  <AdaptationSet mimeType="text/vtt" ...>
    <Representation codecs="vtt" ... />
  </AdaptationSet>

Our parser is registered for text/vtt, but not text/vtt; codecs="vtt". This is easy to fix by registering the additional string. I'll take care of this shortly.

As for the timestamps, we treat everything generically. Since timestamps in video segments are relative to the period, so should text timestamps be.

The only exception to this so far has been WebVTT embedded in MP4. The atom containing the cue does not actually contain a timestamp. The format specifies that the cue time should be the segment time. We treat this differently than the others because this is part of the spec.

Now, looking at your text content, I see this:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:2214643072,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:00.698 position:10%,start align:left line:19%
<c.darkgray>SO WE HAVE A CHANGE OF THE</c>

I researched X-TIMESTAMP-MAP, and it seems to be an HLS extension and not part of the WebVTT spec. This page from BrightCove states:

Adding the X-TIMESTAMP-MAP header may cause inconsistencies on other platforms, such as Windows Mobile applications. Make sure to test across all platforms after adding the offset header. The addition of this header tag is an HLS spec requirement, and not a WebVTT spec parameter.

I think it makes more sense to add support for X-TIMESTAMP-MAP than to add a configuration option to decide if timestamps are relative or not. Thoughts?

@joeyparrish joeyparrish added type: enhancement New feature or request and removed type: question A question from the community labels Sep 1, 2016
@joeyparrish
Copy link
Member

Actually, no, that won't work after all. As I'm looking more closely at the numbers, the map in the WebVTT files don't match up to the DASH presentation timeline at all.

For example, the segment at time 144323839, timescale 1000 contains a map that says it should be offset to 2560783072. Comparing numbers across segments, I see that 1k in the DASH timescale seems to equal 90k in the MPEG2 timescale used by whatever generated the VTT.

I'm increasingly convinced now that these timestamps are something the encoder needs to fix. They are taking something meant for HLS and just serving it as DASH, which doesn't make sense. Have you reached out to them for support?

@joeyparrish joeyparrish added type: question A question from the community and removed type: enhancement New feature or request labels Sep 1, 2016
shaka-bot pushed a commit that referenced this issue Sep 1, 2016
Related to issue #480

Change-Id: I0ef6d479e496ba45e6c4f984e8f7dc5e218c5175
@sanbornhilland
Copy link
Contributor Author

We've talked to the stream provider and received the following response:

  1. The unexpected "X-TIMESTAMP-MAP" attribute in webvtt files will be removed. This will also address the timescale issue since the timescales are tied to those tags.

  2. Other players, EX: Exoplayer are working with the relative timestamps so changing the stream will break playback elsewhere. The suggestion from them is to put fix Account for segmentStartTime in vtt segments to handle live #481 behind a configuration flag to maintain backwards compatibility with other players.

@sandersaares
Copy link
Contributor

sandersaares commented Sep 9, 2016

I note that DASH-IF IOP v3.3, section 6.4.5 forbids the use of plaintext TTML/WebVTT text in multiple segments, as quoted below.

Only one file for the full period is permitted, practically limiting this use case to non-live content.

Such external files are assumed do have a timeline aligned with the Period, so that TTML time 00:00:00.000 corresponds to the start of the Period. [...]

The same applies to side-loaded WebVTT files. In that case, the @MimeType is text/vtt. If segmented subtitles are needed, such as for live sources, ISOBMFF-packaged TTML or WebVTT segments are expected be used to provide proper timing.

Therefore the use of such content is dubious at best. You should be using ISOBMFF encapsulation for text streams.

@joeyparrish joeyparrish self-assigned this Sep 12, 2016
@joeyparrish
Copy link
Member

Thanks, Sander. It appears that we only have one demo asset that violates this part of IOP v3.3, and it's one we created for testing: http://storage.googleapis.com/shaka-demo-assets/tos-pto-webvtt/dash.mpd

shakaAssets.testAssets.filter(function(a) {
  return a.features.includes(shakaAssets.Feature.SEGMENTED_TEXT) &&
      !a.features.includes(shakaAssets.Feature.EMBEDDED_TEXT);
});

Since that's our own home-made test asset, we can change it to align with whatever we determine is the best practice for this non-IOP-compliant situation.

Does anyone have examples of public test content that features segmented text not embedded in ISO-BMFF? I'd like to do a survey of what's out there before we make any changes.

If you can provide a test stream, please do. If you can't, please just state whether your segments' cue times are relative to the period or to the segment. Also, please state what encoder/packager you use.

@joeyparrish
Copy link
Member

@baconz, I see you on the ExoPlayer thread. Can you weigh in on this?

@baconz
Copy link
Contributor

baconz commented Sep 12, 2016

We built our VTT packager to conform with Shaka's period-relative timestamps. We can change them since it seems like everybody is switching to segment-relative timestamps.

@joeyparrish joeyparrish added type: enhancement New feature or request and removed type: question A question from the community labels Oct 3, 2016
@joeyparrish
Copy link
Member

Okay. There were literally zero responses to my attempted survey on the mailing list. We will change to segment-relative timestamps in v2.1.0. PRs are welcome, or we'll get to it ourselves, eventually.

@joeyparrish joeyparrish removed their assignment Oct 3, 2016
@baconz
Copy link
Contributor

baconz commented Oct 3, 2016

@joeyparrish Any chance of adding a legacy flag to soften the transition for us? I can try to put up the PR, but probably won't get to it this week.

@joeyparrish
Copy link
Member

Sure, that could work. If we introduce a setting for this, we could even put it into v2.0.x (default to current behavior, warn about impending deprecation when used). Then in v2.1.0 we could just remove the setting.

@joeyparrish joeyparrish added this to the v2.1.0 milestone Oct 4, 2016
@sanbornhilland
Copy link
Contributor Author

This looks good. I will try to get a PR in for your review.

sanbornhilland pushed a commit to sanbornhilland/shaka-player that referenced this issue Oct 6, 2016
ismena pushed a commit that referenced this issue Oct 24, 2016
* Add config option for using segment relative timestamps for VTT

Fix for #480

* Make useRelativeCueTimestamps a non-nullable param

* Update tests for the new useRelativeCueTimestamps param

* Move period relative timestamp deprecation warning to vtt parser

* Log warning only if using absolute timestamps in text cue

* Fix vtt text parser test
joeyparrish pushed a commit that referenced this issue Oct 25, 2016
* Add config option for using segment relative timestamps for VTT

Fix for #480

* Make useRelativeCueTimestamps a non-nullable param

* Update tests for the new useRelativeCueTimestamps param

* Move period relative timestamp deprecation warning to vtt parser

* Log warning only if using absolute timestamps in text cue

* Fix vtt text parser test
@sanbornhilland
Copy link
Contributor Author

Related to this issue, if we have the following in the mpd:

<SegmentTemplate timescale="10000000" presentationTimeOffset="1796832000000000" media="$RepresentationID$/Segment-$Time$.vtt">
  <SegmentTimeline>
    <S t="1796832000000000" d="20000000" r="1049" />
  </SegmentTimeline>
</SegmentTemplate>

what would be the expected result? Video and audio times are calculated correctly but text cues are all generated starting at -1796832000000000. I'm looking at the DASH spec but I'm still unsure how the startTimes are supposed to be calculated.

@sandersaares
Copy link
Contributor

sandersaares commented Nov 9, 2016

Specification-wise, media samples that exist in ISOBMFF containers exist on a separate timeline from the period, with presentationTimeOffset being the alignment factor. In other words, the period 00:00:00 is mapped to 179683200 seconds in the media sample timeline. So to display a piece of text at the start of the period, it would need to have the timestamp of 179683200 seconds, which is 49912:00:00.

However, media samples that exist in plain text (sidecar) files are assumed to have a timeline aligned with the period (see DASH-IF IOP 6.4.5)!

@sanbornhilland
Copy link
Contributor Author

@sandersaares Thank you for pointing me in the right direction, I had not read that.

In that section (DASH-IF IOP 6.4.5) it says:

Such external files are assumed do have a timeline aligned with the Period, so that TTML time 00:00:00.000 corresponds to the start of the Period. The presentation time offset is expected to be not presented, and if present, expected to be ignored by the DASH client.

The same applies to side-loaded WebVTT files. In that case, the @MimeType is text/vtt. If segmented subtitles are needed, such as for live sources, ISOBMFF-packaged TTML or WebVTT segments are expected be used to provide proper timing.

According to this it looks like Shaka should be ignoring the presentationTimeOffset if its present. And if I am reading it correctly this is actually independent of the issue of relative timestamps.

@joeyparrish does this look correct to you? I can open an separate issue for this and likely provided a PR.

shaka-bot pushed a commit that referenced this issue Apr 5, 2017
The text parsers were all stateless. This caused problems with MP4
VTT as the timescale is needed later on for other boxes. This changes
parsers to carry state.

How time is referenced with the text parsers is not clear and has
caused confusion. In v2.0.1, we introduced the useRelativeCueTimestamps
option to control the behavior of our WebVTT parser. We decided in #480
(comment) that we would remove this option in v2.1.0. All WebVTT
timestamps in v2.1.0 will be relative to the segment time. This change
creates a new time context interface that will be used to help limit
the confusion around how time is communicated.

Closes #726

Change-Id: I67409608c35d2d5abb8b8b25529859cb37f8f0a8
@shaka-project shaka-project locked and limited conversation to collaborators Mar 22, 2018
@shaka-bot shaka-bot added the status: archived Archived and locked; will not be updated label Apr 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants