Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsample offset and cue timestamp parsing for WebVTT fails when PTS wraps. #7462

Closed
stevemayhew opened this issue Jun 3, 2020 · 12 comments
Closed
Assignees
Labels

Comments

@stevemayhew
Copy link
Contributor

stevemayhew commented Jun 3, 2020

Issue description

ExoPlayer does not handle PTS wrap that is not propagated to the Cue timestamps, that is the timestamps continue in time past the max representable as a PTS value. This is allowed by the Pantos spec as:

 When synchronizing WebVTT with PES timestamps, clients SHOULD account
   for cases where the 33-bit PES timestamps have wrapped and the WebVTT
   cue times have not.  When the PES timestamp wraps, the WebVTT segment
   SHOULD have a X-TIMESTAMP-MAP header that maps the current WebVTT
   time to the new (low valued) PES timestamp.

Note this commit to fix Issue #2928 does not fix this issue, as all the cue timestamps have values that are monotonically increasing after the PTS wrap, not just the first cue timestamp.

Reproduction steps

The demo app reproduces this, I've sent an S3 bucket with a snap shot of a live stream that has run long enough for PTS to wrap.

Link to test content

I will email the link to dev.exoplayer@gmail.com.

Pull Request

I've submitted a request that fixes the issue, we of course needed this quickly for a customer. We are more then happy to add more unit tests around this change and help regress other bugs that may or may not be related or affected by this [Bug 609].

[REQUIRED] A full bug report captured from the device

There is nothing relevant in the logs. The captions simply are queue to the sample queue with values so out of scope that playback quickly goes to hell.

[REQUIRED] Version of ExoPlayer being used

2.11.4

[REQUIRED] Device(s) and version(s) of Android being used

Reproduced on an Arris STB on Android Pie

@andrewlewis
Copy link
Collaborator

@icbaker @AquilesCanta Please could you take a look?

@AquilesCanta
Copy link
Contributor

I thought ExoPlayer already supported this case, actually. I can take a quick look.

@icbaker
Copy link
Collaborator

icbaker commented Jun 4, 2020

@AquilesCanta and I have taken a look at this. I'm not convinced the source media is valid/correct in this case. There's 2 (quite different) questions:

  1. Should WebVTT headers be allowed on the same line as the WEBVTT magic string?
  2. Should the LOCAL value of X-TIMESTAMP-MAP wrap when the MPEGTS value does?

WebVTT header parsing

The WebVTT file parsing spec reads the WEBVTT string and then skips content until it finds the first LINE FEED:

  • Step 6 reads WEBVTT and step 7 skips (it says 'collect' but it doesn't use the result, so it's essentially skipping).
  • Step 9 then skips the LINE FEED that was found and Step 11 parses headers.

So it seems that WebVTT headers like X-TIMESTAMP-MAP need to be on their own line directly underneath WEBVTT with no blank lines in between.

The value of LOCAL in X-TIMESTAMP-MAP

The pantos spec paragraph you've quoted suggests that MPEGTS should be expected to wrap but LOCAL doesn't. This makes sense to me, since it's giving the player enough info to map a potentially ambiguous wrapped 90kHZ timestamp back to an absolute offset that corresponds directly to the WebVTT timestamps. However in the media you provided, it seems that both LOCAL and MPEGTS have wrapped. In fact, the X-TIMESTAMP-MAP in the provided media has only a 622µs difference between the LOCAL and MPEGTS timestamps (which provides very little info beyond the default of LOCAL=0,MPEGTS=0).

The file you provided looked something like this (I've made up these numbers, but they're in the same ballpark):

WEBVTT	X-TIMESTAMP-MAP=LOCAL:1:11:16.443,MPEGTS:384879885

250:32:46.582 --> 250:32:48.305
  Cue text one

250:32:50.433 --> 250:32:51.276
  Cue text two
1:11:16.443  = 4276443000µs
384879885pts = 4276443167µs (i.e. 167µs difference)

I'd expect to see something more like:

WEBVTT
X-TIMESTAMP-MAP=LOCAL:250:32:45.824,MPEGTS:384879885

250:32:46.582 --> 250:32:48.305
  Cue text one

250:32:50.433 --> 250:32:51.276
  Cue text two

When I hackily change WebvttExtractor locally to add a constant offset of to the LOCAL time (to effectively make your file look more like my proposed file), then everything else works. There's no weird effects with the progress bar or freezing or general playback problems. In this example I'd use an offset of 250:32:46.582 - 1:11:16.443 = 897690139000, with your file I used the equivalent offset. This is obviously a hopeless hack because it assumes the first cue in the file appears at the timestamp used in X-TIMESTAMP-MAP, but the point was to get the cue offsets in roughly the right ballpark.

@stevemayhew
Copy link
Contributor Author

Thanks @icbaker !

The Pantos spec is unclear how the TIMESTAMP-MAP should be placed in the file (of course, spec is vague on many things). The w3c spec states pretty clearly the WEBVTT string can be followed by space or tab (https://www.w3.org/TR/webvtt1/#file-structure. It is this that the origin vendor claims its ok to do it this way. My last checkin of the WebvttParserUtil.isWebvttHeaderLine() handles either case and passes all the test cases.

As for the timestamps, I'm working on a test case for the change that detects PTS wrap. Probably a good idea to add the same change check to the LOCAL timestamp as well. Reading the paragraph again, it is possible that the LOCAL value could be monotonic increasing as well. If you look at my code and the test case, you can see it will arrive at the same values no matter which timestamps they do or do not allow to wrap.

I'm finishing up a test case for the WebvttExtractor now, I'll check that in to this branch too (that code was not covered by any test case)

Please do review my code, I'll add some notes, and let us know if you have other streams you want tested.

@icbaker
Copy link
Collaborator

icbaker commented Jun 5, 2020

The w3c spec states pretty clearly the WEBVTT string can be followed by space or tab (https://www.w3.org/TR/webvtt1/#file-structure.

I agree WEBVTT can be followed by space/tab and then anything else until the end of the line. But I think 6.1 says to then skip that data (and start parsing headers on the next line). However that 4.1 wording you've linked doesn't seem to allow headers at all (requires a blank line immediately after WEBVTT + other-things). This seems to contradict 6.1 - I've asked a question to the WebVTT folks: w3c/webvtt#485

We'll see what they say.

As for the timestamps, I'm working on a test case for the change that detects PTS wrap. Probably a good idea to add the same change check to the LOCAL timestamp as well. Reading the paragraph again, it is possible that the LOCAL value could be monotonic increasing as well. If you look at my code and the test case, you can see it will arrive at the same values no matter which timestamps they do or do not allow to wrap.

OK - I'm getting a better understanding now. I don't think this is about the LOCAL timestamp wrapping necessarily, whether it wraps or not is irrelevant. What we're doing wrong in the current code is not correctly respecting 33-bit wrapping when applying the X-TIMESTAMP-MAP calculation.

I did some maths on the sample file you provided to try and explain the reasoning of the existing code. The first PTS timestamp I found in the MPEG stream was 494878907.

I calculated things like this:

// 1. Adjust the first cue timestamp to be relative to the LOCAL offset:
266:39:10.679 - 01:31:50.766 = 265:7:19.913 = 954439913000 micros
// 2. Convert this offset to PTS. This is how far ahead in PTS the cue is from the mapping time.
954439913000 micros = 246250 PTS (wrapped) 
// 3. Add back on the MPEGTS time to get the PTS time the first cue should be displayed at
246250 + 495968996 = 496215246
// 4. Compare the first cue PTS time to the PTS time of the first video sample.
//    we expect the first subtitle to be shown at about 14s into the clip.
496215246 - 494878907 = 1336339 pts = 14848211 micros = 00:00:14.848  

The existing code does it like this:

// 1. Convert the LOCAL to micros
01:31:50.766 = 5510766000  micros 
// 2. Convert MPEGTS to micros
495968996 pts = 5510766622 micros 
// 3. Convert the first cue timestamp to micros
266:39:10.679 = 959950679000  micros
// 4. Find the offset of the first cue from the LOCAL mapping time, in micros
959950679000 - 5510766000 = 954439913000 
// 5. Add back on the MPEGTS mapping time, in micros
954439913000 + 5510766622 = 959950679622
// 6. Convert this offset to PTS (without wrapping!)
959950679622  micros = 8639556116 pts = 266:39:10.680
// 7. We then feed this PTS into (the stateful) `timestampAdjuster.adjustTsTimestamp()`.
//    This is where everything goes wrong, we think we've received a timestamp that's
//    ~10 days ahead of the current playback position, so things get very confused.

The two algorithms are equivalent apart from step 6, I just found it easier to reason in PTS while the code decides to do most of its logic in microseconds.

If we fix step 6 to wrap:

// 6. Convert this offset to PTS (*with* wrapping)
959950679622 micros = 496215246 pts
// And then compare with the PTS from the video stream and see we
// get the same answer as my algorithm above:
496215246 - 494878907 = 1336339 pts = 00:00:14:848 

If TimestampAdjuster.usToPts() wraps to 33-bits (which tbh, you might reasonably expect it to) then the existing code works (by making step 6 wrap). I've tested with this change locally & sent it for internal review.

I think your PR makes some incorrect assumptions - I've added some comments. I also don't think it should be needed once we've pushed the wrapping change I've described.

icbaker added a commit that referenced this issue Jun 5, 2020
@icbaker
Copy link
Collaborator

icbaker commented Jun 5, 2020

I've tested with this change locally & sent it for internal review.

This is now pushed to dev-v2. Can you try it out and see what you think? (you'll need to keep your header parsing changes locally)

@icbaker
Copy link
Collaborator

icbaker commented Jun 5, 2020

The Pantos spec is unclear how the TIMESTAMP-MAP should be placed in the file (of course, spec is vague on many things). The w3c spec states pretty clearly the WEBVTT string can be followed by space or tab (https://www.w3.org/TR/webvtt1/#file-structure. It is this that the origin vendor claims its ok to do it this way. My last checkin of the WebvttParserUtil.isWebvttHeaderLine() handles either case and passes all the test cases.

I did some digging through the WebVTT history and found that 'WebVTT header' was clearly defined when the HLS spec was written, and they were defined as appearing on their own lines directly underneath WEBVTT. Full details in w3c/webvtt#485 (comment).

So the HLS spec is arguably relying on an old version of the WebVTT spec, but it seems pretty clear the intention is to use this definition of 'WebVTT header'.

@stevemayhew
Copy link
Contributor Author

So the HLS spec is arguably relying on an old version of the WebVTT spec, but it seems pretty clear the intention is to use this definition of 'WebVTT header'.

Yeah thanks for this Ian, and all your thoughtful analysis here.. It's great.

As a player vendor, there is no real benefit to being pedantic to the spec (unless you want to use it to display a warning log when you see things that are non-compliant). The player's job is to make every reasonable attempt to gather the actual intent from input sources that are less then perfectly compliant. So, we should accept the tab or the \n termination.

@stevemayhew
Copy link
Contributor Author

@ojw28 the Bright Cove folks obviously experienced this as well: https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html

Note the big warning:

The X-TIMESTAMP-MAP header must appear on line 2, directly after the WEBVTT line.

Next push I'll take out commit 5e7d22a and commit 2108c6a from the pull branch. We'll keep these ourselves locally until the origin vendor agrees or convinces us all this is a spec bug.

@stevemayhew
Copy link
Contributor Author

I did some maths on the sample file you provided to try and explain the reasoning of the existing code. The first PTS timestamp I found in the MPEG stream was 494878907.

I calculated things like this:

// 1. Adjust the first cue timestamp to be relative to the LOCAL offset:
266:39:10.679 - 01:31:50.766 = 265:7:19.913 = 954439913000 micros
// 2. Convert this offset to PTS. This is how far ahead in PTS the cue is from the mapping time.
954439913000 micros = 246250 PTS (wrapped) 
// 3. Add back on the MPEGTS time to get the PTS time the first cue should be displayed at
246250 + 495968996 = 496215246
// 4. Compare the first cue PTS time to the PTS time of the first video sample.
//    we expect the first subtitle to be shown at about 14s into the clip.
496215246 - 494878907 = 1336339 pts = 14848211 micros = 00:00:14.848  

Ian, thanks so much. I'll look at the sample and put in the expected PTS for all of the cues in at least the first 5 or 6 cues in the example, we can make a test case for it then (it would need to be on the render side, as the final calculation with offset is done there.

icbaker added a commit that referenced this issue Jun 8, 2020
This also gives some general test coverage for HLS' WebvttExtractor

Issue: #7462
PiperOrigin-RevId: 315257252
@icbaker
Copy link
Collaborator

icbaker commented Jun 10, 2020

I'm going to close this since I believe we've resolved the timestamp problem, and we're not going to change the WebVTT parsing.

I'll leave the PR open while you're looking into testing.

@icbaker icbaker closed this as completed Jun 10, 2020
@stevemayhew
Copy link
Contributor Author

stevemayhew commented Jun 11, 2020 via email

@google google locked and limited conversation to collaborators Aug 10, 2020
@google google deleted a comment from stevemayhew Sep 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants