Teletext Subtitle Duration Issue #1841

Closed
ndjamena opened this Issue Dec 28, 2016 · 22 comments

Projects

None yet

2 participants

@ndjamena

After remuxing a particular 1 h 20 min TV recording from a .ts file the length of the resulting text subtitle becomes 18 h 16 min.

I've remuxed it twice and the same thing is happening.

If I load the mkv in MKVInfo GUI and look at the last cluster, the final frame in the file is from track number 3 (the text subtitles), appears at about 1h 20 mins and lasts for about 2 seconds.

Extracting the tags using MKVExtract shows the subtitle duration to be 18 hours and MKVInfo shows the file to be 18 hours long.

Looking at the cues there is a cluster with a timecode of about 9 hours and a duration of about 9 hours somewhere near the end of the file. I found that cluster and yes it's there. Every cluster after that point contains text subtitle frames and the first of those frames begins 6 seconds into playback.

Basically, somehow MKVMerge has given the first subtitle frame a timecode of 9 hours and a duration of 9 hours, moved it to the end of the file then positioned all the other subtitle frames after it in the file, even though their timecodes indicate they should be displayed at their correct times.

@mbunkus
Owner
mbunkus commented Dec 28, 2016

Well, if you want me to look into it, I need the source TS file.

@ndjamena

http://www.mediafire.com/file/ntdgk9294t6gt1o/disk34.ts

I split the source file into 10mb segments and muxed each one to see what I could see... that's the 34th file I checked.

@mbunkus
Owner
mbunkus commented Dec 28, 2016

Thanks. Do I understand you correctly that everything was OK with files 01–33?

@ndjamena

They didn't have excessive lengths. Most of them had a bitrate of 0kbps for the subtitles though... I didn't overly scrutinise them though, I was looking for this evidence towards this thread.

@mbunkus
Owner
mbunkus commented Dec 28, 2016

Alright. Can you please also upload disk33.ts and one of the files where subtitle entries are present but where the duration is OK? I'd like to compare the timestamps surrounding the problematic area. Thanks.

@ndjamena

http://www.mediafire.com/file/n0xelnx0uvx329d/disk33.ts

MPC-HC can play the subtitles in the 34 MKV fine despite the fact that they're all grouped in clusters at the end of the file.

33 has 10 subtitles of its own, all of which are distributed through the file properly.

@mbunkus
Owner
mbunkus commented Dec 28, 2016

Thanks. I'll look into it some more over the next couple of days.

@ndjamena

Apparently remuxing file 1 causes the 18 hour duration as well. I'm not sure how I missed that.

@mbunkus
Owner
mbunkus commented Jan 5, 2017

The problem with your files is that the subtitle timestamps in the TS files are way off. In disk33.ts the audio and video timestamps start around 00:11:08.418444444, whereas the subtitle packets that directly follow have a PTS of 09:19:25.912088888. Therefore how mkvmerge handles them is not strictly a bug — the source material is simply so bad. You can observe VLC having problems with playback of disk33.ts, too; it doesn't show the subtitles even if I turn them on.

I will implement a workaround for such situations, but as with any workaround I'll have to be careful not to break the processing of valid files.

@mbunkus mbunkus added type:enhancement and removed type:bug labels Jan 5, 2017
@ndjamena
ndjamena commented Jan 5, 2017

I have a problem with that explanation.

166
00:11:11,764 --> 00:11:13,084
Hello there!

167
00:11:14,244 --> 00:11:15,604
Ah!

168
00:11:16,804 --> 00:11:19,004
Hello! Ah...

169
00:11:20,164 --> 00:11:21,684
Hello?

170
00:11:21,804 --> 00:11:22,884
Hello!

171
00:11:22,924 --> 00:11:23,544
What are you doing?

172
00:11:24,564 --> 00:11:25,256
I was setting a trap!

173
00:11:26,164 --> 00:11:27,084
A trap?

174
00:11:27,244 --> 00:11:28,152
On your roof!

175
00:11:28,364 --> 00:11:30,444
What happened?
I tried it out!

176
00:11:28,576 --> 00:11:32,284
How?
Accidentally!

177
00:11:32,284 --> 00:11:33,364
(SNEEZES)

178
00:11:33,444 --> 00:11:35,684
Bless you.
Thanks.

179
00:11:35,684 --> 00:11:37,124
What's your name?
Grant.

180
00:11:35,072 --> 00:11:38,244
Hello, Grant.

181
00:11:36,192 --> 00:11:39,964
What floor is this?
60.

182
00:11:40,884 --> 00:11:42,604
Ah!

183
00:11:43,404 --> 00:11:44,684
(GROANS)

184
00:11:44,764 --> 00:11:46,044
Would it be alright if I came in?

185
00:11:46,044 --> 00:11:47,404
I'll have to ask my mom.

186
00:11:48,724 --> 00:11:50,164
Ow!
Mom, wake up!

187
00:11:50,164 --> 00:11:51,884
(COUGHS)

188
00:11:55,524 --> 00:11:56,524
(GROANS)

File33

1
00:00:04,916 --> 00:00:06,236
Hello there!

2
00:00:07,396 --> 00:00:08,756
Ah!

3
00:00:09,956 --> 00:00:12,156
Hello! Ah...

4
00:00:13,316 --> 00:00:14,836
Hello?

5
00:00:14,956 --> 00:00:16,036
Hello!

6
00:00:16,076 --> 00:00:16,696
What are you doing?

7
00:00:17,716 --> 00:00:18,408
I was setting a trap!

8
00:00:19,316 --> 00:00:20,236
A trap?

9
00:00:20,396 --> 00:00:21,304
On your roof!

10
00:00:21,516 --> 00:00:23,596
What happened?
I tried it out!

file 34

1
09:08:18,741 --> 18:16:34,935
(SNEEZES)

2
00:00:02,628 --> 00:00:04,868
Bless you.
Thanks.

3
00:00:04,868 --> 00:00:06,308
What's your name?
Grant.

4
00:00:04,256 --> 00:00:07,428
Hello, Grant.

5
00:00:05,376 --> 00:00:09,148
What floor is this?
60.

6
00:00:10,068 --> 00:00:11,788
Ah!

7
00:00:12,588 --> 00:00:13,868
(GROANS)

8
00:00:13,948 --> 00:00:15,228
Would it be alright if I came in?

9
00:00:15,228 --> 00:00:16,588
I'll have to ask my mom.

10
00:00:17,908 --> 00:00:19,348
Ow!
Mom, wake up!

11
00:00:19,348 --> 00:00:21,068
(COUGHS)

12
00:00:24,708 --> 00:00:25,708
(GROANS)

Unless I'm missing something Line 1 in file 34 is line 177 in the full file.

All that happened to file 34 was that it was split from the original as simple bytes.

Line 176 "How? Accidentally!" in the original is missing from the combined 33/34... I have no idea how to read the headers in the ts files, so I can only speculate... but are you sure this hasn't got something to do with the ts files being split (seeing as the original was split from the broadcast stream). how are you viewing the timecodes?

@mbunkus
Owner
mbunkus commented Jan 5, 2017

You can look at the TS timestamps with MPEG-2 Transport Stream packet analyser. Restrict it to jump to "payload start indicator = 1" and "PID = 130" for the subtitle stream, "PID = 48" for the video stream.

@mbunkus
Owner
mbunkus commented Jan 5, 2017

Oh, the "PTS" and "DTS" values use a 90 kHz clock. This means that you have to multiply those values by 90000 in order to get the timestamp in nanoseconds. Or to put it the other way around: timestamp_in_seconds = PTS * 90000 / 1000000000 = PTS * 9 / 100000.

@ndjamena
ndjamena commented Jan 5, 2017

Oh, so the problem is simply that the subtitle timecodes don't match the video... so the problem occurs when one or the other is encountered in the file first?

Subtitle timecode found first is set to zero, which is subtracted from the video timecode which makes them negative, which isn't allowed so the video timecodes are set to zero too.

Video timecode found first, their timecodes are set to zero, it encounters a subtitle timecode and the video timecodes is subtracted from it, leaving it still miles into the future.

Something like that?

@mbunkus
Owner
mbunkus commented Jan 5, 2017

Yeah, something along those lines.

@mbunkus mbunkus added a commit that referenced this issue Jan 6, 2017
@mbunkus MPEG TS: workaround for subtitle timestamps differing from audio/vide…
…o timestamps widely

There are MPEG TS files where subtitle packets are multiplexed with
audio and video packets properly, meaning that packets that are supposed
to be shown/played together are also stored next to each other. However,
the timestamps in the PES streams have huge differences. For example,
the first timestamps for audio and video packets are around 00:11:08.418
whereas the timestamps for corresponding subtitle packets start at
09:19:25.912.

This workaround attempts to detect such situations. In that case
mkvmerge will discard subtitle timestamps and use the most recent audio
or video timestamp instead.

Implements #1841.
17fd1fd
@mbunkus mbunkus closed this Jan 6, 2017
@ndjamena
ndjamena commented Jan 7, 2017

Um... OK so the problems in the stream, but with the fix applied there's some overlap of the timecodes. A lot of the subtitles seem to begin at pretty much exactly the same time.

I don't know what the timecodes actually say or how Texttext data is organised but wouldn't it be best to sync the subtitle timecodes to the audio/video only when they're grossly out of sync, and then afterwards sync each teletext packet to the previous packet unless that would put them grossly out of sync again?

Is that what you did? Would that be too much? Or is this new problem caused by bad timecodes in the stream as well?

I have one subtitle that starts before the previous one, so I'm guessing the way teletext data is stored isn't that straight forward.

@mbunkus
Owner
mbunkus commented Jan 7, 2017

Hmm, I'm using a different method that worked fine with the two samples you had uploaded. I'll look into changing it.

@mbunkus mbunkus reopened this Jan 7, 2017
@ndjamena
ndjamena commented Jan 7, 2017

I can't make head or tails from the output of the program you linked me to.

TS.packet,TS.PID,PES.PTS,PES.DTS
37,2321,7833342444,0
38,2322,2204308237,0
60,2320,7833502601,7833499001
90,2322,2204310037,0
102,2320,7833520601,7833502601
146,2322,2204311837,0
201,2322,2204313637,0
253,2322,2204315437,0
260,2320,7833513401,7833506201
300,2322,2204317237,0
318,2320,7833509801,0
348,2320,7833517001,7833513401
365,2322,2204319037,0
378,2320,7833535001,7833517001
418,2322,2204320837,0
471,2322,2204322637,0
525,2322,2204324437,0
577,2322,2204326237,0

As far as I can tell 2320 is video, 2321 is audio and 2322 is the subtitles. I only recorded this just now and the subtitle PTS bares no resemblance to the audio/video PTS's.

Am I looking at the wrong thing or is this actually normal?

I'm pretty sure MKVMerge ripped file 33 acceptably beforehand, it's just in file 34 it set the first subtitle's timecodes to start at nine hours with a nine hour duration, while as far as I can tell the other timecodes were actually close to fine.

@mbunkus
Owner
mbunkus commented Jan 7, 2017

Am I looking at the wrong thing or is this actually normal?

You're looking at the right thing, and your conclusion that the subtitle PTS bare no resemblance to the audio/video PTS is correct. This is not how it's supposed to be, but unfortunately such issues are pretty common in MPEG TS files. They really shouldn't be, but they are, and therefore workarounds have to be implemented.

@mbunkus mbunkus added a commit that referenced this issue Jan 7, 2017
@mbunkus MPEG TS: only adjust bad subtitle timestamps once per detected differ…
…ence

The prior implementation checked each subtitle timestamp against the
last audio or video timestamp seen. If it differed too much, the
subtitle timestamp was set to that last timestamp.

The problem with such an approach is its coarseness. It also disregards
the relative difference between subtitle timestamps.

The new implementation works differently. The first time a big
difference between subtitle and audio/video timestamps is detected, the
difference will be stored. This difference will then be applied to all
subtitle timestamps. This keeps the relative difference between subtitle
timestamps intact.

Addresses #1841.
094f2ee
@mbunkus
Owner
mbunkus commented Jan 7, 2017

New pre-builds are up.

@ndjamena
ndjamena commented Jan 7, 2017

That is MUCH better. It even got rid of the slight overlaps between subtitles and the random subtitles that just blinked on and off almost instantaneously. Other than a slight delay and the lack of colours it's pretty much perfect...

...any chance you'll add colour support any time soon? I notice they use it to denote different speakers without bothering to switch to a new line.

@mbunkus
Owner
mbunkus commented Jan 7, 2017

Thanks for testing & the feedback.

No plans for color support, no.

@mbunkus mbunkus closed this Jan 7, 2017
@ndjamena
ndjamena commented Jan 7, 2017

OK, done then.

Happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment