Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal SSA/ASS Subtitle Support #2281

Merged
merged 18 commits into from Jul 10, 2017
Merged

Minimal SSA/ASS Subtitle Support #2281

merged 18 commits into from Jul 10, 2017

Conversation

jcable
Copy link
Contributor

@jcable jcable commented Jan 2, 2017

Hi, please find for consideration this pull request which adds SSA/AAS subtitle decoding.

The Script Info section is not decoded. The Styles and Events sections are fully decoded.

\N tags in the text are converted to newlines and all other text tags are removed. The full rich text and the full style information is kept in the Cue object and is therefore available for enhanced rendering but this is not implemented.

Unit tests are included.

Thanks to ojw28 for essential advice in getting timestamp handling correct.

* Created by cablej01 on 27/12/2016.
*/

public class Style {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we typically support styling of subtitles is by attaching Span objects directly onto the Cue text, which needs to be a CharSequence (you can use SpannableStringBuilder directly) rather than just a String. Take a look at what we do for webvtt to get the idea (see in particular WebvttCueParser).

I think you should use the same approach for SSA. In particular, there are built in Span objects for font, background color, foreground color, size adjustment, bold, italic, underline, strike-out etc, which the existing subtitle rendering components will render automatically without you having to do anything.

For anything that's not supported by existing Span objects I guess it is sensible to extend Cue to add them (assuming they apply to the whole Cue). If they might apply to only part of the Cue, a custom Span would be appropriate. This stuff is only useful if you think someone's going to build a UI component that actually uses this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I just wanted a branch point where everything was captured and titles would render without style. Happy to change the approach to fit the architecture.

@ojw28
Copy link
Contributor

ojw28 commented Jan 5, 2017

This will be much easier to review and merge as a sequence of smaller pull requests. Could we start with just the extractor change (and corresponding tests / test assets)? That looks pretty easy to get merged as a first step.

I posted a comment about the decoder part that probably requires some work. It's fine to send the decoder pieces a single pull request, although it might be simpler if you send two; one adding basic support for plain text SSA, and the second adding in all the styling stuff. Up to you! Thanks.

private static String defaultStyleFormat = "Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding";
private String[] dialogueFormat;
private String[] styleFormat;
private Map<String,Style> styles = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the decoder to be completely stateless (i.e. styles would be a local variable in decode). This approach will definitely be fine for a side-loaded caption file. I'm not sure what happens in the MKV case. Does the header from which this data is parsed get prefixed onto every sample as it's output, and if not already is that possible to do? If so, this approach will be fine for that case too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was better to move the header from the mkv decoder to the SSADecoder just once

If you prefer the header state stored in the mkv decoder and passed every time that is possible but much less efficient as there are typically many styles defined and re-parsing them for every cue seems very wasteful.

If it were possible to pass something other than a byte[] to the decoder it would be fine, but the subtitle decoder isn't called directly by the container decoder so I have to recreate an octet stream to pass through the demuxer. Passing the styles every time probably increases the effective bit rate between the container decoder and the subtitle decoder by a factor of 10 or 20.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment; you should probably put the header into the Format init data instead, if that works. This will require some plumbing of the format data from TextRenderer to the decoder.

@@ -1453,6 +1502,10 @@ public void initializeOutput(ExtractorOutput output, int trackId) throws ParserE
case CODEC_ID_SUBRIP:
mimeType = MimeTypes.APPLICATION_SUBRIP;
break;
case CODEC_ID_ASS:
mimeType = MimeTypes.TEXT_SSA;
ssaHeader = codecPrivate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the header that you don't want to prefix onto each sample? If so, put it into the format object (by doing: initializationData = codecPrivate) and don't prefix it onto any sample at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can get the header into the format object and I can see it is there in the decoder factory createDecoder call in onStreamChanged in the TextRenderer.

But the decode method of the decoder is called from the decoder thread, not from the text.

If I plumb the format data into the decoder from the TextRenderer doesn't that make the decoder stateful again? The subtitle objects are created in the decoder thread and i can't see any way of passing Format data into each call to the decoder that would make it stateless.

The styles need to be accessible in the decoder at decode time if I'm going to create the spans at the time the cues are created.

private static String defaultStyleFormat = "Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding";
private String[] dialogueFormat;
private String[] styleFormat;
private Map<String,Style> styles = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment; you should probably put the header into the Format init data instead, if that works. This will require some plumbing of the format data from TextRenderer to the decoder.

…coder create method.

This doesn't make a stateless decoder but it is a step in the right direction.
Note that subtitles are not rendering. Is there something wrong with the timecodes?
@jcable
Copy link
Contributor Author

jcable commented Jan 9, 2017

Had a go. I think this is heading in the direction you suggest. I need a bit more guidance.

@jcable
Copy link
Contributor Author

jcable commented Jan 10, 2017

Having slept on it I realise that if I put a <span class="style name"> into the text the decoder doesn't need to keep the styles. Is that the sort of thing you have in mind?

Should separate lines of text in a subtitle go into separate cues?

@jcable
Copy link
Contributor Author

jcable commented Jan 13, 2017

I've noticed a difference in the timestamps passed to Subtitle.getNextEventTimeIndex between release-v2 and dev-v2. In release-v2 it works the way you explained and it fine when Cues are timed from 0 within a Subtitle (within a block?). In dev-v2 the timestamps passed look more absolute from the beginning of the stream. Here are some log outputs from getNextEventTimeIndex:

release-v2:

I/SSASubtitle: time 27254
I/SSASubtitle: time 14284
I/SSASubtitle: time 393533
I/SSASubtitle: time 285
I/SSASubtitle: time 2029
I/SSASubtitle: time 35277
I/SSASubtitle: time 9651
I/SSASubtitle: time 22227
I/SSASubtitle: time 2122
I/SSASubtitle: time 3891
I/SSASubtitle: time 9881
I/SSASubtitle: time 9933

dev-v2;

I/SSASubtitle: time 4236599
I/SSASubtitle: time 9610250
I/SSASubtitle: time 33021021
I/SSASubtitle: time 108873334
I/SSASubtitle: time 119647937
I/SSASubtitle: time 121220273
I/SSASubtitle: time 126186667
I/SSASubtitle: time 127568776
I/SSASubtitle: time 129281769
I/SSASubtitle: time 130566168
I/SSASubtitle: time 133366078
I/SSASubtitle: time 135076803

@ojw28
Copy link
Contributor

ojw28 commented Jan 13, 2017

Having slept on it I realise that if I put a into the text the decoder doesn't need to keep the styles. Is that the sort of thing you have in mind?

I'm not sure what this means; I think there might be some words missing from your sentence :)?

Should separate lines of text in a subtitle go into separate cues?

Not unless they're positioned differently. If it's just a line break then it should be a single cue that contains a line break.

@jcable
Copy link
Contributor Author

jcable commented Jan 22, 2017

Finally worked out what I had done to make my comment meaningless - unfamiliarity with markdown. It should now make sense.

…er but this doesn't seem to be how webvtt does it so not being used.
@jcable
Copy link
Contributor Author

jcable commented Jan 22, 2017

I had a look at how webvtt does things and as far as I can tell the style definitions are embedded in the text before the first cue. This seems very similar to the SSA/ASS approach so I don't understand what should be changed next.

import java.io.UnsupportedEncodingException;
import java.util.List;

import static android.R.attr.initialKeyguardLayout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these imports for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something Android Studio did when i wasn't looking ...

changes removed

if (format.sampleMimeType.equals(MimeTypes.APPLICATION_CEA608)
if (format.sampleMimeType.equals(MimeTypes.TEXT_SSA)) {
byte[] header = format.initializationData.get(1);
String dlgfmt = new String(format.initializationData.get(0), "UTF-8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid abbreviations. I don't understand what "dlgfmt" means, and I doubt most other people will either ;).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That name wasn't adding any value. I've removed it.

@ojw28
Copy link
Contributor

ojw28 commented Feb 20, 2017

This change is more likely to be merged if it's split into two pieces. One that adds basic support without styling in an obviously correct way, and a second that adds styling. I still don't think the styling stuff in this change is in line with what we do elsewhere, and I think it'll make things much easier if we can merge the non-controversial bit and discuss that separately.

@jcable
Copy link
Contributor Author

jcable commented Feb 21, 2017

That's fine. Can you suggest the rough shape of the changes you would like to get to the style-less version and I'll make those and then we can take stock.

@ojw28 ojw28 merged commit 402c985 into google:dev-v2 Jul 10, 2017
@ojw28
Copy link
Contributor

ojw28 commented Jul 10, 2017

Sorry for the long silence on this. I sync'd this on top of latest dev-v2, cleaned up some pieces, removed the styling and merged. So what's in dev-v2 now can serve as the "taking stock" point! Thanks.

@google google locked and limited conversation to collaborators Nov 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants