Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASS Subtitle Fixes #400

Closed
wants to merge 6 commits into from

Conversation

softworkz
Copy link
Collaborator

SubtitleProviderSsaAss: Add line break
SubtitleProviderSsaAss: Calc font size
SubtitleProviderSsaAss: Print offending tag in case of parsing errors
SubtitleProviderSsaAss: Prevent parsing errors due to ambiguous tags
SubtitleProvider: Fix for color strings like H&H2A4F5D&
SubtitleProviderSsaAss: Shadow is float

Fixes #397

@lukasf
Copy link
Member

lukasf commented Jan 10, 2024

Well done, thanks @softworkz!

Only we need to be very careful when changing the font sizing. We had different algorithms already, and some turned out to have weird effects on some files. We'd need to do lots of testing with different files, to make sure that sizing is consistent and in line with how most other players render the subs.

@softworkz
Copy link
Collaborator Author

Well done, thanks @softworkz!

Only we need to be very careful when changing the font sizing. We had different algorithms already, and some turned out to have weird effects on some files. We'd need to do lots of testing with different files, to make sure that sizing is consistent and in line with how most other players render the subs.

The problem was that after fixing the named style parsing (shadow), in many cases it started to really pick up the those styles (which it didn't before) and the result was that fonts were too big. Otherwise I wouldn't have even bothered touching it.

A little more background:

You can rely on the PlayResX and PlayResY values to be present in all ASS headers (all ass versions) as those are mandatory and crucial. These are frequently differing from the resolution of the video, because (naturally) subtitles are usually created just once and then used with different versions of a video.
What complicates this even more, is that the aspect ratio of the PlayRes might not match the aspect ratio of the video, and - even worse - the video might be anamorphic so that we must not only take the ratio between videoWidth and videoHeight but also the display aspect ratio (DAR) in case of anamorphic videos.
The correct procedure would be to calculate the display dimensions of the video and fit a rectangle into it with the aspect ratio of the PlayRes dimensions.
Then we can calculate the ratio between PlayResX/fittedDisplayRectWidth and PlayResY/fittedDisplayRectHeight. From these two values, we need to take the smaller one and that would be our scale factor for fonts and all other dimensions. Depending on how it fits (height or width equal) there's either a vertical or a horizontal offset remaining which needs to be added when positioninig.

Since that is too much for a series of small patches, I only used the heights to determine a scale factor for the font sizes. It's not perfect, but on the side of the most frequent cases at least (where the height factor is determining the scale).

So far so good, but there's one more caveat: There may be differences regarding the meaning of a font size of n pixels between Windows and ASS. And this can even differ by font. Getting that part right is really tough. Everything else can be calculated..

@softworkz
Copy link
Collaborator Author

Another note: The style parsing is not quite correct, because neither order nor count of fields is guaranteed, i.e. there can be more or less than 23 columns. You need to read the line with the field names first and then parse the values in each line accordingly.
What we are seeing and assuming to be a standard is just how Aegisub - the most popular ASS authoring application - is formatting the styles.
From a practical perspective - there's at least one diffference that actually exists out in the wild. I don't remember exactly which one it was.

@lukasf
Copy link
Member

lukasf commented Jan 10, 2024

Another note: The style parsing is not quite correct, because neither order nor count of fields is guaranteed, i.e. there can be more or less than 23 columns.

There is an official standard document for the SSA/ASS styles, for V4 and V4+. The version number defines which fields there are and their order. It is nowhere said that the order may be changed.

Doing dynamic parsing would totally blow up complexity here, just to catch some very few broken files, produced by broken non-standard compliant tools. I don't believe it's worth the hassle. And it's not like they won't work, it's just that their styles are not applied.

@softworkz
Copy link
Collaborator Author

There are many bits which are missing - many of them cannot even be replicated (as discussed recently), but what's important and doable is collission detection, which means that when two or more cues are visible at the same time (and not absolutely positioned), they need to be stacked in order not to overlap like in this example:

image

Correct it looks like this:

image

Another issue: There's a discrepancy in color. It's slight but noticeable and in case you wonder how I can say that the second is right and the first one is wrong, it's because I've watched that video a thousand times with different ASS implementations and in several subtitle authoring tools and it's always like in the second image.
Yet I haven't looked into it, as to whether it's an issue with parsing of the colors or caused by Windows.Media.

@softworkz
Copy link
Collaborator Author

What's further unfortunate (but probably nothing we can do about) is that Windows.Media uses a square-shaped pen for drawing the outlines which is causing ugly artifacts:

Windows.Media

image

Correct

image

(I'm not even talking about the edge blur and the shadow 😆 )

@lukasf
Copy link
Member

lukasf commented Jan 10, 2024

Huh, usually MediaPlayer does the stacking, I have seen it happening quite often. Actually, I have never seen subtitles overlap like that. Are you sure they are not positioned absolutely? They seem to be on slightly different Y positions. Or maybe this is one more WinUI thing?

I also noticed the slight color deviation when I started on SSA/ASS styles support. Not sure where it comes from. But the difference was low enough for me to not bother 😄 If you want to dig into this, you could enter the parsed RGB value into a graphics tool, to see how the color is supposed to look like.

And yeah, the outline rendering is a mess. Better not look too closely ^^

@softworkz
Copy link
Collaborator Author

There is an official standard document for the SSA/ASS styles, for V4 and V4+. The version number defines which fields there are and their order. It is nowhere said that the order may be changed.

Oh yes it does say that very clearly:

image

@softworkz
Copy link
Collaborator Author

Regarding ASS specidfication, I had some chat with the core developers of libass, because it is not complete in several means, and they said that the full spec they consider to be valid is the combination of this document together with the source code of AegiSub... 😉

@lukasf
Copy link
Member

lukasf commented Jan 10, 2024

I read it more in the sense of "you can do it dynamically to support new standards without having to rework". But there has never been a new standard after 4+. And 4+ defines exactly which new fields are added at which exact position (the ones in red).

And yes, Aegisub is the de-facto standard. It shows how the files are supposed to be. I'd see any other files as broken.

@lukasf
Copy link
Member

lukasf commented Jan 10, 2024

I also read somewhere that Aegisub is considered the more precise definition of the standard. The document just tries to make it more readable.

@softworkz
Copy link
Collaborator Author

Huh, usually MediaPlayer does the stacking, I have seen it happening quite often. Actually, I have never seen subtitles overlap like that. Are you sure they are not positioned absolutely? They seem to be on slightly different Y positions.

The different Y is probably due to being Asian letters. There's no absolute positioning. Just different styles:

image

@softworkz
Copy link
Collaborator Author

The text lines are probably better to look at:

Dialogue: 0,0:00:43.35,0:00:53.61,OP Romaji,,0000,0000,0000,,{\fad(250,250)\be1}{\kf38}zu{\kf31}t{\kf85}to {\kf174}shi{\kf54}ma{\kf48}t{\kf162}te {\kf91}o{\kf322}ku
Dialogue: 0,0:00:43.35,0:00:53.61,OP Kanji,,0000,0000,0000,,{\fad(250,250)\be1}{\kf38}ず{\kf31}っ{\kf85}と{\kf174}し{\kf54}ま{\kf48}っ{\kf162}て{\kf91}お{\kf322}く
Dialogue: 0,0:00:43.35,0:00:53.61,OP English,,0000,0000,0000,,{\fad(250,250)\be1}I'll keep it to myself forever.

@softworkz
Copy link
Collaborator Author

softworkz commented Jan 10, 2024

I read it more in the sense of "you can do it dynamically to support new standards without having to rework". But there has never been a new standard after 4+. And 4+ defines exactly which new fields are added at which exact position (the ones in red).

And yes, Aegisub is the de-facto standard. It shows how the files are supposed to be. I'd see any other files as broken.

They are working on 5.0, but who knows whether and when it will happen 😆

"you can do it dynamically to support new standards without having to rework"

Which means that current implementation should not rely on a fixed order and count so that they won't get broken by later versions of the spec.

But as said, there's one case where existing ASS files differ and that made me change the whole parsing in the ffmpeg implementation I did to use the field names for parsing.

@softworkz
Copy link
Collaborator Author

softworkz commented Jan 10, 2024

I don't know how the WindowsMedia layouting is working - could the overlap happen because both styles have margins (the same MarginV) and the auto-stacking does perhaps not work when you use the margins to set the region?

@lukasf
Copy link
Member

lukasf commented Jan 10, 2024

I don't know how the WindowsMedia layouting is working - could the overlap happen because both styles have margins (the same MarginV) and the auto-stacking does perhaps not work when you use the margins to set the region?

That could be the case.

They are working on 5.0, but who knows whether and when it will happen 😆

If they ever release a new standard, I'd support it like I did for V4 and V4+. Sure, dynamic parsing would be the better implementation. But it is way more complicated, and for me it's not worth all the effort, just to support a few broken files.

If it is very important for you, and you want to implement this once again, feel free to add it. As I said, I'd consider it the better implementation, it's just that I won't do it 😄.

@softworkz
Copy link
Collaborator Author

If it is very important for you, and you want to implement this once again, feel free to add it. As I said, I'd consider it the better implementation, it's just that I won't do it 😄.

Agreed and I understand that very well 😆

BTW, I just looked up what they consider to be the size of a font when it's set to 100% and in the source it is stated that a font size of 100% corresponds to 5% of the video's height...

@softworkz
Copy link
Collaborator Author

it's just that I won't do it 😄.

Damn - and I just wanted to suggest that I could send you a bunch of videos with ASS subs for testing, LOL...

@softworkz
Copy link
Collaborator Author

OMG - do you know how they're drawing the outlines? They are drawing the text 8 times!!! (N, NE, E, SE, S, SW, W, NW)

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 10, 2024

That's how age of empires 2 text used to work.
I can test your ASS files if you want to proceed with the implementation.

@softworkz
Copy link
Collaborator Author

softworkz commented Jan 11, 2024

That's how age of empires 2 text used to work.
I can test your ASS files if you want to proceed with the implementation.

Thanks. I will check back on how much time we want to invest here. The collission issue is obviously more important than the style parsing.

@softworkz
Copy link
Collaborator Author

OMG - do you know how they're drawing the outlines? They are drawing the text 8 times!!! (N, NE, E, SE, S, SW, W, NW)

And they even to it wrong:

    // Upper left
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, -offset, -offset));

    // Upper center
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, 0, -offset));

    // Upper right
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, offset, -offset));

    // Left
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, -offset, 0));

    // Right
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, offset, 0));

    // Lower left
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, -offset, offset));

    // Lower center
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, 0, offset));

    // Lower right
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, offset, offset));

Do you see what I mean?

@softworkz
Copy link
Collaborator Author

I have submitted a PR to improve the outline rendering: at https://github.com/microsoft/microsoft-ui-xaml/ PR nr 9224

@softworkz
Copy link
Collaborator Author

That's how age of empires 2 text used to work. I can test your ASS files if you want to proceed with the implementation.

It would be great when you could test a bit if you find the time. I have created that draft PR which includes all the fixes I have submitted: #405

Now I can of course send you some of my files for testing if you wish, but as I know that these are working (except animations), it would be much more interesting to have it tested with more and different files..

@lukasf
Copy link
Member

lukasf commented Jan 14, 2024

Wow you are really serious about this @softworkz. Good job. I never had the ambition to fully support ASS. For me, it was enough to have some of the basic styling working. But you are really taking things to a different level. I will try to do reviews on the partial PRs. But we should do testing on the combined PR, and merge that when we think it is all good.

The bad thing is that we are limited by the poor rendering done by the framework controls. Nice catch with the broken outline rendering. I's no surprise that it looks so horrible. Not sure how libass does it, but I'd guess it's more a combination of a single render plus a clever combination of effects to get the nice outline. But of course, increasing the render count (combined with knowing how a circle works 😄) is a major improvement to the MS solution.

Are you actually able to compile WinUI 3 from source? Or how have you done the comparison screens in the PR?

Copy link
Member

@lukasf lukasf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test if the font scaling is good for various files. But the implementation looks good.

@brabebhin
Copy link
Collaborator

From the files I've had in my usual testing battery this looks fine. Good work! I will try to create some brand new subtitles and mix them around to see how it goes.

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 14, 2024

@lukasf I've read somewhere that libass uses direct2D/directtext for this outline thing. It's how age of empires 2 Definitive Edition does it 😂

@softworkz
Copy link
Collaborator Author

Wow you are really serious about this @softworkz. Good job. I never had the ambition to fully support ASS. For me, it was enough to have some of the basic styling working. But you are really taking things to a different level

I've been shaken back and forth because I know our users and how they would have reacted, so we had discussed internally for how we want to proceed. My initial suggestion was to disable ASS subttitles and use your implementation only for all other text subtitles for which it is doing very well.
The downside of this is that users prefer to be able to DirectPlay (no server transcoding/transmuxing), so they wouldn't like missing ASS support either. Many of those are watching Anime videos with ASS subs, which often have animations included and using advanced ASS features and almost always custom fonts.
With the current state it would have been forseeable that I'll regularly get user reports on the table with certain files (ASS subs) not being shown properly and I want to avoid by all means that this would become a regular area of work for me in the future 😉,
So we decided to make this (ASS rendering) optional and call it "experimental".
Nonetheless, I invested those 3 days to make it as good as possible under the given constraints by WindowsMedia.

The big plus which made me stick to your implementation was that you had already implemented support for custom (embedded) fonts - which is always the first thing about which Anime fans are complaining. 👍

@softworkz
Copy link
Collaborator Author

Are you actually able to compile WinUI 3 from source? Or how have you done the comparison screens in the PR?

No I can't compile - I didn't even retry with the 1.5 branch. Instead I have created a UWP project which replicates their current method and made sure that it gives identical results to their current implementation. Then I implemented my proposed solution, did the comparisons and finally translated it back to C++ code.
If you are interested, I can send you that project. It's very simple.

@softworkz
Copy link
Collaborator Author

@lukasf I've read somewhere that libass uses direct2D/directtext for this outline thing. It's how age of empires 2 Definitive Edition does it 😂

They use DirectWrite on Windows, but I don't know how they do the outlines.

@lukasf
Copy link
Member

lukasf commented Jan 14, 2024

They use DirectWrite on Windows, but I don't know how they do the outlines.

Based on the output, I am pretty sure they use some kind of gaussian blur on the text for the outline, plus maybe some additional effects for fine tuning the look. See how it fades out smoothly to the edges. You cannot get such a smooth fade-out by just rendering the text multiple times. Same is probably done for the shadow, just with some offset and different color/alpha.

At least, I used blur effect to create text shadow in an app, very long ago. It had kind of a similar look.

@softworkz
Copy link
Collaborator Author

They use DirectWrite on Windows, but I don't know how they do the outlines.

Based on the output, I am pretty sure they use some kind of gaussian blur on the text for the outline, plus maybe some additional effects for fine tuning the look. See how it fades out smoothly to the edges. You cannot get such a smooth fade-out by just rendering the text multiple times. Same is probably done for the shadow, just with some offset and different color/alpha.

At least, I used blur effect to create text shadow in an app, very long ago. It had kind of a similar look.

I made an extreme test in Aegisub with a real large outline:

image

This reveals that they are converting the font to a path and drawing it using a circle-shaped pen.

@softworkz
Copy link
Collaborator Author

For the shadow, they are taking the alpha mask of the rendered text (including outline) and painting it with an offset and with some opacity:

image

The edge blur is simply a gaussian blur applied to the alpha mask:

image

@softworkz
Copy link
Collaborator Author

softworkz commented Jan 14, 2024

The edge blur is simply a gaussian blur applied to the alpha mask:

Oh, no - not quite:

image

It's a bit more sophisticated. It's a blur which uses the sorrounding as input but output is only inside the actual region, so it cannot grow larger than it is.

(the image is done with a high blur value but no opacity applied to any color)

@lukasf
Copy link
Member

lukasf commented Jan 14, 2024

Interesting, thanks for the analysis.

@lukasf
Copy link
Member

lukasf commented Jan 14, 2024

Yeah I forgot that outline and blur are separate parameters in ASS. It's just that UWP/WinUI does not support blur.

@brabebhin brabebhin closed this Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

ASS Parsing Errors
3 participants