Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ASS header field "transfer function" (to define HDR vs SDR etc) #297

Open
madshi opened this issue Jan 24, 2018 · 192 comments
Open

add ASS header field "transfer function" (to define HDR vs SDR etc) #297

madshi opened this issue Jan 24, 2018 · 192 comments
Labels

Comments

@madshi
Copy link

madshi commented Jan 24, 2018

Hey there,

UHD HDR Anime content is starting to show up. Now fansubbers might start doing custom subtitles for HDR content. If we don't mark such subtitles as being made for HDR, video renderers will be in trouble, because they won't know for sure if the subtitles were made for SDR or HDR. Both is possible, because users might just reuse an SDR ASS file for the HDR video. Or users could create a new subtitle file for HDR.

So I'd like to suggest that we discuss a potential new ASS header field, e.g. something like this:

"YCbCr Transfer Function" = 709 (or "Gamma"?) | 2084 (or "PQ"?) | HLG | ...

I'm not sure if HLG even needs an extra entry, probably not because it's supposed to be compatible to SDR.

Now a video renderer could run into the 5 following situations:

  1. Video is HDR, output to TV is HDR, subtitle is HDR. This one is easy.
  2. Video is HDR, output to TV is SDR, subtitle is HDR. Subtitle should be blended on video before tone mapping.
  3. Video is HDR, output to TV is SDR, subtitle is SDR, Subtitle should be blended on video after tone mapping.
  4. Video is SDR, subtitle is SDR. This one is easy.
  5. Video is SDR, subtitle is HDR. Subtitle needs to be tone mapped.

What do you think?

Of course for this all to work properly, we'd probably need to add support for it in Aegisub, libass, (xy-)VSFilter and XySubFilter, plus in all (good) video renderers. I think the subtitle renderers (libass, VSFilter etc) probably only need to expose the header information to the video renderer. The video renderer should really be resposible for doing the dirty work. So it shouldn't be a lot of work for the subtitle renderers.

I could do the work for XySubFilter, maybe (xy-)VSFilter. You guys would have to cover libass. Hopefully somebody could be found to address Aegisub. Unfortunately the Aegisub forum seems to be down. Not sure if there's still any active Aegisub dev?

P.S: Oh, and while we're at it, the "YCbCr matrix" header field needs to officially get support for BT.2020, of course, maybe also for DCI-P3, for completeness sake. Does libass expose the "YCbCr matrix" field already? If not, it should do that, so the video renderer can apply color correction, if necessary.

@ghost
Copy link

ghost commented Jan 24, 2018

Insanity. Subtitles are sRGB, convert them in the renderer.

@madshi
Copy link
Author

madshi commented Jan 24, 2018

fansubbers need perfect color matching for some of their effects to work properly. There's no standard for HDR to SDR conversion. So after tone mapping, every video renderer and CE device will end up with different RGB colors. So how can we possibly achieve a perfect color match for HDR content with HDR subtitles, if you consider subtitles to be "simple" sRGB?

@ghost
Copy link

ghost commented Jan 24, 2018

Fuck them and their insane shit? Simple as that. You could also pick a sane approach like increasing color precision instead of furthering a complicated shitawful hack, but why bother.

@madshi
Copy link
Author

madshi commented Jan 24, 2018

I'm never opposed to increasing color precision, but how would it help in this specific case? As I said, everybody does HDR to SDR conversion differently. It's not an issue of precision, but of using different tone mapping curves. Even if you use the same curve, if the tone mapping luminance and gamut targets are different, you already end up with totally different sRGB pixels after tone mapping, regardless of how high a precision you're using.

To be honest, although I respect your opinion, I don't fully understand it. The whole purpose of ASS subtitles is to do crazy subtitle effects. If you don't feel the love for that, why not stick to SRT instead? Since this project is named "libass" and not "libsrt", I was hoping that the libass devs would share the love for crazy effects, and share my motivation to try to get things rendered exactly as the (positively) crazy ASS subtitle author intended.

@astiob
Copy link
Member

astiob commented Jan 24, 2018

Hi @madshi, nice to see you here!

/cc @tgoyne regarding Aegisub.

I’ll start with the easy part:

Does libass expose the "YCbCr matrix" field already? If not, it should do that, so the video renderer can apply color correction, if necessary.

It does, and mpv does apply colour correction based on this field as far as I know. Correct me if I’m wrong, but I imagine it does this both in vf_sub (which blends subtitles onto the video stream à la standalone VSFilter) and in video output modules (comparable to XySubFilter), at least vo_opengl. (What about others?)

libass does not yet support parsing BT.2020 though. I forget whether xy-VSFilter and XySubFilter support this, but in any case I don’t think Aegisub supports writing this. Of course, Aegisub isn’t the only way to create subtitles, but all in all it seems very unlikely that any subtitles exist that use BT.2020 at the moment.

Now, thinking about the present is all well and good, but we should also look into the future. Hence this request and discussion!

I think I get the general gist of what you’re proposing, although I must admit I haven’t actually thought through the conversion chains you’re listing. By the way, could you very briefly describe how HDR video differs from SDR video? I seem to be behind the curve on this. I’m familiar with gamma/piecewise-gamma transfer functions like sRGB/scRGB-nl, BT.709/BT.1361 and 12-bit BT.2020 and with RGB gamuts, but it sounds like this is different from both.

Basically, your idea is that a subtitle file designed for a video in one colour space could be reused with a video in another colour space that looks the same and produce the same result. Sounds good. I think you are also making an underlying assumption that the “RGB” values in the subtitle file use the exact same colour space as the original video.

Is there really no way to make this more uniform and future-proof? For example, as @wm4 suggested, declare that the ASS RGB always means sRGB. Essentially, assume that there is always a header saying “YCbCr Transfer Function = sRGB”. (Actually, shouldn’t that say RGB transfer function?)

So how can we possibly achieve a perfect color match for HDR content with HDR subtitles, if you consider subtitles to be "simple" sRGB?

Is the problem that HDR content may contain more colours and there is no way to match them using sRGB subtitles? That does seem troublesome. :-( It would probably be good indeed to provide a way around this, but I would still love to minimize the effect this has on ASS.

There's no standard for HDR to SDR conversion. So after tone mapping, every video renderer and CE device will end up with different RGB colors.

Does this mean that even with your proposed header the output will be different in each renderer? Or does the header specify exactly how the tone mapping must go? But if the header says “this ASS is SDR” and the video is HDR, don’t we have the same problem again?

Fuck them and their insane shit? Simple as that.

What exactly is insane about matching colours? How does fucking them help? It would be very nice if you spent a bit of effort to understand the problem, think about potential solutions and actually argue your point rather than just spew insults.

You could also pick a sane approach like increasing color precision

Please elaborate on this. It sounds like you have a solution in mind.

For one thing, how do you propose to do this in ASS?

For another, as I understand the term HDR, it expands the colour range rather than or in addition to increasing colour depth. So in terms of 8-bit sRGB, you might need a (260, 270, −4) colour, for example. Do keep in mind that there are actual displays with gamuts wider than sRGB that can display such colours.

@astiob
Copy link
Member

astiob commented Jan 24, 2018

As a further note, this is actually “sane” if I understand it correctly, unlike the YCbCr Matrix header (which essentially redefines “RGB”). Many RGB-based formats allow specifying which kind of RGB they’re using, and the proposed header would do the same in ASS. This makes sense because RGB by itself means nothing. Of course, it we could mandate a single kind of RGB and have it be enough, then adding this extra configurability would make the format more complex, possibly needlessly so. However, so far it seems that a single kind of RGB can’t be enough, unless we also somehow allow R/G/B values to be outside of the 0–255 range.

For what it’s worth, CSS has traditionally mandated sRGB, but as far as I understand, work is ongoing right now to add support for profile-annotated RGB values.

@madshi
Copy link
Author

madshi commented Jan 24, 2018

@astiob, thanks for the nice welcome!

Let me try to explain the color matching problem. Probably most of what I'll explain you already know. But it doesn't harm to recapture the situation, I guess. So:

If we try to achieve subtitle color perfection, ideally the subtitles should have always been YCbCr instead of RGB, because YCbCr is the format almost all videos are encoded in. Unfortunately, ASS was designed to specify RGB colors, for some reason, which makes our life quite a bit more complicated than it could/should have been.

Generally, if all video players did YCbCr -> RGB conversion in exactly the same way, it wouldn't even be a problem. In that situation the ASS RGB colors would always match the video after YCbCr -> RGB conversion. However, some video players might use different decoding matrixes for the video (e.g. BT.601 instead of BT.709), or might decode to different levels (16-235 vs 0-255) etc. What is worse, VSFilter originally always used BT.601, even for BT.709 videos, which means there are many subtitles out there which have RGB colors that match the BT.709 video only if you use BT.601 for the video's YCbCr -> RGB conversion! Very ugly, indeed. Because of this problem the "YCbCr matrix" field was added to the ASS header, which allows interested parties (e.g. the video renderer) to understand which matrix the subtitle was created for, and if that happens to mismatch the video's actual encoded matrix, extra measures could be applied (e.g. by the video renderer) to correct for the mismatch.

It's all a mess, but that doesn't need to stop us from trying to achieve the best possible results, right?

Now what's different about HDR? First of all, there are various competing HDR formats, unfortunately. However, UHD Blu-Ray only supports one of those, so that might just be the most important.

The HDR transfer function used by UHD Blu-Ray is specified in SMPTE ST. 2084. It's often called "PQ" (for Perceptual Quantisizer) and it's VERY different to the "gamma" transfer function that is typically used for SDR content. The "gamma" transfer function is a relative transfer function. It doesn't specify which Y (Luma) value maps to which Nits value. It just specifies the relative Nits distance between neighbor Y values. In contrast to that PQ is an absolute transfer function. An "Y" value of 1.0 (in PC levels) specifies that the output pixel should have a luminance of 10,000 Nits. Basically every Y value has a specific Nits value assigned to it.

Because the transfer functions are so very different, it's extremely important to know what we're dealing with. If a video renderer applies the wrong transfer function, the resulting video will look terribly wrong. The same applies to subtitles: If subtitles were made for SDR but then are blended onto the HDR pixels in PQ, the end result would be terrible: The subtitles would be MUCH too bright. Like sunlight bright. Or if subtitles were made for HDR but then are blended onto the video after HDR -> SDR conversion, the subtitles would be much too dim.

UHD Blu-Ray also uses BT.2020, which is a lot larger than BT.709. So that's also something we can't ignore if you want to achieve color matching.

The big problem with HDR playback is that for once the spec was created ahead of technology! Which I find a very good approach. Basically no consumer display today can do 10,000 Nits, but the HDR spec already supports it! But as a result that means that HDR content needs to be compressed into what the actual display can do. If the display can only do 400 Nits, then the 10,000 Nits content needs to be compressed into 400 Nits. This is done by compressing e.g. the pixels from 0 - 200 Nits very slightly or not at all, but all pixels above a certain threshold get compressed a LOT. But because there's no standard for the whole HDR -> SDR conversion, everybody does it differently. So if you agree with wm4 and want to consider ASS subtitles to always be simple sRGB, then you would simply blend them on the video after HDR -> SDR conversion. However, since the HDR -> SDR conversion is not specified anywhere, and everybody does it differently, you'd never get correct color matching that way.

As I said in the beginning: The ideal solution would have been to create the ASS spec from the start to be YCbCr, matching the encoded video before any conversion. But since we can't change ASS now to YCbCr, we have to workaround the issue.

So my suggestion is to extend the ASS header information ever so slightly, just to collect a little bit more information. It's then up to each video renderer / media player to either ignore and make use of that added header information. The added information won't break anything, so why not adding it, and give interested devs the chance to try to achieve perfection?

@astiob
Copy link
Member

astiob commented Jan 24, 2018

Could you explain in a bit more detail how a renderer would go about rendering video and subtitles with mismatching HDR configuration? It seems that this is impossible if there is an extra step that converts back from HDR to (sort-of-)SDR for display that is not standardized.

So [should] ASS subtitles […] be simple sRGB, then you would simply blend them on the video after HDR -> SDR conversion. However, since the HDR -> SDR conversion is not specified anywhere, and everybody does it differently, you'd never get correct color matching that way.

So if we add this header and then a renderer gets a file that contains HDR video and ASS subtitles explicitly marked sRGB, what will it do?

@astiob
Copy link
Member

astiob commented Jan 24, 2018

Shouldn’t the proper chain of rendering sRGB subtitles with HDR video be “convert subtitles to the same HDR as used by the video exactly per the HDR spec; then blend them onto the video; then convert the resulting HDR for display in any way you like”? Is there an actual problem with this—e. g. that there are video colours that are impossible to match in sRGB should the subtitles want this—or is this an acceptable solution?

@madshi
Copy link
Author

madshi commented Jan 24, 2018

Just to clarify: If you say "sRGB subtitles", do you mean subtitles that the subtitle author originally created for an SDR video? Or do you also mean subtitles that were created directly for the HDR video?

@astiob
Copy link
Member

astiob commented Jan 24, 2018

Er… Isn’t sRGB an absolute term? RGB using the sRGB/BT.709 gamut compressed with the sRGB transfer function. I’m guessing you’re calling this SDR, but I’m not sure.

Edit: Although I don’t suppose sRGB defines absolute nits, does it? So not quite that absolute. Is this a problem?

@madshi
Copy link
Author

madshi commented Jan 24, 2018

Ah yes, you're right. Ok, let me think through a couple possible scenarios that come to my mind:

Content options:
A) BT.2020 HDR UHD Blu-Ray.
B) BT.709 SDR Blu-Ray.
C) BT.601 SDR DVD.

Subtitle options:
S) BT.2020 PQ subtitle.
T) BT.709 Gamma subtitle.
U) BT.601 Gamma subtitle.
V) "None" Gamma subtitle.
W) Gamma subtitle without "YCbCr matrix" header field.

And for HDR playback, the media player could either:
H) Output HDR (PQ) untouched to the display.
I) Convert to SDR (Gamma) and send that to the display.

That makes 30 possible combinations. Ouch. Let's think about a couple "interesting" combinations:

  1. A+S+H: The only mismatch here is video YCbCr and subtitle RGB. Since the subtitle YCbCr matrix matches that of the video, we could either convert the subtitles to YCbCr and blend them unto the YCbCr video, or we could convert the video to RGB and then blend the RGB subtitles. Easy.

  2. A+S+I: The subtitles are for HDR/PQ, so the logical solution would be to blend them onto the video before converting the video to SDR/Gamma. However, it would also be possible to convert both the video and the subtitles to SDR/Gamma, using the same conversion curves, and then do the blending.

  3. A+T+H: No hope of color matching, so we can only do our best to approximate. Meaning we would convert the sRGB subtitles to BT.2020 HDR and then blend them on the video. We need to adjust the subtitle gamut/saturation (BT.709 vs BT.2020), otherwise the subtitles would have too high saturation. A BT.709 RGB 255 value is full BT.709 saturation. For BT.2020 that should be much lower than 255 to make the pixel look "the same". An alternative approach would be to convert the video to SDR, then blend the subtitles, then convert the video back to HDR, but that seems somewhat crazy.

  4. A+T+I: We could use the same approach as 3), but an easier solution would be to first convert the video to SDR, since we need to do that, anyway, and then blend the untouched subtitles afterwards.

  5. B+U (probably the same as B+W): This is a very common situation. In order to achieve proper color matching, we can either convert the subtitles from BT.601 to BT.709, or alternatively we could convert the video to BT.601, then blend the subtitles, then convert the video back to BT.709. Ugly, but achieving perfect color match is possible.

Makes sense?

@astiob
Copy link
Member

astiob commented Jan 24, 2018

  1. Yes.

  2. This raises a good question, actually: I assume the transfer curve is nonlinear; so blending in HDR and blending in SDR will produce different results, right? We would need to decide which is the right/better option. If we assume A+S+H does blending in HDR, then A+S+I should too.

  3. Is it even possible to convert sRGB to BT.2020 HDR? If the HDR defines exact luminance but sRGB doesn’t, then you’d have to make a wild guess about the brightness. YCbCr matrix and RGB gamut (primaries) conversion is well-defined, which I think is what you’re talking about with respect to BT.709 vs BT.2020; the only part that worries me is the absolute luminance.

  4. Right, this is what you suggested before.

  5. Yes (and indeed the same as B+W). This is what already happens at the moment. Usually the subtitles are converted to BT.601 and blended onto the YCbCr video or the subtitles are converted to BT.601 and back to RGB using the BT.709 matrix and then blended onto the video after it is already in RGB (but I’m sure you know this from dealing with XySubFilter), but all options produce the same result modulo quantization errors etc.

If HDR really can’t be converted to/from SDR in a well-defined, standard way, then what do we stand to gain from adding this header at all? I may be wrong, but I think currently renderers blend ASS subtitles onto video in the video’s native gamma-compressed RGB colour space. If we just want HDR subtitles to work on HDR video, then we don’t need to change anything: they will be blended onto the video in HDR.

(…Wait. When you say blend in HDR, do you mean the HDR-compressed RGB or the linear RGB? The compressed one, right?)

If, as it sounds, we can’t make SDR+HDR combinations work anyway, then is it even worth trying? How better of a result can a renderer achieve by guessing at the brightness than by misrendering SDR as HDR? Would it even be better, or would it be better to make it so obviously wrong that nobody releases such a file in the first place? (Although someone might try to combine an external subtitle file with a different video file, but that’s prone to error anyway.)

@madshi
Copy link
Author

madshi commented Jan 24, 2018

Good points.

You're right that due to the difference between relative and absolute transfer function, we can't ever get mixed HDR/SDR combinations perfect. However, it's not that hard to produce reasonable results. For example, peak white is usually considered to be around 100 Nits, so we if wanted to convert SDR subtitles to HDR, we could aim for around 100 Nits. It wouldn't produce perfect results, but it should be more than acceptable. Of course color/luminance matched subtitles wouldn't be perfectly matched, anymore.

If HDR really can’t be converted to/from SDR in a
well-defined, standard way, then what do we stand
to gain from adding this header at all? I may be
wrong, but I think currently renderers blend ASS
subtitles onto video in the video’s native gamma-
compressed RGB colour space. If we just want HDR
subtitles to work on HDR video, then we don’t need
to change anything: they will be blended onto the
video in HDR.

The exact way video renderers blend subtitles is not standardized, each video renderer can decide for itself. There is an argument for blending subtitles late in the processing chain, because if video upscaling is involved, rendering the subtitles in the output resolution will result in a nicer anti-aliasing quality. Anyway, I don't think we need to define what a video renderer should do exactly. I'd just like to make it possible for a subtitle file to specify which exact video format (matrix + transfer) it was made for, so I could try my best to render the subtitles as near to the subtitle author's intention as I possibly can.

E.g. consider the following situation: A 1080p Blu-Ray is released. A fansubber spends many many hours on making a perfect subtitle for this release. Later a 4K HDR Blu-Ray of the same movie is released. Now a user might want to simply reuse the same subtitle file. Ok, colors might not match perfectly, but wouldn't it be nice if it still worked "ok"? If the video renderer would naively blend the subtitles on the native transfer-function-composed RGB color space, then in this situation the subtitles would be as bright as staring into the sun, because a white SDR pixel (which should be around 100 Nits) becomes 10,000 Nits bright if rendered naively in HDR.

If we add a header field to the subtitle file, then the video renderer can detect an HDR/SDR mismatch and can correct for it. Not perfectly, maybe, but reasonably. So why not allow that?

do you mean the HDR-compressed RGB or the linear RGB

Both is possible, but I think it's usually done in compressed RGB.

@madshi
Copy link
Author

madshi commented Jan 24, 2018

P.S: Just to clarify: I believe most pixels in a subtitle are either totally opaque or totally transparent. For both of these it doesn't matter if we blend in linear RGB or compressed RGB. It should only matter for half-transparent pixels, which are probably only the borders of each letter. There will be slight differences there in linear vs compressed RGB, but I don't think any user would notice, so I think we can safely ignore that.

@rcombs
Copy link
Member

rcombs commented Jan 24, 2018

Keep in mind, btw, that 1080p HDR exists, as does 4K SDR, and BT2020 with Gamma (though I'm not aware of any real content with BT709 and PQ). So we will want to make sure to specify colorspace separately from transfer function.

wouldn't it be nice if it still worked "ok"?

For the record, the only HDR release of anime I'm aware of (Your Name), apart from being a poor upscale from 1080, was manually color-corrected from BT709-space RGB gamma to BT2020 PQ per-scene, so even with perfect headers, tone matching, etc., this wouldn't match. But I get your point in the general case.

@astiob
Copy link
Member

astiob commented Jan 24, 2018

I believe most pixels in a subtitle are either totally opaque or totally transparent. For both of these it doesn't matter if we blend in linear RGB or compressed RGB. […] There will be slight differences there in linear vs compressed RGB, but I don't think any user would notice […]

For text, true, good point. (Although in some circumstances the difference is noticeable—but ASS is doing it the wrong way anyway, blending in compressed RGB. I remember seeing discussions about Linux desktop environments rendering text in compressed RGB and how this was making some letters noticeably thicker than others. But this probably doesn’t happen often in ASS, because it tends to have larger font sizes.)

But ASS also contains explicit vector drawings, especially in the crazy awesome subtitle effects that we’re dealing with here. And a good fraction of those is semitransparent, and blending them in linear RGB would be rather different from compressed. There are also animated fades, and they are often synchronized to compressed-RGB fades in the video. There is also blur; I don’t know how bad/different linear-RGB blur would look in general, but to complicate matters further, occasionally complex structures are built out of many blurred pieces, which I imagine would fall apart if rendered in linear RGB.

The exact way video renderers blend subtitles is not standardized, each video renderer can decide for itself. There is an argument for blending subtitles late in the processing chain, because if video upscaling is involved, rendering the subtitles in the output resolution will result in a nicer anti-aliasing quality.

Note that this doesn’t affect the RGB colour space unless the renderer performs gamut correction or HDR conversion. This hasn’t been a big problem until now because gamut correction happens rarely [in anime] and usually isn’t terribly visible, but with HDR on the horizon it seems this will change. Now is as good a time as any to define that subtitles should have the same colours as if they were blended at any particular point in the rendering chain.

(By the way, rendering/blending subtitles after upscaling is only obviously good for simple text: people make some crazy effects—and even some not so crazy, e. g. gradients or karaoke—pixel-by-pixel or pixel-row-by-pixel-row, which breaks after upscaling by a fractional factor because the pixel[ row]s start overlapping. But ASS currently provides no reliable way for a renderer to distinguish between those and normal text, so renderers simply choose to optimize for one or the other and display all subtitles after or before scaling.)

I'd just like to make it possible for a subtitle file to specify which exact video format (matrix + transfer) it was made for, so I could try my best to render the subtitles as near to the subtitle author's intention as I possibly can.

This seems like a very good intention. But if you don’t mind, first I’d just like to get us on the same page regarding the “…which video format subtitles are made for…” part. This assumes that subtitles are made differently for different video formats; but why? Is the reason the fact that HDR has absolute luminance and hence we need at least one option with absolute luminance in ASS? Is this true? Is there anything else?

@madshi
Copy link
Author

madshi commented Jan 24, 2018

@rcombs, fully agreed!

@astiob, fair enough. I've never even thought about blending in linear vs compressed RGB making a noticeable difference. I'm not completely sure what we could do about that, though? I think 99% of all renderers out there blend in compressed RGB, so probably we should consider that the rule?

This assumes that subtitles are made differently for different video formats; but why?

If we want perfect color matching for an SDR movie, then we need to author the subtitle for that exact video encoding. If we want perfect color matching for an HDR movie, then there's no way to achieve that with an SDR subtitle, because (as explained before) the HDR -> SDR conversion is not standardized. So if we want to support perfect subtitle color matching for HDR movies, then there's no other way to achieve it technically than to create an HDR subtitle which is then blended onto the HDR video before modifying the transfer function or gamut in any way.

@MrSmile
Copy link
Member

MrSmile commented Jan 27, 2018

I think 99% of all renderers out there blend in compressed RGB, so probably we should consider that the rule?

Personally I'm strongly against that. Nonlinear blending is physically incorrect, ugly and should never be used. It's especially noticeable in blur and alpha blending. For example, correct linear blur (left) vs. broken as usual (right):
Blur and Gamma
Incorrect blur in sRGB space have visible dark halos. Video explanation of that effect can be found here.

While we cannot do anything about legacy broken by design standards (i. e. switch \blur to gamma-correct one), any new specification (HDR/ASSv5) should aim to fix that bug.

@madshi
Copy link
Author

madshi commented Jan 27, 2018

@MrSmile,

I'd be fine with "requiring" linear blending in a future ASS spec version, or in a whole new subtitle format. But I don't think tying linear blending to a tiny new HDR information header field (as discussed here) makes a lot of sense. Just my personal opinion, of course.

But just to make sure I understand your last sentence correctly: Are you saying that subtitles based on the current ASS spec need to be blended in non-linear light to produce the desired look?

@MrSmile
Copy link
Member

MrSmile commented Jan 27, 2018

But I don't think tying linear blending to a tiny new HDR information header field (as discussed here) makes a lot of sense.

As far as I understand GPUs simply don't have sRGB-like HDR texture/framebuffer formats. Not as there is a need for that: usual floating-point formats are both linear and have improved resolution in the dark. So in case of HDR linear blending is the default.

Are you saying that subtitles based on the current ASS spec need to be blended in non-linear light to produce the desired look?

Doesn't that the way VSFilter doing it? So long as VSFilter = ass spec we have to incorrectly blend in sRGB.

@astiob
Copy link
Member

astiob commented Jan 27, 2018

I think 99% of all renderers out there blend in compressed RGB, so probably we should consider that the rule?

Yeah, I think so.

But just to make sure I understand your last sentence correctly: Are you saying that subtitles based on the current ASS spec need to be blended in non-linear light to produce the desired look?

Yep.

Personally I'm strongly against that. Nonlinear blending is physically incorrect, ugly and should never be used. […] Incorrect blur in sRGB space have visible dark halos. […] While we cannot do anything about legacy broken by design standards (i. e. switch \blur to gamma-correct one), any new specification (HDR/ASSv5) should aim to fix that bug.

You’re quite right, but here we’re talking about ASS, not a new specification.

As far as I understand GPUs simply don't have sRGB-like HDR texture/framebuffer formats. Not as there is a need for that: usual floating-point formats are both linear and have improved resolution in the dark. So in case of HDR linear blending is the default.

I’m not sure how this is relevant to ASS. Besides, you can stuff anything you like in a texture and lie to the GPU about the linearity of it if all you want it to do is blend two things together.

because (as explained before) the HDR -> SDR conversion is not standardized

Only because HDR has absolute luminance and SDR doesn’t, right? Sorry I’m repeating this over and over, as our further actions probably don’t depend on the answer, but I’m just trying to understand this myself.

Other than that, I think you’ve reaffirmed the impression I got earlier.

Fundamentally, we need a way (a) to fix the compression curve shape and (b) to fix the absolute luminance scale.

For the curve shape, we could declare that all ASS subtitles henceforth use the sRGB curve. Although it now occurs to me that this would actually break all existing subtitles unless media players treat video as sRGB. @wm4, I didn’t realize this initially when you proposed that ASS is/should be sRGB, but isn’t this a problem? Existing subtitles are made to match video when rendered in the same nonlinear RGB space as the video itself, and almost no video is sRGB. (Really, it’s even worse, as mostly it’s not clear at all what transfer curve any particular video is supposed to use for decoding. @madshi, please correct me if I’m wrong, but even when the video is explicitly tagged with e. g. the BT.709 transfer curve, it only really means that this curve was used by the camera, whereas the expected viewing conditions might differ—if the tag is even true in the first place.)

Anyway, at least theoretically we could declare that all ASS subtitles use the sRGB curve and hope to get away with wreaking not too much havoc.

It seems implausible to declare that all ASS has a fixed luminance scale though. And even if we do, it sounds like combining 8-bit sRGB with any fixed luminance scale is going to be grossly insufficient for allowing colour matching with HDR video, as either the black-to-white luminance range will be too low (hey, that’s why HDR exists) or, if we extend the range to cover all or even most of HDR, the 8-bit precision will be too shallow to allow even remotely precise colour control (and existing subtitle colours will be significantly shifted as well).

So to sum up, it seems to me that we do physically need at least one new setting in ASS for switching to HDR, and there’s no way around it.

It seems like it might be enough to have a simple Boolean setting switching between a specific, constant flavour of HDR and SDR. This would make HDR subtitles have perfectly well-defined colour spaces and colours (we could even ignore the YCbCr Matrix header for them), but it would not solve the problem that SDR currently has that the transfer curve and RGB gamut are unspecified. Maybe we don’t want to solve this, either, and prefer the simplicity of a single Boolean flag while accepting that the status quo will remain for SDR.

@astiob
Copy link
Member

astiob commented Jan 27, 2018

But I don't think tying linear blending to a tiny new HDR information header field (as discussed here) makes a lot of sense.

I guess it’s possible.

HDR is (from the sounds of it) so different from traditional gamma functions that blending in HDR-compressed coordinates feels almost like a third kind of blending in addition to gamma and linear. Like, I’m sort of used to how gamma-blending looks and how linear blending looks and I know what I can expect to see if I blend two things together with a certain alpha in one or the other. I have no idea what the result will look like in HDR-compressed coordinates. So maybe it’s not quite wise to simply blend in HDR coordinates because it’ll surprise people, but I don’t know.

The colour values themselves will also be so different from usual that people will have to adjust anyway. And they’ll have to adjust to the new blending as well. Since blending will look different anyway, we may as well choose the new blending to be linear and have people adjust to that.

From a different perspective though, complex effects in ASS are most often done to imitate special effects present in the source video. And I don’t know about HDR, but I reckon SDR source videos tend to have effects such as fades rendered in gamma-compressed colour. So it’s simply easier for the subbers if ASS uses gamma-compressed colour. If the same is true for HDR about HDR-compressed colour, then it would be easier as well to use HDR-compressed blending in HDR ASS. …But what if the video uses a different HDR space from the subtitles?

@madshi
Copy link
Author

madshi commented Jan 28, 2018

@MrSmile, the transfer function doesn't have much (if any) effect on the texture formats the video renderer uses. And actually, if I want to output HDR to the display, I'm outputting 10bit integer, not floating point. It's also theoretically possible to output linear light floating point to the GPU, but that would require the GPU driver to convert this back to 10bit/12bit non-linear integer behind my back, because that's the only format HDMI supports. And my experience with relying on GPU drivers to do conversions "correctly" is not very good. So I much prefer doing everything myself, sending integer textures to the GPU for output.

@astiob, one curve being relative and the other absolute is certainly a pretty dramatic difference. But there's more: An HDR encoding has pixels in it which are much brighter than what any SDR display can handle. So converting an HDR encoding to SDR is automatically lossy. You can either clip all the HDR luminance values to SDR. Or you can try to use some kind of compression curve. In any case, you have to actually modify the pixel luminance values in linear light in a non-linear fashion. Clipping looks really ugly, so any decent HDR -> SDR conversion routine applies some sort of compression curve. But the exact compression curve is not standardized. E.g. some displays modify every pixel's luminance, compressing dark image areas only very slightly and compressing highlights a lot. Other displays don't modify dark pixels at all, but start compressing only somewhere in middle gray/luminance. So the key factor is that an HDR -> SDR encoding actually modifies the pixel values in a non-trivial non-linear way that is lossy and can't be reverted or predicted.

As a result, if you try to use an SDR subtitle track with an HDR video, you're in trouble. If you blend in HDR compressed RGB, it will look totally wrong. If you blend after having converted the video from HDR to SDR, it will still not be perfect because everybody does HDR -> SDR conversion differently, so the result is unpredictable. Which means, technically the only way to achieve color matching with HDR videos is to use a subtitle track which is custom tailored to the HDR video and is blended onto the HDR video before it's converted to SDR.

Hope that makes sense?

About SDR transfer functions: IIRC 99.9% of all SDR videos use BT.709/BT.601 as the transfer function (the transfer function is the same for BT.709 and BT.601, and also the same for PAL). sRGB is not usually used for normal SDR videos. So we should probably consider all current ASS subtitles to be BT.709. It's still debatable if you should invert BT.709/BT.601 to get to linear light or if you should use a simple power curve, because the linear segment in the BT.709 transfer function is meant to surpress camera noise, so if you invert that, you undo the purpose of the BT.709 transfer function. But as I said, this is debatable. In any case, VSFilter, libass and most video renderers simply blend the ASS subtitles onto the SDR video in non-linear light, so currently ASS subtitles simply share the transfer function of the SDR video, whatever it is.

True, a simple HDR=true/false header flag would suffice, if we consider SMPTE ST. 2084 the only important HDR format. However, there are several competing HDR formats, so I wonder if it wouldn't be "wiser" to use a transfer function field. We could limit it to "SDR" and "PQ" (or "2084") for the time being, but that would give us the chance to extend it later if need be without having to invent another totally new field.

I'd suggest to use "SDR" for current ASS subtitles, instead of specifying sRGB or BT.709 for the transfer function, to clarify that current ASS subtitles are simply meant to be blended onto the SDR video as encoded, without any funny transfer function modifications.

Thoughts?

P.S: About blending: I'm fine with whatever you guys think is best.

@astiob
Copy link
Member

astiob commented Jan 28, 2018

All seems clear and fair to me. Thanks for the explanations!

Out of interest, is the HDR-to-SDR conversion usually dynamic, varying with the actual dynamic range within the frame? E. g. if all absolute luminances actually used in a video are supported by the display, it doesn’t need to do any compression, right? But it can’t know in advance if this is the case, so does it adapt to what’s currently on screen and change the compression curve from frame to frame?

But the exact compression curve is not standardized.

I was wondering why (or if) it was impossible to actually standardize this. It seems that the only physical obstacle is that screens have different luminance. I guess the conversion process theoretically could even be standardized, but it would still be parametrized by the screen’s physical properties, so it would not make an abstract purely mathematical/software conversion possible.

It's still debatable if you should invert BT.709/BT.601 to get to linear light or if you should use a simple power curve, because the linear segment in the BT.709 transfer function is meant to surpress camera noise, so if you invert that, you undo the purpose of the BT.709 transfer function.

I remember reading a forum thread a while back that you might or might not have participated in where people were discussing this from the perspective that the decoding curve should match what the reference monitors are configured for at the production studio, but different studios configure their monitors differently. That time, it seemed to me that a pure 2.35 gamma was deemed the most common for Western films (based on accounts of people professionally calibrating said monitors), although unfortunately, nothing was said about anime.

I’ve also read (maybe in the same thread, but I think elsewhere as well) that there is usually a nontrivial expected “end-to-end” gamma curve—i. e. the display/projector output should be darker than the camera input, because films are viewed in dark rooms on bounded screens whereas they are shot in conditions where bright light is everywhere around, and background light affects colour perception.

True, a simple HDR=true/false header flag would suffice, if we consider SMPTE ST. 2084 the only important HDR format. However, there are several competing HDR formats

Wouldn’t it be possible to convert from one to another in a well-defined way? So technically we could just have one fixed format in ASS and ask the renderer to convert as appropriate if necessary. But indeed it may be wiser to allow choosing the exact format, to avoid problems when the video HDR has an even higher dynamic range than ASS and the fixed ASS-HDR would yet again be insufficient to covert all video colours.

I'd suggest to use "SDR" for current ASS subtitles, instead of specifying sRGB or BT.709 for the transfer function, to clarify that current ASS subtitles are simply meant to be blended onto the SDR video as encoded, without any funny transfer function modifications.

Sounds good to me.

By the way, are the different HDR formats significantly different? Like, there’s not a terrible lot of difference between the sRGB, 2.2, 2.35 and even BT.709 curves, and if you mistakenly apply the sRGB curve to BT.709 subtitles, the colours will look somewhat wrong but not bright as sun or such; no worse than if you blend the BT.709 subtitles on HDR-converted-to-SDR video. I’m not sure this would be useful, but could we say that subtitles marked as HDR always use the same HDR space as the video does much like SDR subtitles use the same SDR space as the video does?

Regarding blending, it’s not clear how HDR-compressed blending should work if both the video and the subtitles are HDR but different sorts. Or if the video is SDR.

@madshi
Copy link
Author

madshi commented Jan 28, 2018

is the HDR-to-SDR conversion usually dynamic, varying with the
actual dynamic range within the frame? E. g. if all absolute luminances
actually used in a video are supported by the display, it doesn’t
need to do any compression, right?

Haha, if you're not careful, you'll become an HDR expert in no time... :)

The current UHD Blu-Ray HDR spec is usually named HDR10 (for HDR 10bit), and it has static metadata, which means that you get information about the max luminance in the whole movie, but not per scene, or even frame.

Then there's Dolby Vision, which uses exactly the same transfer function, but is 12bit+ instead of 10bit and supports specifying different max luminance values per scene or even frame.

And then there's HDR10+ which was just published in January CES, which is HDR10 with dynamic metadata.

Currently most TVs don't adjust to the luminance of each scene/frame. But both madVR and mpv have the ability to do so (by measuring the luminance of each frame on the fly). And with HDR10+, soon some TVs will probably do that, too - but only with UHD Blu-Ray discs that are mastered with HDR10+ support.

I was wondering why (or if) it was impossible to actually standardize this.

It should be possible, but nobody did it. IIRC the CE companies wanted to be able to choose their own algos as a means to separate from the competition. Anyway, there are some recommendations available on how it could be done, but practically, everybody does it differently.

And of course you're right that the target brightness of each display plays a crucial role, too, so even when using the same curve, if the target brightness is different, so will be the curve parameters.

I remember reading a forum thread a while back that you might or might not have participated in

Yes, I remember that thread, too. Not sure if I participated or just read it silently.

By the way, are the different HDR formats significantly different?

Good question. I know that HDR10, Dolby Vision and HDR10+ all use exactly the same transfer function, so if we don't call it "HDR10", but if we call it e.g. "PQ" or "2084" then we have those 3 covered.

However, there's HLG which I haven't fully studied yet. I think (but I'm not 100% sure) that HLG can be displayed as is on an SDR display, so the transfer function is somewhat similar to normal BT.709. But if a TV actually knows the content is HLG, then by doing some math it can reconstruct some HDR information again somehow, but as I said, I didn't study this one yet. I think SDR subtitles might work for HLG, but I don't know for sure. It might not be a bad idea if we had an option to tag subtitles for HLG, just to be safe, but we might not need it.

Then I've read a while ago that there at least 2 more HDR formats which I know absolutely nothing about. There's a good chance they're completely difference once more.

About blending: Although the transfer functions for SDR and HDR are very different, I think both to some extent approximate the human brightness perception (but the HDR TF does it much better than the SDR one). Which means there's a good chance that blending in non-linear light might look quite similar with both SDR and HDR. But I don't know for sure.

@ghost
Copy link

ghost commented Jan 29, 2018

Man this is such a load of bullshit. I really want to see how you're going to use ASS to perfectly match some video background in HDR and 4K. Apart from looking shit nobody will actually try this because there are no applications for this and won't be. If you really manage to achieve this with ASS, it will probably be shit slow and inefficient because 4K will have a lot of detail that is hard and inefficient to reproduce with ASS. Besides, even for "traditional" video the uses are declining because we're not getting any reports about stuff like this anymore. You're trying to come up with a solution for a problem that doesn't exist. And you're doing the same bullshit that caused the current idiotic colormatrix header, which I was against when it was introduced, and which I'm against even more now. I'll block any attempts to add anything like this as far as I can.

@madshi
Copy link
Author

madshi commented Jan 29, 2018

This is mostly for Anime, where there often isn't a lot of "detail" or "texture" that needs to be reproduced.

Making this work is actually very easy: Subtitles made for HDR should be blended onto the video before modifying the transfer curve, before tone mapping and before gamut mapping. If done like that, there should be no problem getting perfect color and luminance matching, and there's no reason for it to be slow, either, because the processing isn't any more complicated than it is for SDR. The key is doing the blending at the right moment. There's nothing more to it.

I'll block any attempts to add anything like this as far as I can.

How are decisions made for libass? Dictatorship? Or do contributors vote? Or how does it work?

@astiob
Copy link
Member

astiob commented Jan 29, 2018

Note that wm4 is also the lead developer of mpv, so if he really hates this, then mpv might never get support for it even if we add it to libass. Of course, libass is also used by other players, but I reckon the only one with significant market share is VLC, and I have no idea how much VLC cares about colour accuracy at all. Then again, if they want to support HDR video, then they’ll have to care at least a bit.

Question for @wm4: does/will mpv support HDR video? How do you (plan to) handle simple ASS with HDR video? When subtitles are handled by the video output module, you (can) probably blend them in screen-native gamma-compressed RGB, which is easy enough. What about vf_sub? If you just blend the sort-of-sRGB colours of the subtitles onto the HDR colours of the video, as far as I understand madshi’s explanations, you’ll end up with “bright as sun” subtitles.

I really want to see how you're going to use ASS to perfectly match some video background in HDR and 4K. Apart from looking shit nobody will actually try this because there are no applications for this and won't be.

I fail to see how or why this should be any different from matching colours to SDR video as is done now. And the whole point of matching colours is to make it not look like shit. I’m not sure why you need a special program to pick a colour or why you’re so sure that nobody will ever make one. If HDR video takes off (I have no idea), then I imagine people will want and make tools to work with it.

@madshi, would 8-bit be enough for precise colour matching in HDR? If not, then it might indeed look like shit as wm4 said and end up unused. I do agree that it isn’t terribly wise to add a feature that will remain unused.

Besides, even for "traditional" video the uses are declining because we're not getting any reports about stuff like this anymore.

Are you saying that we are getting fewer bug reports about signs in libass, but not because we’ve fixed a lot of stuff but because nobody uses them any more? :-)

It might be interesting to download the last n subtitle files off Nyaa and count how many use colour-matched signs, and compare this to n files from some years ago.

And you're doing the same bullshit that caused the current idiotic colormatrix header

Once again, no. The colormatrix header is indeed idiotic if you will, because it ties RGB to a YUV matrix. This, on the other hand, simply specifies a previously unspecified parameter of RGB and is exactly like adding a colour profile to a PNG image or a colour space tag to an H.264 video.

Also, what caused the colormatrix header was the fact that VSFilter rendered ASS colours in a haphazard manner, converting them to YUV using an underspecified conversion. This is precisely what we are trying to avoid when HDR comes along. If we do nothing at all now, the rendering of ASS on HDR video will likely end up looking very different in different players and player configurations, and all of the different looks will be ridiculously wrong.

At the very least, even if we agree that nobody will ever want to make HDR subtitles and ASS should stay SDR forever, we should make this agreement explicit and define guidelines (if not an exact procedure) for how ASS should be rendered on HDR video. (Maybe SDR as well while we’re at it, even if just to codify the existing practice.)

@madshi
Copy link
Author

madshi commented Jan 29, 2018

Of course it would be nice to have more than 8bit precision, but I don't think we can introduce that without breaking compatability, or can we? If we stick to 8bit with proper rounding, the max error we can get is 0.5 steps in 8bit, which is not nice, but I think it's going to be barely visible, if at all. Usually having more than 8bit is most important for smooth gradients. In HDR one 8bit step will be more visible than in SDR, but I still don't think it will be a real problem. Of course I haven't tested it, so this is only me guessing...

But even if we can't achieve perfection, there's still the issue that subtitles created for SDR will look totally incorrect when blended onto untouched HDR video. And subtitles created for HDR will look totally incorrect when blended onto untouched SDR video. So for that reason alone I think we either have to add a transfer function info field, or alternatively completely disallow HDR subtitles. But even if we wanted to - how can we effectively disallow HDR subtitles? If media player / renderers currently simply blend subtitles onto untouched video, then if a user creates a subtitle for an HDR video, it will automatically become an "HDR subtitle", and there's not really any way we could possibly stop that. So isn't it much better to properly flag HDR subtitles, instead of giving up and accepting chaos?

If we add an information field which mpv ends up ignoring, then mpv users are no worse off than if we don't do anything. But players / renderers which care still have a chance to provide a better experience. So IMHO mpv shouldn't dictate our decision. Besides, wm4 didn't like the "YCbCr Matrix" field, either, but in the end mpv now supports it, I think? So if HDR subtitles ever become a reality, I believe mpv will have no other choice than to support it, too. If HDR subtitles won't ever become mainstream, then there's no big harm adding a new header information field, which in the worst case would then be simply unused.

Just my 2 cents, of course.

@clsid2
Copy link

clsid2 commented Mar 12, 2024

If the subtitle is to be rendered in the video’s colour space, then the video renderer needs no such details and there are no mismatches. Or at least fewer, if a transfer function kind is specified as SDR/PQ/HLG. Seems simpler, no?

By explicity setting all required information in the script, there is no need for the libass library user to feed it with information extracted from the video decoder and be burdened with all those extra implementation details and coding. The subtitle authoring tool is capable of doing that for the user. And since we dictate that script should match video, there should in practice be no mismatches, unless script is badly tagged or combined with different video.

And if the required information is just a transfer function, then that is the only thing that should be added.

If we’re designing for simplicity, if we’re designing for the 99% of content, then we can and should assume that subtitles always use the same colorimetry as the video as long as the definition range (standard or high) matches. The fewer knobs, the better. The fewer ways the colours can end up wrong because some parameter differs or is interpreted differently by multiple parts of the rendering graph, the better.

That is exactly what we have been saying all along, but I get a feeling that people are constantly misunderstanding it.

@astiob
Copy link
Member

astiob commented Mar 12, 2024

I may indeed be misunderstanding something.

By explicity setting all required information in the script, there is no need for the libass library user to feed it with information extracted from the video decoder and be burdened with all those extra implementation details and coding.

But libass doesn’t need to know video colorimetry in any case. Quite the opposite; if the script is tagged, then libass must add an API through which the video player can query this information. The less the script is tagged, the less information there is to speak of.

libass currently doesn’t perform any colour correction, and you seem to concur that this should continue to be the video renderer’s job.

That is exactly what we have been saying all along, but I get a feeling that people are constantly misunderstanding it.

But right now you seem to be saying the opposite: that the script should contain explicit tagging, a full copy of the video’s colorimetry tags. This is an additional burden and a door for additional mismatches. You may expect the tags to match most of the time, but you still need to define what happens when they don’t.


You can’t pick one [gamut] unambiguously.

To be fair, with enough cynicism, we could just try to assume the values typical for anime and ignore other possibilities, because that’s probably almost all content with heavily styled ASS typesetting. But even then, ambiguities remain: TV.601 subtitles could be designed for BT.709 video, TV.709 subtitles could be for an old DTV recording, and the white point could be anything (although IINM no renderer represented in this thread actually ever uses D93 during playback, so perhaps we can ignore this part).

@madshi
Copy link
Author

madshi commented Mar 12, 2024

@astiob,

do you think we will ever run into video files (I'm only talking about video files here, not about subtitles) where the video decoding matrix is either BT.601 or BT.701 and where the gamut is BT.2020 at the same time? Or where the video decoding matrix is BT.2020, but the gamut is either BT.601 or BT.709 at the same time?

Depending on the answer to that question, I may be in favor of or against adding a gamut header field.

@clsid2
Copy link

clsid2 commented Mar 12, 2024

But right now you seem to be saying the opposite: that the script should contain explicit tagging, a full copy of the video’s colorimetry tags.

No I am not saying that. I am saying that script only should contain whatever information it needs to render subtitle without getting any knowledge about the video. If that required information is only transfer function, that that is the only explicit tagging in script that needs to be added.
No color correction needed in libass for new scripts. Adjusting SDR sub for HDR video is video renderer job.

@astiob
Copy link
Member

astiob commented Mar 13, 2024

do you think we will ever run into video files (I'm only talking about video files here, not about subtitles) where the video decoding matrix is either BT.601 or BT.701 and where the gamut is BT.2020 at the same time? Or where the video decoding matrix is BT.2020, but the gamut is either BT.601 or BT.709 at the same time?

I… hope not, although you never know. At the very least, I’m not aware of any legitimate situation where such a combination is used or where I’d want to use it myself (although my knowledge is likely out of date).

Are you perhaps thinking that the various 601 and 709 gamuts are close enough that you could pick an arbitrary one but the 2020 one far enough that you want it separated? I guess I could get behind that, as long as this gamut choice is restricted to SDR-on-HDR situations and doesn’t force SDR-on-SDR into video/subtitle gamut mismatches.

I am saying that script only should contain whatever information it needs to render subtitle without getting any knowledge about the video.

Oh, OK. But what does “render” mean in this context?

  • Is it “render to some unspecified flavour of RGB”? Then no information is necessary at all.
  • Is it “render to precise light specification that’s just enough to display stable colours across any hardware physically capable of such colours”? Then the full lot of transfer/primaries information is required.
  • Is it “render to linear RGB with unknown primaries”? Then I think only the transfer function is required (and the matrix only for legacy back-conversions, so really only for 601-on-709 files; in particular, it isn’t necessary to support a BT.2020 value for the matrix header and None can be used instead).
  • Is it “render to some intermediate format that’s just defined enough to be able to combine with the video”? Then I think that’s what madshi has been getting at, but this does inherently link to the video stream.

By the way, which exact part of HDR10+ (and others if there are others) is dynamic? I see above that it uses the PQ transfer function, so the dynamic part is… the primaries? Or some tone-mapping metadata? Is there even such a thing? If there is, then fully-specified video-independent rendering of subtitles would also need that.

@clsid2
Copy link

clsid2 commented Mar 13, 2024

Option 1. So no info needed for libass itself.
Transfer function included as info to forward to video renderer. And as indication of sub following the NEW guidelines.
Matrix set to NONE, unless madshi needs to know it. But if a generic SDR sub is combined with HDR video, it is probably safe to assume it is 709.

https://en.wikipedia.org/wiki/HDR10%2B
The tonemapping metadata is dynamic, with global defaults for backwards compatibility.

@kasper93
Copy link
Contributor

kasper93 commented Mar 13, 2024

But I don’t see what’s the good part in this. The basic mode intentionally simplifies implementation by discarding some parameter mismatch cases instead of handling them. Given that the full mode and its implementation exists anyway (and always has existed), basic literally has no reason to be even offered other than spite.

Again, I didn't say I like this decision. I can recognize it for what it was and at the time it was right assessment that basic mode is enough to produce good looking subtitles and will deter people from using more "fancy" combinations. Which was wm4's way of expressing his personal view on the situation like he often did.

In practice you don't need anything more than basic mode, in fact with current libass version it would be awkward to use anything else, with the recommendation to use NONE value anyway, which is the go to nowadays.

I agree that this covers 99% of content. And that, indeed, is why no one has complained. However, this will change if people start actively using TV.2020 as proposed in this thread.

Exactly, "will" change. And after this change it can be reevaluated, but don't apply current or future state of things retroactively to 2010 decisions. As we stand now, we are still uncertain if this header will even be used/extended, so don't blame wm4 on that front. I don't even know why we argue about this historic choice, it was correct for almost all if not all content, no need to dive further into that.

My point here is: SDR subtitles that care about exact colours are made for SDR video, but a lot (or all) of SDR video doesn’t have a defined transfer (linearization) function.

In short bt.1886 is a transfer function that attempt to fix this status and was designed to match older content. Of course it is not perfect, you cannot go back it time and make everyone target it directly. But it was industry way to standardize SDR transfer function and everything should be using it, because it is the best thing we can guess right now.

With that out of the way, I agree what you are saying. I won't quote more parts of it, but let me add some commentary.

We have two main seemingly contradicting objectives.
A) Make subtitles seamlessly blend with the video track. This means color values of ASS subtitles are abstract and cannot be used on it is own, they are just some values when applied correctly to the one-and-only correct video "should" produce correct output.
B) Make subtitles seamlessly blend with any video track. This means that color values of ASS subtitles are well defined and can be adopted to the target surface along with the video.

A is what we have now, we expect RGB values of ASS subtitles to be applied to the RGB video. The implicit assumption is that video YUV->RGB conversion done by the player will be the same as were done during mastering of subtitles. We don't care about transfer function because blending is done in gamma light, so whatever it is, RGB values of ASS should be the same. Or in case of older renderers we blend in YUV, but it is just converting RGB->YUV, which for SDR is easy and produce same results.

The above approach is the reason why YCbCr Matrix is called disgusting hack, because it is only needed for backward compatibility. Where VSFilter (or its older brother DirectVobSub) would always use bt.601, so in fact RGB values are NOT valid for the given video file. This header in fact does not specify video colorspace, but the subtitle colorspace which is then converted to video colorspace. In modern nominal case it shouldn't be used, because expectation is that RGB values are already correct and we don't care about obsolete versions of VSFilter. And objective never was to make subtitles "portable", even though it gave this possibility, by tagging subtitle track.

B is a extension of subtitle metadata to define what RGB values actually mean, we already have YCbCr Matrix, but we also need a transfer function. In SDR world we didn't care, because everything is bt.1886, so even if the ASS were to be used in another SDR file it would just work. With HDR we have to do conversion.

Now the concern is that extending ASS header with real metadata will introduce possible mismatch and remove the abstract nature of it. And that's true, ASS subtitles would no longer be strictly tied to video track. In practice though I don't see this is a big problem, simply because for SDR we would tag it with SDR(bt.1886) transfer function. Which should be exactly the same as inferred for video track in any media player / video renderer. Of course we can avoid naming it "bt.1886" and just go with SDR. Which is what would basically mean "some SDR transfer". Again in practice when converting to HDR it would mean bt.1886 internally anyway probably. In SDR mode we wouldn't use transfer. Which of course make it impossible to use those between different SDR files, but if we don't want to be specific all SDR would be SDR. I would be specific because why not, but this means that if user during playback override transfer of video, they would have to do it also for subtitle track, in case of course it were already matching.

Saying all that, the question what are the real issues of tagging ASS files? I don't think that in SDR mode it will be the big deal it is made to be. Of course this means to break with the always video colorspace mindset, because in HDR probably most subtitles will be done in SDR, unless some extreme cases.

I don't know what I wanted to say anymore...

By the way, which exact part of HDR10+ (and others if there are others) is dynamic? I see above that it uses the PQ transfer function, so the dynamic part is… the primaries? Or some tone-mapping metadata? Is there even such a thing? If there is, then fully-specified video-independent rendering of subtitles would also need that.

In short (and simplification) dynamic metadata can specific minimum and maximum luminance per scene. Which greatly helps with tonemapping. Say you have 0-1000 nits static metadata for full movie, but compressing all scenes with constant factor is not good. Imagine now you have dark scene with peak of say 120nits, if you compress it with 1000 nits peak value you loose a lot of range that could be used to better represent this scene. In practice if you have 300 nits screen, you can display such scene in full, without any tonemapping, because it is in display range.

I initially mentioned full hdr metatada too, but since it wouldn't be perfect anyway, we don't have to go crazy and just clip to current video frame luminance, 203 in SDR and in to "something" in HDR.

@madshi
Copy link
Author

madshi commented Mar 13, 2024

@astiob,

Are you perhaps thinking that the various 601 and 709 gamuts
are close enough that you could pick an arbitrary one but the
2020 one far enough that you want it separated?

Yes. I think we need to be aware of that we have 2 different goals we want to achieve:

  1. We want subtitles to be absolutely perfectly color matched if they're played with the video encoding they were made for.

  2. If subtitles are matched with a video they were not made for, we no longer strive for perfection (knowing it's probably not going to happen no matter what we try), so we make do with something that looks ok.

In case of 1), all we need to do is for the video renderer to use the same YCbCr -> RGB matrix that Aegisub did, and we're already done. There's no need to perform any gamut conversion here, because Aegisub never does any gamut conversion, either. If we actually did try to do gamut conversion, it would actually destroy color matching.

In case of 2), we've already given up on achieving perfect color matching, so there's no need here for gamut conversion, either, when talking about BT.601 <-> BT.709, because they're really not far away from each other. However, the situation is different when talking about BT.2020 because it's far more saturated than either BT.601 or BT.709. So as soon as either subtitle or video is BT.2020, but the other is not, we probably have to perform gamut conversion, because otherwise the subtitles will look either strongly oversaturated or strongly undersaturated.

So to sum up, I don't think gamut conversion is ever needed - except for when there's a mismatch in gamut between the video the subtitle was made for vs the video the subtitle is later used with and when that mismatch involves BT.2020. Ok, now one could argue that we need a gamut ASS header field to catch this situation. But I was thinking that if BT.2020 videos always use both BT.2020 matrix and BT.2020 gamut, then we could get along without adding a gamut header field, because we could detect the mismatch by just looking at the YCbCrMatrix field.

However, if there is a (reasonable) chance that we may see video files that mix BT.2020 gamut with BT.601/709 matrix, or vice versa, then we may have no choice but to add a gamut header field. Though, that would only ever be used for 2) above.

Thoughts?

@kasper93
Copy link
Contributor

However, if there is a (reasonable) chance that we may see video files that mix BT.2020 gamut with BT.601/709 matrix, or vice versa, then we may have no choice but to add a gamut header field. Though, that would only ever be used for 2) above.

I personally don't think it is needed.

@TheOneric
Copy link
Member

B) Make subtitles seamlessly blend with any video track.

I’m under the impression the consensus so far was for this to be this is impossible (even if we’d create a completely new sub format) and at most “it will be off but not eye-gougingly bad when paired with a different video“ is achievable

In SDR world we didn't care [about transfer functions], because everything is bt.1886, so even if the ASS were to be used in another SDR file it would just work.

mpv-player/mpv#13381 shows it already matters for SDR

Also, I have more to say on the color-compat=basic matter and fail to see how it does or ever made any sense, but let us defer this to some future mpv issue/PR to not derail this discussion too much

@kasper93
Copy link
Contributor

I’m under the impression the consensus so far was for this to be this is impossible (even if we’d create a completely new sub format) and at most “it will be off but not eye-gougingly bad when paired with a different video“ is achievable

Is this a bad thing? In some scenarios masking colors may not match perfectly, but anything else will be close enough to look good for the user, which in itself would be improvement.

mpv-player/mpv#13381 shows it already matters for SDR

I was speaking in more generic terms. If we were to blend just after RGB conversion in gamma light transfer function doesn't matter. The issue above is more caused by the fact how libpalcebo pipeline works. But it is possible to avoid this dependency. Either way like I said I'm in favor of defining things instead of leaving them undefined, so issues like that does not happen regardless of possible renderer implementation details.

Also, I have more to say on the color-compat=basic matter and fail to see how it does or ever made any sense, but let us defer this to some future mpv issue/PR to not derail this discussion too much

Yeah, we venture into off topics too much. But I don't see any real usecase currently where mangled subtitle colors would need to be fixed by this header except bt.601. In all other cases there is no reason for RGB values to need conversion.

@TheOneric
Copy link
Member

TheOneric commented Mar 13, 2024

Not being eye-gougingly terrible on a different video is not a bad thing. It’s just that you earlier were contrasting “A) seamlessly blend with the bundled video” with this “B)” as conflicting objectives and correctly noted A) requires to fully match video properties.

The thing is A) is not a nice-to-have goal, but a hard requirement for typesetting and therefore the ASS format.

The question then is, can we add headers which fully preserve A), but at least somewhat improves on B) while

  1. again never compromise A
  2. not introducing new ways of insane colour mangling if the header value is poorly chosen
  3. being simple enough for both authors and players to actually be set/implemented correctly

This to me seemingly implies there must be an easy way to gauge if it is the originally-bundled video and if so directly short-circuit to “matching video everywhere” (instead of say specifying all static and dynamic metadata of the video in sub headers which would break both 2. and 3.). Importantly, this detection must not have false negatives, but a small amount of false positives are imho acceptable.

The question then is, what value would be good to gauge this and if different, what value(s) do need to be provided to avoid “eye-gougingly bad”?
Another question is, how rigorous the behaviour should be specified if the header + video data indicates it is not the original video. (being too rigorous risks introducing sometimes unfavourable behaviour, might need too much metadata (breaking 3.) or get too complex for only little gain)

@clsid2
Copy link

clsid2 commented Mar 13, 2024

You basically have three main scenarios:

  1. You deliberately author a sub to match a specific video.
  2. It is some generic (generated) subtitle script with basic dialogue. The creator (human/machine) did not match it to a specific video. No color matching was done with parts of video. Accurate colors are not important. We could standardize that colors used in such generic script should be interpreted as matching SDR BT709 video. Or basically what a generic old script would look like on HD resolution video.
  3. Old script.

New tag can indicate that 1 or 2 applies.
For 2/3 send signal to video renderer that sub is SDR.

ColorMatch=Video/Generic

@astiob
Copy link
Member

astiob commented Mar 13, 2024

Of course we can avoid naming it "bt.1886" and just go with SDR. Which is what would basically mean "some SDR transfer". Again in practice when converting to HDR it would mean bt.1886 internally anyway probably. In SDR mode we wouldn't use transfer. Which of course make it impossible to use those between different SDR files, but if we don't want to be specific all SDR would be SDR.

Just to confirm, this is what I (we) were proposing earlier: just “any SDR”, and explicitly allow each player/renderer/user pick their own preference and tweak it as they see fit.

I would be specific because why not, but this means that if user during playback override transfer of video, they would have to do it also for subtitle track, in case of course it were already matching.

That’s what worries me about specifying a very explicit transfer function for SDR subtitles: SDR-on-SDR must match the video’s transfer function, which (IMO even after the advent of BT.1886) may differ from whatever default we choose now due to user overrides, renderer capabilities or even future video standardization work, so if we declare that SDR subtitles always use BT.1886 and nothing else, this implicitly seems to break SDR-on-SDR matching. We could, of course, word the declaration very carefully and make it abundantly explicit that it applies only to SDR-on-HDR, but that makes the spec only more complicated.

BTW, if we do mandate a specific transfer function for SDR-on-HDR, do we want sRGB or BT.1886, after all? I do believe they’re different.

And my impression from madshi’s comments is that we want to pair it with RGB primaries from either BT.709 (matching sRGB) or BT.2020 (different from sRGB).

Side note: technically, AVC/HEVC/H.273 do allow tagging several EOTFs, all of which differ from BT.1886. But I’ve never heard of this being used, of course.

@TheOneric
Copy link
Member

ColorMatch=Video/Generic

Such a boolean toggle would allow simple dialogue-only subs to opt out of HDR (and colour mangling), but doesn’t help with the “make reused subs not eye-gougingly bad on different video” goal. Any HDR sub would set Video and once remuxed with a different HDR encode displayed severely mangled

@TheOneric
Copy link
Member

TheOneric commented Mar 13, 2024

The question then is, can we add headers which fully preserve A), but at least somewhat improves on B) while

Extending on this, there seems to have been four goals brought up so far motivated by or overlapping with HDR

  1. Make HDR typesetting possible at all
    ⇒ solution already clear, everything needs to match video properties, i.e. (behave like) blending on the raw frame
  2. Make it possible to provide a not-ideal but also non-terrible, best effort fallback for remuxes, but without negatively affecting properly matched types *(from pre-previous post)
    ⇒ likely means we need an indicator to gauge if this is the original video, if yes behave like 1., if no use added metadata to provide this best-effort fallback. Must fulfil:
    a) again never compromise A
    b) not introducing new ways of insane colour mangling if the header value is poorly chosen
    c) being simple enough for both authors and players to actually be set/implemented correctly
  3. To get good HDR types we need more than 8bit per colour channel
    problem: VSFilter truncates/wraps around larger values, which we intentionally emulate. Allowing larger values instead prob needs to be tied to some header or v4++ too and it very likely requires an API break
  4. Dialogue/Typesetting split; currently debated whether dialogue should always be SDR and/or (un)affected by tonemapping

@madshi
Copy link
Author

madshi commented Mar 13, 2024

Good summary, TheOneric.

Why do you think we need more than 8bit, though? Do you really think we can't achieve perfect color matching using 8bit? Usually, the only problem of low bitdepth is banding in color gradients. I doubt that it would prevent "perfect" (to the human eye) color matching?

@cubicibo
Copy link
Contributor

cubicibo commented Mar 13, 2024

Typesetting must blend perfectly in the video. Generally 8-bit will be enough, but strong gradients, glows, and complex textures will require >8-bit precision to smoothly integrate the footage. Fansubbers do not want the viewer to be able to distinguish good typesetting from the actual video. So 10 or 12-bit will be needed in the future, and the base space will have to offer the same coverage as output space, if a common base space is chosen.

@madshi
Copy link
Author

madshi commented Mar 13, 2024

I think we should first find a test case that clearly shows that 8bit is not enough before we make breaking changes to the way ASS works. Right now aren't we just speculating that 8bit might not be enough? I'd say it probably is enough, but it's hard to be 100% sure.

Now don't get me wrong: I'm a big fan of using as high a bitdepth as possible, and I'm pretty angry with the UHD Blu-Ray creators for using only 10bit. However, adding support for more than 8bit to ASS would probably be a very big change and would break compatibility with probably all current software. So we should really be extra sure that it's really needed.

@clsid2
Copy link

clsid2 commented Mar 13, 2024

It is already pretty clear that the hidden agenda is to try to push huge unnecessary format changes requiring all software to be rewritten...

@cubicibo
Copy link
Contributor

No. We need to find out if basic changes are good enough or an absolute waste of time for everyone. If all typesetting is hideous in HDR, it is all pointless and a new format will be needed.

I would advise to start with a custom mpv branch where ASS is blended in video space+range (like mpv at the moment?), and a test command line option to specify all parameters that we theoretize can be useful to achieve:

  • Correct blending for matched assets.
  • Acceptable blending for mismatched assets.
    Furthermore any unspecified parameter would means "SDR ASS/ui white".

Typesetters can then handcraft samples that can then be tested across various official exports of the same show (HDR+DV, SDR).
From there we can see what's useful, useless, and a waste of time to implement.

Process has to be iterative with all parties involved, if the study outcome is "not possible to have good typesetting with HDR", then we need a new ASS version altogether, not a bandaid.

@clsid2
Copy link

clsid2 commented Mar 13, 2024

That mpv test branch is a good suggestion.

If a sub is tagged as ColorMatch=video then you can of course not re-use that sub verbatim to a different release. In that situation that header should be changed to some other value that can assist in giving a best effort decent output result. Or have a second header with some basic info about original video.

ColorMatch=video/generic
and in case of video match
OriginalVideoInfo=2020.pc.pq (can even be separate header values)

But these are all just variations of what has been previously suggested. The actual real life test could help determine which information is actually essential to have.

@rcombs
Copy link
Member

rcombs commented Mar 13, 2024

Yeah, my expectation is that we'll have a few problems with applying a simple band-aid to HDR:

  • 8-bit won't be sufficient for PQ (effectively reduces the SDR range to ~7-bit, and 8-bit is already pretty borderline on banding in that range)
  • PQ-compressed blending will lead to odd tones in blurs and alpha gradients
  • Specifying colors directly in PQ (as opposed to linear or gamma space) will be unintuitive and confusing for typesetters

But that's just my guesses, and it's worth validating them before making bigger decisions.

My biggest longer-term concern around adding 16-bit output is that it would roughly double our memory and bandwidth requirements, which perhaps demands an optimization like "only output 16-bit images when they involve color values in the HDR range; otherwise, output 8-bit masks and expect them to be treated as BT1886 0-100".

Re: making larger API changes that require player rework, we wouldn't make such a change lightly; most of the changes being discussed here are things we've been discussing for years; e.g.:

  • Splitting dialogue from typesetting would allow for better performance when rendering lower-resolution video on higher-resolution displays (e.g. 1080p typesetting on a 4K TV) without compromising dialogue sharpness, and would also result in better matching of the author's intended look
  • Supporting new output image formats would be required to allow for gradient tags, embedded textures, and other non-flat-with-alpha-mask format features
  • Blending in linear space would give more intuitive and better-looking behavior on edges and blurs

If we're going to introduce new format features that require players to use new API surface, I'd expect that we'll also take steps to ensure that callers using the existing API will continue to see reasonable results; e.g. getting 16-bit output would require an opt-in, and HDR values would be clipped to the SDR range for existing callers.

@petzku
Copy link

petzku commented Mar 14, 2024

My biggest longer-term concern around adding 16-bit output is that it would roughly double our memory and bandwidth requirements, which perhaps demands an optimization like "only output 16-bit images when they involve color values in the HDR range; otherwise, output 8-bit masks and expect them to be treated as BT1886 0-100".

To add to this: bandwidth is already often a major issue in complex typesetting, as "more layers" is most typesetters' proverbial hammer. It's not at all uncommon for a large sign to require several megapixels of total bitmap size, even at 1080p (especially when diagonals are concerned). There are other, mostly unrelated, optimizations libass could implement to help with this, but in either case, simply doubling bitmap sizes across the board should not be done lightly.

kasper93 added a commit to kasper93/mpv that referenced this issue Mar 16, 2024
Upstream ASS specification says that all subtitles should be rendered
with color primaries and transfer matching their associated video. But
as expected after further discussion the decision has been made to
fallback to SDR mode in case of HDR video.

See-Also: https://github.com/libass/libass/blob/649a7c2e1fc6f4188ea1a89968560715800b883d/libass/ass_types.h#L233-L237
See-Also: libass/libass#297
See-Also: mpv-player#13381
Fixes: mpv-player#13673
kasper93 added a commit to mpv-player/mpv that referenced this issue Mar 18, 2024
Upstream ASS specification says that all subtitles should be rendered
with color primaries and transfer matching their associated video. But
as expected after further discussion the decision has been made to
fallback to SDR mode in case of HDR video.

See-Also: https://github.com/libass/libass/blob/649a7c2e1fc6f4188ea1a89968560715800b883d/libass/ass_types.h#L233-L237
See-Also: libass/libass#297
See-Also: #13381
Fixes: #13673
TheOneric added a commit to TheOneric/libass that referenced this issue Apr 3, 2024
mpv (temporarily) enabling colourspace-matching for HDR videos,
showed by now a noteable amount of subs exist expecting to
remain SDR when placed on HDR video.

After further discussion in libass#297 we thus decided to revise
the default for subs on HDR to forego exact colour-matches, see:
libass#297 (comment)
TheOneric added a commit to TheOneric/libass that referenced this issue Apr 3, 2024
mpv (temporarily) enabling colourspace-matching for HDR videos,
showed by now a noteable amount of subs exist expecting to
remain SDR when placed on HDR video.

After further discussion in libass#297 we thus decided to revise
the default for subs on HDR to forego exact colour-matches, see:
libass#297 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests