-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ASS header field "transfer function" (to define HDR vs SDR etc) #297
Comments
Insanity. Subtitles are sRGB, convert them in the renderer. |
fansubbers need perfect color matching for some of their effects to work properly. There's no standard for HDR to SDR conversion. So after tone mapping, every video renderer and CE device will end up with different RGB colors. So how can we possibly achieve a perfect color match for HDR content with HDR subtitles, if you consider subtitles to be "simple" sRGB? |
Fuck them and their insane shit? Simple as that. You could also pick a sane approach like increasing color precision instead of furthering a complicated shitawful hack, but why bother. |
I'm never opposed to increasing color precision, but how would it help in this specific case? As I said, everybody does HDR to SDR conversion differently. It's not an issue of precision, but of using different tone mapping curves. Even if you use the same curve, if the tone mapping luminance and gamut targets are different, you already end up with totally different sRGB pixels after tone mapping, regardless of how high a precision you're using. To be honest, although I respect your opinion, I don't fully understand it. The whole purpose of ASS subtitles is to do crazy subtitle effects. If you don't feel the love for that, why not stick to SRT instead? Since this project is named "libass" and not "libsrt", I was hoping that the libass devs would share the love for crazy effects, and share my motivation to try to get things rendered exactly as the (positively) crazy ASS subtitle author intended. |
Hi @madshi, nice to see you here! /cc @tgoyne regarding Aegisub. I’ll start with the easy part:
It does, and mpv does apply colour correction based on this field as far as I know. Correct me if I’m wrong, but I imagine it does this both in libass does not yet support parsing BT.2020 though. I forget whether xy-VSFilter and XySubFilter support this, but in any case I don’t think Aegisub supports writing this. Of course, Aegisub isn’t the only way to create subtitles, but all in all it seems very unlikely that any subtitles exist that use BT.2020 at the moment. Now, thinking about the present is all well and good, but we should also look into the future. Hence this request and discussion! I think I get the general gist of what you’re proposing, although I must admit I haven’t actually thought through the conversion chains you’re listing. By the way, could you very briefly describe how HDR video differs from SDR video? I seem to be behind the curve on this. I’m familiar with gamma/piecewise-gamma transfer functions like sRGB/scRGB-nl, BT.709/BT.1361 and 12-bit BT.2020 and with RGB gamuts, but it sounds like this is different from both. Basically, your idea is that a subtitle file designed for a video in one colour space could be reused with a video in another colour space that looks the same and produce the same result. Sounds good. I think you are also making an underlying assumption that the “RGB” values in the subtitle file use the exact same colour space as the original video. Is there really no way to make this more uniform and future-proof? For example, as @wm4 suggested, declare that the ASS RGB always means sRGB. Essentially, assume that there is always a header saying “YCbCr Transfer Function = sRGB”. (Actually, shouldn’t that say RGB transfer function?)
Is the problem that HDR content may contain more colours and there is no way to match them using sRGB subtitles? That does seem troublesome. :-( It would probably be good indeed to provide a way around this, but I would still love to minimize the effect this has on ASS.
Does this mean that even with your proposed header the output will be different in each renderer? Or does the header specify exactly how the tone mapping must go? But if the header says “this ASS is SDR” and the video is HDR, don’t we have the same problem again?
What exactly is insane about matching colours? How does fucking them help? It would be very nice if you spent a bit of effort to understand the problem, think about potential solutions and actually argue your point rather than just spew insults.
Please elaborate on this. It sounds like you have a solution in mind. For one thing, how do you propose to do this in ASS? For another, as I understand the term HDR, it expands the colour range rather than or in addition to increasing colour depth. So in terms of 8-bit sRGB, you might need a (260, 270, −4) colour, for example. Do keep in mind that there are actual displays with gamuts wider than sRGB that can display such colours. |
As a further note, this is actually “sane” if I understand it correctly, unlike the YCbCr Matrix header (which essentially redefines “RGB”). Many RGB-based formats allow specifying which kind of RGB they’re using, and the proposed header would do the same in ASS. This makes sense because RGB by itself means nothing. Of course, it we could mandate a single kind of RGB and have it be enough, then adding this extra configurability would make the format more complex, possibly needlessly so. However, so far it seems that a single kind of RGB can’t be enough, unless we also somehow allow R/G/B values to be outside of the 0–255 range. For what it’s worth, CSS has traditionally mandated sRGB, but as far as I understand, work is ongoing right now to add support for profile-annotated RGB values. |
@astiob, thanks for the nice welcome! Let me try to explain the color matching problem. Probably most of what I'll explain you already know. But it doesn't harm to recapture the situation, I guess. So: If we try to achieve subtitle color perfection, ideally the subtitles should have always been YCbCr instead of RGB, because YCbCr is the format almost all videos are encoded in. Unfortunately, ASS was designed to specify RGB colors, for some reason, which makes our life quite a bit more complicated than it could/should have been. Generally, if all video players did YCbCr -> RGB conversion in exactly the same way, it wouldn't even be a problem. In that situation the ASS RGB colors would always match the video after YCbCr -> RGB conversion. However, some video players might use different decoding matrixes for the video (e.g. BT.601 instead of BT.709), or might decode to different levels (16-235 vs 0-255) etc. What is worse, VSFilter originally always used BT.601, even for BT.709 videos, which means there are many subtitles out there which have RGB colors that match the BT.709 video only if you use BT.601 for the video's YCbCr -> RGB conversion! Very ugly, indeed. Because of this problem the "YCbCr matrix" field was added to the ASS header, which allows interested parties (e.g. the video renderer) to understand which matrix the subtitle was created for, and if that happens to mismatch the video's actual encoded matrix, extra measures could be applied (e.g. by the video renderer) to correct for the mismatch. It's all a mess, but that doesn't need to stop us from trying to achieve the best possible results, right? Now what's different about HDR? First of all, there are various competing HDR formats, unfortunately. However, UHD Blu-Ray only supports one of those, so that might just be the most important. The HDR transfer function used by UHD Blu-Ray is specified in SMPTE ST. 2084. It's often called "PQ" (for Perceptual Quantisizer) and it's VERY different to the "gamma" transfer function that is typically used for SDR content. The "gamma" transfer function is a relative transfer function. It doesn't specify which Y (Luma) value maps to which Nits value. It just specifies the relative Nits distance between neighbor Y values. In contrast to that PQ is an absolute transfer function. An "Y" value of 1.0 (in PC levels) specifies that the output pixel should have a luminance of 10,000 Nits. Basically every Y value has a specific Nits value assigned to it. Because the transfer functions are so very different, it's extremely important to know what we're dealing with. If a video renderer applies the wrong transfer function, the resulting video will look terribly wrong. The same applies to subtitles: If subtitles were made for SDR but then are blended onto the HDR pixels in PQ, the end result would be terrible: The subtitles would be MUCH too bright. Like sunlight bright. Or if subtitles were made for HDR but then are blended onto the video after HDR -> SDR conversion, the subtitles would be much too dim. UHD Blu-Ray also uses BT.2020, which is a lot larger than BT.709. So that's also something we can't ignore if you want to achieve color matching. The big problem with HDR playback is that for once the spec was created ahead of technology! Which I find a very good approach. Basically no consumer display today can do 10,000 Nits, but the HDR spec already supports it! But as a result that means that HDR content needs to be compressed into what the actual display can do. If the display can only do 400 Nits, then the 10,000 Nits content needs to be compressed into 400 Nits. This is done by compressing e.g. the pixels from 0 - 200 Nits very slightly or not at all, but all pixels above a certain threshold get compressed a LOT. But because there's no standard for the whole HDR -> SDR conversion, everybody does it differently. So if you agree with wm4 and want to consider ASS subtitles to always be simple sRGB, then you would simply blend them on the video after HDR -> SDR conversion. However, since the HDR -> SDR conversion is not specified anywhere, and everybody does it differently, you'd never get correct color matching that way. As I said in the beginning: The ideal solution would have been to create the ASS spec from the start to be YCbCr, matching the encoded video before any conversion. But since we can't change ASS now to YCbCr, we have to workaround the issue. So my suggestion is to extend the ASS header information ever so slightly, just to collect a little bit more information. It's then up to each video renderer / media player to either ignore and make use of that added header information. The added information won't break anything, so why not adding it, and give interested devs the chance to try to achieve perfection? |
Could you explain in a bit more detail how a renderer would go about rendering video and subtitles with mismatching HDR configuration? It seems that this is impossible if there is an extra step that converts back from HDR to (sort-of-)SDR for display that is not standardized.
So if we add this header and then a renderer gets a file that contains HDR video and ASS subtitles explicitly marked sRGB, what will it do? |
Shouldn’t the proper chain of rendering sRGB subtitles with HDR video be “convert subtitles to the same HDR as used by the video exactly per the HDR spec; then blend them onto the video; then convert the resulting HDR for display in any way you like”? Is there an actual problem with this—e. g. that there are video colours that are impossible to match in sRGB should the subtitles want this—or is this an acceptable solution? |
Just to clarify: If you say "sRGB subtitles", do you mean subtitles that the subtitle author originally created for an SDR video? Or do you also mean subtitles that were created directly for the HDR video? |
Er… Isn’t sRGB an absolute term? RGB using the sRGB/BT.709 gamut compressed with the sRGB transfer function. I’m guessing you’re calling this SDR, but I’m not sure. Edit: Although I don’t suppose sRGB defines absolute nits, does it? So not quite that absolute. Is this a problem? |
Ah yes, you're right. Ok, let me think through a couple possible scenarios that come to my mind: Content options: Subtitle options: And for HDR playback, the media player could either: That makes 30 possible combinations. Ouch. Let's think about a couple "interesting" combinations:
Makes sense? |
If HDR really can’t be converted to/from SDR in a well-defined, standard way, then what do we stand to gain from adding this header at all? I may be wrong, but I think currently renderers blend ASS subtitles onto video in the video’s native gamma-compressed RGB colour space. If we just want HDR subtitles to work on HDR video, then we don’t need to change anything: they will be blended onto the video in HDR. (…Wait. When you say blend in HDR, do you mean the HDR-compressed RGB or the linear RGB? The compressed one, right?) If, as it sounds, we can’t make SDR+HDR combinations work anyway, then is it even worth trying? How better of a result can a renderer achieve by guessing at the brightness than by misrendering SDR as HDR? Would it even be better, or would it be better to make it so obviously wrong that nobody releases such a file in the first place? (Although someone might try to combine an external subtitle file with a different video file, but that’s prone to error anyway.) |
Good points. You're right that due to the difference between relative and absolute transfer function, we can't ever get mixed HDR/SDR combinations perfect. However, it's not that hard to produce reasonable results. For example, peak white is usually considered to be around 100 Nits, so we if wanted to convert SDR subtitles to HDR, we could aim for around 100 Nits. It wouldn't produce perfect results, but it should be more than acceptable. Of course color/luminance matched subtitles wouldn't be perfectly matched, anymore.
The exact way video renderers blend subtitles is not standardized, each video renderer can decide for itself. There is an argument for blending subtitles late in the processing chain, because if video upscaling is involved, rendering the subtitles in the output resolution will result in a nicer anti-aliasing quality. Anyway, I don't think we need to define what a video renderer should do exactly. I'd just like to make it possible for a subtitle file to specify which exact video format (matrix + transfer) it was made for, so I could try my best to render the subtitles as near to the subtitle author's intention as I possibly can. E.g. consider the following situation: A 1080p Blu-Ray is released. A fansubber spends many many hours on making a perfect subtitle for this release. Later a 4K HDR Blu-Ray of the same movie is released. Now a user might want to simply reuse the same subtitle file. Ok, colors might not match perfectly, but wouldn't it be nice if it still worked "ok"? If the video renderer would naively blend the subtitles on the native transfer-function-composed RGB color space, then in this situation the subtitles would be as bright as staring into the sun, because a white SDR pixel (which should be around 100 Nits) becomes 10,000 Nits bright if rendered naively in HDR. If we add a header field to the subtitle file, then the video renderer can detect an HDR/SDR mismatch and can correct for it. Not perfectly, maybe, but reasonably. So why not allow that?
Both is possible, but I think it's usually done in compressed RGB. |
P.S: Just to clarify: I believe most pixels in a subtitle are either totally opaque or totally transparent. For both of these it doesn't matter if we blend in linear RGB or compressed RGB. It should only matter for half-transparent pixels, which are probably only the borders of each letter. There will be slight differences there in linear vs compressed RGB, but I don't think any user would notice, so I think we can safely ignore that. |
Keep in mind, btw, that 1080p HDR exists, as does 4K SDR, and BT2020 with Gamma (though I'm not aware of any real content with BT709 and PQ). So we will want to make sure to specify colorspace separately from transfer function.
For the record, the only HDR release of anime I'm aware of (Your Name), apart from being a poor upscale from 1080, was manually color-corrected from BT709-space RGB gamma to BT2020 PQ per-scene, so even with perfect headers, tone matching, etc., this wouldn't match. But I get your point in the general case. |
For text, true, good point. (Although in some circumstances the difference is noticeable—but ASS is doing it the wrong way anyway, blending in compressed RGB. I remember seeing discussions about Linux desktop environments rendering text in compressed RGB and how this was making some letters noticeably thicker than others. But this probably doesn’t happen often in ASS, because it tends to have larger font sizes.) But ASS also contains explicit vector drawings, especially in the crazy awesome subtitle effects that we’re dealing with here. And a good fraction of those is semitransparent, and blending them in linear RGB would be rather different from compressed. There are also animated fades, and they are often synchronized to compressed-RGB fades in the video. There is also blur; I don’t know how bad/different linear-RGB blur would look in general, but to complicate matters further, occasionally complex structures are built out of many blurred pieces, which I imagine would fall apart if rendered in linear RGB.
Note that this doesn’t affect the RGB colour space unless the renderer performs gamut correction or HDR conversion. This hasn’t been a big problem until now because gamut correction happens rarely [in anime] and usually isn’t terribly visible, but with HDR on the horizon it seems this will change. Now is as good a time as any to define that subtitles should have the same colours as if they were blended at any particular point in the rendering chain. (By the way, rendering/blending subtitles after upscaling is only obviously good for simple text: people make some crazy effects—and even some not so crazy, e. g. gradients or karaoke—pixel-by-pixel or pixel-row-by-pixel-row, which breaks after upscaling by a fractional factor because the pixel[ row]s start overlapping. But ASS currently provides no reliable way for a renderer to distinguish between those and normal text, so renderers simply choose to optimize for one or the other and display all subtitles after or before scaling.)
This seems like a very good intention. But if you don’t mind, first I’d just like to get us on the same page regarding the “…which video format subtitles are made for…” part. This assumes that subtitles are made differently for different video formats; but why? Is the reason the fact that HDR has absolute luminance and hence we need at least one option with absolute luminance in ASS? Is this true? Is there anything else? |
@rcombs, fully agreed! @astiob, fair enough. I've never even thought about blending in linear vs compressed RGB making a noticeable difference. I'm not completely sure what we could do about that, though? I think 99% of all renderers out there blend in compressed RGB, so probably we should consider that the rule?
If we want perfect color matching for an SDR movie, then we need to author the subtitle for that exact video encoding. If we want perfect color matching for an HDR movie, then there's no way to achieve that with an SDR subtitle, because (as explained before) the HDR -> SDR conversion is not standardized. So if we want to support perfect subtitle color matching for HDR movies, then there's no other way to achieve it technically than to create an HDR subtitle which is then blended onto the HDR video before modifying the transfer function or gamut in any way. |
Personally I'm strongly against that. Nonlinear blending is physically incorrect, ugly and should never be used. It's especially noticeable in blur and alpha blending. For example, correct linear blur (left) vs. broken as usual (right): While we cannot do anything about legacy broken by design standards (i. e. switch \blur to gamma-correct one), any new specification (HDR/ASSv5) should aim to fix that bug. |
I'd be fine with "requiring" linear blending in a future ASS spec version, or in a whole new subtitle format. But I don't think tying linear blending to a tiny new HDR information header field (as discussed here) makes a lot of sense. Just my personal opinion, of course. But just to make sure I understand your last sentence correctly: Are you saying that subtitles based on the current ASS spec need to be blended in non-linear light to produce the desired look? |
As far as I understand GPUs simply don't have sRGB-like HDR texture/framebuffer formats. Not as there is a need for that: usual floating-point formats are both linear and have improved resolution in the dark. So in case of HDR linear blending is the default.
Doesn't that the way VSFilter doing it? So long as VSFilter = ass spec we have to incorrectly blend in sRGB. |
Yeah, I think so.
Yep.
You’re quite right, but here we’re talking about ASS, not a new specification.
I’m not sure how this is relevant to ASS. Besides, you can stuff anything you like in a texture and lie to the GPU about the linearity of it if all you want it to do is blend two things together.
Only because HDR has absolute luminance and SDR doesn’t, right? Sorry I’m repeating this over and over, as our further actions probably don’t depend on the answer, but I’m just trying to understand this myself. Other than that, I think you’ve reaffirmed the impression I got earlier. Fundamentally, we need a way (a) to fix the compression curve shape and (b) to fix the absolute luminance scale. For the curve shape, we could declare that all ASS subtitles henceforth use the sRGB curve. Although it now occurs to me that this would actually break all existing subtitles unless media players treat video as sRGB. @wm4, I didn’t realize this initially when you proposed that ASS is/should be sRGB, but isn’t this a problem? Existing subtitles are made to match video when rendered in the same nonlinear RGB space as the video itself, and almost no video is sRGB. (Really, it’s even worse, as mostly it’s not clear at all what transfer curve any particular video is supposed to use for decoding. @madshi, please correct me if I’m wrong, but even when the video is explicitly tagged with e. g. the BT.709 transfer curve, it only really means that this curve was used by the camera, whereas the expected viewing conditions might differ—if the tag is even true in the first place.) Anyway, at least theoretically we could declare that all ASS subtitles use the sRGB curve and hope to get away with wreaking not too much havoc. It seems implausible to declare that all ASS has a fixed luminance scale though. And even if we do, it sounds like combining 8-bit sRGB with any fixed luminance scale is going to be grossly insufficient for allowing colour matching with HDR video, as either the black-to-white luminance range will be too low (hey, that’s why HDR exists) or, if we extend the range to cover all or even most of HDR, the 8-bit precision will be too shallow to allow even remotely precise colour control (and existing subtitle colours will be significantly shifted as well). So to sum up, it seems to me that we do physically need at least one new setting in ASS for switching to HDR, and there’s no way around it. It seems like it might be enough to have a simple Boolean setting switching between a specific, constant flavour of HDR and SDR. This would make HDR subtitles have perfectly well-defined colour spaces and colours (we could even ignore the YCbCr Matrix header for them), but it would not solve the problem that SDR currently has that the transfer curve and RGB gamut are unspecified. Maybe we don’t want to solve this, either, and prefer the simplicity of a single Boolean flag while accepting that the status quo will remain for SDR. |
I guess it’s possible. HDR is (from the sounds of it) so different from traditional gamma functions that blending in HDR-compressed coordinates feels almost like a third kind of blending in addition to gamma and linear. Like, I’m sort of used to how gamma-blending looks and how linear blending looks and I know what I can expect to see if I blend two things together with a certain alpha in one or the other. I have no idea what the result will look like in HDR-compressed coordinates. So maybe it’s not quite wise to simply blend in HDR coordinates because it’ll surprise people, but I don’t know. The colour values themselves will also be so different from usual that people will have to adjust anyway. And they’ll have to adjust to the new blending as well. Since blending will look different anyway, we may as well choose the new blending to be linear and have people adjust to that. From a different perspective though, complex effects in ASS are most often done to imitate special effects present in the source video. And I don’t know about HDR, but I reckon SDR source videos tend to have effects such as fades rendered in gamma-compressed colour. So it’s simply easier for the subbers if ASS uses gamma-compressed colour. If the same is true for HDR about HDR-compressed colour, then it would be easier as well to use HDR-compressed blending in HDR ASS. …But what if the video uses a different HDR space from the subtitles? |
@MrSmile, the transfer function doesn't have much (if any) effect on the texture formats the video renderer uses. And actually, if I want to output HDR to the display, I'm outputting 10bit integer, not floating point. It's also theoretically possible to output linear light floating point to the GPU, but that would require the GPU driver to convert this back to 10bit/12bit non-linear integer behind my back, because that's the only format HDMI supports. And my experience with relying on GPU drivers to do conversions "correctly" is not very good. So I much prefer doing everything myself, sending integer textures to the GPU for output. @astiob, one curve being relative and the other absolute is certainly a pretty dramatic difference. But there's more: An HDR encoding has pixels in it which are much brighter than what any SDR display can handle. So converting an HDR encoding to SDR is automatically lossy. You can either clip all the HDR luminance values to SDR. Or you can try to use some kind of compression curve. In any case, you have to actually modify the pixel luminance values in linear light in a non-linear fashion. Clipping looks really ugly, so any decent HDR -> SDR conversion routine applies some sort of compression curve. But the exact compression curve is not standardized. E.g. some displays modify every pixel's luminance, compressing dark image areas only very slightly and compressing highlights a lot. Other displays don't modify dark pixels at all, but start compressing only somewhere in middle gray/luminance. So the key factor is that an HDR -> SDR encoding actually modifies the pixel values in a non-trivial non-linear way that is lossy and can't be reverted or predicted. As a result, if you try to use an SDR subtitle track with an HDR video, you're in trouble. If you blend in HDR compressed RGB, it will look totally wrong. If you blend after having converted the video from HDR to SDR, it will still not be perfect because everybody does HDR -> SDR conversion differently, so the result is unpredictable. Which means, technically the only way to achieve color matching with HDR videos is to use a subtitle track which is custom tailored to the HDR video and is blended onto the HDR video before it's converted to SDR. Hope that makes sense? About SDR transfer functions: IIRC 99.9% of all SDR videos use BT.709/BT.601 as the transfer function (the transfer function is the same for BT.709 and BT.601, and also the same for PAL). sRGB is not usually used for normal SDR videos. So we should probably consider all current ASS subtitles to be BT.709. It's still debatable if you should invert BT.709/BT.601 to get to linear light or if you should use a simple power curve, because the linear segment in the BT.709 transfer function is meant to surpress camera noise, so if you invert that, you undo the purpose of the BT.709 transfer function. But as I said, this is debatable. In any case, VSFilter, libass and most video renderers simply blend the ASS subtitles onto the SDR video in non-linear light, so currently ASS subtitles simply share the transfer function of the SDR video, whatever it is. True, a simple HDR=true/false header flag would suffice, if we consider SMPTE ST. 2084 the only important HDR format. However, there are several competing HDR formats, so I wonder if it wouldn't be "wiser" to use a transfer function field. We could limit it to "SDR" and "PQ" (or "2084") for the time being, but that would give us the chance to extend it later if need be without having to invent another totally new field. I'd suggest to use "SDR" for current ASS subtitles, instead of specifying sRGB or BT.709 for the transfer function, to clarify that current ASS subtitles are simply meant to be blended onto the SDR video as encoded, without any funny transfer function modifications. Thoughts? P.S: About blending: I'm fine with whatever you guys think is best. |
All seems clear and fair to me. Thanks for the explanations! Out of interest, is the HDR-to-SDR conversion usually dynamic, varying with the actual dynamic range within the frame? E. g. if all absolute luminances actually used in a video are supported by the display, it doesn’t need to do any compression, right? But it can’t know in advance if this is the case, so does it adapt to what’s currently on screen and change the compression curve from frame to frame?
I was wondering why (or if) it was impossible to actually standardize this. It seems that the only physical obstacle is that screens have different luminance. I guess the conversion process theoretically could even be standardized, but it would still be parametrized by the screen’s physical properties, so it would not make an abstract purely mathematical/software conversion possible.
I remember reading a forum thread a while back that you might or might not have participated in where people were discussing this from the perspective that the decoding curve should match what the reference monitors are configured for at the production studio, but different studios configure their monitors differently. That time, it seemed to me that a pure 2.35 gamma was deemed the most common for Western films (based on accounts of people professionally calibrating said monitors), although unfortunately, nothing was said about anime. I’ve also read (maybe in the same thread, but I think elsewhere as well) that there is usually a nontrivial expected “end-to-end” gamma curve—i. e. the display/projector output should be darker than the camera input, because films are viewed in dark rooms on bounded screens whereas they are shot in conditions where bright light is everywhere around, and background light affects colour perception.
Wouldn’t it be possible to convert from one to another in a well-defined way? So technically we could just have one fixed format in ASS and ask the renderer to convert as appropriate if necessary. But indeed it may be wiser to allow choosing the exact format, to avoid problems when the video HDR has an even higher dynamic range than ASS and the fixed ASS-HDR would yet again be insufficient to covert all video colours.
Sounds good to me. By the way, are the different HDR formats significantly different? Like, there’s not a terrible lot of difference between the sRGB, 2.2, 2.35 and even BT.709 curves, and if you mistakenly apply the sRGB curve to BT.709 subtitles, the colours will look somewhat wrong but not bright as sun or such; no worse than if you blend the BT.709 subtitles on HDR-converted-to-SDR video. I’m not sure this would be useful, but could we say that subtitles marked as HDR always use the same HDR space as the video does much like SDR subtitles use the same SDR space as the video does? Regarding blending, it’s not clear how HDR-compressed blending should work if both the video and the subtitles are HDR but different sorts. Or if the video is SDR. |
Haha, if you're not careful, you'll become an HDR expert in no time... :) The current UHD Blu-Ray HDR spec is usually named HDR10 (for HDR 10bit), and it has static metadata, which means that you get information about the max luminance in the whole movie, but not per scene, or even frame. Then there's Dolby Vision, which uses exactly the same transfer function, but is 12bit+ instead of 10bit and supports specifying different max luminance values per scene or even frame. And then there's HDR10+ which was just published in January CES, which is HDR10 with dynamic metadata. Currently most TVs don't adjust to the luminance of each scene/frame. But both madVR and mpv have the ability to do so (by measuring the luminance of each frame on the fly). And with HDR10+, soon some TVs will probably do that, too - but only with UHD Blu-Ray discs that are mastered with HDR10+ support.
It should be possible, but nobody did it. IIRC the CE companies wanted to be able to choose their own algos as a means to separate from the competition. Anyway, there are some recommendations available on how it could be done, but practically, everybody does it differently. And of course you're right that the target brightness of each display plays a crucial role, too, so even when using the same curve, if the target brightness is different, so will be the curve parameters.
Yes, I remember that thread, too. Not sure if I participated or just read it silently.
Good question. I know that HDR10, Dolby Vision and HDR10+ all use exactly the same transfer function, so if we don't call it "HDR10", but if we call it e.g. "PQ" or "2084" then we have those 3 covered. However, there's HLG which I haven't fully studied yet. I think (but I'm not 100% sure) that HLG can be displayed as is on an SDR display, so the transfer function is somewhat similar to normal BT.709. But if a TV actually knows the content is HLG, then by doing some math it can reconstruct some HDR information again somehow, but as I said, I didn't study this one yet. I think SDR subtitles might work for HLG, but I don't know for sure. It might not be a bad idea if we had an option to tag subtitles for HLG, just to be safe, but we might not need it. Then I've read a while ago that there at least 2 more HDR formats which I know absolutely nothing about. There's a good chance they're completely difference once more. About blending: Although the transfer functions for SDR and HDR are very different, I think both to some extent approximate the human brightness perception (but the HDR TF does it much better than the SDR one). Which means there's a good chance that blending in non-linear light might look quite similar with both SDR and HDR. But I don't know for sure. |
Man this is such a load of bullshit. I really want to see how you're going to use ASS to perfectly match some video background in HDR and 4K. Apart from looking shit nobody will actually try this because there are no applications for this and won't be. If you really manage to achieve this with ASS, it will probably be shit slow and inefficient because 4K will have a lot of detail that is hard and inefficient to reproduce with ASS. Besides, even for "traditional" video the uses are declining because we're not getting any reports about stuff like this anymore. You're trying to come up with a solution for a problem that doesn't exist. And you're doing the same bullshit that caused the current idiotic colormatrix header, which I was against when it was introduced, and which I'm against even more now. I'll block any attempts to add anything like this as far as I can. |
This is mostly for Anime, where there often isn't a lot of "detail" or "texture" that needs to be reproduced. Making this work is actually very easy: Subtitles made for HDR should be blended onto the video before modifying the transfer curve, before tone mapping and before gamut mapping. If done like that, there should be no problem getting perfect color and luminance matching, and there's no reason for it to be slow, either, because the processing isn't any more complicated than it is for SDR. The key is doing the blending at the right moment. There's nothing more to it.
How are decisions made for libass? Dictatorship? Or do contributors vote? Or how does it work? |
Note that wm4 is also the lead developer of mpv, so if he really hates this, then mpv might never get support for it even if we add it to libass. Of course, libass is also used by other players, but I reckon the only one with significant market share is VLC, and I have no idea how much VLC cares about colour accuracy at all. Then again, if they want to support HDR video, then they’ll have to care at least a bit. Question for @wm4: does/will mpv support HDR video? How do you (plan to) handle simple ASS with HDR video? When subtitles are handled by the video output module, you (can) probably blend them in screen-native gamma-compressed RGB, which is easy enough. What about
I fail to see how or why this should be any different from matching colours to SDR video as is done now. And the whole point of matching colours is to make it not look like shit. I’m not sure why you need a special program to pick a colour or why you’re so sure that nobody will ever make one. If HDR video takes off (I have no idea), then I imagine people will want and make tools to work with it. @madshi, would 8-bit be enough for precise colour matching in HDR? If not, then it might indeed look like shit as wm4 said and end up unused. I do agree that it isn’t terribly wise to add a feature that will remain unused.
Are you saying that we are getting fewer bug reports about signs in libass, but not because we’ve fixed a lot of stuff but because nobody uses them any more? :-) It might be interesting to download the last n subtitle files off Nyaa and count how many use colour-matched signs, and compare this to n files from some years ago.
Once again, no. The colormatrix header is indeed idiotic if you will, because it ties RGB to a YUV matrix. This, on the other hand, simply specifies a previously unspecified parameter of RGB and is exactly like adding a colour profile to a PNG image or a colour space tag to an H.264 video. Also, what caused the colormatrix header was the fact that VSFilter rendered ASS colours in a haphazard manner, converting them to YUV using an underspecified conversion. This is precisely what we are trying to avoid when HDR comes along. If we do nothing at all now, the rendering of ASS on HDR video will likely end up looking very different in different players and player configurations, and all of the different looks will be ridiculously wrong. At the very least, even if we agree that nobody will ever want to make HDR subtitles and ASS should stay SDR forever, we should make this agreement explicit and define guidelines (if not an exact procedure) for how ASS should be rendered on HDR video. (Maybe SDR as well while we’re at it, even if just to codify the existing practice.) |
Of course it would be nice to have more than 8bit precision, but I don't think we can introduce that without breaking compatability, or can we? If we stick to 8bit with proper rounding, the max error we can get is 0.5 steps in 8bit, which is not nice, but I think it's going to be barely visible, if at all. Usually having more than 8bit is most important for smooth gradients. In HDR one 8bit step will be more visible than in SDR, but I still don't think it will be a real problem. Of course I haven't tested it, so this is only me guessing... But even if we can't achieve perfection, there's still the issue that subtitles created for SDR will look totally incorrect when blended onto untouched HDR video. And subtitles created for HDR will look totally incorrect when blended onto untouched SDR video. So for that reason alone I think we either have to add a transfer function info field, or alternatively completely disallow HDR subtitles. But even if we wanted to - how can we effectively disallow HDR subtitles? If media player / renderers currently simply blend subtitles onto untouched video, then if a user creates a subtitle for an HDR video, it will automatically become an "HDR subtitle", and there's not really any way we could possibly stop that. So isn't it much better to properly flag HDR subtitles, instead of giving up and accepting chaos? If we add an information field which mpv ends up ignoring, then mpv users are no worse off than if we don't do anything. But players / renderers which care still have a chance to provide a better experience. So IMHO mpv shouldn't dictate our decision. Besides, wm4 didn't like the "YCbCr Matrix" field, either, but in the end mpv now supports it, I think? So if HDR subtitles ever become a reality, I believe mpv will have no other choice than to support it, too. If HDR subtitles won't ever become mainstream, then there's no big harm adding a new header information field, which in the worst case would then be simply unused. Just my 2 cents, of course. |
By explicity setting all required information in the script, there is no need for the libass library user to feed it with information extracted from the video decoder and be burdened with all those extra implementation details and coding. The subtitle authoring tool is capable of doing that for the user. And since we dictate that script should match video, there should in practice be no mismatches, unless script is badly tagged or combined with different video. And if the required information is just a transfer function, then that is the only thing that should be added.
That is exactly what we have been saying all along, but I get a feeling that people are constantly misunderstanding it. |
I may indeed be misunderstanding something.
But libass doesn’t need to know video colorimetry in any case. Quite the opposite; if the script is tagged, then libass must add an API through which the video player can query this information. The less the script is tagged, the less information there is to speak of. libass currently doesn’t perform any colour correction, and you seem to concur that this should continue to be the video renderer’s job.
But right now you seem to be saying the opposite: that the script should contain explicit tagging, a full copy of the video’s colorimetry tags. This is an additional burden and a door for additional mismatches. You may expect the tags to match most of the time, but you still need to define what happens when they don’t.
To be fair, with enough cynicism, we could just try to assume the values typical for anime and ignore other possibilities, because that’s probably almost all content with heavily styled ASS typesetting. But even then, ambiguities remain: |
do you think we will ever run into video files (I'm only talking about video files here, not about subtitles) where the video decoding matrix is either BT.601 or BT.701 and where the gamut is BT.2020 at the same time? Or where the video decoding matrix is BT.2020, but the gamut is either BT.601 or BT.709 at the same time? Depending on the answer to that question, I may be in favor of or against adding a gamut header field. |
No I am not saying that. I am saying that script only should contain whatever information it needs to render subtitle without getting any knowledge about the video. If that required information is only transfer function, that that is the only explicit tagging in script that needs to be added. |
I… hope not, although you never know. At the very least, I’m not aware of any legitimate situation where such a combination is used or where I’d want to use it myself (although my knowledge is likely out of date). Are you perhaps thinking that the various 601 and 709 gamuts are close enough that you could pick an arbitrary one but the 2020 one far enough that you want it separated? I guess I could get behind that, as long as this gamut choice is restricted to SDR-on-HDR situations and doesn’t force SDR-on-SDR into video/subtitle gamut mismatches.
Oh, OK. But what does “render” mean in this context?
By the way, which exact part of HDR10+ (and others if there are others) is dynamic? I see above that it uses the PQ transfer function, so the dynamic part is… the primaries? Or some tone-mapping metadata? Is there even such a thing? If there is, then fully-specified video-independent rendering of subtitles would also need that. |
Option 1. So no info needed for libass itself. https://en.wikipedia.org/wiki/HDR10%2B |
Again, I didn't say I like this decision. I can recognize it for what it was and at the time it was right assessment that In practice you don't need anything more than
Exactly, "will" change. And after this change it can be reevaluated, but don't apply current or future state of things retroactively to 2010 decisions. As we stand now, we are still uncertain if this header will even be used/extended, so don't blame wm4 on that front. I don't even know why we argue about this historic choice, it was correct for almost all if not all content, no need to dive further into that.
In short bt.1886 is a transfer function that attempt to fix this status and was designed to match older content. Of course it is not perfect, you cannot go back it time and make everyone target it directly. But it was industry way to standardize SDR transfer function and everything should be using it, because it is the best thing we can guess right now. With that out of the way, I agree what you are saying. I won't quote more parts of it, but let me add some commentary. We have two main seemingly contradicting objectives. A is what we have now, we expect RGB values of ASS subtitles to be applied to the RGB video. The implicit assumption is that video YUV->RGB conversion done by the player will be the same as were done during mastering of subtitles. We don't care about transfer function because blending is done in gamma light, so whatever it is, RGB values of ASS should be the same. Or in case of older renderers we blend in YUV, but it is just converting RGB->YUV, which for SDR is easy and produce same results. The above approach is the reason why B is a extension of subtitle metadata to define what RGB values actually mean, we already have Now the concern is that extending ASS header with real metadata will introduce possible mismatch and remove the abstract nature of it. And that's true, ASS subtitles would no longer be strictly tied to video track. In practice though I don't see this is a big problem, simply because for SDR we would tag it with SDR(bt.1886) transfer function. Which should be exactly the same as inferred for video track in any media player / video renderer. Of course we can avoid naming it "bt.1886" and just go with SDR. Which is what would basically mean "some SDR transfer". Again in practice when converting to HDR it would mean bt.1886 internally anyway probably. In SDR mode we wouldn't use transfer. Which of course make it impossible to use those between different SDR files, but if we don't want to be specific all SDR would be SDR. I would be specific because why not, but this means that if user during playback override transfer of video, they would have to do it also for subtitle track, in case of course it were already matching. Saying all that, the question what are the real issues of tagging ASS files? I don't think that in SDR mode it will be the big deal it is made to be. Of course this means to break with the always video colorspace mindset, because in HDR probably most subtitles will be done in SDR, unless some extreme cases. I don't know what I wanted to say anymore...
In short (and simplification) dynamic metadata can specific minimum and maximum luminance per scene. Which greatly helps with tonemapping. Say you have 0-1000 nits static metadata for full movie, but compressing all scenes with constant factor is not good. Imagine now you have dark scene with peak of say 120nits, if you compress it with 1000 nits peak value you loose a lot of range that could be used to better represent this scene. In practice if you have 300 nits screen, you can display such scene in full, without any tonemapping, because it is in display range. I initially mentioned full hdr metatada too, but since it wouldn't be perfect anyway, we don't have to go crazy and just clip to current video frame luminance, 203 in SDR and in to "something" in HDR. |
Yes. I think we need to be aware of that we have 2 different goals we want to achieve:
In case of 1), all we need to do is for the video renderer to use the same YCbCr -> RGB matrix that Aegisub did, and we're already done. There's no need to perform any gamut conversion here, because Aegisub never does any gamut conversion, either. If we actually did try to do gamut conversion, it would actually destroy color matching. In case of 2), we've already given up on achieving perfect color matching, so there's no need here for gamut conversion, either, when talking about BT.601 <-> BT.709, because they're really not far away from each other. However, the situation is different when talking about BT.2020 because it's far more saturated than either BT.601 or BT.709. So as soon as either subtitle or video is BT.2020, but the other is not, we probably have to perform gamut conversion, because otherwise the subtitles will look either strongly oversaturated or strongly undersaturated. So to sum up, I don't think gamut conversion is ever needed - except for when there's a mismatch in gamut between the video the subtitle was made for vs the video the subtitle is later used with and when that mismatch involves BT.2020. Ok, now one could argue that we need a gamut ASS header field to catch this situation. But I was thinking that if BT.2020 videos always use both BT.2020 matrix and BT.2020 gamut, then we could get along without adding a gamut header field, because we could detect the mismatch by just looking at the YCbCrMatrix field. However, if there is a (reasonable) chance that we may see video files that mix BT.2020 gamut with BT.601/709 matrix, or vice versa, then we may have no choice but to add a gamut header field. Though, that would only ever be used for 2) above. Thoughts? |
I personally don't think it is needed. |
I’m under the impression the consensus so far was for this to be this is impossible (even if we’d create a completely new sub format) and at most “it will be off but not eye-gougingly bad when paired with a different video“ is achievable
mpv-player/mpv#13381 shows it already matters for SDR Also, I have more to say on the |
Is this a bad thing? In some scenarios masking colors may not match perfectly, but anything else will be close enough to look good for the user, which in itself would be improvement.
I was speaking in more generic terms. If we were to blend just after RGB conversion in gamma light transfer function doesn't matter. The issue above is more caused by the fact how libpalcebo pipeline works. But it is possible to avoid this dependency. Either way like I said I'm in favor of defining things instead of leaving them undefined, so issues like that does not happen regardless of possible renderer implementation details.
Yeah, we venture into off topics too much. But I don't see any real usecase currently where mangled subtitle colors would need to be fixed by this header except bt.601. In all other cases there is no reason for RGB values to need conversion. |
Not being eye-gougingly terrible on a different video is not a bad thing. It’s just that you earlier were contrasting “A) seamlessly blend with the bundled video” with this “B)” as conflicting objectives and correctly noted A) requires to fully match video properties. The thing is A) is not a nice-to-have goal, but a hard requirement for typesetting and therefore the ASS format. The question then is, can we add headers which fully preserve A), but at least somewhat improves on B) while
This to me seemingly implies there must be an easy way to gauge if it is the originally-bundled video and if so directly short-circuit to “matching video everywhere” (instead of say specifying all static and dynamic metadata of the video in sub headers which would break both 2. and 3.). Importantly, this detection must not have false negatives, but a small amount of false positives are imho acceptable. The question then is, what value would be good to gauge this and if different, what value(s) do need to be provided to avoid “eye-gougingly bad”? |
You basically have three main scenarios:
New tag can indicate that 1 or 2 applies.
|
Just to confirm, this is what I (we) were proposing earlier: just “any SDR”, and explicitly allow each player/renderer/user pick their own preference and tweak it as they see fit.
That’s what worries me about specifying a very explicit transfer function for SDR subtitles: SDR-on-SDR must match the video’s transfer function, which (IMO even after the advent of BT.1886) may differ from whatever default we choose now due to user overrides, renderer capabilities or even future video standardization work, so if we declare that SDR subtitles always use BT.1886 and nothing else, this implicitly seems to break SDR-on-SDR matching. We could, of course, word the declaration very carefully and make it abundantly explicit that it applies only to SDR-on-HDR, but that makes the spec only more complicated. BTW, if we do mandate a specific transfer function for SDR-on-HDR, do we want sRGB or BT.1886, after all? I do believe they’re different. And my impression from madshi’s comments is that we want to pair it with RGB primaries from either BT.709 (matching sRGB) or BT.2020 (different from sRGB). Side note: technically, AVC/HEVC/H.273 do allow tagging several EOTFs, all of which differ from BT.1886. But I’ve never heard of this being used, of course. |
Such a boolean toggle would allow simple dialogue-only subs to opt out of HDR (and colour mangling), but doesn’t help with the “make reused subs not eye-gougingly bad on different video” goal. Any HDR sub would set |
Extending on this, there seems to have been four goals brought up so far motivated by or overlapping with HDR
|
Good summary, TheOneric. Why do you think we need more than 8bit, though? Do you really think we can't achieve perfect color matching using 8bit? Usually, the only problem of low bitdepth is banding in color gradients. I doubt that it would prevent "perfect" (to the human eye) color matching? |
Typesetting must blend perfectly in the video. Generally 8-bit will be enough, but strong gradients, glows, and complex textures will require >8-bit precision to smoothly integrate the footage. Fansubbers do not want the viewer to be able to distinguish good typesetting from the actual video. So 10 or 12-bit will be needed in the future, and the base space will have to offer the same coverage as output space, if a common base space is chosen. |
I think we should first find a test case that clearly shows that 8bit is not enough before we make breaking changes to the way ASS works. Right now aren't we just speculating that 8bit might not be enough? I'd say it probably is enough, but it's hard to be 100% sure. Now don't get me wrong: I'm a big fan of using as high a bitdepth as possible, and I'm pretty angry with the UHD Blu-Ray creators for using only 10bit. However, adding support for more than 8bit to ASS would probably be a very big change and would break compatibility with probably all current software. So we should really be extra sure that it's really needed. |
It is already pretty clear that the hidden agenda is to try to push huge unnecessary format changes requiring all software to be rewritten... |
No. We need to find out if basic changes are good enough or an absolute waste of time for everyone. If all typesetting is hideous in HDR, it is all pointless and a new format will be needed. I would advise to start with a custom mpv branch where ASS is blended in video space+range (like mpv at the moment?), and a test command line option to specify all parameters that we theoretize can be useful to achieve:
Typesetters can then handcraft samples that can then be tested across various official exports of the same show (HDR+DV, SDR). Process has to be iterative with all parties involved, if the study outcome is "not possible to have good typesetting with HDR", then we need a new ASS version altogether, not a bandaid. |
That mpv test branch is a good suggestion. If a sub is tagged as
But these are all just variations of what has been previously suggested. The actual real life test could help determine which information is actually essential to have. |
Yeah, my expectation is that we'll have a few problems with applying a simple band-aid to HDR:
But that's just my guesses, and it's worth validating them before making bigger decisions. My biggest longer-term concern around adding 16-bit output is that it would roughly double our memory and bandwidth requirements, which perhaps demands an optimization like "only output 16-bit images when they involve color values in the HDR range; otherwise, output 8-bit masks and expect them to be treated as BT1886 0-100". Re: making larger API changes that require player rework, we wouldn't make such a change lightly; most of the changes being discussed here are things we've been discussing for years; e.g.:
If we're going to introduce new format features that require players to use new API surface, I'd expect that we'll also take steps to ensure that callers using the existing API will continue to see reasonable results; e.g. getting 16-bit output would require an opt-in, and HDR values would be clipped to the SDR range for existing callers. |
To add to this: bandwidth is already often a major issue in complex typesetting, as "more layers" is most typesetters' proverbial hammer. It's not at all uncommon for a large sign to require several megapixels of total bitmap size, even at 1080p (especially when diagonals are concerned). There are other, mostly unrelated, optimizations libass could implement to help with this, but in either case, simply doubling bitmap sizes across the board should not be done lightly. |
Upstream ASS specification says that all subtitles should be rendered with color primaries and transfer matching their associated video. But as expected after further discussion the decision has been made to fallback to SDR mode in case of HDR video. See-Also: https://github.com/libass/libass/blob/649a7c2e1fc6f4188ea1a89968560715800b883d/libass/ass_types.h#L233-L237 See-Also: libass/libass#297 See-Also: mpv-player#13381 Fixes: mpv-player#13673
Upstream ASS specification says that all subtitles should be rendered with color primaries and transfer matching their associated video. But as expected after further discussion the decision has been made to fallback to SDR mode in case of HDR video. See-Also: https://github.com/libass/libass/blob/649a7c2e1fc6f4188ea1a89968560715800b883d/libass/ass_types.h#L233-L237 See-Also: libass/libass#297 See-Also: #13381 Fixes: #13673
mpv (temporarily) enabling colourspace-matching for HDR videos, showed by now a noteable amount of subs exist expecting to remain SDR when placed on HDR video. After further discussion in libass#297 we thus decided to revise the default for subs on HDR to forego exact colour-matches, see: libass#297 (comment)
mpv (temporarily) enabling colourspace-matching for HDR videos, showed by now a noteable amount of subs exist expecting to remain SDR when placed on HDR video. After further discussion in libass#297 we thus decided to revise the default for subs on HDR to forego exact colour-matches, see: libass#297 (comment)
Hey there,
UHD HDR Anime content is starting to show up. Now fansubbers might start doing custom subtitles for HDR content. If we don't mark such subtitles as being made for HDR, video renderers will be in trouble, because they won't know for sure if the subtitles were made for SDR or HDR. Both is possible, because users might just reuse an SDR ASS file for the HDR video. Or users could create a new subtitle file for HDR.
So I'd like to suggest that we discuss a potential new ASS header field, e.g. something like this:
"YCbCr Transfer Function" = 709 (or "Gamma"?) | 2084 (or "PQ"?) | HLG | ...
I'm not sure if HLG even needs an extra entry, probably not because it's supposed to be compatible to SDR.
Now a video renderer could run into the 5 following situations:
What do you think?
Of course for this all to work properly, we'd probably need to add support for it in Aegisub, libass, (xy-)VSFilter and XySubFilter, plus in all (good) video renderers. I think the subtitle renderers (libass, VSFilter etc) probably only need to expose the header information to the video renderer. The video renderer should really be resposible for doing the dirty work. So it shouldn't be a lot of work for the subtitle renderers.
I could do the work for XySubFilter, maybe (xy-)VSFilter. You guys would have to cover libass. Hopefully somebody could be found to address Aegisub. Unfortunately the Aegisub forum seems to be down. Not sure if there's still any active Aegisub dev?
P.S: Oh, and while we're at it, the "YCbCr matrix" header field needs to officially get support for BT.2020, of course, maybe also for DCI-P3, for completeness sake. Does libass expose the "YCbCr matrix" field already? If not, it should do that, so the video renderer can apply color correction, if necessary.
The text was updated successfully, but these errors were encountered: