Ensure minimum contrast ratio as in xterm.js#8420
Ensure minimum contrast ratio as in xterm.js#8420kovidgoyal merged 10 commits intokovidgoyal:masterfrom
Conversation
|
Avoid control flow in shaders, it's quite costly. If you read the kitty shaders you will see I go to great lengths to avoid control flow. I can see you have basically copied the xterm.js code, which is fine for a first stab at it, but it can be improved in a couple of ways:
Finally please add docs to the option in options/definition.py |
|
On Sun, Mar 09, 2025 at 04:41:23PM -0700, arne314 wrote:
arne314 left a comment (kovidgoyal/kitty#8420)
Thank you for feedback. Though, it seems to me (not very familiar with branchless programming) that some kind of iterative approach will be necessary to converge towards an L (as in HSL) value that fulfills the required luminance.
Not sure I understand why. Say you have a color (r, g, b) and you want
to increase the luminance to a value L. Isn't it just a matter of
converting the color into HSL, say (h, s, l) the adding (L - l) to its
luminance and converting it back to RGB?
I didnt read the xterm.js algorithm in detail so perhaps I am missing
something.
Maybe this is a crazy approach, but we could have a lookup table (maybe a 3d texture to provide efficient interpolation, generated in python) to map from a given H, S and target luminance to the required L. I think the resolution (especially on the S axis) could be kept reasonably small.
Using lookup tables is possible, they are already used for gamma
correction, see srgb_lut. However, in this case I doubt the table would
fit in a uniform. GPUs are fairly limited in how much data is allowed in
a uniform.
I have created a simple plot to illustrate my idea where the red line marks the L required for 60% luminance.

Assuming this plot was generated using a closed form function, why cant
we just implement that function in GLSL to calculate the required
luminance as a function of S?
|
|
From a quick look at the xterm.js algorithm it is trying to change the contrast ratio, which is just a function of the luminance of the two colors. But it does so by adjusting the rgb colors. Rather than doing that implement the algorithm in HSL space. Adjust the luminance in that space it should be easy to derive a closed form function to do that given the under and over luminances. And once adjusted convert from HSL space back to RGB. |
|
The problem is that there is a difference between the lightness as in the HSL model and the perceived one (what we call luminance). For example, at the same lightness we perceive the color yellow to have a higher luminance than blue, which is why in the plot the luminance graphs (upper ones) don't scale linearly despite the linear lightness on the y-axis. The plot was created in the most straightforward way, I looped over all L and H values and on the L value that first met the target luminance (60% in this case) I inserted the red pixel. The luminance is computed from an RGB value (which I needed for the lower graphs anyway) using the same formula as in the shader code (which is default for sRGB space). Regarding the uniform limit, I wonder if using textures which are stored in a different memory is still fast enough as we would literally just need one lookup per cell. |
|
On Mon, Mar 10, 2025 at 01:40:03AM -0700, arne314 wrote:
arne314 left a comment (kovidgoyal/kitty#8420)
The problem is that there is a difference between the lightness as in the HSL model and the perceived one (what we call luminance). For example, at the same lightness we perceive the color yellow to have a higher luminance than blue, which is why in the plot the luminance graphs (upper ones) don't scale linearly despite the linear lightness on the y-axis.
So work in a perceptual color space instead of HSL, maybe HSLuv? As long as there is
a closed form mapping from RGB to the colorspace, it will work without
needing either control flow or lookups.
The plot was created in the most straightforward way, I looped over all L and H values and on the L value that first met the target luminance (60% in this case) I inserted the red pixel. The luminance is computed from an RGB value (which I needed for the lower graphs anyway) using the same formula as in the shader code (which is default for sRGB space).
Regarding the uniform limit, I wonder if using textures which are stored in a different memory is still fast enough as we would literally just need one lookup per cell.
It's currently one lookup per pixel, not one per cell as its
run in the fragment shader not the vertex shader. But, I suppose it
could be moved to the vertex shader. Maybe start by doing that in a
separate PR, just move the existing code into the vertex shader.
That is a performance win even without any other changes.
|
|
Here is GLSL code for HSLuv: I havent reviewed that code to see how performant it is however. |
|
Quickly glancing over the code, it has some simple if statements and ? operators that should be easy to replace with branchless versions. No loops. |
|
Regarding the |
|
glsl has isnan() and isinf() functions you can use them to normalize values in combination with step. I dont think you have to worry about NaN IIRC that mostly happens multiplying zero by infinity. Inf will be more likely to happen since there are plenty of divisions with variable denominators. I suggest make a function name safe_divide(num, den, sentinel) or similar that divides, and returns sentinel when isinf() is true. And then replace all potentially unsafe divisions in the hsluv shader with that. |
|
In fact coming to think of it you can implement safe_divide without isinf() by mixing on denominator == 0. Since GPUs dont trap on divide by zero this should actually be more performant. |
But wouldn't that |
|
Yes, that's true, will need to be more subtle than that. |
|
Depending on what you need the division to be in the zero case simply doing num / max(0.0000001, denom) might be acceptable. |
|
Or something like this: float safe_divide(float numerator, float denominator) {
float epsilon = 1e-10;
float safe_denominator = abs(denominator) + epsilon;
float result = numerator / safe_denominator;
result *= sign(denominator);
return result;
} |
|
There are still a couple of branches left in the fragment shader and I dont think mix() works with a bool for the third argument. Also, thinking about it rather than using negative numbers, lets use a unit suffix. so the setting can be text_fg_override_threshold 50% when no unit is supplied it defaults to % for backwards compat. In the future if we want to add another algorithm it can be easily accomodated by adding another unit. |
|
I do like the config solution a lot better. Unless built-in functions introduce branching, there is only one |
|
On Fri, Mar 14, 2025 at 04:16:40AM -0700, arne314 wrote:
arne314 left a comment (kovidgoyal/kitty#8420)
I do like the config solution a lot better. Unless built-in functions introduce branching, there is only one `if` statement left. Are you sure we should always perform the hsluv computations within it, as they contain quite a few expensive `pow` operations and usually only kick in for a few pixels at a time in rare scenarios anyway?
While it varies with GPU hardware quite a bit, branches on GPUs are much
more expensive than on CPUs, because of the higher parallelism.
We should anyway move the whole thing into the vertex shader if
possible, that will be a huge perf win.
|
|
Honestly, I don't know enough about shaders to take on that kind of refactoring. Also, how would that work with emojis which have more than one color? Is there a (straightforward) way to just ignore them in the contrast computations or a way to still process every pixel of them in the fragment shader while normal characters are processed in the vertex shader only? |
|
On Fri, Mar 14, 2025 at 02:09:36PM -0700, Arne wrote:
arne314 left a comment (kovidgoyal/kitty#8420)
Honestly, I don't know enough about shaders to take on that kind of refactoring. Also, how would that work with emojis which have more than one color? Is there a (straightforward) way to just ignore them in the contrast computations or a way to still process every pixel of them in the fragment shader while normal characters are processed in the vertex shader only?
Don't worry about then, I will look at at it after this is merged.
Basically the tradeoff is increased perf for no color correction on
emoji. There is already no color correction on images either. And I dont
think this feature is that important for emoji, so it might be a
worthwhile tradeoff.
Just remove the last branch and I will merge.
|
|
And just some general background on GPU branching. GPU's generally run the same code (machine instructions) on blocks of vertices/fragments. Older GPUs used to use larger blocks more modern ones have more shader execution units so use smaller blocks. This reduces the cost of branching. However even modern ones have blocks of ~10 fragments. Here we are talking of a "dynamic" branch, aka a branch that depends on per fragment data, this makes it not possible to execute the same code on a block, thereby giving us a significant slowdown per branch. Although it's more complicated than this, GPUs will actually perform both sides of the branch in parallel and mix the results anyway. So the actual cost is a halving of the available number of shader execution units. What that translates into in terms of actual performance is hard to say, it's very hardware dependent. Indeed many compilers will actually elide branches by doing the mixing themselves in the generated machine code. I prefer to be explicit about, so we are not relying on compiler behavior. |
I do very much agree with that. |
|
Cool, am travelling fora few days will merge this after I return. |
|
Merged, and moved code into vertex shader. |


This PR adds an alternative algorithm (same as in xterm.js) to ensure readability in low contrast scenarios, as discussed in #8417.
The algorithm wil be used when a negative value is provided to the
text_fg_override_thresholdconfig option. To for example meetWCAG 2 level AAa value of-4.5could be provided.I am very much open to changes and happy to provide documentation as well.