Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description of algorithm? #35

Closed
makew0rld opened this issue Jan 4, 2021 · 10 comments
Closed

Description of algorithm? #35

makew0rld opened this issue Jan 4, 2021 · 10 comments

Comments

@makew0rld
Copy link

It'd be nice if the README or another file had an in-depth description of the algorithm, so it could be implemented in other programming languages.

@leeoniya
Copy link
Owner

leeoniya commented Jan 4, 2021

it's actually pretty simple, but has some deficiencies for which there are manual knobs that can be tweaked. such as minHueCols, boxSize, boxPxls. the current results with small palette sizes are also not great. i've been playing on and off with other strategies, such as using a linear color space (HSLuv) as well as better bucketing by lightness, hue and saturation to remove the knobs. this has me hesitating to officially describe the algorithm, but the general gist is:

  1. divide the image into a grid with cell size of boxPxls, the default is 64x64.
  2. for every cell, take any colors that occur >= boxPxls times. and add to an initial palette, keeping track of the raw occurance count.
  3. sort the initial palette by occurrence counts (highest to lowest).
  4. walk the palette and remove any subsequent colors that have a color distance below a certain threshold.
  5. if the resulting palette ends up bigger than the one we're looking for, increase the threshold and repeat. if the palette is smaller than the one we're looking for add back the colors that were removed in the previous pass and have the greatest color distance and/or count.

that gives you the color palette. then we just walk the image and find the closest color in our palette to the current pixel value and assign it. that's not terribly in depth, but there's not much more to it that this. it works surprisingly well, but it's not terribly fast, and sometimes needs manual tweaking of parameters.

@leeoniya leeoniya closed this as completed Jan 8, 2021
@makew0rld
Copy link
Author

makew0rld commented Jan 8, 2021

Thanks for explaining! This would be nice on the README.

take any colors that occur >= boxPxls times

If boxPxls is 64x64, does that means the color must occur 4096 times, filling the box? Or just 64 times?

remove any subsequent colors that have a color distance below a certain threshold.

How are you calculating color distance, using Euclidean? And what are you comparing it to, like if I have a color in the palette, what's the other color I'm using to calculating distance?

such as using a linear color space (HSLuv)

If you're not already doing it, you should use the linear RGB color space internally, converting to and from sRGB for input and output images. That will improve your color distance algorithms because any numerical distances are now linear. See here and here for more info.

Although as you probably know, Euclidean + linear RGB isn't really the best way because it doesn't map to human color perception at all. Some research led me to this page, which describes a color distance algorithm that operates on RGB and is quick. The color space of RGB isn't mentioned, but I think it is sRGB, as it doesn't mention conversion and none of the StackOverflow responses that use it do either.

Some other ways would be euclidean + CIELAB or using CIEDE2000. I suspect the latter will be too slow, but I have no idea about the former.

@leeoniya
Copy link
Owner

leeoniya commented Jan 8, 2021

If boxPxls is 64x64, does that means the color must occur 4096 times, filling the box? Or just 64 times?

the color needs to occur >= 2 times inside a 64x64 square.

How are you calculating color distance, using Euclidean?

yes. it's Rec. 709 component-multiplied sRGB: https://github.com/leeoniya/RgbQuant.js/blob/master/src/rgbquant.js#L703-L729

And what are you comparing it to, like if I have a color in the palette, what's the other color I'm using to calculating distance?

each remaining color in the palette. (it's an iterative palette reduction loop).

Although as you probably know, Euclidean + linear RGB isn't really the best way because it doesn't map to human color perception at all.

"at all" is an over-exaggeration, but CIEDE2000 is very expensive. HSLuv (linear perceptual) + eucleadian or MSE is probably the best route here (similar to euclidean + CIELAB), as it is based on CIELab with additional adjustments for uniformity: https://www.hsluv.org/comparison/

if you have any color distance improvements (gamma correction, etc), that don't destroy performance, then i'll definitely review a PR.

i have a hand-ported HSLuv implementation that is massively faster than the official JS version of HSLuv. if you're interested in working on this, i can publish it.

@makew0rld
Copy link
Author

the color needs to occur >= 2 times inside a 64x64 square.

Thanks. Was saying "colors that occur >= boxPxls times" a mistake then? Just trying to make sure I'm not misunderstanding anything.

Thanks for linking to HSLuv, I haven't heard of that before and it seems very useful.

i'll definitely review a PR.

Sorry, I don't know JS. I'm working on some Go libraries that deal with color and I came across this (thanks to the ditherit.com README), and so I was wondering how it worked, because I might want to re-implement it.

i have a hand-ported HSLuv implementation that is massively faster than the official JS version of HSLuv. if you're interested in working on this, i can publish it.

Definitely publish it! The world will benefit. I might not be able to use it directly, but I might be able to port the performance improvements to my Go libraries.

@leeoniya
Copy link
Owner

leeoniya commented Jan 8, 2021

Was saying "colors that occur >= boxPxls times" a mistake then?

no. have you seen https://github.com/leeoniya/RgbQuant.js#usage

but I might be able to port the performance improvements to my Go libraries.

the port doesnt do anything fancy, it just avoids allocating objects during conversion process (which the Haxe->JS compiler doesnt do). you should just port it directly from the C impl, it's pretty easy: https://github.com/hsluv/hsluv-c

@makew0rld
Copy link
Author

Oh I see, thanks. I think you said boxPxls instead of boxSize somewhere else:

divide the image into a grid with cell size of boxPxls, the default is 64x64.

the port doesnt do anything fancy, it just avoids allocating objects during conversion process

Ah ok, thanks. There's already a Go implementation so I'll probably just use that.

@leeoniya
Copy link
Owner

leeoniya commented Jan 8, 2021

oh, nice. it looks like a native impl (not a haxe traspile): https://github.com/hsluv/hsluv-go

@leeoniya
Copy link
Owner

leeoniya commented Jan 8, 2021

i have an experiment locally that tries to do an octree-type strategy after HSLuv conversion with 6x6x6 buckets (lightness x saturation x hue). then each bucket is reduced using the above strategy, and then combined and reduced again. this prevents having to compare literally all colors against all other colors during the palette reduction passes. this has shown some success, but i havent had time to pursue it further. this strategy also removes the grid-subdivisions and minimum color threshold requirements, while guaranteeing that all hues are represented. there needs to be additional assessment based on bucket statistics to tune the pruning aggressiveness, but i haven't had time to make further progress in a few months.

if you end up trying this strategy with any success, i'd be interested to see it!

@makew0rld
Copy link
Author

I'll look into it and see.

That reminds me, I still have a question.

And what are you comparing it to, like if I have a color in the palette, what's the other color I'm using to calculating distance?

each remaining color in the palette. (it's an iterative palette reduction loop).

I'm confused what this means. How do you end up with one number in the end? Are you taking the distance between the chosen color and all the others and averaging that?

@leeoniya
Copy link
Owner

leeoniya commented Jan 8, 2021

take an initial palette (sorted by global color count):

red, green, redish, blue, pink, yellow, orange

loop 1 with a pruning color distance of < 0.1:

red (keep)
if dist(red, green) < 0.1 {remove green}
if dist(red, redish) < 0.1 {remove redish} // redish removed
...

loop 2:
green (keep)
if dist(green, blue) < 0.1 {remove blue}
...

the whole time you keep track of the resulting palette size. if at the end of this loop you have a palette that's still too big, then you increase the pruning distance slightly and repeat. if you over-reduced, then restore any previously removed colors with the highest counts and/or greatest distance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants