Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The format is still inefficient for images having 64 colors or less #194

Closed
verdy-p opened this issue Apr 9, 2022 · 22 comments
Closed

The format is still inefficient for images having 64 colors or less #194

verdy-p opened this issue Apr 9, 2022 · 22 comments

Comments

@verdy-p
Copy link

verdy-p commented Apr 9, 2022

This format is still inefficient for images having 64 colors or less: all colors shoult fit inside the 64 internal palette, but the hash creates havoc with collisions, so in fact less than they are overriden and OP_RGB chunks are used repeatedly.

So instead, for such images, a new "colorspace" value should be fined to allow a "palettized" format.
I suggest: colorspace=64+(number of colors in the palette)-1 (the number of channels may be 3 or 4, but it does not matter, except for the remark below for the format of the palette).

  • When this is used, it is followed by the array of RGBA or sRGBA colors in the palette, each one coded as 4 bytes (scaled R,G,B+linear A, or linear R,G,B,A); this array has a size indicated by the "colorspace" value (for RGB images, just set alpha=255 in the color palette). (If the number of channel is considered, we may eventually allow colors in the palette to be coded as 3 bytes for 3 channel, discarding the implicit alpha=255 byte for each color, but this would save only 1 to 64 bytes on the total image file using 3 channels instead of 4; note that with channels=1 we could as well just store each "color" as a single byte for the grayscale, but we should also be able to specify the dominant RGB color used for the mid-range brightness represented as brightness=128 value, allowing to alter the greyscale by "tinting" it on the rendering, and the renderer could reconstrust the RGB palette for this scale still going from black to white. With channels=2 representing a 2D colorscale, with 2 dominant colors represented as 2 RGB values for the midrange brightness=128 value and then each pixel would just encode the two channels according to this scale, and the renderer would still be able to reconstruct a true RGB-based palette based on the reduced palette selecting some positions used in the complete 2D colorscale by effective colors of pixels in the image; 2-channel images are typically used for diagrams and charts in statistics and commercial presentations, with a "neutral/greyish" tint for most contents and an additional "saturated" tint for emphasizing some remarquable elements, or with two distinct saturated basic tints like brown-yellowish and blue-cyanish at mid-range, for coloring data on two independant axis of values).
  • Then we need a new separate set of chunks, because now individual pixels can have only one of index values from 0 up to 63 (maximum), and there are separate cases depending on the number of colors in the palette:
    ** 1 color: the image is a flat monochromatic image, we don't need any other chunks, as we know the number of pixels from the header fields containing the image's width and height. The palette contains only one entry (4 bytes).
    ** 2 colors: the image is bicolor and so the pixel array is reduced to 1 bit per pixel to specify the color index. The colorspace=65. The palette contains 2 entries (8 bytes). We need a new efficient set of chunks for compressing the set of pixels:
    *** Ideally this should be done using groups of 4 pixels which are represented as 4 bits (before compression), each bit indicating a color index 0 or 1 in the palette; here we can have run-length encoding, but it will be based on groups of 4 successive pixels (if the image size is not a multiple of 4, it is endoded as if the missing pixels that will be encoded had the implied color index 0; the decoder will ignore/discard them). Chunks are themselves represented as with a leading byte with 1 bit for each chunk (8 chunks are grouped and ordered from left to right in the same byte) indicating the type of chunk; followed by as many bytes needed to set chunk values.
    *** the chunk type can be 0: this indicates that the chunk value contains a pattern of 8 bits (1 bit for each color index in the bicolor palette); when it is used, this pattern will be added in a cyclic cache of patterns, which can contain of up to 129 patterns.
    *** the chunk type can be 1: this indicates that chunk value contains 1 byte specifying in its most-significant if it is either:
    **** RLE-encoded: MSB=1, followed by the (number of repetitions from 1 to 128) minus 1, encoded as 7 bits, or
    **** it references a relative distance in in the cyclic cache of patterns: MSB=0; this cannot be the last seen pattern (that was added to the cach when chunk-type was 0), so the relative distance is from 1 to 128 only, and can be encoded in the chunk value as 7 bits.
    ** 3 or 4 colors: each pixel is representable by a 2-bit color index in the palette, this requires a new set of chunk types (TBD)
    ** 5 to 8 colors: each pixel is representable by a 3-bit color index in the palette; this requires a new set of chunk types (TBD)
    ** 9 to 16 colors: each pixel is representable by a 4-bit color index in the palette; this requires a new set of chunk types (TBD)
    ** 17 to 32 colors: each pixel is representable by a 5-bit color index in the palette; this requires a new set of chunk types (TBD)
    ** 33 to 64 colors: each pixel is representable by a 6-bit color index in the palette; this requires a new set of chunk types (TBD)

For more than 64 colors, the existing QOI format with colorspace=0 (sRGB or sRGB+A) or 1 (linear RGB or linear RGB+A), but without using any encoded palette is still usable (ideally for coloruful icons, or losless format for photo) as a good substitute to PNG.

The QOI format is still not usable as a format for lossless HDR photos (with more than 8 bits per colorplane, typically 10 or 12 bits, usually with a non-linear value for best rendering, or using 16-bit floatting points). This should require new colorspace values (values 2 to 63 and 128 to 255 are still not used in this proposal) in the header and a new set of chunk types as well.

Otherwise, JPEG or JPEG2000 are used when we sacrifice fidelity. Or existing "RAW" photographic formats or TIFF (but no compression at all, these photos requires much storage; for HDR video, this often does not work without using excessive memory or I/O bandwidth, lot of power on batteries for cameras or smartphones, and this requires costly storage memory cards for high bandwidth, but this is not a problem in static computers that are DC-powered and using lot of RAM and fast arrays of SSD, and can also perform more complex compression and encoding if needed to save RAM space and I/O bandwidth for reliable persistant storage, or if they can transmit the live video over a fast network, for remote processing on unlimited computing and storage resources).

As well there are new colorspaces that use more than 3 components (R,G,B) in additive color synthesis. Typically there exists at least:

  • CMYK for printing and books (with added black, used with negative color synthesis)
  • RGBW (with added white) or RGBY (with added yellow) for new monitors (notaby those made to render HDR photos and videos, or scientific and medical images); Samsung is developing this for new flat panel

All colorspaces can have an alpha channel; but usually the alpha channel rarely needs 8-bits per component, and it can also be frequently reduced to just 1 bit per pixel (i.e. plain color or fully transparent). QOI for now only supports 8-bit alpha (with colorspace=1) or none (with colorspace=0). May be we should have a specic bit indicating that the alpha-channel exists but is 1-bit encoded (this could use the compression pattern described above for bicolor images, except that the alpha-channel itself does not need a leading palette).

@sectokia
Copy link

sectokia commented Apr 10, 2022

About Palettes
The problem with palette's is that they cannot be done without violating the 'core' (well to me anyway...) concept of QOI which is to have the encoder and decoder work in 1 pass without any pixel pre-knowledge and with out iteration loop for any pixel.

Unless the palette is sent as pre-knowledge to the encoder, there is no way for the encoder to know the 64 colors to use in the pallet (other than to loop over the entire pixel stream one time to determine it, then again to encode it).

Not only that, but the encoder would have to loop over the palette - for every single pixel it encounters - to figure out its palette value - which would absolutely decimate the encoders speed (the average pixel would need 32 comparisons to the pallet before it found the matching entry).

So while I agree with your ideas would work for getting the information into the file. They do not work in terms of encoding fast / in a single pass.

About 10/12bit:
The big reason QOI works as good as it does is the INDEX/DIFF/LUMA types. Particular DIFF/LUMA handle subtle changes/noise fluctuation. At 10/12bit the magnitude of these changes is 4x and 16x higher. This means DIFF/LUMA need to have all the bits added if there is any hope to achieve similar compression ratios.

This means for 10/12bit DIFF would have to be 12/18bit instead of 6bit. And LUMA would have to be 19/25bit instead of 13bit. And then you have to somehow make these fit nicely to byte alignments. And even if you do that, you don't achieve as good a compression (example DIFF is no longer 24bit to 8bit, but 30/36bit to probably 16/24bit).

@oscardssmith
Copy link

this isn't quite right. An encoder could use a hash table to count the number of colors, so it would only take roughly 1 compare per pixel. (and this hash table can be very small, 512 colors worth of storage should be enough to have very few hash collisions if you can bail out if you find more than 256 colors).

@verdy-p
Copy link
Author

verdy-p commented Apr 10, 2022

About the palette: that's false: there's no need of two passes for the encoding step itself: the image can be encoded in one pass, and stored, and then the palette stored on the fly. During the encoding this palette can remain in memory.

and the lookup of the palette will still very fast because it is a very small table (64 colors at most, this entirely fit in the internal processor L1 data cache

However what is true is that the encoder would first need to count colors in the image in a first pass to determine the format to use. But we can imagine that the encoder starts encoding for a palette, as long as it has not encountered more than 64 colors. But if it finds a 65th colors, it can either: restart the encoding... or continue the encoding from this pixel using the chunks for 8-bit channels (note that the initial color will not be plain black but will be the color of the last pixel encoded with the first encoder before encountering the 65th color).

@oscardssmith
Copy link

I'm also not sure how 10/12 bit images would work, but it is worth noting that they are typically used in HDR color spaces. The impact of this is that you probably get an extra bit or 2 more in common than you would expect from a basic analysis. That said, to get efficient encoding, you probably would want to add something like a rolling average based tag that extrapolates based on the last few (maybe 8) pixels.I'm guessing that this will be more efficient than only looking at the previous pixel.

@verdy-p
Copy link
Author

verdy-p commented Apr 10, 2022

Note also that the processing speed is theoretical but based on a false assumotion: in reality for very large images, the speed is not bound by the number of CPU cycles per pixel, but by the memory access time (or worse the I/O access time for the input and output of the encoder).

So any gain in the ouput size will IMPROVE the speed significantly (by avoiding memory swaps to disk or reducing the I/O accesses for the ouput: that's why for long video streams, a good compression ratio is always needed, otherwise you cannot support very high resolutions or large colorspaces or high frame rate without exceeding the I/O bandwidth or using lot of energy or requiring lot of RAM for containing all the ouput: you would need to sacrifiace the frame rate, or resolution, or colorspaces and it could be much more lossy thant using a "lossy" compressor with a good compression ratio for MPEG, or even JPEG for battery-powered mobile cameras with slow processor, and low amounts of RAM; these cameras reduce the CPU and battery by using hardware accelerators for the DCT transform and Huffman coder and need also a good compression or to sacrifice the resolution because of the limitation of I/O bandwidth to their SD card for videos, whereas this is less a problem for a single snaphot that can use the highest resolution and colorspace: writing to typical SD cards or small flash drives allows a sustained bandwidth not exceeding about 7-30 megabytes per second; if you don't compress a video and want to keep it raw/lossless, you cannot achieve high resolution and high framerate).

And for small or average images, the very few cycles for the color table lookup in the encoding or decoding (not demonstrated at all as the lookup table is very small and fits in cache) will be insignificant.

@sectokia
Copy link

this isn't quite right. An encoder could use a hash table to count the number of colors, so it would only take roughly 1 compare per pixel. (and this hash table can be very small, 512 colors worth of storage should be enough to have very few hash collisions if you can bail out if you find more than 256 colors).

If you have a hash table of 512 colors, then you have a 12% chance of hash collision adding just 64 colors.

@sectokia
Copy link

sectokia commented Apr 10, 2022

and the lookup of the palette will still very fast because it is a very small table (64 colors at most, this entirely fit in the internal processor L1 data cache

If this was true QOI would be using a look up table of past 64 values instead of a 64 hash table? It doesn't really matter what the L1 cache is when you are talking about a 30x fold increase in number of instructions.

@oscardssmith
Copy link

@sectokia that's not a problem. hash tables use something called probing that allows you to deal with this. Specifically, if you assume a good hash function, 88% of colors will be successfully found in 1 comparison, and only about 1% of them will need more than 2 comparisons (and none will need more than 3). Also, the reason QOI doesn't use the past 64 values is that you couldn't decode that quickly. The key difference here is that QOI's table has a very limited size because you have to store the index. For something that only affects encoding, using a bigger table is trivial.

@Brian151
Copy link

I'm also not sure how 10/12 bit images would work, but it is worth noting that they are typically used in HDR color spaces. The impact of this is that you probably get an extra bit or 2 more in common than you would expect from a basic analysis. That said, to get efficient encoding, you probably would want to add something like a rolling average based tag that extrapolates based on the last few (maybe 8) pixels.I'm guessing that this will be more efficient than only looking at the previous pixel.

does HDR use true 10-bit or cheat and align to 16 bits? dealing with anything that doesn't line-up at least with whole bytes usually causes quite the hassle. unfortunately familiar with it, and it's very hard to wrap one's mind about how to code it, versus intutively understanding "well, these bits are my value" 12 bit would be relatively painless. you could break it into 3 4-bit passes. you could break 10 down to 5 2 bit passes. ideally, you just want to straight read/write the largest possible chunk at a time, and THAT is where this gets complicated/painful.

@verdy-p
Copy link
Author

verdy-p commented Apr 10, 2022

To straight read/write the largest chunk at a time requires memory and more CPU cycles to refill its cache. If we want speed, we need a limit to chunks of data, and should avoid using successive passes that each span the whole image. Instead these passes should be chained on each given chunk (e.g. with chunks of at most 64K pixels, may be lower if there's a hardware limit on caches, so that we don't suffer extra cycles; L1 data caches in moderns CPU are about at least 4 KiB and up to 64 KiB, this is the msot critical and smallest cache, and a good estimate for the maximum number of pixels to keep in the same chunk for each pass should take this into account: we can reserve at most 32 KiB for caching image data kept for the next pass, and with HDR raw images using at most 32 bits per pixel for all its channels, this means that chunks should not be larger than 8 Ki-pixels and this is still a reasonnable limit for keeping good compression ratio, without using too much memory, or exhausting too fast the CPU L1 cache, even if there's a L2 cache which is fast but still a bit slower). This cache effect will be visible if you need more passes, but successive passes will use a very amount of cycles compared to the first pass where the cache is empty (does not contain any prior image data). So successive passes will have a very small impact on theoretical performance. But the real advantage will be gained in the reduction of the output bandwidth offered by a better compression: as we still get a good compression, even when using multiple passes, we instantly save lot of time for large images and images with many channels or high color depth such as HDR, and lot of I/O and delays to store the result or transmit it to a network: the massive amount of wait cycles we gain on the ouput largely surpasses what we loose by the few cycles we use by using passes for several passes on the same data chunk.
As QOIF is all about performance (while preserveing the fidelity by using a lossless compression), it is still interesting to note that a better handling of images with few colors (64 colors or less) is still to consider, because QOIF (as it is for now) generates too much useless databits, and it is still unable to handle correctly high color depths.
Note that I have not proposed using variable-width compression like Huffman, which uses bit patterns that requires lot of bit shuffling and suffers from other problems like keeping data alignment.

As well Huffman (or arithmetic coding which is a more advanced variant using rational bit patterns rather than integer bit patterns) requires parsing the image as well in a first pass to get statistics: but these statistics could be also computed by chunk, or modified "on the fly" until a limit is reached to change them for the rest of the image. But updating statisitc on the fly is complex to program, while it is much simpler to optimize when handling data by chunks (and if needed the next chunk can preserve some statistics computed and used for the previous chunk, so that it will not start with zero-knowledge). The same can be said about dictionary-based compression (Lepel-Ziv-Welsh or similar) which is also a much better extension of the RLE compression that is all what QOIF uses (with poor compression level) on top of a limited diff-based compression (which is where it offers the most frequent compresion level, except for diagrams/schemas with low colors where RLE works more often but still poorly as it wastes lot of bits for images with low number of colors).
In summary QOIF is a good start, it may be interesting on a small subset of images, but it is not general enough and has too many caveats that will occur frequently in practice. This means that it cannot be described (for now) as a "replacement" for PNG. It is even worse than other lossless formats like TIFF.
And I'm not convinced that it will soon replace the existing "RAW" image formats used by cameras in photography, or in professional video editors running on fast and costly hardware and using much energy. For image creators/designers/artists, it will be safer for them to work on raw image formats (and they don't need a very high bandwidth of I/O when they just load and save images and work on a small visible chunk of the image to apply manual prrogressive and manual modifications. If they ever try some computed effects on the image they don't need to see the result instantly and being able to rollback or change effect settings instantly to apply them to the whole large image (with the risk of this effect to be very lossy).
But a lossless compression with low impact on performance and visible effect on total file size will always be beneficial. It has to be thought however to be enough general to be applicable to various types of images with different needs in term of pixel resolution, colordepths (natural photography, graphic arts, schemas/diagrams, icon sets, facsimiles of documents...). And a lossless compression should also be able to manage different color schemes (in different colorspaces): the linear RGB or sRGB colorspaces are not the best for all even if they are general (but QOIF currently limits them to only 8-bit per channel: often wasteful, but also lossy for HDR, and almost always wasteful if we add an alpha channel coded only in 8-bits)

@verdy-p
Copy link
Author

verdy-p commented Apr 10, 2022

About 40 years ago I wrote an image coder in 8086 assembly language (that was running on a 80286 hardware). It was coding grayscale images from cameras and did not use any other hardware. it was meant to be able to capture a "live" video from remote cameras, connected to a network (at that time it used a 64 kbit/s data channel over ISDN. That algorithm even allowed to have a simple way to detect movements and send alarms (the network connection was also secured and monitored remotely) and being able to store locally a video, and replay them back even before the alarm was sent and the captured images were viewed. It was successfully used in railway stations, restricted areas of airports, logistic centers and it was vastly better than every other existing system that was only able to take a single snapshot and unable to replay them.

At that time, JPEG did not even exist, and there was no hardware acceleration in CPUs or graphics cards and there was still no dedicated camera chips.

There was a proposal for what would ultimately become JPEG. But i used other technics to compress the image, notably I succesfully created by own format, based on 2D geometric analysis: an improvement over RLE and differential coding, where the image was scanned in BOTH vertical and horizontal directions at the same time.

The concept was just to generate a series of rectangles (with variable width and height), initially aligned from the top of the image, then continuing to scan to fill the "holes" left behind these rectangles, starting by holes left nearer from the top.
IT was very successful, and did not even require JPEG or similar, in fact this was even compressing better than JPEG alone and faster (it was important because it had to run on small hardware, it had to use very low amount of energy, so that it would not exhauts the battery too soon: the goal was to create remote surveillance camera).

Then I had improved the computing speed by first implementing Huffman compression (later the arithmetic compression but the result was very modest where the code to do that was much more complex and hard to debug), and the DCT transform (similar to what would be used later in JPEG). Almost all was done in assembly language (tests using existing C compilers were that they did not optimize sufficiently, or that the encoder used too much energy and we were concerned by battery life, even if the remote cameras would be DC-powered, they had to resist to a DC-power failure at least for several hours or days while keeping the system alive, ad we did not want to require the use of very massive batteries as the system had to fit in a box not exceeding about one kilogram so that it could be installed anywhere, in very inacessible places not necessarily on the floor, on a mast, but possibly below a light roof).

Also I had already applied the differential transform for capturing videos and compressing them by fixed amount of frames (and using only a few base frames that could be easily used for locating some timecode for example one base frame every second, all other frames being differential). This also allowed a dramatic reduction of the ouput size and huge reduction in terms of CPU cycles, energy used: this allowed to use this to gain in resolution, and finally to add some hue coloring (as this was used for remote surveillance, getting high-fidelity for colors was not a real goal, but we had to find ways to solve the problem caused by noise, notably with the temperature of the CCD camera sensors, which largely depended on the refresh rate: it was easier to manage this noise by just improving the resolution and using better sonsors that used lower amount of energy and having some colors was secondary even if it could help eliminate some noise and better detect some forms and movements).

That system has worked for years, until the Chinese manufactirers developed the first cameras embedding hardware chipos for compressing photos (JPEG-based) and later videos. Now you can find cheap USB- or Wifi-connected cameras everywhere in shops at very low cost, that can just a small battery and that you can place everywhere.

But I'm convinced that lossless compression is not dead and my old idea using 2D geometric ananalysis can still outperform the basic ideas based on RLE+DIFF applied in only one dimension.

@phoboslab
Copy link
Owner

There's some good ideas here and I enjoy this discussion (and the stories!). I just want to be clear about my intentions before anyone puts more work into "extensions" of the format:

For better or worse, I consider the QOI format to be finalized. I don't particularly like the idea of a versioned format or an extension. It was a conscious decision to not have a version number in the header. My goal was to have a simple format that always works. A QOI file is a QOI file is a QOI file.

Any improvements and extensions of QOI should be rolled into a format with a different name.

@Brian151
Copy link

@verdy-p
what i meant, is read/write the largest possible chunk of a bitfield of bits. that's not something that tends to be pretty or efficient. since computer memory is designed to access 8 bits minimum, it becomes challenging to deal with any values that don't align at 8-bit boundaries. 12bit byte-aligns at 24 and 48 bits. 10bit byte-aligns at 80 and 120 bits. you can cheat with 12 bit by reading or writing 32 to 64 bits from/to the buffer at a time. mind that writing could be more complex if there's other data in the way, whereas reading can elect to ignore it even existing. you could read/write 128 bits for 10bit, sure. but really, you'd be better off writing/stealing a general-purpose variable bitfield solution that might be useful for other projects down the line. wrote the decoding half for one of my projects, boy was that fun! still got to write encoding half... [plus all the other crap this project needs]

@phoboslab
sound reasoning tbh
alternatively, there's formats that allow extensions to just be added to the file directly
PNG and TIFF both work that way, RIFF was purpose-built to do this
and, it also gets messy!

i'm curious/anxious what the W3C would do to QOI if they actually consider adopting it, honestly...

@verdy-p
Copy link
Author

verdy-p commented Apr 15, 2022

So you think this image format is final. To really finalize it, you must still specify that it is not a general purpose format and that it MUST not be used for images that use colorspaces that are not RGB or RGBA, or whose R/G/B/A channels use more than 8 bits per channel, or for any image that uses 64 colors or less: all these images must use other formats.

And so you need to conclude that this is absolutely NOT a replacement for PNG, which support all these (including the support for ICC color profiles, HDR images, grayscale images. Even the old basic PBM formats used in Linux/Unix and X11, or BMP, ICO and CUR formats used in Windows will always be better (notably for icons or sets of icons where PNG is still an universal format). Note that the performance of PNG largely depends on the compression options used to create them: you are not required to use compression in PNG. As well TIFF formats are still perfect for lossless captures (and it also supports metadata for keeping copyright/licence/authors, and high bit-depths for color channels, up to 64 bits per channel, and compression is also optional if you want performance for encoding or decoding, depending on the I/O bandwidth capability of the storage or network or memory, for which a light compression may cost some CPU or GPU cycles but with large gain on the I/O bandwidth, meaning much less less frequent delays to wait for I/O completion)

@ratchetfreak
Copy link

and being so general with a bunch of option is what leads to those image formats to be a pain to decode and handle

QOI is meant from the start to be simple and straightforward and fast to decode. There have been some sarifices made for that purpose, like support for >8 bit color depth, and bad compression for partialy transparent gradients

@verdy-p
Copy link
Author

verdy-p commented Apr 16, 2022

TIFF is very fast to decode: given you can choose the format, you can perfectly use one of the options that will compress/decompress very fast and will also be lossless. With those options defined, you don't need lot of code.

Anyway, once you start adding support for QOIF, plus support for other formats that are also needed (notably PNG, GIF, JPEG or ICO) at the end you end with a large library, while in fact you could jsut select one universal format to support everything and benefit of a wide choice of images sizes and qualities and a large choice of image editors or generators. And almost all of them support the common PNG, GIF, GPEG and TIFF formats (and many also support PBM and windows BMP... not to be confused with WBMP for monochromatic images, and standardized by those that wrote the JPEG standard for ISO and used mostly for archiving scanned documents on mobile readers/tablets for e-books or for improving the quality and speed of transmission on modern faxes and scanners, or reducing the size of scans sent by emails, or used as an encoding option for PDFs; the ICO/CUR format is quite specific to Windows editors; the remaining common format for scanned documents and books is DjVu).

If you really need high speed of decoding in applications (for example textures used for 3D graphics in games, none of these formats are suitable: you'll better use the formats supported natively by graphics accelerators or their drivers, and definitely not QOIF which will use too much memory or bus bandwidth for textures). Given that display drivers are already extremely complex pieces of code, you don't need to add any other layer for images and texture, using a csutom code written for the CPU that won't benefit of the GPU acceleration and huge optimization efforts constantly made in the graphics driver to improve their performance: these drivers are already recognizing the most common formats like JPEG and PNG, and also allow uncompressed raw images (frame buffers, also used by accelerated video codecs): these drivers tend to be prefering image formats that are universal in terms of quality (resolution and color accuracy).

@ratchetfreak
Copy link

QOI has been finalized without any option for a version bump. So there is no chance that your proposal will get into the format

I'm glad you got the wake-up call that image compression isn't all that complicated, but QOI isn't the place for you to bikeshed about it.

However nobody is stopping you from making your own image format that does support all you want it to support, you can even base it on the QOI spec (being released as public domain).

@shuckster
Copy link

@verdy-p Check out this blog post by the author: https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-compression

Especially the "Why?" section.

I think it's very revealing about the intent of the format, despite it going through some changes since the post was written.

From what I have understood myself, QOI is for devs, not users, and serves as an example of how to achieve PNG-like compression in around 300 lines of code that can be grokked by practically anyone.

This is quite a different goal to being all things to all devices and users, but there's no reason why it couldn't serve as a basis for future formats that achieve that.

@verdy-p
Copy link
Author

verdy-p commented Apr 16, 2022

Then if you think about developers, this kinbd of research on a specialized type of images is too limited to be usable. Extensives searches have been since since long (remember my post above: I wrote my own image coder about 25 years ago, when all formats were proprietary, including GIF stil lcovered by patents, and JPEG did not even exist. And my goal was already to be efficient (because we needed low levels of energy and CPUs at that time were much slower than what they are today, and memory was also expensive and not as fast as they are today; let's not speak also about the network bandwidth: having connections at 64kbps only was still expensive on ISDN, and modems working on analog POTS line were cappping at much slower speeds at about 10kbps at best; cameras were also still all analog and using lot of energy, and needed AD converters and software-only solutions; their resolution were also limited due to limitations on old PCI buses, RAM and old PATA disks, but the way to maximize the I/O throughput already required performing at least some level of compression if we wanted to be lossless for images that were already limited in resolution/colors/framerate, just at the minimum already to be usable for the intended purpose, i.e. remote surveillance, so it was not acceptable to loose more details and we needed a way to capture images faster, and control remote cameras, i.e. zoom, focal length, exposure time and sensitivity, faster so that we could paliate these defects already limiting the captured "raw" images).
Image formats are always a tradeoff between multiple limiting factor, and "performance" cannot be measured just by the raw CPU speed , but also by how much other external resources we need (layered caches, memory, buses, storage, networking, and power usage). These tradeoffs will also need to evolve over time (JPEG for example is the result of an old tradeoff which may no longer be pertinent today: that's why we need open formats; PNG is also a step, just like MPEG, or WebM and other modern codecs used today; and let's not forget that we now have better hardwares, notably vectorized instruction sets in CPU, GPUs, and many decidated chips in cameras; and we are no longer limited at all in image editing applications: supporting more formats is not a problem and we have lot of good general purpose libraries with excellent and fast implementations, which are also largely integrated and supported by OSes and many devices: all those modern codecs are outperforming what is done in QOIF, even with its limited goal, and claims of "performance" are in fact wrong, largely biased by specific measurements for very limited usages).

@shuckster
Copy link

Are you arguing that QOI is useless because it will advance no further in this repo?

@phoboslab
Copy link
Owner

So you think this image format is final. To really finalize it, you must still specify that it is not a general purpose format and that it MUST not be used for images that use colorspaces that are not RGB or RGBA, or whose R/G/B/A channels use more than 8 bits per channel (...)

Specifying a thing by saying what it is not, is not a good way to specify anything.

MUST not be used (...) or for any image that uses 64 colors or less: all these images must use other formats.

You can use QOI to encode images with 64 or fewer colors. Could there be a more optimal encoding for very specific kinds of images: certainly.

(...) this kinbd of research on a specialized type of images is too limited to be usable.

The solution is then not to use it.

(...) all those modern codecs are outperforming what is done in QOIF, even with its limited goal,

But can they be implemented in 300 LOC? :^)

and claims of "performance" are in fact wrong, largely biased by specific measurements for very limited usages).

How so? The QOI benchmark is very explicit in what it is.

All in all you seem to dislike QOI. My question then is: what are you doing here?

@Brian151
Copy link

@phoboslab
it helps to specify both tbh
i have some stuff, myself, and it's as much about what it is as what it is not
don't wish to mis-lead people

i actually can think of some formats expressly designed for this. looking way back to retro formats is actually a good start. those had to be insanely efficient. technically were hardware-compatible for their days. modern hardware, ofc, stripped us of any ability to natively decode them. a real pity. software codecs for them aren't too bad, at least...

yep!
seems like many noteworthy projects decided it was worth its time
on that note, i'll be filing an issue and/or PR someday, myself...
as much as i like PNG, i don't like its complexity

they cannot. and i don't believe they're truly open, either. a part of being open is being open to suggestion [within reason]. you and i both seem to be aware how that goes with these large committees/consortiums, i've read your blog not only about this format, but your HTML5 endeavors.

can't entirely tell what the complaint is
indeed, why?

now, my question to you is, why don't you lock this issue, already?
seems it's run its course and is now just becoming a place for all to scream their heads off, instead of make any positive contributions to anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants