The final(?) specification #48

phoboslab · 2021-11-29T14:39:32Z

I want to apologize.

I may have been too quick with announcing the file format to be finished. I'm frankly overwhelmed with the attention this is getting. With all the implementations already out there, I thought it was a good idea to finalize the specification ASAP. I'm no longer sure if that was the right decision.

QOI is probably good enough the way it is now, but I'm wondering if there are things that could be done better — without sacrificing the simplicity or performance of this format.

One of these things is the fact that QOI_RUN_16 was determined to be pretty useless, and QOI could be become even simpler by just removing it. Maybe there's more easy wins with a different hash function or distributing some bits differently? I don't know.

At the risk of annoying everyone: how do you all feel about giving QOI a bit more time to mature?

To be clear, the things I'd be willing to discuss here are fairly limited:

I don't want more features (higher bit depth, custom headers, more meta info...).
I'm generally against any ideas that make the format more complex (e.g. mode switches, transforms into YUV colorspace...)
I don't want to make decoding of the chunks to be dependent on some information in the header (e.g. different behaviours for 3 or 4 channels)

What I'm looking for specifically is:

Changes that make the format simpler
Changes that would yield better performance when en-/decoding
Changes that improve compression without making QOI more complex

Should we set a deadline in 2-3 weeks to produce the really-final (pinky promise) specification? Or should we just leave it as it is?

Again, I'm very sorry for the confusing messaging!

Edit: Thanks for your feedback. Let's produce the final spec till 2021.12.20.

The text was updated successfully, but these errors were encountered:

ikskuh · 2021-11-29T15:02:33Z

Things where i see potential to improvement:

Improve the "hash" function by incorporating all bits. Currently, 0x00, 0x40, 0x80, 0xC0, 0xFF all map to index 0 as the hash ignores the 2 most significant bits
Removal of QOI_RUN_16 seems like a good idea after reading the issue, which means that QOI_RUN_8 gets one additional bit for encoding 1..64
Making a 1 run encode with QOI_INDEX and increasing QOI_RUN_8 to 2..65 is probably not so bad either
I would remove the color space field in the header, but i have no strong feelings here

Apart from that, i'm already super-happy with the format, as it is very cpu-friendly in terms of branching (my impl has 0.9% branch misses which surprised me) and it looks like people already made a QOI video stream to the oculus quest 1 with 50 FPS, so it seems to be usable for such heavy-load use cases as well

And i agree that the format doesn't need to get any new features. It's really good in what it has already

oscardssmith · 2021-11-29T15:32:49Z

IMO, QOI version 1 is a really good first version. I think it makes sense to wait for a month or so before committing to a second version, since we want to make sure there is time to find good ideas.

nigeltao/qoi2-bikeshed#14 has some really good analysis of opcode frequency which will be very useful for optimizing opcodes. While I understand why you don't want different 3 vs 4 channel behavior, I think we should look carefully at it. Separating them can give a pretty easy 10% increase in compression ratio because you simply have shorter opcodes, so you can fit more data in. This comes at a minor complexity cost, but I think it is somewhat offset by the fact that it simplifies the QOI_COLOR opcode. The 4 bits for which of RGBA to store is a lot of complexity that only gets used 0.7% of the time (other than for removing the alpha channel)

laurelkeys · 2021-11-29T15:39:01Z

I would remove the color space field in the header, but i have no strong feelings here

I think it's really nice to be able to differentiate between linear encoded channels vs sRGB encoded RGB channels + linear alpha channel (as this is the most common case for "images in the wild").

However, I'd like to point out that having the default 0x00 case mean all channels are "gamma compressed" -- including alpha -- is quite error-prone. For instance, see floooh/qoiview#3 (and the linked PR adding QOI support to tev for more information on this).

I'm not sure what's the best way to handle this. IMO, 0x00 should mean QOI_SRGB_LINEAR_ALPHA, but maybe it's easier/simpler to "swap" the current meaning of the colorspace bits such that a 0-bit indicates linear and a 1-bit indicates sRGB (?)

This would lead to 0x00 == QOI_LINEAR, 0x01 == QOI_SRGB and 0x0f == QOI_SRGB_LINEAR_ALPHA, signaling that the choice of sRGB was intentional (i.e. the qoi_desc struct wasn't simply zero-initialized).

oscardssmith · 2021-11-29T15:57:46Z

In terms of a better hash function, I think the best approach would be to just interpret the RGBA value as a 32 bit uint, and perform an xor and multiply by 2 32 bit prime numbers. This should give much better randomization at very low cost.

While we're improving the index, it also might make sense to try adding a 16 bit instruction that is a combination of index with delta (QOI_INDEX_DELTA?). You could use 4 bits for the tag, 6 bits to store the index, and have 6 bits left to store the lower 2 bits of RGB values. This could be encoded and decoded relatively easily by having a second table where values are hashed by their upper 6 bits, which would allow O(1) lookup at the cost of a little memory.

I think this instruction would be really valuable for things like photographs and dithered images where there is a little noise that will lower the number of exact matches. Currently for these types of images, we use a 32 bit QOI_COLOR tag roughly 14% of the time, and QOI_DIFF_24 roughly 12% of the time. For images without alpha, this tag would reduce those sizes by 50% and 33% when used, which would be a really big advantage.

ikskuh · 2021-11-29T16:49:24Z

While we're improving the index, it also might make sense to try adding a 16 bit instruction that is a combination of index with delta (QOI_INDEX_DELTA?). You could use 4 bits for the tag, 6 bits to store the index, and have 6 bits left to store the lower 2 bits of RGB values. This could be encoded and decoded relatively easily by having a second table where values are hashed by their upper 6 bits, which would allow O(1) lookup at the cost of a little memory.

This sounds very complex to me and even if it might improve the compression rate, the compression speed will heavily suffer from it, as you now have to search the index array

In terms of a better hash function, I think the best approach would be to just interpret the RGBA value as a 32 bit uint, and perform an xor and multiply by 2 32 bit prime numbers. This should give much better randomization at very low cost.

True, i think we could go with something easier though, as multiplication is still costly on lower-end hardware. Maybe an adopepted Pearson hashing for 6 bit with a LUT of 64 byte, but the LUT makes my performance drop sense tingle. I think doing some xor-shifting while not discarding some bits might already be sufficient

oscardssmith · 2021-11-29T17:12:02Z

you now have to search the index array

No you don't. In compression the code is roughly

unsigned int pxtrunc =  px.v & 0xfcfcfcfc;
index_pos = QOI_COLOR_HASH(pxtrunc) % 64;
if (index_trunc[index_pos].v == pxtrunc) {
    unsigned int diff = (px.v - pxtrunc);
    unsigned char low_bits = diff | 0x3 | ((diff | 0x300)>>6) | ((diff | 0x30000)>>12)
    bytes[p++] = QOI_INDEX_DELTA | (index_pos>>2)
    bytes[p++] = (index_pos<<4) | low_bits
}
else {
    index_trunc[index_pos] = pxtrunc;
}

The key here is that you store a second index where the values are hashed based on their upper 6 bits rather than the whole value. This keeps the lookup at O(1) at the expense of some extra memory.

phoboslab · 2021-11-29T18:33:15Z

Ok, I guess we still have some things to iron out. Let's set the deadline for the final spec to 2021.12.20.

Thanks for your patience <3

andrewmd5 · 2021-11-29T18:42:50Z

Was there a technical reason the spec chose to adopt big endian over little?

oscardssmith · 2021-11-29T19:00:17Z

I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance. Delata coding also does increase the size by 1 bit since you go from 0:255 to -255:255, which again is probably really bad. Tiling might be a good idea.

phoboslab · 2021-11-29T19:00:30Z

Was there a technical reason the spec chose to adopt big endian over little?

No. Big endian seemed like the right thing to do for a file format. I guess I'll have to revisit #36 - but there's tradeoffs either way.

separate channels
tiling
use delta

All interesting ideas, but the result would be a different file format. Outside of the scope of what I'm willing to change here.

QOI_INDEX_DELTA (...) this tag would reduce those sizes by 50% and 33%

How did you arrive at these numbers? I did a quick an dirty test and got a ~3% smaller file size for the kodak images.

oscardssmith · 2021-11-29T19:05:05Z

for QOI_INDEX_DELTA, I just meant that when it's applicable, it replaces a 3 or 4 byte value with a 2 byte value, not that that would be the effect over the whole image. For the test, did you use the current r^g^b^a or one of the more complicated hashes described above?

oscardssmith · 2021-11-29T19:07:26Z

It does expand the file though. See my above comment.

gamblevore · 2021-11-29T19:11:01Z

I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance. Delata coding also does increase the size by 1 bit since you go from 0:255 to -255:255, which again is probably really bad. Tiling might be a good idea.

No... thats not how it works. In theory, yes, but in practice no. I guess it wraps around. You just consider that from 255 to 0 you add 1. Or from 0 to 255 you subtract 1. So its like a loop of 0-255 values. Imagine them wrapped around a tube.

phoboslab · 2021-11-29T19:18:22Z

For the test, did you use the current r^g^b^a or one of the more complicated hashes described above?

Since the lower 2bits are omitted for the trunc_index I did:
trunc_index_pos = (px.rgba.r >> 2) ^ (px.rgba.g >> 2) ^ (px.rgba.b >> 2) ^ (px.rgba.a >> 2);

oscardssmith · 2021-11-29T19:28:43Z

Can you try a version with trunc_index_pos = 0x1071b495*(px.v & 0x92458355)+0xcb533df9;? I think it will give much better mixing. (Similarly, testing index_pos = 0x92458355*px.v+0xcb533df9; would be interesting). The magic constants are randomly chosen 32 bit primes.

gamblevore · 2021-11-30T09:14:25Z

I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance.

How?

normally this: RGBA,RGBA
now this: RR, GG, BB, AA

8 bytes in both approaches. Where is the loss of byte alignment?

ikskuh · 2021-11-30T09:39:16Z

normally this: RGBA,RGBA
now this: RR, GG, BB, AA

This means you have to sweep the output memory four times instead of one time, hurting your cache locality a lot. One of the reasons why QOI is so fast is that it will touch every memory cell in both input and output exactly once. If we change this, we will get huge performance losses which aren't recoverable by any "win" in the compression rate

gamblevore · 2021-11-30T13:03:55Z

Decompress rate is more important usually. Compress rate should still be fast enough. An operation as "channel splitting" could work at near memcpy speeds.


// restrict means the pointers dont overlap, so the compiler is free to do more opts
void channel_split(uint* restrict Px, int n, u8* restrict R, u8* restrict G, u8* restrict B, u8* restrict A ) {
    while (n-->0) {
        uint P = *Px++;
        *R++ = P;
        *G++ = P>>8;
        *B++ = P>>16;
        *A++ = P>>24;
    }
}

A good compiler schedules RGBA writes, so they all happen in the same cycle. Also would unroll the loop.

You won't lose much more speed than an memcpy would have lost you.

slembcke · 2021-11-30T19:17:44Z

Letting it mature sounds like a good idea I guess. Though on the other hand, even if you had 3 different revisions of the format it would still be relatively simple. ;)

Maybe the best way to go about it is to encourage people to use it as a simple intermediate format for now and NOT a long term storage format? Undoubtedly there will be subtle issues that will change your mind about how qoi should work, or even what it is. You've obviously found a nice local maxima in the design space, give it a little time to climb all the way to the top. :)

I don't personally care about compatibility because I'm just using it as an intermediate format. Qoi's defining trait that made me say "why not?" was it's tiny, hackable implementation. It was one liner make rule to drop it into my asset pipeline, and a nearly 1:1 replacement for stb_image with some (mild) benefits. Why not indeed. Being easily hackable, I even pushed the alpha pre-multiplication into the encoder and don't have to care whether or not it's considered "standard" or not. (Though I am curious if there's a technical reason for that beyond "keep it simple", which is a good reason)

mgmalheiros · 2021-11-30T19:47:15Z

As a humble suggestion to better define the scope of this format, perhaps it could be restated as:

"a simple and fast run-length encoding of byte groups"

Thus, if a group is actually a RGB triplet, then it is outside of the format scope to specify the colorspace (that's semantic info which should be somewhere else). Likewise, if the groups are RGBA quads, the actual alpha format is up to the application using it.

Following my naïve interpretation, there are only three info fields needed in the header besides the magic: width, height and channels.

I believe these are useful as a convenience (when holding images, you got the resolution and channels) but actually required when pre-allocating the buffer where decoding will happen. And the number of channels is needed as the "byte group" size for hashing/encoding/decoding.

Therefore, covering the 1 and 2-channel cases could be both very simple to support and useful, as the format would be able to hold grayscale images, indexed pixelart (actual palette info is outside the scope), tilemap info and 2-channel textures (common in certain Physically-Based Rendering pipelines).

Just my two cents.

phoboslab · 2021-11-30T21:05:33Z

In the experimental branch I have removed QOI_RUN_16 and added a new operation, based on the idea that changes in luma would affect all three RGB channels in the same direction: QOI_GDIFF_16 (needs a better name).

 - QOI_GDIFF_16 ------------------------------------
|         Byte[0]         |         Byte[1]         |
|  7  6  5  4  3  2  1  0 |  7  6  5  4  3  2  1  0 |
|-------------+--------+-------------------+--------|
|  1  1  0  1 |  dr-dg |    green diff     |  db-dg |

4-bit tag b1101
3-bit   red channel difference minus green channel difference -4..3
6-bit green channel difference from the previous pixel -32..31
3-bit  blue channel difference minus green channel difference -4..3

The green channel is used to indicate the general direction of change and gets
a few more bits. dr and db base their diffs off of the green channel diff. E.g.
  dr = (last_px.r - cur_px.r) - (last_px.g - cur_px.g)

Encoded sizes (avg) kb

	master	experimental
wallpaper	10640	10170
kodak	771	700
misc	400	407
textures	184	179
screenshots	2582	2491

Screenshots and misc suffer a bit from the removal of QOI_RUN_16. Sadly, misc also doesn't gain much from this new op. It seems to be best for photos or "natural" images.

I have also aligned the chunk prefixes so that they are either 2-bit or 4-bit. No more 3-bit codes. This could probably lead to some performance improvement for the decoder.

#define QOI_INDEX     0x00 // 00xxxxxx
#define QOI_RUN       0x40 // 01xxxxxx
#define QOI_DIFF_8    0x80 // 10xxxxxx
#define QOI_DIFF_16   0xc0 // 1100xxxx
#define QOI_GDIFF_16  0xd0 // 1101xxxx
#define QOI_DIFF_24   0xe0 // 1110xxxx
#define QOI_COLOR     0xf0 // 1111xxxx

Note that I re-aranged the if statements in the encoding function to make it easier to experiment, but it makes encoding a bit slower.

I think that's an easy win!? The new operation is very easy to implement with just two more subtractions.

A rather unsuccessful experiment was to encode the "acceleration" of change for each channel instead of the "velocity" (diff) of change. It helps, but not much. QOI_GDIFF yielded far better results.

We also really need a better test suite of images. As noted elsewhere, the only images with an alpha channel are the few ones in misc/. But I also believe that dice.png and fish.png are rather "artificial" examples. I would assume that most textures used in games (and also most use-cases for PNG on the web) have sharper edges for alpha, rather than this semi transparent gradient presented in those two files.

Anyway, I probably won't have time to work much on this in the next few days, but I'm hyped to get back to it :)

oscardssmith · 2021-11-30T21:17:12Z

If I made a PR to experimental to update the QOI_RUN_8 to an implementation more like #41 where repeated RUN_8 instructions are used more efficiently, would you be willing to merge it? Doing so should be a notable speedup for decompression, and give better file size.

nigeltao · 2021-12-01T00:23:03Z

In the experimental branch I have removed QOI_RUN_16 and added a new operation, based on the idea that changes in luma would affect all three RGB channels in the same direction: QOI_GDIFF_16 (needs a better name).

If we're throwing experiments out there, see also nigeltao/qoi2-bikeshed#20

toddjonker · 2021-12-01T02:18:24Z

This sounds very complex to me and even if it might improve the compression rate, the compression speed will heavily suffer from it, as you now have to search the index array

It sounds attractive to have a simple-to-understand and fast-to-decode format that allows for more complexity (and innovation) on the encoder side. If a clever encoder wants to spend a lot of wall-clock time to get great compression, that doesn't harm the simplicity of the standard format.

amber8706 · 2021-12-01T06:30:43Z

Hi! I like the idea of this format. I would like to see an optional section for metadata as part of the format. In general terms, it would be great to store additional information about the image into format. I consider the metadata section as optional in specification. The "general" metadata properties could be the following: date and time of image creation, location, author, general description, etc. In addition to the "general" metadata, it should be possible to add any other metadata in the "key=value" format.
If the metadata section is too large, it should be possible to compress it (for example, using the LZMA method or PPM etc.)
I repeat: the metadata section is optional. The operation of the QOI format should not depend on it.

notnullnotvoid · 2021-12-01T07:53:26Z

If you need real-world alpha textures for the benchmark suite, I have a whole game's worth of high-res painted alpha sprites you could pick through and freely use. The slowness of the PNG format was a huge impediment in our asset pipeline, so having a fast format that performs reasonably well on sprites with transparency is something I'm quite personally invested in. Let me know and I can send a big zip of the relevant images.

rmn20 · 2021-12-01T08:02:07Z

I think that abandoning 3-bit indexes can be a bad move, as it reduces deltas size in DIFF_16, which is one of the most used opcodes at the moment. (Maybe GDIFF_16 can work even better with 544 DIFF_16?). Another way might be to remove DIFF_16 at all and enlarge the GDIFF_16 deltas to 464, 473 or 373 this way.

wbd73 · 2021-12-18T07:30:52Z

@phoboslab
It looks like this simpler hash function

#define QOI_COLOR_HASH(C) (C.rgba.r + C.rgba.g*3 + C.rgba.b*5 + C.rgba.a)

performs almost as good when I try.
In case where multiplication is costly, it could be calculated like

#define QOI_COLOR_HASH(C) (((C.rgba.g + C.rgba.b) << 2) + C.rgba.r - C.rgba.g + C.rgba.b + C.rgba.a)

but also when working with a lookup table it is faster since you only have to lookup two values instead of four.

wbd73 · 2021-12-18T17:00:31Z

At the moment the implementation mentions

The 8-bit tags have precedence over the 2-bit tags. A decoder must check for the presence of an 8-bit tag first.

Since the format will not change when finalized, would you consider to change this requirement into

The 8-bit tags have precedence over the 2-bit QOI_OP_RUN tag. A decoder must check for the presence of an 8-bit tag before checking the QOI_OP_RUN tag.

I can't see a problem with checking the other 2-bit tags before the 8-bit ones.
It would give some more freedom when writing a decoder.
If you optimize for speed, starting with the most common tags might help.

phoboslab · 2021-12-18T17:19:18Z

I don't understand. The only tag that can collide with QOI_OP_RGB and QOI_OP_RGBA is the QOI_OP_RUN, so you're already free to do that.

nigeltao · 2021-12-19T00:20:47Z

Copy/pasting from nigeltao/qoi2-bikeshed#28 "Demo 20: a Proof of Concept"

Starting from commit aefa0f7 (2021-12-17) from the phoboslab/master branch, qoi-op-a adds an QOI_OP_A (like the old QOI_OP_COLOR(A)), replacing "QOI_OP_RUN with run length 62".

qoi-demo20 builds on that by:

using a FIFO color cache (instead of a hashed color cache), like qoi-fifo2e (Hash function nigeltao/qoi2-bikeshed#19 (comment)), except that we get all of the encoding speed back by re-introducing hashing just for the encoder. Different encoder implementations can bikeshed on the particular hash function as much as they want, but hashing is kept out of the file format specification (simpler) and out of the decoder implementation (faster). No SIMD required.
the color cache's initial state was tweaked to include opaque black (consistent with the px_prev initial value, a la Inital state of index array is undocumented #75 (comment)). "QOI_OP_RUN with run length 1" is now redundant (there's always an equivalent QOI_INDEX op), and while it's not in qoi-demo20, we could re-assign the opcode (or bump run lengths by 1).
various optimizations like qoi-demo2e (Recent changes in experimental branch #71 (comment)), an optimized version (with no file format changes) of phoboslab/master commit 2ee2169.

The images test suite is the 'official' one from #69 and images/icon_512 within that is highlighted for decoder/encoder speed gains on top of phoboslab/master. The images-lance test suite comes from #69 (comment) and qoi-demo20 shows a 1.13x compression ratio gain (qoi-demo21 shows 1.22x) on top of phoboslab/master.

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

## Total for images/icon_512
qoi-demo2e:   0.9         1.4        300.26        183.00        85    8.4%
qoi-aefa0f7:  1.6         1.6        165.18        167.23        85    8.4%
qoi-op-a:     1.4         1.2        180.92        213.11        79    7.8%
qoi-demo20:   0.8         1.2        322.69        227.86        79    7.8%
qoi-demo21:   0.9         1.3        290.54        197.44        77    7.6%

# Grand total for images
qoi-demo2e:   3.8         5.3        121.20         87.21       463   25.6%
qoi-aefa0f7:  4.8         5.6         97.28         83.08       463   25.6%
qoi-op-a:     4.7         4.8         98.25         96.80       462   25.5%
qoi-demo20:   3.6         4.9        127.71         95.65       459   25.4%
qoi-demo21:   3.8         4.9        123.34         95.26       458   25.3%

# Grand total for images-lance
qoi-demo2e:  26.9        30.8        151.25        131.97      2109   13.3%
qoi-aefa0f7: 35.4        32.4        114.61        125.29      2109   13.3%
qoi-op-a:    35.8        29.1        113.38        139.46      1862   11.7%
qoi-demo20:  27.1        28.2        150.12        143.97      1859   11.7%
qoi-demo21:  27.7        31.4        146.73        129.50      1722   10.9%

qoi-demo20.h.txt

Edit: added qoi-demo21 per comment below.
qoi-demo21.h.txt

notnullnotvoid · 2021-12-19T04:22:25Z

That's brilliant! I see absolutely no reason not to adopt the fifo index queue now.

However, I would strongly advocate for using QOI_OP_DIFF2 and QOI_OP_DIFF4 as suggested here instead of (or in addition to?) QOI_OP_A, since they get consistently better compression.

Here is a summary comparison of the different proposals for handling alpha:

## Total for images/icon_512
        decode ms   encode ms   decode mpps   encode mpps   size kb   vs rgba   vs raw   vs disk
qoi           0.8         1.1        315.22        243.74        85      8.4%     8.4%     1.67x
op_a          0.8         1.1        322.02        240.00        79      7.8%     7.8%     1.56x
qoi4          0.8         1.1        322.12        243.41        80      7.8%     7.8%     1.57x
qoi24         0.9         1.1        304.68        238.29        78      7.6%     7.6%     1.53x

# Grand total for images
        decode ms   encode ms   decode mpps   encode mpps   size kb   vs rgba   vs raw   vs disk
qoi           2.6         3.1        180.24        149.26       463     25.6%    28.2%     1.18x
op_a          2.5         3.1        185.55        149.51       462     25.5%    28.1%     1.18x
qoi4          2.6         3.1        178.13        148.26       462     25.5%    28.1%     1.18x
qoi24         2.7         3.2        172.95        146.52       462     25.5%    28.1%     1.18x

# Grand total for images-lance
        decode ms   encode ms   decode mpps   encode mpps   size kb   vs rgba   vs raw   vs disk
qoi          16.8        20.8        242.42        195.31      2109     13.3%    13.6%     1.49x
op_a         17.5        21.3        232.59        190.84      1862     11.7%    12.0%     1.31x
qoi4         17.2        21.4        236.48        189.84      1857     11.7%    11.9%     1.31x
qoi24        18.2        21.9        222.68        185.19      1786     11.3%    11.5%     1.26x

qoi is the current implementation, op_a uses the proposed QOI_OP_A, qoi4 uses only the proposed QOI_OP_DIFF4, and qoi24 uses both QOI_OP_DIFF2 and QOI_OP_DIFF4.

wbd73 · 2021-12-19T05:38:53Z

@phoboslab

I don't understand. The only tag that can collide with QOI_OP_RGB and QOI_OP_RGBA is the QOI_OP_RUN, so you're already free to do that.

Probably just a misunderstanding. I was under the impression any decoder had to follow the direction mentioned in the original source code to first check for all 8 bit tags. If it's okay to deviate from that it's fine.

nigeltao · 2021-12-19T09:52:05Z

However, I would strongly advocate for using QOI_OP_DIFF2 and QOI_OP_DIFF4

I have updated my previous post to do that (called qoi-demo21).

phoboslab · 2021-12-19T10:00:09Z

I'm sorry, but I don't like any of these tradeoffs. I believe QOI is at a very sweet spot right now. The compression gains shown don't justify the added complexity.

Pending any last minute emergencies, the current spec will be the final one.

chocolate42 · 2021-12-19T10:27:51Z

Nigeltao's FIFO reduces decode complexity, increases compression, and barely increases encode complexity.

Wulf0x67E7 · 2021-12-19T10:34:16Z

@phoboslab What complexity? QOI_OP_A, QOI_OP_DIFF2, and QOI_OP_DIFF4 are all extremely simple and straightforward ops, and the implementation-cached FIFO index actually removes complexity from the spec.

On top of that, they increase the compression rate of the type of images qoi struggles the most with, those with complex alpha, by almost 20% ((1-1722/2109)*100), which is most definitely significant!

wbd73 · 2021-12-19T10:37:32Z

Nigeltao's FIFO reduces decode complexity, increases compression, and barely increases encode complexity.

That was also the one I hoped would make it.
I thought it was a great idea to use a FIFO queue with a hash for encoding only.
On the other hand I also understand you don't want to make significant changes a day before finalizing a file format.

HappySeaFox · 2021-12-19T10:45:57Z

Why not to move the deadline then? 😀 Imho it's totally ok. Of course it's up to @phoboslab

phoboslab · 2021-12-19T10:50:49Z

FIFO doubles the compression time and puts more burden on any implementation trying to get that time back. I like the fact that even a "naive" implementation of QOI is still a good one.

I will not have a discussion about more OPs, sorry.

Feels like we're back at last month, where I piss of a bunch of people with very cool ideas. But this time I have a bit more confidence to say: I'm happy with the current state of QOI and all the things we learned should be rolled into a new format.

chocolate42 · 2021-12-19T10:56:53Z

The FIFO we are discussing does not double the compression time, you are misunderstanding what the recent development as of demo20 does.

phoboslab · 2021-12-19T11:17:17Z

The "naive" implementation did double compression time.

But I have to admit, not having to specify any hash function is enticing... I will do some testing!

chocolate42 · 2021-12-19T11:23:08Z

Here's my simple proposal refined with demo20's FIFO:

# Grand total for qoi_benchmark_suite/images
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         6.558      77.415         70.78          6.00       398   24.2%
stbi:           6.498      57.981         71.43          8.01       561   34.2%
qoi-master3a:   2.140       2.619        216.94        177.20       463   28.2%
qoi-propA-fifo: 2.075       2.671        223.75        173.81       446   27.2%

By reworking existing ops slightly and adding a single op (a 3 byte RGB encoding that often means we avoid a 4 byte QOI_OP_RGB encoding), compression is significantly improved, decoding is slightly quicker, and encode is slightly slower.

#define QOI_OP_RGB3   0x00 /* 00xxxxxx */
#define QOI_OP_DIFF   0x40 /* 01xxxxxx */
#define QOI_OP_LUMA   0x80 /* 10xxxxxx */
#define QOI_OP_INDEX  0xc0 /* 110xxxxx */
#define QOI_OP_RUN    0xe0 /* 111xxxxx */
#define QOI_OP_RGB    0xfe /* 11111110 */
#define QOI_OP_RGBA   0xff /* 11111111 */

#define QOI_COLOR_HASH_SIZE 1024
#define QOI_COLOR_HASH(C) ((C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11) & 1023)

For more details: #91

notnullnotvoid · 2021-12-19T11:35:27Z

@phoboslab

An exhaustive search of the FIFO does slow down compression, but you don't need to do an exhaustive search. You can gain all that speed back by using a hash, much like what the current version does. But with the FIFO, that extra bit of complexity is optional instead of mandatory, and the decoder never sees it. I would agree that the FIFO index queue removes complexity overall.

As for the new opcodes, while they do add complexity, they are all pretty straightforward and don't change anything about how the format already works. And if adding all three seems like too much, there are other options. While personally I'd love to have them all, diff4 + diff2, only diff4, or even only op_a (though I do think diff4 is better if you're looking for just one) are all reasonable and significantly better than having nothing at all for varying alpha. The inability to compress the alpha channel has been a major blind spot of the format imho, and it's a shame to not address that in any way when there are plenty of good options available. Speaking personally, the ability of the format to handle varying alpha is one of the biggest factors in my ability to make use of it.

It's disappointing to hear that you aren't willing to consider changes and would suggest forking the format instead. I don't want to deal with a fractured format, so that doesn't sound like an appealing option to me.

However you slice it, the current proposal is simpler than the first version of the format that you presented a month ago, which you rightly considered very simple at the time. If the format was simple enough back then, there's no question that it's simple enough with these changes too.

phoboslab · 2021-12-19T14:25:34Z

More FIFO discussion here: #94

@notnullnotvoid yeah, 20% improvement sounds nice, but does it really matter in practice if your images are highly compressed to 13.6% or to 11.5%? If it does, you should consider a different image format.

Also, I'm not proposing to "fork" QOI. I'm trying hard to establish only true standard for QOI here. An improved format should have a different name (QGI or QOI2, whatever) and should implement all the things that QOI does not: allowing restarts for parallel processing, block based encoding, maybe a color transform to YUV and then some more OPs.

notnullnotvoid · 2021-12-19T15:52:49Z

@notnullnotvoid yeah, 20% improvement sounds nice, but does it really matter in practice?

Yes, it does. If such a simple change led to such huge gains only for opaque images, you'd adopt it without hesitation. Why you have so much disdain for people working with transparent images, I genuinely do not understand. I am reaching out to you in the first place from a fairly generous position of compromise. I was careful to make sure that the fix I proposed was extremely simple, did not change any of what already works well in the format, and could be implemented without noticeably impacting ratio or throughput for opaque images. The format privileges opaque images over transparent ones to a pretty extreme degree, and to be honest, after these changes it still does - just less so, enough to make it more bearable. The fact that it's even possible to get such huge wins by devoting less than 1% of the opcode space to one or two multi-byte encodings is evidence enough of the severity of the problem. All I am asking you to do is make a small addition to alleviate the most glaring weak point in the format, while you still have the opportunity to do so. Seriously, I am trying to help you out here. Please let me.

phoboslab · 2021-12-19T18:54:24Z

I'm not sure if I have to say this, but I have nothing against images with an alpha channel or people using such images.

I'm just considering the tradeoffs here and for QOI I'm very heavily leaning towards simplicity. We already have too many complicated image formats out there. There were countless other proposals that I rejected on those grounds. So please don't take it personal.

My suggestion to look into another image format was an honest one. If you have a lot of semi-transparent images, QOI may not be the best choice. As you said yourself, even with those proposed fixes QOI still performs badly when dealing with complicated alpha channels.

DanielMagen · 2021-12-19T19:15:48Z

@phoboslab I believe that what made this format gain so much traction in the first place is its simplicity and speed because people do look for such things. there would also always be room for improvement.

as I see it, there are 3 choices

sacrifice more of the simplicity and speed of qoi to get better compression.
I don't think you would ever beat png with a simple approach. I believe someone can conjure-up/find some corner cases that qoi does badly on and png developers took into account. those would probably not be common in practice, but still.
sacrifice some of qoi current compression to get more simplicity. personally, I'm biased in favor of the original xor hash, it gave such good performances compared to its simplicity. I also think that qoi can inspire more simple-but-actually-really-good formats, and sacrificing simplicity reduces the chances for that. but maybe I'm just a bit too "romantic" about this.
keep qoi as it is now. your'e about 16-20% worse than png but so so much faster. its a really cool format as it is.
I personally would use qoi if it gets the right amount of support. sacrificing a bit of disk space to get more speed is a trade I'm willing to make.

phoboslab · 2021-12-19T19:25:39Z

@DanielMagen as you can imagine, I agree that QOI got the attention because of its simplicity. I mean, FLIF is awesome in terms of compression, but as far as I can tell there's exactly one implementation of it.

In regards to 2.: I guess the proposal of just using the color hash as a FIFO (#94) would kindle your romance?

DanielMagen · 2021-12-19T19:35:42Z

naivly, it looks to me like you're trying to squeeze the last bits from the cache table and sacrificing simplicity a lot for it. but I'll let other people choose if it's a valid direction.

phoboslab · 2021-12-19T19:48:11Z

Sorry, I'm not sure how to read your comment. You mean the FIFO sacrifices simplicity?

but I'll let other people choose if it's a valid direction.

That'll be me, I guess. And I've been staring it this whole thing for too long...

DanielMagen · 2021-12-19T20:03:28Z

sorry I'll make myself more clear.
using the cache as a hash table gives very good results. changing it to fifo would not be simpler to implement and would improve compression by only a tiny bit which I think is negligible.

the "highest hit to simplicity" I see is that to give a good (fast) implementation of qoi with the fifo approach, one would have to use the hash trick you used. so even if the spec is simplified, in practice, this would add more code to any implementation with negligible effects to compression. this would also add more burden to someone trying to implement qoi in the form of needing to read/come-up(if the hash trick is not specified) with the hash/fifo idea and choose a good hash.

swarnimarun · 2022-04-07T22:32:02Z

Just to be clear, the spec is now locked/fixed/unchanging(atleast anything major) right?

I wanted to pitch an implementation to be merged in production software. Just wanted to know where exactly things stand. :)

oof dammit, saw the other issue about 30 seconds later.
#95

Yeah anyone else who finds themselves here, atleast at the moment things seem fixed. So ciao.

phoboslab mentioned this issue Nov 29, 2021

The QOI File Format Specification #37

Closed

phoboslab pinned this issue Nov 29, 2021

nigeltao mentioned this issue Nov 29, 2021

Hash function nigeltao/qoi2-bikeshed#19

Open

phoboslab mentioned this issue Dec 19, 2021

Color index array as FIFO #94

Closed

phoboslab unpinned this issue Dec 19, 2021

nigeltao mentioned this issue Dec 20, 2021

Demo 20: a Proof of Concept nigeltao/qoi2-bikeshed#28

Open

phoboslab closed this as completed Aug 5, 2022

The final(?) specification #48

The final(?) specification #48

Comments

phoboslab commented Nov 29, 2021 • edited Loading

ikskuh commented Nov 29, 2021

oscardssmith commented Nov 29, 2021

laurelkeys commented Nov 29, 2021

oscardssmith commented Nov 29, 2021

ikskuh commented Nov 29, 2021

oscardssmith commented Nov 29, 2021 • edited Loading

phoboslab commented Nov 29, 2021

andrewmd5 commented Nov 29, 2021 • edited Loading

oscardssmith commented Nov 29, 2021

phoboslab commented Nov 29, 2021

oscardssmith commented Nov 29, 2021

oscardssmith commented Nov 29, 2021

gamblevore commented Nov 29, 2021 • edited Loading

phoboslab commented Nov 29, 2021

oscardssmith commented Nov 29, 2021

gamblevore commented Nov 30, 2021

ikskuh commented Nov 30, 2021

gamblevore commented Nov 30, 2021 • edited Loading

slembcke commented Nov 30, 2021

mgmalheiros commented Nov 30, 2021

phoboslab commented Nov 30, 2021 • edited Loading

oscardssmith commented Nov 30, 2021

nigeltao commented Dec 1, 2021

toddjonker commented Dec 1, 2021

amber8706 commented Dec 1, 2021

notnullnotvoid commented Dec 1, 2021

rmn20 commented Dec 1, 2021 • edited Loading

wbd73 commented Dec 18, 2021

wbd73 commented Dec 18, 2021 • edited Loading

phoboslab commented Dec 18, 2021

nigeltao commented Dec 19, 2021 • edited Loading

notnullnotvoid commented Dec 19, 2021

wbd73 commented Dec 19, 2021

nigeltao commented Dec 19, 2021

phoboslab commented Dec 19, 2021

chocolate42 commented Dec 19, 2021

Wulf0x67E7 commented Dec 19, 2021

wbd73 commented Dec 19, 2021

HappySeaFox commented Dec 19, 2021

phoboslab commented Dec 19, 2021

chocolate42 commented Dec 19, 2021

phoboslab commented Dec 19, 2021

chocolate42 commented Dec 19, 2021

notnullnotvoid commented Dec 19, 2021 • edited Loading

phoboslab commented Dec 19, 2021

notnullnotvoid commented Dec 19, 2021

phoboslab commented Dec 19, 2021

DanielMagen commented Dec 19, 2021 • edited Loading

phoboslab commented Dec 19, 2021

DanielMagen commented Dec 19, 2021 • edited Loading

phoboslab commented Dec 19, 2021

DanielMagen commented Dec 19, 2021

swarnimarun commented Apr 7, 2022 • edited Loading

phoboslab commented Nov 29, 2021 •

edited

Loading

oscardssmith commented Nov 29, 2021 •

edited

Loading

andrewmd5 commented Nov 29, 2021 •

edited

Loading

gamblevore commented Nov 29, 2021 •

edited

Loading

gamblevore commented Nov 30, 2021 •

edited

Loading

phoboslab commented Nov 30, 2021 •

edited

Loading

rmn20 commented Dec 1, 2021 •

edited

Loading

wbd73 commented Dec 18, 2021 •

edited

Loading

nigeltao commented Dec 19, 2021 •

edited

Loading

notnullnotvoid commented Dec 19, 2021 •

edited

Loading

DanielMagen commented Dec 19, 2021 •

edited

Loading

DanielMagen commented Dec 19, 2021 •

edited

Loading

swarnimarun commented Apr 7, 2022 •

edited

Loading