Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The final(?) specification #48

Closed
phoboslab opened this issue Nov 29, 2021 · 152 comments
Closed

The final(?) specification #48

phoboslab opened this issue Nov 29, 2021 · 152 comments

Comments

@phoboslab
Copy link
Owner

phoboslab commented Nov 29, 2021

I want to apologize.

I may have been too quick with announcing the file format to be finished. I'm frankly overwhelmed with the attention this is getting. With all the implementations already out there, I thought it was a good idea to finalize the specification ASAP. I'm no longer sure if that was the right decision.

QOI is probably good enough the way it is now, but I'm wondering if there are things that could be done better — without sacrificing the simplicity or performance of this format.

One of these things is the fact that QOI_RUN_16 was determined to be pretty useless, and QOI could be become even simpler by just removing it. Maybe there's more easy wins with a different hash function or distributing some bits differently? I don't know.

At the risk of annoying everyone: how do you all feel about giving QOI a bit more time to mature?

To be clear, the things I'd be willing to discuss here are fairly limited:

  • I don't want more features (higher bit depth, custom headers, more meta info...).
  • I'm generally against any ideas that make the format more complex (e.g. mode switches, transforms into YUV colorspace...)
  • I don't want to make decoding of the chunks to be dependent on some information in the header (e.g. different behaviours for 3 or 4 channels)

What I'm looking for specifically is:

  • Changes that make the format simpler
  • Changes that would yield better performance when en-/decoding
  • Changes that improve compression without making QOI more complex

Should we set a deadline in 2-3 weeks to produce the really-final (pinky promise) specification? Or should we just leave it as it is?

Again, I'm very sorry for the confusing messaging!

Edit: Thanks for your feedback. Let's produce the final spec till 2021.12.20.

@ikskuh
Copy link

ikskuh commented Nov 29, 2021

Things where i see potential to improvement:

  • Improve the "hash" function by incorporating all bits. Currently, 0x00, 0x40, 0x80, 0xC0, 0xFF all map to index 0 as the hash ignores the 2 most significant bits
  • Removal of QOI_RUN_16 seems like a good idea after reading the issue, which means that QOI_RUN_8 gets one additional bit for encoding 1..64
  • Making a 1 run encode with QOI_INDEX and increasing QOI_RUN_8 to 2..65 is probably not so bad either
  • I would remove the color space field in the header, but i have no strong feelings here

Apart from that, i'm already super-happy with the format, as it is very cpu-friendly in terms of branching (my impl has 0.9% branch misses which surprised me) and it looks like people already made a QOI video stream to the oculus quest 1 with 50 FPS, so it seems to be usable for such heavy-load use cases as well

And i agree that the format doesn't need to get any new features. It's really good in what it has already

@oscardssmith
Copy link

IMO, QOI version 1 is a really good first version. I think it makes sense to wait for a month or so before committing to a second version, since we want to make sure there is time to find good ideas.

nigeltao/qoi2-bikeshed#14 has some really good analysis of opcode frequency which will be very useful for optimizing opcodes. While I understand why you don't want different 3 vs 4 channel behavior, I think we should look carefully at it. Separating them can give a pretty easy 10% increase in compression ratio because you simply have shorter opcodes, so you can fit more data in. This comes at a minor complexity cost, but I think it is somewhat offset by the fact that it simplifies the QOI_COLOR opcode. The 4 bits for which of RGBA to store is a lot of complexity that only gets used 0.7% of the time (other than for removing the alpha channel)

@laurelkeys
Copy link

  • I would remove the color space field in the header, but i have no strong feelings here

I think it's really nice to be able to differentiate between linear encoded channels vs sRGB encoded RGB channels + linear alpha channel (as this is the most common case for "images in the wild").

However, I'd like to point out that having the default 0x00 case mean all channels are "gamma compressed" -- including alpha -- is quite error-prone. For instance, see floooh/qoiview#3 (and the linked PR adding QOI support to tev for more information on this).

I'm not sure what's the best way to handle this. IMO, 0x00 should mean QOI_SRGB_LINEAR_ALPHA, but maybe it's easier/simpler to "swap" the current meaning of the colorspace bits such that a 0-bit indicates linear and a 1-bit indicates sRGB (?)

This would lead to 0x00 == QOI_LINEAR, 0x01 == QOI_SRGB and 0x0f == QOI_SRGB_LINEAR_ALPHA, signaling that the choice of sRGB was intentional (i.e. the qoi_desc struct wasn't simply zero-initialized).

@oscardssmith
Copy link

In terms of a better hash function, I think the best approach would be to just interpret the RGBA value as a 32 bit uint, and perform an xor and multiply by 2 32 bit prime numbers. This should give much better randomization at very low cost.

While we're improving the index, it also might make sense to try adding a 16 bit instruction that is a combination of index with delta (QOI_INDEX_DELTA?). You could use 4 bits for the tag, 6 bits to store the index, and have 6 bits left to store the lower 2 bits of RGB values. This could be encoded and decoded relatively easily by having a second table where values are hashed by their upper 6 bits, which would allow O(1) lookup at the cost of a little memory.

I think this instruction would be really valuable for things like photographs and dithered images where there is a little noise that will lower the number of exact matches. Currently for these types of images, we use a 32 bit QOI_COLOR tag roughly 14% of the time, and QOI_DIFF_24 roughly 12% of the time. For images without alpha, this tag would reduce those sizes by 50% and 33% when used, which would be a really big advantage.

@ikskuh
Copy link

ikskuh commented Nov 29, 2021

While we're improving the index, it also might make sense to try adding a 16 bit instruction that is a combination of index with delta (QOI_INDEX_DELTA?). You could use 4 bits for the tag, 6 bits to store the index, and have 6 bits left to store the lower 2 bits of RGB values. This could be encoded and decoded relatively easily by having a second table where values are hashed by their upper 6 bits, which would allow O(1) lookup at the cost of a little memory.

This sounds very complex to me and even if it might improve the compression rate, the compression speed will heavily suffer from it, as you now have to search the index array

In terms of a better hash function, I think the best approach would be to just interpret the RGBA value as a 32 bit uint, and perform an xor and multiply by 2 32 bit prime numbers. This should give much better randomization at very low cost.

True, i think we could go with something easier though, as multiplication is still costly on lower-end hardware. Maybe an adopepted Pearson hashing for 6 bit with a LUT of 64 byte, but the LUT makes my performance drop sense tingle. I think doing some xor-shifting while not discarding some bits might already be sufficient

@oscardssmith
Copy link

oscardssmith commented Nov 29, 2021

you now have to search the index array

No you don't. In compression the code is roughly

unsigned int pxtrunc =  px.v & 0xfcfcfcfc;
index_pos = QOI_COLOR_HASH(pxtrunc) % 64;
if (index_trunc[index_pos].v == pxtrunc) {
    unsigned int diff = (px.v - pxtrunc);
    unsigned char low_bits = diff | 0x3 | ((diff | 0x300)>>6) | ((diff | 0x30000)>>12)
    bytes[p++] = QOI_INDEX_DELTA | (index_pos>>2)
    bytes[p++] = (index_pos<<4) | low_bits
}
else {
    index_trunc[index_pos] = pxtrunc;
}

The key here is that you store a second index where the values are hashed based on their upper 6 bits rather than the whole value. This keeps the lookup at O(1) at the expense of some extra memory.

@phoboslab
Copy link
Owner Author

Ok, I guess we still have some things to iron out. Let's set the deadline for the final spec to 2021.12.20.

Thanks for your patience <3

@andrewmd5
Copy link

andrewmd5 commented Nov 29, 2021

Was there a technical reason the spec chose to adopt big endian over little?

@oscardssmith
Copy link

I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance. Delata coding also does increase the size by 1 bit since you go from 0:255 to -255:255, which again is probably really bad. Tiling might be a good idea.

@phoboslab
Copy link
Owner Author

Was there a technical reason the spec chose to adopt big endian over little?

No. Big endian seemed like the right thing to do for a file format. I guess I'll have to revisit #36 - but there's tradeoffs either way.

separate channels
tiling
use delta

All interesting ideas, but the result would be a different file format. Outside of the scope of what I'm willing to change here.

QOI_INDEX_DELTA (...) this tag would reduce those sizes by 50% and 33%

How did you arrive at these numbers? I did a quick an dirty test and got a ~3% smaller file size for the kodak images.

@oscardssmith
Copy link

for QOI_INDEX_DELTA, I just meant that when it's applicable, it replaces a 3 or 4 byte value with a 2 byte value, not that that would be the effect over the whole image. For the test, did you use the current r^g^b^a or one of the more complicated hashes described above?

@oscardssmith
Copy link

It does expand the file though. See my above comment.

@gamblevore
Copy link

gamblevore commented Nov 29, 2021

I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance. Delata coding also does increase the size by 1 bit since you go from 0:255 to -255:255, which again is probably really bad. Tiling might be a good idea.

No... thats not how it works. In theory, yes, but in practice no. I guess it wraps around. You just consider that from 255 to 0 you add 1. Or from 0 to 255 you subtract 1. So its like a loop of 0-255 values. Imagine them wrapped around a tube.

@phoboslab
Copy link
Owner Author

For the test, did you use the current r^g^b^a or one of the more complicated hashes described above?

Since the lower 2bits are omitted for the trunc_index I did:
trunc_index_pos = (px.rgba.r >> 2) ^ (px.rgba.g >> 2) ^ (px.rgba.b >> 2) ^ (px.rgba.a >> 2);

@oscardssmith
Copy link

Can you try a version with trunc_index_pos = 0x1071b495*(px.v & 0x92458355)+0xcb533df9;? I think it will give much better mixing. (Similarly, testing index_pos = 0x92458355*px.v+0xcb533df9; would be interesting). The magic constants are randomly chosen 32 bit primes.

@gamblevore
Copy link

I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance.

How?

normally this: RGBA,RGBA
now this: RR, GG, BB, AA

8 bytes in both approaches. Where is the loss of byte alignment?

@ikskuh
Copy link

ikskuh commented Nov 30, 2021

normally this: RGBA,RGBA
now this: RR, GG, BB, AA

This means you have to sweep the output memory four times instead of one time, hurting your cache locality a lot. One of the reasons why QOI is so fast is that it will touch every memory cell in both input and output exactly once. If we change this, we will get huge performance losses which aren't recoverable by any "win" in the compression rate

@gamblevore
Copy link

gamblevore commented Nov 30, 2021

Decompress rate is more important usually. Compress rate should still be fast enough. An operation as "channel splitting" could work at near memcpy speeds.


// restrict means the pointers dont overlap, so the compiler is free to do more opts
void channel_split(uint* restrict Px, int n, u8* restrict R, u8* restrict G, u8* restrict B, u8* restrict A ) {
    while (n-->0) {
        uint P = *Px++;
        *R++ = P;
        *G++ = P>>8;
        *B++ = P>>16;
        *A++ = P>>24;
    }
}

A good compiler schedules RGBA writes, so they all happen in the same cycle. Also would unroll the loop.

You won't lose much more speed than an memcpy would have lost you.

@slembcke
Copy link

Letting it mature sounds like a good idea I guess. Though on the other hand, even if you had 3 different revisions of the format it would still be relatively simple. ;)

Maybe the best way to go about it is to encourage people to use it as a simple intermediate format for now and NOT a long term storage format? Undoubtedly there will be subtle issues that will change your mind about how qoi should work, or even what it is. You've obviously found a nice local maxima in the design space, give it a little time to climb all the way to the top. :)

I don't personally care about compatibility because I'm just using it as an intermediate format. Qoi's defining trait that made me say "why not?" was it's tiny, hackable implementation. It was one liner make rule to drop it into my asset pipeline, and a nearly 1:1 replacement for stb_image with some (mild) benefits. Why not indeed. Being easily hackable, I even pushed the alpha pre-multiplication into the encoder and don't have to care whether or not it's considered "standard" or not. (Though I am curious if there's a technical reason for that beyond "keep it simple", which is a good reason)

@mgmalheiros
Copy link

As a humble suggestion to better define the scope of this format, perhaps it could be restated as:

"a simple and fast run-length encoding of byte groups"

Thus, if a group is actually a RGB triplet, then it is outside of the format scope to specify the colorspace (that's semantic info which should be somewhere else). Likewise, if the groups are RGBA quads, the actual alpha format is up to the application using it.

Following my naïve interpretation, there are only three info fields needed in the header besides the magic: width, height and channels.

I believe these are useful as a convenience (when holding images, you got the resolution and channels) but actually required when pre-allocating the buffer where decoding will happen. And the number of channels is needed as the "byte group" size for hashing/encoding/decoding.

Therefore, covering the 1 and 2-channel cases could be both very simple to support and useful, as the format would be able to hold grayscale images, indexed pixelart (actual palette info is outside the scope), tilemap info and 2-channel textures (common in certain Physically-Based Rendering pipelines).

Just my two cents.

@phoboslab
Copy link
Owner Author

phoboslab commented Nov 30, 2021

In the experimental branch I have removed QOI_RUN_16 and added a new operation, based on the idea that changes in luma would affect all three RGB channels in the same direction: QOI_GDIFF_16 (needs a better name).

 - QOI_GDIFF_16 ------------------------------------
|         Byte[0]         |         Byte[1]         |
|  7  6  5  4  3  2  1  0 |  7  6  5  4  3  2  1  0 |
|-------------+--------+-------------------+--------|
|  1  1  0  1 |  dr-dg |    green diff     |  db-dg |

4-bit tag b1101
3-bit   red channel difference minus green channel difference -4..3
6-bit green channel difference from the previous pixel -32..31
3-bit  blue channel difference minus green channel difference -4..3

The green channel is used to indicate the general direction of change and gets
a few more bits. dr and db base their diffs off of the green channel diff. E.g.
  dr = (last_px.r - cur_px.r) - (last_px.g - cur_px.g)

Encoded sizes (avg) kb

master experimental
wallpaper 10640 10170
kodak 771 700
misc 400 407
textures 184 179
screenshots 2582 2491

Screenshots and misc suffer a bit from the removal of QOI_RUN_16. Sadly, misc also doesn't gain much from this new op. It seems to be best for photos or "natural" images.

I have also aligned the chunk prefixes so that they are either 2-bit or 4-bit. No more 3-bit codes. This could probably lead to some performance improvement for the decoder.

#define QOI_INDEX     0x00 // 00xxxxxx
#define QOI_RUN       0x40 // 01xxxxxx
#define QOI_DIFF_8    0x80 // 10xxxxxx
#define QOI_DIFF_16   0xc0 // 1100xxxx
#define QOI_GDIFF_16  0xd0 // 1101xxxx
#define QOI_DIFF_24   0xe0 // 1110xxxx
#define QOI_COLOR     0xf0 // 1111xxxx

Note that I re-aranged the if statements in the encoding function to make it easier to experiment, but it makes encoding a bit slower.

I think that's an easy win!? The new operation is very easy to implement with just two more subtractions.

A rather unsuccessful experiment was to encode the "acceleration" of change for each channel instead of the "velocity" (diff) of change. It helps, but not much. QOI_GDIFF yielded far better results.

We also really need a better test suite of images. As noted elsewhere, the only images with an alpha channel are the few ones in misc/. But I also believe that dice.png and fish.png are rather "artificial" examples. I would assume that most textures used in games (and also most use-cases for PNG on the web) have sharper edges for alpha, rather than this semi transparent gradient presented in those two files.

Anyway, I probably won't have time to work much on this in the next few days, but I'm hyped to get back to it :)

@oscardssmith
Copy link

If I made a PR to experimental to update the QOI_RUN_8 to an implementation more like #41 where repeated RUN_8 instructions are used more efficiently, would you be willing to merge it? Doing so should be a notable speedup for decompression, and give better file size.

@nigeltao
Copy link

nigeltao commented Dec 1, 2021

In the experimental branch I have removed QOI_RUN_16 and added a new operation, based on the idea that changes in luma would affect all three RGB channels in the same direction: QOI_GDIFF_16 (needs a better name).

If we're throwing experiments out there, see also nigeltao/qoi2-bikeshed#20

@toddjonker
Copy link

This sounds very complex to me and even if it might improve the compression rate, the compression speed will heavily suffer from it, as you now have to search the index array

It sounds attractive to have a simple-to-understand and fast-to-decode format that allows for more complexity (and innovation) on the encoder side. If a clever encoder wants to spend a lot of wall-clock time to get great compression, that doesn't harm the simplicity of the standard format.

@amber8706
Copy link

Hi! I like the idea of this format. I would like to see an optional section for metadata as part of the format. In general terms, it would be great to store additional information about the image into format. I consider the metadata section as optional in specification. The "general" metadata properties could be the following: date and time of image creation, location, author, general description, etc. In addition to the "general" metadata, it should be possible to add any other metadata in the "key=value" format.
If the metadata section is too large, it should be possible to compress it (for example, using the LZMA method or PPM etc.)
I repeat: the metadata section is optional. The operation of the QOI format should not depend on it.

@notnullnotvoid
Copy link

If you need real-world alpha textures for the benchmark suite, I have a whole game's worth of high-res painted alpha sprites you could pick through and freely use. The slowness of the PNG format was a huge impediment in our asset pipeline, so having a fast format that performs reasonably well on sprites with transparency is something I'm quite personally invested in. Let me know and I can send a big zip of the relevant images.

@rmn20
Copy link

rmn20 commented Dec 1, 2021

I think that abandoning 3-bit indexes can be a bad move, as it reduces deltas size in DIFF_16, which is one of the most used opcodes at the moment. (Maybe GDIFF_16 can work even better with 544 DIFF_16?). Another way might be to remove DIFF_16 at all and enlarge the GDIFF_16 deltas to 464, 473 or 373 this way.

@wbd73
Copy link

wbd73 commented Dec 18, 2021

@phoboslab
It looks like this simpler hash function

#define QOI_COLOR_HASH(C) (C.rgba.r + C.rgba.g*3 + C.rgba.b*5 + C.rgba.a)

performs almost as good when I try.
In case where multiplication is costly, it could be calculated like

#define QOI_COLOR_HASH(C) (((C.rgba.g + C.rgba.b) << 2) + C.rgba.r - C.rgba.g + C.rgba.b + C.rgba.a)

but also when working with a lookup table it is faster since you only have to lookup two values instead of four.

@wbd73
Copy link

wbd73 commented Dec 18, 2021

At the moment the implementation mentions

The 8-bit tags have precedence over the 2-bit tags. A decoder must check for the presence of an 8-bit tag first.

Since the format will not change when finalized, would you consider to change this requirement into

The 8-bit tags have precedence over the 2-bit QOI_OP_RUN tag. A decoder must check for the presence of an 8-bit tag before checking the QOI_OP_RUN tag.

I can't see a problem with checking the other 2-bit tags before the 8-bit ones.
It would give some more freedom when writing a decoder.
If you optimize for speed, starting with the most common tags might help.

@phoboslab
Copy link
Owner Author

I don't understand. The only tag that can collide with QOI_OP_RGB and QOI_OP_RGBA is the QOI_OP_RUN, so you're already free to do that.

@nigeltao
Copy link

nigeltao commented Dec 19, 2021

Copy/pasting from nigeltao/qoi2-bikeshed#28 "Demo 20: a Proof of Concept"


Starting from commit aefa0f7 (2021-12-17) from the phoboslab/master branch, qoi-op-a adds an QOI_OP_A (like the old QOI_OP_COLOR(A)), replacing "QOI_OP_RUN with run length 62".

qoi-demo20 builds on that by:

  • using a FIFO color cache (instead of a hashed color cache), like qoi-fifo2e (Hash function nigeltao/qoi2-bikeshed#19 (comment)), except that we get all of the encoding speed back by re-introducing hashing just for the encoder. Different encoder implementations can bikeshed on the particular hash function as much as they want, but hashing is kept out of the file format specification (simpler) and out of the decoder implementation (faster). No SIMD required.
  • the color cache's initial state was tweaked to include opaque black (consistent with the px_prev initial value, a la Inital state of index array is undocumented #75 (comment)). "QOI_OP_RUN with run length 1" is now redundant (there's always an equivalent QOI_INDEX op), and while it's not in qoi-demo20, we could re-assign the opcode (or bump run lengths by 1).
  • various optimizations like qoi-demo2e (Recent changes in experimental branch #71 (comment)), an optimized version (with no file format changes) of phoboslab/master commit 2ee2169.

The images test suite is the 'official' one from #69 and images/icon_512 within that is highlighted for decoder/encoder speed gains on top of phoboslab/master. The images-lance test suite comes from #69 (comment) and qoi-demo20 shows a 1.13x compression ratio gain (qoi-demo21 shows 1.22x) on top of phoboslab/master.

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

## Total for images/icon_512
qoi-demo2e:   0.9         1.4        300.26        183.00        85    8.4%
qoi-aefa0f7:  1.6         1.6        165.18        167.23        85    8.4%
qoi-op-a:     1.4         1.2        180.92        213.11        79    7.8%
qoi-demo20:   0.8         1.2        322.69        227.86        79    7.8%
qoi-demo21:   0.9         1.3        290.54        197.44        77    7.6%

# Grand total for images
qoi-demo2e:   3.8         5.3        121.20         87.21       463   25.6%
qoi-aefa0f7:  4.8         5.6         97.28         83.08       463   25.6%
qoi-op-a:     4.7         4.8         98.25         96.80       462   25.5%
qoi-demo20:   3.6         4.9        127.71         95.65       459   25.4%
qoi-demo21:   3.8         4.9        123.34         95.26       458   25.3%

# Grand total for images-lance
qoi-demo2e:  26.9        30.8        151.25        131.97      2109   13.3%
qoi-aefa0f7: 35.4        32.4        114.61        125.29      2109   13.3%
qoi-op-a:    35.8        29.1        113.38        139.46      1862   11.7%
qoi-demo20:  27.1        28.2        150.12        143.97      1859   11.7%
qoi-demo21:  27.7        31.4        146.73        129.50      1722   10.9%

qoi-demo20.h.txt

Edit: added qoi-demo21 per comment below.
qoi-demo21.h.txt

@notnullnotvoid
Copy link

That's brilliant! I see absolutely no reason not to adopt the fifo index queue now.

However, I would strongly advocate for using QOI_OP_DIFF2 and QOI_OP_DIFF4 as suggested here instead of (or in addition to?) QOI_OP_A, since they get consistently better compression.

Here is a summary comparison of the different proposals for handling alpha:

## Total for images/icon_512
        decode ms   encode ms   decode mpps   encode mpps   size kb   vs rgba   vs raw   vs disk
qoi           0.8         1.1        315.22        243.74        85      8.4%     8.4%     1.67x
op_a          0.8         1.1        322.02        240.00        79      7.8%     7.8%     1.56x
qoi4          0.8         1.1        322.12        243.41        80      7.8%     7.8%     1.57x
qoi24         0.9         1.1        304.68        238.29        78      7.6%     7.6%     1.53x

# Grand total for images
        decode ms   encode ms   decode mpps   encode mpps   size kb   vs rgba   vs raw   vs disk
qoi           2.6         3.1        180.24        149.26       463     25.6%    28.2%     1.18x
op_a          2.5         3.1        185.55        149.51       462     25.5%    28.1%     1.18x
qoi4          2.6         3.1        178.13        148.26       462     25.5%    28.1%     1.18x
qoi24         2.7         3.2        172.95        146.52       462     25.5%    28.1%     1.18x

# Grand total for images-lance
        decode ms   encode ms   decode mpps   encode mpps   size kb   vs rgba   vs raw   vs disk
qoi          16.8        20.8        242.42        195.31      2109     13.3%    13.6%     1.49x
op_a         17.5        21.3        232.59        190.84      1862     11.7%    12.0%     1.31x
qoi4         17.2        21.4        236.48        189.84      1857     11.7%    11.9%     1.31x
qoi24        18.2        21.9        222.68        185.19      1786     11.3%    11.5%     1.26x

qoi is the current implementation, op_a uses the proposed QOI_OP_A, qoi4 uses only the proposed QOI_OP_DIFF4, and qoi24 uses both QOI_OP_DIFF2 and QOI_OP_DIFF4.

@wbd73
Copy link

wbd73 commented Dec 19, 2021

@phoboslab

I don't understand. The only tag that can collide with QOI_OP_RGB and QOI_OP_RGBA is the QOI_OP_RUN, so you're already free to do that.

Probably just a misunderstanding. I was under the impression any decoder had to follow the direction mentioned in the original source code to first check for all 8 bit tags. If it's okay to deviate from that it's fine.

@nigeltao
Copy link

However, I would strongly advocate for using QOI_OP_DIFF2 and QOI_OP_DIFF4

I have updated my previous post to do that (called qoi-demo21).

@phoboslab
Copy link
Owner Author

I'm sorry, but I don't like any of these tradeoffs. I believe QOI is at a very sweet spot right now. The compression gains shown don't justify the added complexity.

Pending any last minute emergencies, the current spec will be the final one.

@chocolate42
Copy link

Nigeltao's FIFO reduces decode complexity, increases compression, and barely increases encode complexity.

@Wulf0x67E7
Copy link

@phoboslab What complexity? QOI_OP_A, QOI_OP_DIFF2, and QOI_OP_DIFF4 are all extremely simple and straightforward ops, and the implementation-cached FIFO index actually removes complexity from the spec.

On top of that, they increase the compression rate of the type of images qoi struggles the most with, those with complex alpha, by almost 20% ((1-1722/2109)*100), which is most definitely significant!

@wbd73
Copy link

wbd73 commented Dec 19, 2021

Nigeltao's FIFO reduces decode complexity, increases compression, and barely increases encode complexity.

That was also the one I hoped would make it.
I thought it was a great idea to use a FIFO queue with a hash for encoding only.
On the other hand I also understand you don't want to make significant changes a day before finalizing a file format.

@HappySeaFox
Copy link
Contributor

Why not to move the deadline then? 😀 Imho it's totally ok. Of course it's up to @phoboslab

@phoboslab
Copy link
Owner Author

FIFO doubles the compression time and puts more burden on any implementation trying to get that time back. I like the fact that even a "naive" implementation of QOI is still a good one.

I will not have a discussion about more OPs, sorry.

Feels like we're back at last month, where I piss of a bunch of people with very cool ideas. But this time I have a bit more confidence to say: I'm happy with the current state of QOI and all the things we learned should be rolled into a new format.

@chocolate42
Copy link

The FIFO we are discussing does not double the compression time, you are misunderstanding what the recent development as of demo20 does.

@phoboslab
Copy link
Owner Author

The "naive" implementation did double compression time.

But I have to admit, not having to specify any hash function is enticing... I will do some testing!

@chocolate42
Copy link

Here's my simple proposal refined with demo20's FIFO:

# Grand total for qoi_benchmark_suite/images
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         6.558      77.415         70.78          6.00       398   24.2%
stbi:           6.498      57.981         71.43          8.01       561   34.2%
qoi-master3a:   2.140       2.619        216.94        177.20       463   28.2%
qoi-propA-fifo: 2.075       2.671        223.75        173.81       446   27.2%

By reworking existing ops slightly and adding a single op (a 3 byte RGB encoding that often means we avoid a 4 byte QOI_OP_RGB encoding), compression is significantly improved, decoding is slightly quicker, and encode is slightly slower.

#define QOI_OP_RGB3   0x00 /* 00xxxxxx */
#define QOI_OP_DIFF   0x40 /* 01xxxxxx */
#define QOI_OP_LUMA   0x80 /* 10xxxxxx */
#define QOI_OP_INDEX  0xc0 /* 110xxxxx */
#define QOI_OP_RUN    0xe0 /* 111xxxxx */
#define QOI_OP_RGB    0xfe /* 11111110 */
#define QOI_OP_RGBA   0xff /* 11111111 */

#define QOI_COLOR_HASH_SIZE 1024
#define QOI_COLOR_HASH(C) ((C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11) & 1023)

For more details: #91

@notnullnotvoid
Copy link

notnullnotvoid commented Dec 19, 2021

@phoboslab

An exhaustive search of the FIFO does slow down compression, but you don't need to do an exhaustive search. You can gain all that speed back by using a hash, much like what the current version does. But with the FIFO, that extra bit of complexity is optional instead of mandatory, and the decoder never sees it. I would agree that the FIFO index queue removes complexity overall.

As for the new opcodes, while they do add complexity, they are all pretty straightforward and don't change anything about how the format already works. And if adding all three seems like too much, there are other options. While personally I'd love to have them all, diff4 + diff2, only diff4, or even only op_a (though I do think diff4 is better if you're looking for just one) are all reasonable and significantly better than having nothing at all for varying alpha. The inability to compress the alpha channel has been a major blind spot of the format imho, and it's a shame to not address that in any way when there are plenty of good options available. Speaking personally, the ability of the format to handle varying alpha is one of the biggest factors in my ability to make use of it.

It's disappointing to hear that you aren't willing to consider changes and would suggest forking the format instead. I don't want to deal with a fractured format, so that doesn't sound like an appealing option to me.

However you slice it, the current proposal is simpler than the first version of the format that you presented a month ago, which you rightly considered very simple at the time. If the format was simple enough back then, there's no question that it's simple enough with these changes too.

@phoboslab
Copy link
Owner Author

More FIFO discussion here: #94

@notnullnotvoid yeah, 20% improvement sounds nice, but does it really matter in practice if your images are highly compressed to 13.6% or to 11.5%? If it does, you should consider a different image format.

Also, I'm not proposing to "fork" QOI. I'm trying hard to establish only true standard for QOI here. An improved format should have a different name (QGI or QOI2, whatever) and should implement all the things that QOI does not: allowing restarts for parallel processing, block based encoding, maybe a color transform to YUV and then some more OPs.

@notnullnotvoid
Copy link

@notnullnotvoid yeah, 20% improvement sounds nice, but does it really matter in practice?

Yes, it does. If such a simple change led to such huge gains only for opaque images, you'd adopt it without hesitation. Why you have so much disdain for people working with transparent images, I genuinely do not understand. I am reaching out to you in the first place from a fairly generous position of compromise. I was careful to make sure that the fix I proposed was extremely simple, did not change any of what already works well in the format, and could be implemented without noticeably impacting ratio or throughput for opaque images. The format privileges opaque images over transparent ones to a pretty extreme degree, and to be honest, after these changes it still does - just less so, enough to make it more bearable. The fact that it's even possible to get such huge wins by devoting less than 1% of the opcode space to one or two multi-byte encodings is evidence enough of the severity of the problem. All I am asking you to do is make a small addition to alleviate the most glaring weak point in the format, while you still have the opportunity to do so. Seriously, I am trying to help you out here. Please let me.

@phoboslab
Copy link
Owner Author

I'm not sure if I have to say this, but I have nothing against images with an alpha channel or people using such images.

I'm just considering the tradeoffs here and for QOI I'm very heavily leaning towards simplicity. We already have too many complicated image formats out there. There were countless other proposals that I rejected on those grounds. So please don't take it personal.

My suggestion to look into another image format was an honest one. If you have a lot of semi-transparent images, QOI may not be the best choice. As you said yourself, even with those proposed fixes QOI still performs badly when dealing with complicated alpha channels.

@DanielMagen
Copy link

DanielMagen commented Dec 19, 2021

@phoboslab I believe that what made this format gain so much traction in the first place is its simplicity and speed because people do look for such things. there would also always be room for improvement.

as I see it, there are 3 choices

  1. sacrifice more of the simplicity and speed of qoi to get better compression.
    I don't think you would ever beat png with a simple approach. I believe someone can conjure-up/find some corner cases that qoi does badly on and png developers took into account. those would probably not be common in practice, but still.

  2. sacrifice some of qoi current compression to get more simplicity. personally, I'm biased in favor of the original xor hash, it gave such good performances compared to its simplicity. I also think that qoi can inspire more simple-but-actually-really-good formats, and sacrificing simplicity reduces the chances for that. but maybe I'm just a bit too "romantic" about this.

  3. keep qoi as it is now. your'e about 16-20% worse than png but so so much faster. its a really cool format as it is.
    I personally would use qoi if it gets the right amount of support. sacrificing a bit of disk space to get more speed is a trade I'm willing to make.

@phoboslab
Copy link
Owner Author

@DanielMagen as you can imagine, I agree that QOI got the attention because of its simplicity. I mean, FLIF is awesome in terms of compression, but as far as I can tell there's exactly one implementation of it.

In regards to 2.: I guess the proposal of just using the color hash as a FIFO (#94) would kindle your romance?

@DanielMagen
Copy link

DanielMagen commented Dec 19, 2021

naivly, it looks to me like you're trying to squeeze the last bits from the cache table and sacrificing simplicity a lot for it. but I'll let other people choose if it's a valid direction.

@phoboslab
Copy link
Owner Author

Sorry, I'm not sure how to read your comment. You mean the FIFO sacrifices simplicity?

but I'll let other people choose if it's a valid direction.

That'll be me, I guess. And I've been staring it this whole thing for too long...

@DanielMagen
Copy link

sorry I'll make myself more clear.
using the cache as a hash table gives very good results. changing it to fifo would not be simpler to implement and would improve compression by only a tiny bit which I think is negligible.

the "highest hit to simplicity" I see is that to give a good (fast) implementation of qoi with the fifo approach, one would have to use the hash trick you used. so even if the spec is simplified, in practice, this would add more code to any implementation with negligible effects to compression. this would also add more burden to someone trying to implement qoi in the form of needing to read/come-up(if the hash trick is not specified) with the hash/fifo idea and choose a good hash.

@swarnimarun
Copy link

swarnimarun commented Apr 7, 2022

Just to be clear, the spec is now locked/fixed/unchanging(atleast anything major) right?

I wanted to pitch an implementation to be merged in production software. Just wanted to know where exactly things stand. :)


oof dammit, saw the other issue about 30 seconds later.
#95

Yeah anyone else who finds themselves here, atleast at the moment things seem fixed. So ciao.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests