Make reference tests simpler to use by RunDevelopment · Pull Request #2802 · image-rs/image

RunDevelopment · 2026-02-27T17:08:49Z

Fixes #2796

I changed the reference image test runner to a bless-based workflow. I also got rid of CRC and the custom test harness.

Changes:

Removed CRC from reference images, because it makes blessing more complex.
All test images in tests/images are now decoded and compared to their references in tests/references.
1. A test image failing to decode or not having a reference is now an error.
2. Animated test images also compare all their frames to reference.
3. Users can use the BLESS=1 env var to automatically create and update references.
Added missing reference images.

This directly led me to discovering 2 issues:

TIFF is unable to decode hpredict_cmyk.tiff (temporarily lives in tests/bad/tiff/TODO hpredict_cmyk.tiff). Other programs can open this.
WebP animation frames are decoded incorrectly when transparency is involved. See tests\reference\webp\extended_images\anim.webp.04.png. This is an animation of a ball bouncing around, but we decode each frame with all previous frames stacked below. My guess is that some buffer isn't cleared when it should be, and we just draw opaque pixels on top of the previous frame.

fintelia · 2026-02-27T18:39:19Z

Could we break this up into smaller pieces?

There are parts that I like, but tons of different things are changing at once. For instance, this completely rewrites the reference images test and regresses the output to just be:

     Running `/home/runner/work/image/image/target/debug/deps/reference_images-522906eb2bfa8780`

running 2 tests
test bad_images ... ok
test check_references ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 10.53s

RunDevelopment · 2026-02-27T23:32:21Z

Could we break this up into smaller pieces?

The only parts to break off that I can think of are the bad images and non-image files. Did you mean that?

this completely rewrites the reference images test

Well, yes. The reference image test works very differently now, and the previous implementation didn't have much that was reusable (because it was one huge function).

regresses the output to just be

Please explain how this is a regression. While true that all images passing will be a single line, that's hardly a problem, no? What's most important is that errors are explained well to enable the developer to fix them. If any subset of images fail the test, a simple overview will be generated. For example, say the reference for a.gif is different and the reference for b.qoi is missing. Then the output will be:

     Running tests\reference_images.rs (target\debug\deps\reference_images-f2796d1ae4fc4b02.exe)

running 1 test

thread 'check_references' (12332) panicked at tests\reference_images.rs:140:9:
Errors in references:
❌ images\gif\a.gif
     Reference dimension mismatch: image is (256, 256), reference is (12, 12) 
❌ images\qoi\b.qoi
     Missing reference image
stack backtrace:
   0: std::panicking::panic_handler
             at /rustc/254b59607d4417e9dffbc307138ae5c86280fe4c/library\std\src\panicking.rs:689
   1: core::panicking::panic_fmt
             at /rustc/254b59607d4417e9dffbc307138ae5c86280fe4c/library\core\src\panicking.rs:80
   2: reference_images::check_references
             at .\tests\reference_images.rs:140
   3: reference_images::check_references::closure$0
             at .\tests\reference_images.rs:46
   4: core::ops::function::FnOnce::call_once<reference_images::check_references::closure_env$0,tuple$<> >
             at C:\Users\micha\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ops\function.rs:250
   5: core::ops::function::FnOnce::call_once
             at /rustc/254b59607d4417e9dffbc307138ae5c86280fe4c/library\core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test check_references ... FAILED

failures:
    check_references

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1 filtered out; finished in 9.82s

197g · 2026-02-27T23:46:11Z

Each image having its own test failure status was the main motivation for changing it from the version-0.25 variant in the first place. It's valuable to know which test is failing if any is failing, makes it easy to identify any particular feature they might have especially for regression tests and suites that are constructed to cover them.

RunDevelopment · 2026-02-28T00:17:37Z

It's valuable to know which test is failing if any is failing, [...]

Don't you get that same information in the output that is generated by this PR? I don't understand what advantage making each file one test has.

fintelia · 2026-02-28T01:31:37Z

If the test passes then you don't get any output detailed output. That includes if the test passes because your image was skipped or because you placed it in the wrong directory and it didn't get detected in the first place.

And there's also the aspect that you're reverting a feature that we just added a few weeks ago.

RunDevelopment · 2026-02-28T09:39:34Z

Sure. Then let's go back to libtest-mimic.

Could you please also answer how you want the PR to be split up?
Edit: I made 2 PR that split off some parts of this. This PR is blocked until they are merged.

RunDevelopment · 2026-02-28T13:11:54Z

Regarding QOI: The BE bug that causes CI to fail has been fixed around a year ago, but they haven't made a new release in over 3 years. So that's fun. Suggestions on how to deal with this? Removing the one (and only) QOI test image seems like the easiest solution.

RunDevelopment · 2026-02-28T23:35:50Z

CI finally passes 🎉 I fixed it by ignoring QOI specifically on BE arches in reference tests. Hacky, but it works.

With that, this PR is technically ready to merge (after review of course), but I'd like to talk about 2 topics before:

tests/output - do we still need it? Right now, reference tests will still write every decoded image to tests/output. This was useful for when you had to manually copy files around, but now you can just bless and references will be updated automatically.
Unused references. I haven't implemented that yet with libtest-mimic. I'd like to do it in a follow-up PR to not block this one.

mstoeckl · 2026-03-01T00:26:09Z

I think including the hash of the uncompressed data in the filename is useful at minimum when testing PNG and TIFF (float) formats, because they are the reference formats for integer and float pixel types. It ensures that decoder bugs (like incorrectly byte-swapping 16 bit data on big endian systems) that affect both the image and its reference will be caught. I can't say if it's the best way to do it, but it works.

(Arguably reference images aren't needed at all to detect issues, and the hash suffices; but they do help with debugging. Edit: I should also note the current hash only applies to the raw decoded bytes and can miss dimension or format interpretation issues.)

Looking over the test images, I see:

In general the reference folder has gotten larger. Part of this is driven by the large (1MB) png/iptc.png, which did not previously have a reference. Some of the new reference images (like tests/reference/webp/extended_images/advertises_rgba_but_frames_are_rgb.webp.05.png) could can be compressed a bit tighter (using e.g. oxipng).
Now the tests produce two files like output/exr/cropping - uncropped original.exr.tiff, which is 1920x1920 pixels and uses 60MB of disk space.

RunDevelopment · 2026-03-01T01:01:17Z

It ensures that decoder bugs (like incorrectly byte-swapping 16 bit data on big endian systems) that affect both the image and its reference will be caught

IMO, it's fine for the reference test to assume that reference images are decoded correctly.

Plus, say there was a byte-swapping (or similar) bug in PNG/TIFF. For that bug to not be caught by the reference test, it would also need to be present in all other image formats that use PNG/TIFF for reference images.

In general the reference folder has gotten larger. Part of this is driven by the large (1MB) png/iptc.png, which did not previously have a reference. Some of the new reference images (like tests/reference/webp/extended_images/advertises_rgba_but_frames_are_rgb.webp.05.png) could can be compressed a bit tighter (using e.g. oxipng).

Frankly, png/iptc.png seems like a mistake to start with. Idk why this was added.

As for better compression: I'm open to it if it's a simple change. But I'm not going to integrate an external command line tool. The whole point of this PR is that it should be easy to add test images and update references.

Now the tests produce two files like output/exr/cropping - uncropped original.exr.tiff, which is 1920x1920 pixels and uses 60MB of disk space.

This will change when #2785 enables compression for TIFF by default. Case in point, I actually copied the compression settings from #2785 to generate the TIFF reference images for this PR. See that reference/exr/cropping - uncropped original.exr.tiff is 100KB and not 60 MB.

fintelia · 2026-03-01T01:11:59Z

We shouldn't be generating the reference images using the same decoders we're testing. They should be produced using some other independent decoder, and then compressed via oxipng or similar to avoid taking up more space than necessary in the repository.

The only exceptions are JPEG and AVIF(?) where the bit-exact output isn't fully defined. In those cases we should still compare against an independent reference implementation, but the checked in contents should be produced by our decoders (after running them through oxipng)

RunDevelopment · 2026-03-01T12:51:53Z

We shouldn't be generating the reference images using the same decoders we're testing. They should be produced using some other independent decoder [...]

Is this a "they should ideally be produced by an independent decoder" or is this a "they must be produced by an independent decoder"?

If you meant the latter (=a hard requirement), then I seriously question why you even reviewed this PR and asked me to make changes to begin with. A bless-based workflow necessarily means that our decoders' output is used as reference.

If you meant the former, then why even bring it up? Judging by @mstoeckl's comment in #2796 and my own experience, people already copied what was in tests/output. This PR just automates this process, so nothing changes. Plus, it's not like using this decoders' output compromises correctness. Personally, for any test image, I compare the generated output to how the image editing programs I use open that test image. This makes it easier to see and understand systematic error. Diffing is easy with these tools too.

and then compressed via oxipng or similar to avoid taking up more space than necessary in the repository.

I doubt many contributors ever did that. I didn't. Up until the comment by @mstoeckl, I haven't even heard of this tool. If this is supposed to be a requirement, then document it.

I also want to say something more general: If you want tests to be written, then you have to make it easy to add them. If you have a million requirements for adding a single image to test, then your software simply isn't going to be tested well.

fintelia · 2026-03-02T00:08:25Z

Sorry for the confusion.

The reference tests (like many areas in image-rs) exists in the current form mostly out of inertia. Someone designed a part of the code 5 or 10 years ago and it hasn't gotten very much attention since. Honestly, at this point I don't really have a fully thought out opinion on what reference tests should look like.

So far, using oxipng has not been a requirement but we have preferred small test images over large ones. If I notice a PR that adds a big test file, I'll generally request a small one instead. But since we're now bringing up the topic of adding new test images, it makes sense to talk through whether we should have clearer guidance.

On the topic of "do we want people to write more tests", IMO there's actually some nuance:

Git repositories don't handle large files very well. Any image file we add increases the repository download time for anyone cloning it, regardless of whether they want to run the tests, or even whether a subsequent commit later deletes it.
- There's various options to get around this like Git LFS or creating a second repository for holding test images. I don't think anyone's done much investigation into these though.
- At the moment our test images take ~14MB
Many of our more complicated codecs live in separate repositories. Which probably means most of the test images for those formats should live directly in their repositories so regressions are caught faster? Any images in this repository would then just serve as a quick check that nothing broke in the binding layer.
Incorrect reference images produced by a buggy encoder or decoder could actually make it harder to find issues.

RunDevelopment · 2026-03-02T06:54:31Z

I see. While I don't fully agree with the hypothetical in your last point, everything else makes sense to me.

To address the git size issues, we could not commit every reference image. In image-dds, my test runner uses both reference images and reference hashes (all auto generated). Test images are always compared against their reference hash, but only against a reference image if it exists (kind of). This allows me to pick and choose which reference images I want to commit by git-ignoring the files I don't. So the DDS decoder has 16 MB of reference images, but I only commit 2 MB. This approach is especially useful when the reference PNG is many times larger than its corresponding test image.

(Since reference hashes are generated directly from the decoded image data, this also address concerns about buggy encoders used to generate reference images.)

Regarding how hashes are stored: I would keep them in one file. Maybe tests/reference/hashes.toml. The file format doesn't matter. Could be YAML, TOML, JSON, or whatever. It just needs to store a mapping from (relative) test image path to hash. Suggestions for the file format would be welcome, since we need a good reader and writer for it.

Then the workflow for adding new test images would be as follows:

Add the test image to tests/images/<format>.
Run BLESS=1 cargo test to create/update reference hashes and images.
Verify that the generated reference image is correct.
If the reference image is very large (or a reviewer requests it), add it to .gitignore.

This would allow us to impose a rule like "All reference images >32 KB must be git-ignored unless you have a good reason." (Not enforced in code, just documented as part of the contributing docs.)

While unnecessary, I would advocate for committing at least some reference images. Makes it easier to review PRs without having to run their code locally. I also want to avoid a situation where a new contributor checks out the code base, runs tests, tests fail due to some issue, and all they see is an error saying "mismatching hashes" without any further info.

(The hashes.toml file could later be extended to store more information than just an image data hash. It would also be useful to store other information the decoder returns, such as width, height, color type, orientation, animation loops, metadata, and so on. This would allow us to more comprehensively test our decoders.)

Which probably means most of the test images for those formats should live directly in their repositories so regressions are caught faster? Any images in this repository would then just serve as a quick check that nothing broke in the binding layer.

Fully agree with that. If we have more test images for a particular format than the upstream crate implementing that format, then something is seriously wrong upstream.

However, even our binding layers are typically at least a hundred lines of code each, often with lots of branching code paths to handle different options within a format. E.g. TIFF has 1 or more branches for every supported color format. So even our binding layers require quite a few test images to test adequately.

RunDevelopment · 2026-03-02T13:07:23Z

Regarding oxipng: I thought it was a pure CLI tool, but it can be a library too. So integrating it might not even be difficult.

I also ran it on all reference images (with default compression level o2) to see how it performs, and the results are pretty good. Around 30% saved on average good. I did not think that it would be able to save this much.

Size change by file

(Small files <1 KB are ignored, because they don't matter for the git size problem.)

File	Size (KB)	Opt Size (KB)	Change
bmp\images\Core_1_Bit.bmp.png	1.4	0.5	-68%
bmp\images\Core_4_Bit.bmp.png	2.2	0.9	-59%
bmp\images\Core_8_Bit.bmp.png	2.7	1.7	-38%
bmp\images\lenient\badpalettesize.bmp.png	15.6	2.7	-83%
bmp\images\lenient\badplanes.bmp.png	4.5	0.5	-88%
bmp\images\lenient\pal8oversizepal.bmp.png	15.6	2.7	-83%
bmp\images\lenient\rgb16-880.bmp.png	2.1	0.9	-59%
bmp\images\lenient\rletopdown.bmp.png	15.6	2.7	-83%
bmp\images\pal2.bmp.png	2.9	0.9	-68%
bmp\images\pal2color.bmp.png	3.1	0.9	-70%
bmp\images\pal4rle.bmp.png	4.1	1.3	-69%
bmp\images\pal4rlecut.bmp.png	3.2	1.0	-68%
bmp\images\pal4rletrns.bmp.png	3.9	1.2	-69%
bmp\images\pal8badindex.bmp.png	2.8	1.5	-47%
bmp\images\pal8os2-hs.bmp.png	3.5	2.7	-23%
bmp\images\pal8os2-sz.bmp.png	3.5	2.7	-23%
bmp\images\pal8os2sp.bmp.png	3.5	2.7	-23%
bmp\images\pal8rle.bmp.png	5.7	2.7	-53%
bmp\images\pal8v4.bmp.png	5.7	2.7	-53%
bmp\images\pal8v5.bmp.png	5.7	2.7	-53%
bmp\images\rgb16-231.bmp.png	5.1	2.5	-51%
bmp\images\rgb16-565.bmp.png	1.8	1.2	-33%
bmp\images\rgb16.bmp.png	1.7	1.1	-37%
bmp\images\rgb24.bmp.png	1.5	1.0	-33%
bmp\images\rgb24prof.bmp.png	2.6	1.0	-63%
bmp\images\rgb24prof2.bmp.png	2.6	1.0	-63%
bmp\images\rgb24rle24.bmp.png	3.5	2.7	-23%
bmp\images\rgb32-111110.bmp.png	1.5	1.0	-33%
bmp\images\rgb32.bmp.png	1.5	1.0	-33%
bmp\images\rgb32bf.bmp.png	1.5	1.0	-33%
bmp\images\rgba16-1924.bmp.png	5.8	2.7	-54%
bmp\images\rgba32-61754.bmp.png	2.2	1.5	-32%
bmp\images\rgba32.bmp.png	1.7	1.2	-30%
bmp\images\rgba32abf.bmp.png	3.0	1.2	-61%
bmp\images\V4_24_Bit.bmp.png	72.6	42.8	-41%
bmp\images\V5_24_Bit.bmp.png	72.6	42.8	-41%
farbfeld\transparency\tbbn0g04.ff.png	2.9	0.4	-86%
farbfeld\transparency\tbbn3p08.ff.png	3.7	1.5	-60%
farbfeld\transparency\tbgn3p08.ff.png	3.7	1.5	-60%
farbfeld\transparency\tbrn2c08.ff.png	3.7	1.7	-54%
farbfeld\transparency\tbwn3p08.ff.png	3.7	1.5	-60%
farbfeld\transparency\tbyn3p08.ff.png	3.7	1.5	-60%
farbfeld\transparency\tp0n0g08.ff.png	3.7	0.7	-81%
farbfeld\transparency\tp0n2c08.ff.png	3.9	1.5	-60%
farbfeld\transparency\tp0n3p08.ff.png	3.8	1.3	-65%
farbfeld\transparency\tp1n3p08.ff.png	3.7	1.5	-60%
gif\anim\large-gif-anim-combine.gif.01.png	22.0	0.5	-98%
gif\anim\large-gif-anim-combine.gif.02.png	22.0	1.2	-95%
gif\anim\large-gif-anim-combine.gif.png	22.0	0.5	-98%
gif\anim\large-gif-anim-full-frame-replace.gif.01.png	22.0	0.2	-99%
gif\anim\large-gif-anim-full-frame-replace.gif.02.png	22.0	0.2	-99%
gif\anim\large-gif-anim-full-frame-replace.gif.png	22.0	0.2	-99%
ico\images\bmp_v5_with_icc.ico.png	6.0	2.8	-53%
ico\images\multiple_entries_with_different_bit_depth.ico.png	5.6	4.2	-24%
ico\images\two-entry-order-test.ico.png	1.6	0.8	-49%
jpg\iptc.jpg.png	189.7	130.4	-31%
jpg\portrait_2.jpg.png	41.0	33.5	-18%
jpg\progressive\3.jpg.png	348.0	348.0	0%
jpg\progressive\cat.jpg.png	90.3	90.3	0%
jpg\progressive\test.jpg.png	1.4	1.4	0%
png\16bpc\basn2c16.png.png	2.0	0.2	-88%
png\16bpc\basn6a16.png.png	3.7	2.3	-39%
png\apng\ball.png.01.png	5.7	4.6	-20%
png\apng\ball.png.02.png	5.5	4.4	-19%
png\apng\ball.png.03.png	5.2	4.2	-20%
png\apng\ball.png.04.png	4.7	3.9	-17%
png\apng\ball.png.05.png	4.3	3.6	-15%
png\apng\ball.png.06.png	3.9	3.3	-16%
png\apng\ball.png.07.png	3.5	3.0	-14%
png\apng\ball.png.08.png	3.3	2.8	-15%
png\apng\ball.png.09.png	3.4	2.9	-15%
png\apng\ball.png.10.png	3.7	2.9	-20%
png\apng\ball.png.11.png	3.5	2.9	-17%
png\apng\ball.png.12.png	3.4	2.8	-17%
png\apng\ball.png.13.png	3.5	2.9	-17%
png\apng\ball.png.14.png	3.6	3.0	-18%
png\apng\ball.png.15.png	3.7	3.0	-19%
png\apng\ball.png.16.png	3.7	3.0	-18%
png\apng\ball.png.17.png	3.8	3.1	-19%
png\apng\ball.png.18.png	4.0	3.4	-16%
png\apng\ball.png.19.png	4.3	3.6	-17%
png\apng\ball.png.20.png	5.0	4.1	-18%
png\apng\ball.png.png	5.4	4.5	-15%
png\bugfixes\debug_triangle_corners_widescreen.png.png	76.5	44.1	-42%
png\iptc.png.png	1020.3	904.0	-11%
png\transparency\tbbn3p08.png.png	1.8	1.5	-18%
png\transparency\tbgn3p08.png.png	1.8	1.5	-18%
png\transparency\tbrn2c08.png.png	1.8	1.7	-3%
png\transparency\tbwn3p08.png.png	1.8	1.5	-18%
png\transparency\tbyn3p08.png.png	1.8	1.5	-18%
png\transparency\tp0n2c08.png.png	1.7	1.5	-9%
png\transparency\tp0n3p08.png.png	1.7	1.3	-22%
png\transparency\tp1n3p08.png.png	1.8	1.5	-18%
png\transparency\tp1n3p08_xmp.png.png	2.0	1.5	-26%
tga\encoding\black_white.tga.png	15.1	0.4	-97%
tga\testsuite\bottom_left.tga.png	3.0	2.7	-11%
tga\testsuite\bottom_right.tga.png	3.0	2.6	-11%
tga\testsuite\top_left.tga.png	2.8	2.5	-11%
tga\testsuite\top_right.tga.png	2.9	2.6	-11%
tiff\testsuite\fax4.tiff.png	5.9	2.8	-52%
tiff\testsuite\hpredict.tiff.png	1.5	1.1	-24%
tiff\testsuite\hpredict_packbits.tiff.png	1.5	1.2	-22%
tiff\testsuite\l1.tiff.png	1.8	1.8	0%
tiff\testsuite\l1_xmp.tiff.png	3.2	1.5	-55%
tiff\testsuite\mandrill.tiff.png	638.4	626.3	-2%
tiff\testsuite\planar.tiff.png	282.5	268.4	-5%
tiff\testsuite\rgb-3c-16b.tiff.png	127.5	126.3	-1%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.01.png	57.7	39.9	-31%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.02.png	57.1	37.1	-35%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.03.png	53.4	35.9	-33%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.04.png	57.1	36.7	-36%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.05.png	61.7	37.8	-39%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.06.png	53.5	35.0	-35%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.07.png	54.2	36.3	-33%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.08.png	55.8	37.5	-33%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.09.png	51.9	36.1	-30%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.10.png	51.7	35.3	-32%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.11.png	58.2	38.0	-35%
webp\extended_images\advertises_rgba_but_frames_are_rgb.webp.png	57.7	39.9	-31%
webp\extended_images\anim.webp.01.png	9.1	6.3	-31%
webp\extended_images\anim.webp.02.png	16.8	11.6	-31%
webp\extended_images\anim.webp.03.png	23.8	15.5	-35%
webp\extended_images\anim.webp.04.png	28.8	18.7	-35%
webp\extended_images\anim.webp.05.png	31.5	19.9	-37%
webp\extended_images\anim.webp.06.png	34.5	21.3	-38%
webp\extended_images\anim.webp.png	9.1	6.3	-31%
webp\extended_images\lossy_alpha.webp.png	8.4	7.9	-6%
webp\lossless_images\2-color.webp.png	5.1	0.4	-93%
webp\lossless_images\multi-color.webp.png	200.1	184.1	-8%
webp\lossless_images\simple.webp.png	93.0	41.4	-55%
webp\lossless_images\simple_xmp.webp.png	92.9	41.4	-55%
webp\lossy_images\simple-gray.webp.png	13.4	3.7	-72%
webp\lossy_images\simple-rgb.webp.png	13.8	12.3	-11%
webp\lossy_images\simple-rgb.webp.png	13.8	12.3	-10.8%

RunDevelopment · 2026-03-02T13:53:22Z

I added oxipng. Since the test runner only saves reference images that change, it's really easy to run oxipng after a reference image has been created/updated without applying it to existing references/outputs. I went with optimization level 4, because it doesn't seem to be much slower than the default and not any worse than max (=6) while being a good deal faster.

The problem might be a dependency of oxipng. It depends on libdeflater, which needs a C compiler. Not sure if this is going to be a problem for us, since oxipng is a dev dependency.

cargo tree

├── oxipng v10.1.0
│   ├── bitvec v1.0.1
│   │   ├── funty v2.0.0
│   │   ├── radium v0.7.0
│   │   ├── tap v1.0.1
│   │   └── wyz v0.5.1
│   │       └── tap v1.0.1
│   ├── crossbeam-channel v0.5.15
│   │   └── crossbeam-utils v0.8.21
│   ├── indexmap v2.13.0
│   │   ├── equivalent v1.0.2
│   │   ├── hashbrown v0.16.1
│   │   └── rayon v1.11.0 (*)
│   ├── libdeflater v1.25.2
│   │   └── libdeflate-sys v1.25.2
│   │       [build-dependencies]
│   │       └── cc v1.2.56
│   │           ├── find-msvc-tools v0.1.9
│   │           └── shlex v1.3.0

197g · 2026-03-02T19:51:02Z

Around 30% saved on average good. I did not think that it would be able to save this much.

The size savings would be for git pull. I'm afraid that updating the images at this point produces more overhead by them appearing twice in the history than the gains are worth for the more occasional --depth= checkout (during CI). We'd want to use it for all new tests though apparently.

RunDevelopment · 2026-03-02T21:33:45Z

Yeah, of course. I'm not suggesting updating existing reference images. (I only used them as a dataset to evaluate how good oxipng is.) That's why I only update images this PR adds. AFAIK, Squash & Merge should ensure that the first (larger) files I commit in this PR never end up in the git history of this repo (only my fork).

RunDevelopment · 2026-03-03T12:33:46Z

Right. Good old crc doesn't do the same thing on BE, because bytes have a different layout there. I'd like some feedback on the approach before I fix that though.

197g · 2026-03-04T02:01:17Z

tests/reference_images.rs

-            let test_crc_actual = {
-                let mut hasher = Crc32::new();
-                match test_img {
-                    DynamicImage::ImageLuma8(_)
-                    | DynamicImage::ImageLumaA8(_)
-                    | DynamicImage::ImageRgb8(_)
-                    | DynamicImage::ImageRgba8(_) => hasher.update(test_img.as_bytes()),
-                    DynamicImage::ImageLuma16(_)
-                    | DynamicImage::ImageLumaA16(_)
-                    | DynamicImage::ImageRgb16(_)
-                    | DynamicImage::ImageRgba16(_) => {
-                        for v in test_img.as_bytes().chunks(2) {
-                            hasher.update(&u16::from_ne_bytes(v.try_into().unwrap()).to_le_bytes());
-                        }
-                    }
-                    DynamicImage::ImageRgb32F(_) | DynamicImage::ImageRgba32F(_) => {
-                        for v in test_img.as_bytes().chunks(4) {
-                            hasher.update(&f32::from_ne_bytes(v.try_into().unwrap()).to_le_bytes());
-                        }
-                    }
-                    _ => panic!("Unsupported image format"),
-                }
-                hasher.finalize()
-            };


The way to use hashing is to normalize the pixel matrix byte data to big-endian like CRC did here previously in pre-processing. clone should be cheap enough for tests. (Aand should probably happen in a dedicated method outside edit_image_hash because any change we make to the layout of images may need us to further normalize).

Ah, sorry. When I said "feedback on the approach" I meant the approach to testing in general. Fixing CRC is trivial. Thanks though.

fintelia · 2026-03-04T06:21:47Z

I don't see why these changes require a complete rewrite of the reference images test. The volume of changes makes it much harder to review.

RunDevelopment · 2026-03-04T13:14:32Z

Firstly, some criticism. This isn't actionable feedback. What do you want me to do in response to this? Explain, make changes, cut back features? We have a communication delay of about 1/2 day due to time zones. Anytime I have to ask questions like this, I have to wait another day or do something hoping I guessed right. Please be clearer about what you want me to do.

Secondly, I don't see why you think that the changes I made could be implemented in the previous reference test without replacing almost everything. This PR flips the direction images are processed to ensure all images in tests/images are tested, adds a blessed mode to easily add/change test images, and now even adds a new snapshot format to allow reference images to be git-ignored. Please explain how you would implement that in the previous reference test with minimal changes.

And lastly, the amount of code changes in this PR is comparable to #2742. The only reason so many files were renamed is that I removed the CRC from their name, since it's unnecessary now. I understand that large diffs are annoying to review, but it's not like I'm making it large for no reason. If you have suggestions for how the diff could be shrunk, please make them.

I'm making this a draft until I figure out what approach to snapshot testing to take. See also here. Feedback welcome.

fintelia · 2026-03-06T07:11:30Z

So my specific feedback is to break this down into smaller PRs so that each one is a small diff against the current state. Large diffs aren't just more annoying to review, they take exponentially longer to review and have a much higher chance of stalling out without merging anything at all. They also ironically make communication delay worse: it is relatively fast to glance at a small change and give a line or two of feedback or immediately click approve, while it is much rarer to have a big chunk of time (and motivation!) to analyze a large PR or write a long comment. (To give you a sense, I saw your last comment only a few hours after you posted it but it has taken me until now to sit down and write a full reply...)

Taken together, all the changes you listed do necessarily result in a large delta. But each change individually seems much smaller and since the high level code structure isn't changing I'd expect that at least some pieces wouldn't need to change.

I'd expect a PR to remove the CRCs from filenames to be a handful of source line changes (mostly deletions) and then a ton of renamed images, with no added/removed files. I'd expect a PR to flip the processing order to only take a line or two for switching which directory was recursively walked, a few more for fixing up the filename logic, and then maybe a bunch of added reference images. The new snapshot idea probably takes more code, but it should nearly all additions rather than edits.

RunDevelopment · 2026-03-06T12:14:37Z

Sure, that's a plan. Flipping processing order is unlikely to be just a few lines changed, but we'll see.

RunDevelopment added 3 commits February 27, 2026 17:48

Make reference tests simpler to use

a4bf4c3

Fix partial feature set

798b90f

Fix compiler error

ccb031e

This was referenced Feb 28, 2026

Move non-image test assets into own folder #2805

Merged

Move bad image files into own folder #2806

Merged

RunDevelopment added 4 commits February 28, 2026 11:22

Move bad images test out

34c5946

Back to libtest-mimic

96da6db

Simplified test

84862a6

Add simple diffing for mismatching references

154cf0a

RunDevelopment added 4 commits February 28, 2026 17:52

Remove mismatch diff. It's not worth the complexity

589086c

Exclude QOI on BE arch from reference test

11dc328

Better test names

5bbcab8

Merge branch 'main' into new-ref-tests

70783c8

RunDevelopment force-pushed the new-ref-tests branch from e3d17e5 to 70783c8 Compare February 28, 2026 23:17

Integrate oxipng

82b3c9b

Update all new reference images to use oxipng

1246ddd

RunDevelopment added 2 commits March 3, 2026 01:24

Fixed minor mistakes

c7ca197

Add reference hashes and git-ignore large reference PNG

76d1a74

197g reviewed Mar 4, 2026

View reviewed changes

RunDevelopment marked this pull request as draft March 4, 2026 11:39

RunDevelopment closed this Mar 6, 2026

This was referenced Mar 6, 2026

Remove CRC from reference images #2828

Merged

Flip processing order of reference test #2829

Open

Conversation

RunDevelopment commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fintelia commented Feb 27, 2026

Uh oh!

RunDevelopment commented Feb 27, 2026

Uh oh!

197g commented Feb 27, 2026

Uh oh!

RunDevelopment commented Feb 28, 2026

Uh oh!

fintelia commented Feb 28, 2026

Uh oh!

RunDevelopment commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RunDevelopment commented Feb 28, 2026

Uh oh!

RunDevelopment commented Feb 28, 2026

Uh oh!

mstoeckl commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RunDevelopment commented Mar 1, 2026

Uh oh!

fintelia commented Mar 1, 2026

Uh oh!

RunDevelopment commented Mar 1, 2026

Uh oh!

fintelia commented Mar 2, 2026

Uh oh!

RunDevelopment commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RunDevelopment commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RunDevelopment commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

197g commented Mar 2, 2026

Uh oh!

RunDevelopment commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RunDevelopment commented Mar 3, 2026

Uh oh!

197g Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

RunDevelopment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

fintelia commented Mar 4, 2026

Uh oh!

RunDevelopment commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fintelia commented Mar 6, 2026

Uh oh!

RunDevelopment commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RunDevelopment commented Feb 27, 2026 •

edited

Loading

RunDevelopment commented Feb 28, 2026 •

edited

Loading

mstoeckl commented Mar 1, 2026 •

edited

Loading

RunDevelopment commented Mar 2, 2026 •

edited

Loading

RunDevelopment commented Mar 2, 2026 •

edited

Loading

RunDevelopment commented Mar 2, 2026 •

edited

Loading

RunDevelopment commented Mar 2, 2026 •

edited

Loading

RunDevelopment commented Mar 4, 2026 •

edited

Loading