Faster u32 parsing #428

anforowicz · 2023-11-15T22:51:46Z

This series of 5 commits refactors how the code parses u32 values. The cumulative impact on the runtime of the decode/generated-png:noncompressed-8x8.png benchmark looks as follows:

[-7.1475% -6.9281% -6.6885%] (p = 0.00 < 0.05)
[-7.7189% -7.5262% -7.3241%] (p = 0.00 < 0.05)
[-7.4631% -7.1972% -6.9319%] (p = 0.00 < 0.05)

I expect that this PR will have minimal impact on bigger images, so I didn't even try running the other benchmarks (I assume that any changes observed in the other benchmarks would be just statistical noise).

I think that this PR turned out quite nicely from code simplification and readability perspective. Even if there was no performance impact, I think it might be desirable to land it from this perspective.

Various notes:

I wondered whether it might be desirable to add some additional test coverage for partially read u32 values.
- FWIW this path seems to be covered by decoder::tests::no_data_dup_on_finish - only one test... :-(
- Once BufRead changes land, maybe we should tweak the render_images test to work with BufReader::with_capacity(1, <original input>) which should test dripping the input byte-by-byte.
- I did try to run all the tests after temporarily tweaking BufReader::with_capacity calls in src/decoder/mod.rs so that a buffer of only 1 byte was used.
I wondered if U32ValueKind::Type after an IDAT chunk really needs to go through 2 state transitions (with a detour via Decoded::ImageDataFlushed during which self.state is set to the old/unchanged value). This is probably in the unnecessary-microoptimizations territory, but I was hoping to simplify the code a bit more. Still... I don't see how to do this cleanly.

This way removes a separate `State::Signature` variant and handles the PNG signature via two `State::U32` states. This is desirable because: * This helps to reuse the performance gains for `State::U32` (i.e. avoid accumulating the signature byte-by-byte if the input buffer already has all its bytes) * Avoiding separate code also has additional benefits: - Less pressure on the instructions cache - Ability to reuse branch prediction for `U32`-related code The modified code already has reasonable test coverage via http://www.schaik.com/pngsuite/#corrupted

anforowicz · 2023-11-27T16:27:44Z

@fintelia, can you PTAL?

fintelia · 2023-11-28T01:07:58Z

👍 This change looks good to me, but I haven't yet had a chance to do a detailed read through. Hopefully should get to that soon

anforowicz mentioned this pull request Nov 16, 2023

Cap buffer sizes via ZlibStream::set_max_total_output. #429

Merged

anforowicz added 2 commits November 27, 2023 15:33

Introduce State::new_u32 helper function

ea793b4

Consolidate U32... enum variants into a single one.

2074bf0

anforowicz mentioned this pull request Nov 27, 2023

Option to ignore ancillary chunks with invalid CRC #430

Merged

anforowicz force-pushed the faster-u32-parsing branch from b9eb5fa to 92a7e4f Compare November 27, 2023 15:47

anforowicz added 3 commits November 27, 2023 15:51

Extract a separate parse_u32 method

7552a63

Parse more than 1 byte of U32 if more bytes are available in the input

b294881

anforowicz force-pushed the faster-u32-parsing branch from 92a7e4f to d51188f Compare November 27, 2023 15:54

anforowicz mentioned this pull request Nov 28, 2023

Performance: Require BufRead instead of just Read for inputs. #427

Draft

fintelia merged commit dbe2cc6 into image-rs:master Dec 16, 2023
19 checks passed

anforowicz deleted the faster-u32-parsing branch December 17, 2023 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster u32 parsing #428

Faster u32 parsing #428

anforowicz commented Nov 15, 2023

anforowicz commented Nov 27, 2023

fintelia commented Nov 28, 2023

Faster u32 parsing #428

Faster u32 parsing #428

Conversation

anforowicz commented Nov 15, 2023

anforowicz commented Nov 27, 2023

fintelia commented Nov 28, 2023