Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of InflaterSource and DeflaterSink #1427

Merged
merged 4 commits into from
Feb 13, 2024

Conversation

swankjesse
Copy link
Member

No description provided.

@swankjesse

This comment was marked as resolved.

@swankjesse
Copy link
Member Author

There’s a decent amount of complexity in the two new loops, writeBytesFromSource() and readBytesToTarget().

They’re both almost the same, except:

  • writeBytesFromSource() counts about how many bytes are consumed. It doesn’t care how many are produced!
  • readBytesToTarget() counts how many bytes are produced. It doesn’t care how many are consumed!

Critically, both inflator and deflator are tested on ratios from 0.001% through 700%. That makes me confident that we’re handling cases where a small input produces a big output, and vice versa.

I originally tried to do this as one big function and that sucked. Doing it as two functions is better, though the helper functions are themselves a bit awkward.

I’m tempted to follow-up by replacing the DataProcessor’s ByteArray property with a Segment. I’d need to rework some other stuff, but it might be a net reduction in code which I’d like. The only real gotcha there is the segment’s pos+limit are potentially different from next_out+avail_out in cases where we don’t want to fill up the entire segment.

Next steps - I don’t think this is particularly useful as-is and I promised a release for Feb 9. I’d like to cut that release beforehand and keep working on this into next week.

Comment on lines +97 to +98
// If we've produced a full segment, emit it. This blocks writing to the target.
target.emitCompleteSegments()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to do this inside the loop or outside? I would have thought outside since this is called once per write on the source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I think inside is correct specifically for inflate + deflate.

Delivering data one-segment-at-a-time is pretty balanced:

  • limit how much memory it holds. If you have a bunch of these on a web server, you don‘t have to worry about giant allocations when data decompresses tremendously!
  • reduce latency. We can deliver data to the user (or network, or whatever) as soon as it’s ready
  • still get some benefits from batch processing. We do expensive operations like syscalls once every 8 KiB to amortize the cost

We’d get maximum efficiency (fewer syscalls) if we delivered more segments at a time. But I think the real optimization there is to do a once-per-decade review of Segment.SIZE and see if 8 KiB makes as much sense in 2024 as it did in 2014.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Last time I considered dynamic segment sizes I concluded it was a bad fit. Pooling gets weirder, for one.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess I was more thinking as to whether it made sense for a single Sink.write call to result in 500 Sink.writes beneath this, or whether it should be a single call with a multi-segment buffer. I suppose in practice the compression level isn't going to be astronomical so it doesn't really matter.

@swankjesse swankjesse merged commit 18dfed0 into master Feb 13, 2024
11 checks passed
@swankjesse swankjesse deleted the jwilson.0208.deflater_sink_inflater_source branch February 13, 2024 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants