-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of InflaterSource and DeflaterSink #1427
Initial implementation of InflaterSource and DeflaterSink #1427
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There’s a decent amount of complexity in the two new loops, They’re both almost the same, except:
Critically, both inflator and deflator are tested on ratios from 0.001% through 700%. That makes me confident that we’re handling cases where a small input produces a big output, and vice versa. I originally tried to do this as one big function and that sucked. Doing it as two functions is better, though the helper functions are themselves a bit awkward. I’m tempted to follow-up by replacing the DataProcessor’s Next steps - I don’t think this is particularly useful as-is and I promised a release for Feb 9. I’d like to cut that release beforehand and keep working on this into next week. |
// If we've produced a full segment, emit it. This blocks writing to the target. | ||
target.emitCompleteSegments() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to do this inside the loop or outside? I would have thought outside since this is called once per write
on the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I think inside is correct specifically for inflate + deflate.
Delivering data one-segment-at-a-time is pretty balanced:
- limit how much memory it holds. If you have a bunch of these on a web server, you don‘t have to worry about giant allocations when data decompresses tremendously!
- reduce latency. We can deliver data to the user (or network, or whatever) as soon as it’s ready
- still get some benefits from batch processing. We do expensive operations like syscalls once every 8 KiB to amortize the cost
We’d get maximum efficiency (fewer syscalls) if we delivered more segments at a time. But I think the real optimization there is to do a once-per-decade review of Segment.SIZE and see if 8 KiB makes as much sense in 2024 as it did in 2014.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Last time I considered dynamic segment sizes I concluded it was a bad fit. Pooling gets weirder, for one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess I was more thinking as to whether it made sense for a single Sink.write
call to result in 500 Sink.write
s beneath this, or whether it should be a single call with a multi-segment buffer. I suppose in practice the compression level isn't going to be astronomical so it doesn't really matter.
No description provided.