Initial implementation of InflaterSource and DeflaterSink #1427

swankjesse · 2024-02-09T04:54:50Z

No description provided.

swankjesse · 2024-02-09T05:10:47Z

There’s a decent amount of complexity in the two new loops, writeBytesFromSource() and readBytesToTarget().

They’re both almost the same, except:

writeBytesFromSource() counts about how many bytes are consumed. It doesn’t care how many are produced!
readBytesToTarget() counts how many bytes are produced. It doesn’t care how many are consumed!

Critically, both inflator and deflator are tested on ratios from 0.001% through 700%. That makes me confident that we’re handling cases where a small input produces a big output, and vice versa.

I originally tried to do this as one big function and that sucked. Doing it as two functions is better, though the helper functions are themselves a bit awkward.

I’m tempted to follow-up by replacing the DataProcessor’s ByteArray property with a Segment. I’d need to rework some other stuff, but it might be a net reduction in code which I’d like. The only real gotcha there is the segment’s pos+limit are potentially different from next_out+avail_out in cases where we don’t want to fill up the entire segment.

Next steps - I don’t think this is particularly useful as-is and I promised a release for Feb 9. I’d like to cut that release beforehand and keep working on this into next week.

okio/src/nativeMain/kotlin/okio/DeflaterSink.kt

JakeWharton · 2024-02-12T14:34:46Z

okio/src/nativeMain/kotlin/okio/DataProcessor.kt

+      // If we've produced a full segment, emit it. This blocks writing to the target.
+      target.emitCompleteSegments()


Better to do this inside the loop or outside? I would have thought outside since this is called once per write on the source.

Good question! I think inside is correct specifically for inflate + deflate.

Delivering data one-segment-at-a-time is pretty balanced:

limit how much memory it holds. If you have a bunch of these on a web server, you don‘t have to worry about giant allocations when data decompresses tremendously!

reduce latency. We can deliver data to the user (or network, or whatever) as soon as it’s ready

still get some benefits from batch processing. We do expensive operations like syscalls once every 8 KiB to amortize the cost

We’d get maximum efficiency (fewer syscalls) if we delivered more segments at a time. But I think the real optimization there is to do a once-per-decade review of Segment.SIZE and see if 8 KiB makes as much sense in 2024 as it did in 2014.

(Last time I considered dynamic segment sizes I concluded it was a bad fit. Pooling gets weirder, for one.)

Yeah I guess I was more thinking as to whether it made sense for a single Sink.write call to result in 500 Sink.writes beneath this, or whether it should be a single call with a multi-segment buffer. I suppose in practice the compression level isn't going to be astronomical so it doesn't really matter.

Initial implementation of InflaterSource and DeflaterSink

46bd03b

This comment was marked as resolved.

Sign in to view

swankjesse added 2 commits February 10, 2024 09:40

Add some tests for exception handling

19209d7

More tests

cb04b8c

swankjesse requested a review from JakeWharton February 12, 2024 00:42

JakeWharton approved these changes Feb 12, 2024

View reviewed changes

Fix names of wrapped streams

bd21fdd

swankjesse merged commit 18dfed0 into master Feb 13, 2024
11 checks passed

swankjesse deleted the jwilson.0208.deflater_sink_inflater_source branch February 13, 2024 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of InflaterSource and DeflaterSink #1427

Initial implementation of InflaterSource and DeflaterSink #1427

swankjesse commented Feb 9, 2024

This comment was marked as resolved.

swankjesse commented Feb 9, 2024

JakeWharton Feb 12, 2024

swankjesse Feb 12, 2024

swankjesse Feb 12, 2024

JakeWharton Feb 12, 2024

		// If we've produced a full segment, emit it. This blocks writing to the target.
		target.emitCompleteSegments()

Initial implementation of InflaterSource and DeflaterSink #1427

Initial implementation of InflaterSource and DeflaterSink #1427

Conversation

swankjesse commented Feb 9, 2024

This comment was marked as resolved.

swankjesse commented Feb 9, 2024

JakeWharton Feb 12, 2024

Choose a reason for hiding this comment

swankjesse Feb 12, 2024

Choose a reason for hiding this comment

swankjesse Feb 12, 2024

Choose a reason for hiding this comment

JakeWharton Feb 12, 2024

Choose a reason for hiding this comment