Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stateless dictionary support #216

Merged
merged 5 commits into from
Feb 4, 2020
Merged

Conversation

klauspost
Copy link
Owner

Enable up to 8K dictionary for stateless compression.

Enable up to 8K dictionary for stateless compression.
flate/stateless.go Outdated Show resolved Hide resolved
@nhooyr
Copy link

nhooyr commented Feb 1, 2020

I can't use this in WebSockets due to the 8 KB limit unfortunately since I need to support at least the default 32 KB rolling window. I suppose I could fallback. 🤔

@klauspost
Copy link
Owner Author

Why? 8K should work just as well as 32K.

@nhooyr
Copy link

nhooyr commented Feb 1, 2020

Why? 8K should work just as well as 32K.

Not all clients support configurable window sizes. E.g. every other Go WebSocket library does not since the standard library does not allow configuring it.

@nhooyr
Copy link

nhooyr commented Feb 1, 2020

It's possible I'm confused but as far as I understand, the window size must be the same on both client/server, so the truncation that would be performed here wouldn't work.

@klauspost
Copy link
Owner Author

klauspost commented Feb 1, 2020

I think you may be misunderstanding what the dictionary is for. It is purely there for efficiency, and the compressor can have any history size. It is purely for better compression.

You can use the current stateless implementation, and it will work fine with any client. It will just compress slightly worse.

@nhooyr
Copy link

nhooyr commented Feb 1, 2020

https://godoc.org/compress/flate#NewWriterDict

The compressed data written to w can only be decompressed by a Reader initialized with the same dictionary.

Why do the docs state that if the dictionary is purely for efficiency?

@klauspost
Copy link
Owner Author

klauspost commented Feb 1, 2020

That is if it is a pre-shared dictionary, meaning something that is initialized before the stream starts.

The decoder must have the dictionary, since the encoder may reference it. But the encoder does not have to reference the dictionary at all or only parts of it, which is why an 8K is fine even if the decoder buffers up to 32K.

For blocks, the entire 32K will be referenced because the dictionary is a sliding window from the current decompression point.

You are just limiting how far back the encoder can reference data. The decoder must be able to handle references 32K back, but from the point of the compressor it is just a matter of efficiency how far back to reference.

@klauspost
Copy link
Owner Author

In terms of the documentation, what it means is that the decoder must have the dictionary to decompress the data correctly.

@nhooyr
Copy link

nhooyr commented Feb 1, 2020

Ahh makes perfect sense. Will test in my lib this weekend.

@klauspost klauspost merged commit b7ccab8 into master Feb 4, 2020
@klauspost klauspost deleted the stateless-dictionary-support branch February 4, 2020 17:44
@klauspost
Copy link
Owner Author

@nhooyr Have you had a chance to test it?

@nhooyr
Copy link

nhooyr commented Feb 13, 2020

Thanks for reminding me, just finally finished the rewrite with compression.

Here's the commit in which I add your library.

coder/websocket@aff9d13

All my tests pass including the entire autobahn test suite with race detection enabled so it definitely works well. I cannot comment on performance yet as I haven't written benchmarks just yet. As soon as I do, I'll let you know.

@nhooyr
Copy link

nhooyr commented Feb 13, 2020

The one thing I was curious about is why it doesn't return the length of bytes written. I need this information to accurately implement io.Writer if an error occurs as then it's possible not all of p was written.

@nhooyr
Copy link

nhooyr commented Feb 13, 2020

Updated commit: coder/websocket@7dbe93c

@klauspost
Copy link
Owner Author

@nhooyr Cool. I will take a peek :)

why it doesn't return the length of bytes written.

It doesn't really make much sense to keep track of that, since there isn't a 1:1 mapping between input and output. x bytes in will map to y bytes output and a partial write will result in garbage data.

Most writers consider a failed write as unrecoverable, including the stdlib which persists the error.

If you want to keep track of it you can buffer the output, so writes cannot fail. But in general I find that trying to recover a broken write is very unreliable at best, but I'll leave that to you.

@nhooyr
Copy link

nhooyr commented Feb 13, 2020

Fair, noticed the stdlib writer just returns n = 0 if any error occurs.

@nhooyr
Copy link

nhooyr commented Feb 13, 2020

Why don't we just encapsulate that into StatelessDeflate and have it return n so anyone using it doesn't have to think about it and to match stdlib flate's behaviour?

@klauspost
Copy link
Owner Author

I'm not sure what problem you are trying to solve, but there is also NewStatelessWriter if you want regular Write calls.

@nhooyr
Copy link

nhooyr commented Feb 14, 2020

See #222

@nhooyr
Copy link

nhooyr commented Feb 15, 2020

My benchmarks indicate your library is 2x faster than the stdlib which is sweet. For non stateless compression, it seems to allocate 2x though. See #107 (comment)

In regards to the stateless compression, I'm seeing 19301 B being allocated per op in my benchmark for echoing a 512 random byte message with a 8 KB dictionary. With no dictionary, it's like 100 B per op. I'm not sure how much memory it's using internally between calls, there doesn't seem to be an easy way to measure as it's using a pool to keep the memory usage down.

@nhooyr
Copy link

nhooyr commented Feb 15, 2020

Disabled the bitWriterPool pool and I'm seeing 36931 B/op which seems really good, too good to be true. Is there any other pool I need to disable?

@klauspost
Copy link
Owner Author

@nhooyr The actual allocations depend on whether the content is expected to be compressed.

You are probably hitting this:

if dst.n == 0 {
or maybe the check right below.

This is early bailout if content cannot be compressed in which case no huffman tables are generated and content is just written uncompressed.

@nhooyr
Copy link

nhooyr commented Feb 16, 2020

So I changed my benchmark to use []byte(strings.Repeat("1234", 128)) and I'm seeing the same behaviour. Add println's to confirm that neither check is being hit.

@nhooyr
Copy link

nhooyr commented Feb 16, 2020

Although no dictionary now allocates 15 KB/op.

@nhooyr
Copy link

nhooyr commented Feb 16, 2020

Nvm ignore last comment, was due to the bit writer pool being disabled.

$ go test -bench=BenchmarkConn -run='^$' -memprofile /tmp/mem.prof -memprofilerate 1
goos: darwin
goarch: amd64
pkg: nhooyr.io/websocket
BenchmarkConn/disabledCompress-8         	   51957	     22579 ns/op	  22.68 MB/s	      32 B/op	       2 allocs/op
BenchmarkConn/compress-8                 	   11718	    102324 ns/op	   5.00 MB/s	   19378 B/op	       4 allocs/op
BenchmarkConn/compressNoContext-8        	   16272	     75338 ns/op	   6.80 MB/s	     106 B/op	       4 allocs/op
PASS
ok  	nhooyr.io/websocket	5.835s

On https://github.com/nhooyr/websocket/tree/compress and with the bit writer pool enabled in klauspost/compress.

@nhooyr
Copy link

nhooyr commented Feb 16, 2020

It's also definitely compressing, I'm seeing 16 total bytes written (including the websocket header) for the 512 byte message.

@klauspost
Copy link
Owner Author

@nhooyr Did you ever doubt that :D

@nhooyr
Copy link

nhooyr commented Feb 16, 2020

Honestly did, the memory being allocated is impressively low.

@klauspost
Copy link
Owner Author

It can use the stack since it is stateless, and other tricks since it is only really useful for smaller blocks.

It is (of course) quite inefficient, but something is better than nothing and speed is decent. Most important is of course that it doesn't keep memory between calls.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants