Multithreading support #55

AlexAkulov · 2019-09-16T17:48:50Z

I noticed that this library uses only one cpu core. It may be problem on processing large data. Do you have ideas about how to add multi thread support?

pierrec · 2019-09-17T08:18:51Z

Probably need to add new types, one for writing (compress) and one for reading (decompress), that handle blocks in parallel. Dont have any time free atm to tackle this, but if you do, feel free to send PRs!

klauspost · 2019-10-07T14:16:01Z

Here are some of my experiences:

It could be part of the Reader/Writer. Doing the compression part is probably the one with most potential. Decompression should be really fast on a single thread and will likely just waste a lot of cpu cycles synchronizing.

For S2 compression I made a pretty simple setup which scales fine:

When a new writer is created or Reset is called a goroutine is spun up. This will be outputting the compressed data to the supplied writer.
Create an"output" to a chan chan result on the Writer.
Incoming data is collected until there is a full block or Flush/Close is called.

The reason for chan chan is that we want to be able to preserve order. When a block is queued, a chan result is added to the output queue. When the data is ready to be written it will be sent on the queued channel. The size of the chan chan result is also what is limiting concurrency.

When enough data is collected:

Add "output" to a chan chan result on the Writer.
Create output channel output = make(chan result) - result is just an alias for []byte here.
Spin up a goroutine that compresses the block. The Writer returns.

When data for the block is compressed, send output to output on the Writer so it is written

A mutex protected error state is kept in the Writer.

For cases where the user doesn't want concurrency the Writer falls back to using sync compression and doesn't use goroutines.

While getting concurrent compression up and running is pretty easy, the tricky part is error handling and proper flushing and releasing resources.

Some users may also have expectations that writes to output will only happen while the Writer itself is being called. While this can be done, it will of course be slightly inefficient compared to async writes.

pierrec · 2019-10-11T16:32:01Z

@klauspost thank you for your insights, it looks like an elegant way to deal with concurrent compression
I will probably experiment with your technique and see if this helps for lz4.

pierrec · 2019-10-27T11:40:56Z

First version of the concurrent writer added to branches master and v3.
No release yet as things may move a bit.
If you have time, please try it out and let me know.

klauspost · 2019-10-27T12:06:59Z

@pierrec 10gb.tar and rawstudio-mint14.tar crashes:

panic: runtime error: slice bounds out of range [:4194316] with capacity 4194304

goroutine 2450 [running]:
github.com/pierrec/lz4.compressBlock(0xc000d04000, 0x400000, 0x800000, 0xc015e68000, 0x400000, 0x400000, 0xc0067e3f08, 0x10000, 0x10000, 0x0)
        e:/gopath/src/github.com/pierrec/lz4/block.go:164 +0x847
github.com/pierrec/lz4.(*Writer).compressBlock.func2(0xc000d04000, 0x400000, 0x800000, 0xc015e68000, 0x400000, 0x400000, 0xc00666a420, 0xc015a68000, 0x400000, 0x800000, ...)
        e:/gopath/src/github.com/pierrec/lz4/writer.go:289 +0x2ce
created by github.com/pierrec/lz4.(*Writer).compressBlock
        e:/gopath/src/github.com/pierrec/lz4/writer.go:282 +0x49d

http://mattmahoney.net/dc/10gb.html and https://files.klauspost.com/compress/rawstudio-mint14.7z

Speed wise it is looking fine compared to S2:

Using test files here: https://github.com/klauspost/compress/tree/master/s2#performance

JSON: Worse compression, but slightly faster than fastest mode. 🆗
github-ranks-backup.bin (serialized binary data): Same speed as S2 and better compression. 👍
consensus.db (Binary DB): Slower than S2 and worse compression. 👎
gobstream: Slightly faster than S2, slightly worse compression 🆗
enwik9 (XML): Perfectly in between the 2 S2 modes both in terms of compression and speed. 🆗
2GB random data: Same speed/compression 🆗
silesia: Same speed as S2 "better", but worse compression. Small 👎

I've added numbers at the bottom here:

https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit?usp=sharing

klauspost · 2019-10-27T13:26:15Z

@pierrec Looks nice.

I noticed is that Flush() doesn't seem to keep its promise any more. AFAICT it only queues the current block. It doesn't wait for it to be written.

Reset should probably also clear z.err.

Another minor thing: This attempt at BCE does not do anything anything positive. In the following line dst[di-2], dst[di-1] the compiler inserts a check if the value is negative:

go build -gcflags="-d=ssa/check_bce/debug=1" confirms this:

.\block.go:169:10: Found IsInBounds
.\block.go:170:6: Found IsInBounds
.\block.go:170:17: Found IsInBounds

The best I could get it was simply to remove the line which brings it from 3 to 2 bounds checks.

pierrec · 2019-11-02T15:27:20Z

@klauspost thanks a lot for this analysis!
I have addressed the issues you mentioned. Running into strange issues when running the tests with concurrency enabled, still investigating.

klauspost · 2019-11-04T18:01:02Z

Looks very good! The crashes are fixed.

10gb.tar - Slightly slower than S2, slightly better compression 🆗
rawstudio-mint14.tar - Slightly slower than S2, worse compression. 👎

So it is definitely "competitive". The margins to S2 are pretty slim in most cases and I think the only difference really comes down to the encoding chosen for each format.

I included LZ4 as a very competitive format for Go compression in a talk I just gave.

pierrec · 2019-11-15T18:15:38Z

Thanks. It has not been released yet as it is racy, so bear with me while I fix them. Once this is done (hopefully this weekend) I will issue a release.

pierrec added a commit that referenced this issue Nov 15, 2019

Writer: added concurrency support. Fixes #55

8bdc6c8

pierrec closed this as completed in 9085dac Nov 15, 2019

AlexAkulov mentioned this issue Nov 18, 2019

Multithreading support mholt/archiver#159

Closed

pierrec mentioned this issue Dec 11, 2019

Why it is very slower than snappy ? #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading support #55

Multithreading support #55

AlexAkulov commented Sep 16, 2019

pierrec commented Sep 17, 2019

klauspost commented Oct 7, 2019 •

edited

Loading

pierrec commented Oct 11, 2019

pierrec commented Oct 27, 2019

klauspost commented Oct 27, 2019 •

edited

Loading

klauspost commented Oct 27, 2019 •

edited

Loading

pierrec commented Nov 2, 2019

klauspost commented Nov 4, 2019

pierrec commented Nov 15, 2019

Multithreading support #55

Multithreading support #55

Comments

AlexAkulov commented Sep 16, 2019

pierrec commented Sep 17, 2019

klauspost commented Oct 7, 2019 • edited Loading

pierrec commented Oct 11, 2019

pierrec commented Oct 27, 2019

klauspost commented Oct 27, 2019 • edited Loading

klauspost commented Oct 27, 2019 • edited Loading

pierrec commented Nov 2, 2019

klauspost commented Nov 4, 2019

pierrec commented Nov 15, 2019

klauspost commented Oct 7, 2019 •

edited

Loading

klauspost commented Oct 27, 2019 •

edited

Loading

klauspost commented Oct 27, 2019 •

edited

Loading