Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading support #55

Closed
AlexAkulov opened this issue Sep 16, 2019 · 9 comments
Closed

Multithreading support #55

AlexAkulov opened this issue Sep 16, 2019 · 9 comments

Comments

@AlexAkulov
Copy link

I noticed that this library uses only one cpu core. It may be problem on processing large data. Do you have ideas about how to add multi thread support?

@pierrec
Copy link
Owner

pierrec commented Sep 17, 2019

Probably need to add new types, one for writing (compress) and one for reading (decompress), that handle blocks in parallel. Dont have any time free atm to tackle this, but if you do, feel free to send PRs!

@klauspost
Copy link
Contributor

klauspost commented Oct 7, 2019

Here are some of my experiences:

It could be part of the Reader/Writer. Doing the compression part is probably the one with most potential. Decompression should be really fast on a single thread and will likely just waste a lot of cpu cycles synchronizing.

For S2 compression I made a pretty simple setup which scales fine:

  • When a new writer is created or Reset is called a goroutine is spun up. This will be outputting the compressed data to the supplied writer.
  • Create an"output" to a chan chan result on the Writer.
  • Incoming data is collected until there is a full block or Flush/Close is called.

The reason for chan chan is that we want to be able to preserve order. When a block is queued, a chan result is added to the output queue. When the data is ready to be written it will be sent on the queued channel. The size of the chan chan result is also what is limiting concurrency.

When enough data is collected:

  • Add "output" to a chan chan result on the Writer.
  • Create output channel output = make(chan result) - result is just an alias for []byte here.
  • Spin up a goroutine that compresses the block. The Writer returns.

When data for the block is compressed, send output to output on the Writer so it is written

A mutex protected error state is kept in the Writer.

For cases where the user doesn't want concurrency the Writer falls back to using sync compression and doesn't use goroutines.

While getting concurrent compression up and running is pretty easy, the tricky part is error handling and proper flushing and releasing resources.

Some users may also have expectations that writes to output will only happen while the Writer itself is being called. While this can be done, it will of course be slightly inefficient compared to async writes.

@pierrec
Copy link
Owner

pierrec commented Oct 11, 2019

@klauspost thank you for your insights, it looks like an elegant way to deal with concurrent compression
I will probably experiment with your technique and see if this helps for lz4.

@pierrec
Copy link
Owner

pierrec commented Oct 27, 2019

First version of the concurrent writer added to branches master and v3.
No release yet as things may move a bit.
If you have time, please try it out and let me know.

@klauspost
Copy link
Contributor

klauspost commented Oct 27, 2019

@pierrec 10gb.tar and rawstudio-mint14.tar crashes:

panic: runtime error: slice bounds out of range [:4194316] with capacity 4194304

goroutine 2450 [running]:
github.com/pierrec/lz4.compressBlock(0xc000d04000, 0x400000, 0x800000, 0xc015e68000, 0x400000, 0x400000, 0xc0067e3f08, 0x10000, 0x10000, 0x0)
        e:/gopath/src/github.com/pierrec/lz4/block.go:164 +0x847
github.com/pierrec/lz4.(*Writer).compressBlock.func2(0xc000d04000, 0x400000, 0x800000, 0xc015e68000, 0x400000, 0x400000, 0xc00666a420, 0xc015a68000, 0x400000, 0x800000, ...)
        e:/gopath/src/github.com/pierrec/lz4/writer.go:289 +0x2ce
created by github.com/pierrec/lz4.(*Writer).compressBlock
        e:/gopath/src/github.com/pierrec/lz4/writer.go:282 +0x49d

http://mattmahoney.net/dc/10gb.html and https://files.klauspost.com/compress/rawstudio-mint14.7z

Speed wise it is looking fine compared to S2:

Using test files here: https://github.com/klauspost/compress/tree/master/s2#performance

JSON: Worse compression, but slightly faster than fastest mode. 🆗
github-ranks-backup.bin (serialized binary data): Same speed as S2 and better compression. 👍
consensus.db (Binary DB): Slower than S2 and worse compression. 👎
gobstream: Slightly faster than S2, slightly worse compression 🆗
enwik9 (XML): Perfectly in between the 2 S2 modes both in terms of compression and speed. 🆗
2GB random data: Same speed/compression 🆗
silesia: Same speed as S2 "better", but worse compression. Small 👎

I've added numbers at the bottom here:

https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit?usp=sharing

@klauspost
Copy link
Contributor

klauspost commented Oct 27, 2019

@pierrec Looks nice.

I noticed is that Flush() doesn't seem to keep its promise any more. AFAICT it only queues the current block. It doesn't wait for it to be written.

Reset should probably also clear z.err.

Another minor thing: This attempt at BCE does not do anything anything positive. In the following line dst[di-2], dst[di-1] the compiler inserts a check if the value is negative:

go build -gcflags="-d=ssa/check_bce/debug=1" confirms this:

.\block.go:169:10: Found IsInBounds
.\block.go:170:6: Found IsInBounds
.\block.go:170:17: Found IsInBounds

The best I could get it was simply to remove the line which brings it from 3 to 2 bounds checks.

@pierrec
Copy link
Owner

pierrec commented Nov 2, 2019

@klauspost thanks a lot for this analysis!
I have addressed the issues you mentioned. Running into strange issues when running the tests with concurrency enabled, still investigating.

@klauspost
Copy link
Contributor

Looks very good! The crashes are fixed.

10gb.tar - Slightly slower than S2, slightly better compression 🆗
rawstudio-mint14.tar - Slightly slower than S2, worse compression. 👎

So it is definitely "competitive". The margins to S2 are pretty slim in most cases and I think the only difference really comes down to the encoding chosen for each format.

I included LZ4 as a very competitive format for Go compression in a talk I just gave.

@pierrec
Copy link
Owner

pierrec commented Nov 15, 2019

Thanks. It has not been released yet as it is racy, so bear with me while I fix them. Once this is done (hopefully this weekend) I will issue a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants