New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions for faster preserving/restoring of streams #188
Conversation
Thanks for this PR Rob. Note that a few automated tests fail with this PR (see results above). Now regarding the PR itself. Now let's assume this 16 KB Even though your proposed code is fairly simple, it's still more complex for the CPU than a straight If above hypothesis is right, that means your method will be faster with small input and small dictionaries, while the So I believe the next stage is to make some tests. |
oh btw, last detail @rkd-msw, |
Yann, thanks! Taking your points one-by-one:
I can easily believe that when compressing more data with larger dictionaries, the benefit would be much smaller. I have some tests I used during development to prove the benefits in this scenario - let me dig those out, clean them up and attach them. |
There is no specific style guide. I just follow compiler's errors notifications. For For Visual, the issue it that it doesn't like variable declaration in the middle of a block. Another point is that if your patch improves speed for a combination of small dictionary + small blocks, it is important to document it (typically using inline comments) and provide some good hints of what size it should concern (and what gain is expected). The risk here is a confusion for 3rd-party users : they may assume that they must use these new functions, while being in fact in a fairly different case (large dictionary or larger blocks), for which the patch would hurt performance. |
We'll proceed differently for this use case. But it's a significant change, so it's unclear when we will have time to do it. |
We have a use-case that requires us to create lots of new LZ4 streams, and we found that loading the dictionary for each of these was taking a non-trivial amount of CPU (about 3% of CPU for the whole application).
We investigated whether we could just load a dictionary once and preserve the resulting LZ4_stream_t structure. Just doing a memcpy was actually slower (because of the overhead of copying the 16KiB hash table), but we've implemented a scheme where we create a separate structure recording which bits of the hash table we should copy, and can then just copy those entries when cloning the structure.
This is about 33% faster than using LZ4_loadDict each time.
We're happy for this to be under the 2-clause BSD license (from https://github.com/Cyan4973/lz4/blob/d008c87151abf8c36a9f98d28461bf6f3dfdc6ae/lib/LICENSE).