Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memset() rather than reduceIndex() #1658

Merged
merged 3 commits into from
Jul 1, 2019
Merged

memset() rather than reduceIndex() #1658

merged 3 commits into from
Jul 1, 2019

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Jun 21, 2019

I recently received a user performance report stating that reduceIndex() can require a non-negligible delay to perform its job, especially when applied to a large context.

reduceIndex() is required in 2 scenarios :

  • when compressing a large data stream, typically > 4 GB
  • when reusing a context for a lot of smaller data streams, and using the continue mode as a consequence of constantly keeping the same parameters.

For the first case, there's nothing new.

For the second case, the issue is that reduceIndex() is a measurable load compared to a single compression job. User reported delay in the ~80ms range. I could find similar delay on my laptop with llvm in the 20-30ms range when using large tables (~30 MB). User results are exacerbated by the use of Visual Studio, which is worse at auto-vectorization (reduceIndex() performance is tied to auto-vectorization).

The thing is, for such case, we do not actually need a reduceIndex() operation. A simpler memset() would work just as well. And of course, a big difference is that memset() is actually fast (I could measure up to 10x faster), and performance differences between compilers is unlikely to be "large".

This patch tries to favor memset() over reduceIndex() by testing currentIndex when starting a new compression (detected by triggering ZSTD_resetCCtx_internal()). It will disallow continue mode when currentIndex is considered "too close" to the limit.
The limit was already defined in zstd_compress_internal.h as ZSTD_CURRENT_MAX, and "too close" has been arbitrarily defined as < ZSTD_INDEXOVERFLOW_MARGIN (== 16 MB).

like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
…imit

by disabling continue mode when index is close to limit.
* memset() will be triggered before reduceIndex().
*/
#define ZSTD_INDEXOVERFLOW_MARGIN (16 MB)
static int ZSTD_index_valid_for_continue(ZSTD_window_t w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ZSTD_indexValidForContinue() is more inline with naming.

Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.

Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Jun 24, 2019

This last update implements an optimization :
rather than going through the full update process,
it shortcuts to ZSTD_reset_matchState() :
since all parameters are the same, there is no need to re-arrange table sizes and positions.

Seems to work fine so far, but I'm not against a second look,
since I have not redacted ZSTD_reset_matchState(),
I could miss some implied condition with the rest of the code.

In particular, I would like to be completely sure that it works well in combination with LDM.
So far, it seems fine, all tests pass, and I couldn't identify a failure scenario.
The new code skips this LDM memset(), but proceeds to ZSTD_continueCCtx() as usual, which does window_clear the LDM tables. So I suspect it works, but LDM tables will get a rescale(), instead of a memset().

@terrelln likely has a better understanding of this part.

@terrelln
Copy link
Contributor

terrelln commented Jun 24, 2019

It should be fine. LDM manages its own window and indices, so this change shouldn't impact it.

I'll take a closer look in a bit, just to be sure.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this fell off my radar, LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants