New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement decay-based unused dirty page purging. #325
Comments
|
A year ago I did some design work for this feature as part of my Applicative conference talk, and this week I wrote some simple simulation code to see how the decay algorithms behaved. Specifically I simulated various malloc/free/noop traces and tracked the moving maximum I went back to the drawing board to find a way to implement true sigmoidal decay curves. I have an algorithm for this now, and simulations indicate it behaves as desired. The basic idea is to keep a trailing record of how many unused dirty pages were created at each of the past N time slices, and apply a sigmoidal decay curve independently to each time slice. For example, if we want to assure that newly generated unused dirty pages persist no longer than 5 seconds, we record how many are generated during the current time slice, and then during each subsequent time slice for the next 5 seconds, we multiply the sigmoidal decay curve factor (which decays from 1.0 down to 0.0) with the original number of unused dirty pages. If the current number of unused dirty pages exceeds the limit set by the decay curve, we purge enough pages to drop below the limit. Each time we make a purging decision, we must compute the sum of per time slice limits on unused dirty pages. The number of required computations is linear in the number of time slices required to cover the decay time, e.g. 100 for a 5 second decay and 50 ms time slices. Through a combination of precomputed tables, fixed point integer math, and the magic of strength reduction, we can optimize this into a series of efficient integer operations that should have inconsequential performance impact in the context of how infrequently we make purging decisions (per arena, maximum once per time slice). Regarding the sigmoidal decay curve, we want a curve that actually hits 1.0 and 0.0, rather than merely asymptotically approaching them. The smoothstep family of curves fits this need precisely. I simulated with all three (smoothstep, smootherstep, and smootheststep), and although any of the three would suffice, I'm conceptually leaning toward smootherstep because it makes sense to me that we would want the first and second derivatives (akin to velocity and acceleration) to start at zero. In practice it might not matter because of quantization effects: we will purge the LRU unused dirty page run as a whole, and we only purge once the entirety of that run is above the decay curve (i.e. we level-trigger purging at post-purge dirty page count rather than pre-purge dirty page count). |
Summary: Enabling this functionality based on the results from a variety of performance tests. Please see jemalloc/jemalloc#325 to learn more about the functionality this change enables. Supported ways to enable this functionality are either using `malloc_conf` or `MALLOC_CONF` environment variable. Global variable takes precedence over the environment variable. To remove enabling this option, just revert these changes or delete the variable declaration. Test Plan: - LinkBench - SqlBench - SysBench Reviewers: jtolmer Reviewed By: jtolmer Subscribers: webscalesql-eng Differential Revision: https://reviews.facebook.net/D55119
Use jemalloc's decay-based purging instead of the default ratio-based purging.
Decay-based purging was a major optimization added with jemalloc 4.1.0, but it
won't be the default until at least 5.0. I've seen some pretty nice performance
speedups from using decay-based purging, so make it our default.
For binary trees (a benchmark that really measures allocator speed) this
results in a ~10% speedup on the shootout-like machine.
From the 4.1.0 release notes:
Implement decay-based unused dirty page purging, a major optimization with
mallctl API impact. This is an alternative to the existing ratio-based
unused dirty page purging, and is intended to eventually become the sole
purging mechanism.
See jemalloc/jemalloc#325 for more info on
decay-based purging
Enable jemalloc's decay-based purging
Use jemalloc's decay-based purging instead of the default ratio-based purging.
Decay-based purging was a major optimization added with jemalloc 4.1.0, but it
won't be the default until at least 5.0. I've seen some pretty nice performance
speedups from using decay-based purging, so make it our default.
For binary trees (a benchmark that really measures allocator speed) this
results in a ~10% speedup on the shootout-like machine.
From the 4.1.0 release notes:
Implement decay-based unused dirty page purging, a major optimization with
mallctl API impact. This is an alternative to the existing ratio-based
unused dirty page purging, and is intended to eventually become the sole
purging mechanism.
See jemalloc/jemalloc#325 for more info on
decay-based purging
Summary: Enabling this functionality based on the results from a variety of performance tests. Please see jemalloc/jemalloc#325 to learn more about the functionality this change enables. Supported ways to enable this functionality are either using `malloc_conf` or `MALLOC_CONF` environment variable. Global variable takes precedence over the environment variable. To remove enabling this option, just revert these changes or delete the variable declaration. Test Plan: - LinkBench - SqlBench - SysBench Reviewers: jtolmer Reviewed By: jtolmer Subscribers: webscalesql-eng Differential Revision: https://reviews.facebook.net/D55119
Summary: Enabling this functionality based on the results from a variety of performance tests. Please see jemalloc/jemalloc#325 to learn more about the functionality this change enables. Supported ways to enable this functionality are either using `malloc_conf` or `MALLOC_CONF` environment variable. Global variable takes precedence over the environment variable. To remove enabling this option, just revert these changes or delete the variable declaration. Test Plan: - LinkBench - SqlBench - SysBench Reviewers: jtolmer Reviewed By: jtolmer Subscribers: webscalesql-eng Differential Revision: https://reviews.facebook.net/D55119
The current ratio-based mechanism for unused dirty page purging is effective at controlling accumulation of unused dirty pages, except that its conservatism can cause extreme performance degradation if the application's memory usage constantly fluctuates more than e.g. 1/9 with the default
lg_dirty_mult:3setting. In fact, we would prefer a much more stringent ratio, perhapslg_dirty_mult:6, but an unacceptable fraction of applications' performance would suffer with such a setting.Part of the problem is that our internal clock can only measure passage of time in terms of allocation event ticks. Introduce a wall clock and track how long unused dirty pages have existed, so that unused dirty page count decays toward zero over time.
The text was updated successfully, but these errors were encountered: