Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct allocation attribution in profiling #1777

Closed
wants to merge 5 commits into from

Conversation

yinan1048576
Copy link
Contributor

This change is not completed yet, but I think it has reached a stage suitable for seeking some early feedback.

The context is in #1751. To enable correct inference, we need to record both the number of sampled bytes and the number of unsampled bytes for each stack trace. The number of sampled bytes is equivalent to curobjs, so we only need to record the number of unsampled bytes, which I denote as surplus - the amount beyond what's needed for triggering sampling.

surplus is relayed in two routes:

  • thread_alloc_event() -> prof_malloc_sample_object(), via a new thread local field prof_sample_event_surplus, which is needed because the sampling logic is currently separate from the thread event logic;
  • prof_malloc_sample_object() -> prof_free_sampled_object(), via a new field e_prof_surplus on edata, which is needed because free() needs to roll back the allocation's surplus from the stack trace total.

My remaining work involves the following:

  • Add unit tests.
  • Add an option to actually use the additional information i.e. the surplus. (I figured that computing the surplus is fairly cheap so I don't really need to guard the computation under any option.)
    • Print the raw surplus value as well as some other convenient estimation(s) to the profiling dump.
    • Compute leak checking differently.

@yinan1048576
Copy link
Contributor Author

Rebase, and rewrite. The main change in the rewrite is that I'm not creating the new thread local field any more, thanks to #1779.

tsd_thread_allocated_last_event_get(tsd);
size_t sample_wait = tsd_prof_sample_event_wait_get(tsd);
if (accumbytes < sample_wait) {
/* Don't bother to set surplus - it will never be read. */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may trigger some warning. Let's see...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected. Adding an initializer.

@yinan1048576
Copy link
Contributor Author

Fix compiler warnings, as well as one other place revealed by the FreeBSD unit tests: te_prof_sample_event_lookahead_surplus() would better be short-circuited in edge cases e.g. non-nominal TSD / reentrancy.

@yinan1048576
Copy link
Contributor Author

Rebase on top of #1787 to get rid of the warning.

@@ -101,11 +101,11 @@ arena_prof_tctx_reset_sampled(tsd_t *tsd, const void *ptr) {
}

JEMALLOC_ALWAYS_INLINE void
arena_prof_info_set(edata_t *edata, prof_tctx_t *tctx) {
arena_prof_info_set(edata_t *edata, prof_tctx_t *tctx, uint64_t surplus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surplus should be a size_t, no?

* purpose, because a valid surplus value is strictly less than
* usize.
*/
*surplus = usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I might suggest: I'm not sure I would spot that *surplus == usize is invalid while debugging. Some other garbage constant (e.g. 0x99999999 or 0xdeadbeef or something) OTOH, I would.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Use SIZE_MAX instead. (Strictly speaking 0x99999999 and 0xdeadbeef can be valid surplus values.)

@yinan1048576
Copy link
Contributor Author

Revise according to feedback.

@yinan1048576
Copy link
Contributor Author

Rebase, and added another field accumsurplus. It's analogous to cursurplus, but records the accumulative surplus, and is computed whenever opt_prof_accum is on.

My next step is to define an opt-in run-time option and, when it is turned on, to ingest the surplus values in the dump content.

@yinan1048576 yinan1048576 changed the title Direct allocation attribution in profiling [In progress] Direct allocation attribution in profiling Apr 6, 2020
@yinan1048576
Copy link
Contributor Author

Rebase.

@yinan1048576
Copy link
Contributor Author

Rebase, and added:

  • Unit tests.
  • An option to print total surplus rather than total bytes in prof dumps. Users don't need the total bytes and can derive whatever inference they need from
    • the number of sampled objects,
    • total surplus, and
    • lg_prof_sample.

@yinan1048576 yinan1048576 changed the title [In progress] Direct allocation attribution in profiling Direct allocation attribution in profiling Jun 29, 2020
@yinan1048576
Copy link
Contributor Author

Closing. #1897 serves to correct the bias. Will reopen if there's any need for inference, in addition to a single point estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants