Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Allow "precompiled" perf-trampolines to largely mitigate the cost of enabling perf-trampolines #109587

Closed
czardoz opened this issue Sep 19, 2023 · 1 comment
Labels
type-feature A feature request or enhancement

Comments

@czardoz
Copy link
Contributor

czardoz commented Sep 19, 2023

Feature or enhancement

Proposal:

The perf trampoline feature introduced in #96143 is incredibly useful to gain visibility into the cost of individual Python functions.

Currently, there are a few aspects of the trampoline that make it costly to enable on live server-side applications:

  1. Each call to a new function results in a disk IO operation (to write to the /tmp/perf-<pid>.map file)
  2. For a forked multiprocess model:
    a. We disable and re-enable the trampoline on fork, which means the same work must be done in child processes.
    b. We use more memory, because we do not take advantage of copy-on-write.

On a fairly large Python server (Instagram), we have observed a 1.5% overall CPU regression, mainly stemming from the repeated file writes.

In order to address these, and make it possible to have perf-trampoline running in an always-enabled fashion, we could allow extension modules to initialize trampolines eagerly, after the application is "warmed up". (The definition of warmed up is specific to the application).

Essentially, this would involve introducing a two C-API functions:

//  Creates a new trampoline by doing essentially what `py_trampoline_evaluator` currently does.
//  1. Call `compile_trampoline()`
//  2. Register it by calling `trampoline_api.write_state()`
//  3. Call `_PyCode_SetExtra`
int PyUnstable_PerfTrampoline_CompileCode(PyCodeObject *co);
// This flag will be used by _PyPerfTrampoline_AfterFork_Child to 
// decide whether to re-initialize trampolines in child processes. If enabled,
// we would copy the the parent process perf-map file, else we would use 
// the current behavior. 
static Py_ssize_t persist_after_fork = 0;

int PyUnstable_PerfTrampoline_PersistAfterFork (int enable) {
    persist_after_fork = enable;
}

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@czardoz czardoz added the type-feature A feature request or enhancement label Sep 19, 2023
@carljm
Copy link
Member

carljm commented Sep 20, 2023

cc @pablogsal @gpshead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants