You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The perf trampoline feature introduced in #96143 is incredibly useful to gain visibility into the cost of individual Python functions.
Currently, there are a few aspects of the trampoline that make it costly to enable on live server-side applications:
Each call to a new function results in a disk IO operation (to write to the /tmp/perf-<pid>.map file)
For a forked multiprocess model:
a. We disable and re-enable the trampoline on fork, which means the same work must be done in child processes.
b. We use more memory, because we do not take advantage of copy-on-write.
On a fairly large Python server (Instagram), we have observed a 1.5% overall CPU regression, mainly stemming from the repeated file writes.
In order to address these, and make it possible to have perf-trampoline running in an always-enabled fashion, we could allow extension modules to initialize trampolines eagerly, after the application is "warmed up". (The definition of warmed up is specific to the application).
Essentially, this would involve introducing a two C-API functions:
// Creates a new trampoline by doing essentially what `py_trampoline_evaluator` currently does.
// 1. Call `compile_trampoline()`
// 2. Register it by calling `trampoline_api.write_state()`
// 3. Call `_PyCode_SetExtra`
int PyUnstable_PerfTrampoline_CompileCode(PyCodeObject *co);
// This flag will be used by _PyPerfTrampoline_AfterFork_Child to
// decide whether to re-initialize trampolines in child processes. If enabled,
// we would copy the the parent process perf-map file, else we would use
// the current behavior.
static Py_ssize_t persist_after_fork = 0;
int PyUnstable_PerfTrampoline_PersistAfterFork (int enable) {
persist_after_fork = enable;
}
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Feature or enhancement
Proposal:
The perf trampoline feature introduced in #96143 is incredibly useful to gain visibility into the cost of individual Python functions.
Currently, there are a few aspects of the trampoline that make it costly to enable on live server-side applications:
/tmp/perf-<pid>.map
file)a. We disable and re-enable the trampoline on fork, which means the same work must be done in child processes.
b. We use more memory, because we do not take advantage of copy-on-write.
On a fairly large Python server (Instagram), we have observed a 1.5% overall CPU regression, mainly stemming from the repeated file writes.
In order to address these, and make it possible to have perf-trampoline running in an always-enabled fashion, we could allow extension modules to initialize trampolines eagerly, after the application is "warmed up". (The definition of warmed up is specific to the application).
Essentially, this would involve introducing a two C-API functions:
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: