Drastically speed up precommand hook by caching active version to skip redundant activate calls#523
Conversation
b01b088 to
bb9a634
Compare
There was a problem hiding this comment.
Good for a start!
The "limitations" you've outlined are exactly why we haven't done something like this yet! Their complexity has intimidated all the prospective contributors before you! And a stale cache would be a bug -- so we'd have to revert any fundamentally flawed solution in order to fix it.
I think we actually can check for on-disk changes with absolute minimum overhead -- by caching mtimes of the active .pyenv-version file, and of any directories without a .pyenv-version file leading up to it. Then all the disk activity we'll have to do at a prompt is up to a few stat calls.
- As you can see, this makes it unnecessary to actually read the files each time
- Moreover, if the current version is set by a higher-priority mechanism (
shell -> local -> global) -- we don't have to check for changes in the lower-priority mechanisms at all! statis not a builtin. It is possible to do with thetest -ntbuiltin -- but then we'd have to create a marker file (which will be per-shell-session) and somehow maintain it. We can create our own builtin if process spawns are really as big of a deal as you make it look...
|
P.S. since this is a highly requested change, you are eligible for a payout (non-taxable) from donated money upon completion if you're interested! |
|
Thanks for the thorough review. The detail on mtime caching and the priority chain was exactly what I needed to get this right. Revised commit addresses all three points:
Cache variables simplified from five to three ( Shared sourced file in pyenv proper left for a follow-up, per your suggestion. |
|
I'm currently busy with other RL stuff so I may be replying with a delay. Sorry about that. |
No problem, also busy with some RL work, should have response within 24h. |
7ca917c to
1266527
Compare
There was a problem hiding this comment.
- Indeed, 2ms is not 40us, forks (or rather, spawns, we don't yet have figures on subshell overhead) do show to matter. But that's still way fast enough to be unnoticeable (and worlds apart from hundreds of ms as it is now). We can always change it back to marker files if we have a solution for them, or to custom builtins if such extreme performance here turns out to really be that critical.
- I also looked through whether we actually need to refactor this into Pyenv, given that this PR seems to be dragging out and someone may lose interest if this happens for too long. Given how much damage this slowdown deals to our product, judging by user commentary, I would not want that.
- The
$PYENV_VERSIONlogic andpyenv locallogic are unlikely to change in the foreseeable future - For
pyenv globallogic, file location or name might change -- but that's also highly unlikely - The curveball is
pyenv version-namehooks. If they are present, they may make the above copied logic inconsistent with the subcommand.- AFAICS, in that case, we have to either abandon caching entirely, or delegate to
pyenv version-nameproper to see if the active version has changed for parts that we copy from it. - Hook entry points are likely to change unpredictably in the future
- AFAICS, in that case, we have to either abandon caching entirely, or delegate to
- So, refactoring does seem necessary -- but not immediately and critically so.
- So we can split the refactoring part into a separate PR. But the payout will then be split, too.
- The
Mtime-based caching: stat the directory tree once per prompt (~2ms) and skip full activation when nothing has changed. - Compare PYENV_VERSION, VIRTUAL_ENV, PWD; shortcut when PYENV_VERSION matches - Walk PWD to / for .python-version (stop at first; include broken symlinks) - Path list built once on miss, stored as array (spaces in paths) - Global version file skipped when local version active - version-name hooks checked at init; if present, caching disabled - Covers bash, zsh, and fish
1266527 to
8546b11
Compare
|
Addressed all feedback, including hooks awareness. Path list: Built once on cache miss and reused on subsequent hits. Walk stops at first Env vars: Single check at the top of the function. The PYENV_VERSION-set shortcut and stat check are nested inside it. stat -L: Format detection updated to Hooks:
8/8 tests pass. Force-pushed as |
native-api
left a comment
There was a problem hiding this comment.
I've addressed the remaining issue. Could you double-check me?
|
If you're interested in the payout, please register at https://opencollective.com/, configure the payment method in your profile and provide your handle name -- then we can initiate it. To assess the appropriate amount, I looked at GitPay, heeding the issue's complexity and criticality. I hope it won't disappoint! |
My Open Collective handle is JLodw. Payout method is set up. Appreciate it! |
Checked. Your two commits on the branch look correct: |
activate calls
Problem
_pyenv_virtualenv_hookcallspyenv sh-activate --quieton every prompt, spawning ~10 subprocesses through the pyenv dispatcher regardless of whether anything has changed. On macOS this adds ~200ms of latency per keystroke+Enter (#259, #490, #338).Approach
Make the hook detect "nothing changed" and short-circuit before forking, rather than skipping the check entirely (the approach that #456 took, which broke
pyenv local).The hook now caches five values using shell builtins (zero forks):
$PWDcd)$PYENV_VERSIONpyenv shellchanges$PWD/.python-versioncontentpyenv localchanges$PYENV_ROOT/versioncontentpyenv globalchanges$VIRTUAL_ENVWhen all five match their cached values, the hook returns immediately. When any value changes, the full
pyenv sh-activatepath runs and the cache is refreshed.Limitation
.python-versionfiles in parent directories are not tracked. pyenv walks up the directory tree to find.python-version; this cache only reads the file in$PWD. If a parent directory's file changes while the user is in a subdirectory, the nextcdtriggers a full recheck. Walking up directories from the hook would replicate pyenv's version resolution logic.Measurement
Test plan
Acknowledgments
This approach builds on prior work:
$PWDonly; this one extends the cache to five keys to coverpyenv local,pyenv global,pyenv shell, and manual venv changes.$PWDand$PYENV_VERSIONas the key signals.