Summary
_cached_discover_vscode_logs calls _scan_child_ids(candidate) — which does one os.scandir + one DirEntry.stat() per child directory — before checking whether the cached root_id is still valid. Every call, even a perfect cache hit, pays the full O(n_children) scan cost.
File & Function
src/copilot_usage/vscode_parser.py · _cached_discover_vscode_logs
What Makes It Slow
root_id: tuple[int, int] = (st.st_mtime_ns, st.st_size)
child_ids = _scan_child_ids(candidate) # ← always runs: 1 scandir + n stat() calls
cached = _VSCODE_DISCOVERY_CACHE.get(candidate)
if (
cached is not None
and cached.root_id == root_id
and cached.child_ids == child_ids
):
result.extend(cached.log_paths)
continue
_scan_child_ids does os.scandir(root) and calls entry.stat(follow_symlinks=False) for every immediate child directory before the cache is even consulted. VS Code creates one dated subdirectory per session (20240115/, 20240116/, …). A long-time VS Code user accumulates hundreds of these. At steady state — when nothing has changed — all those stat syscalls are wasted.
On Linux (ext4/btrfs/overlayfs) and macOS (APFS), creating or removing a subdirectory always updates the parent directory's mtime. root_id (which captures (st_mtime_ns, st_size)) is therefore sufficient to detect child-directory changes on these platforms; the child_ids level is belt-and-suspenders that costs an O(n_children) scandir on every call.
Concrete Fix
Reorder: check the cached root_id before calling _scan_child_ids. If the root's identity is unchanged, trust the cache and skip the child scan entirely.
root_id: tuple[int, int] = (st.st_mtime_ns, st.st_size)
cached = _VSCODE_DISCOVERY_CACHE.get(candidate)
if cached is not None and cached.root_id == root_id:
# Root directory identity unchanged — no subdirectories added/removed
# (parent mtime always updates on Linux/macOS).
result.extend(cached.log_paths)
continue
# Cache miss or root changed — scan children and update cache.
child_ids = _scan_child_ids(candidate)
found = sorted(candidate.glob(_GLOB_PATTERN))
_VSCODE_DISCOVERY_CACHE[candidate] = _VSCodeDiscoveryCache(
root_id=root_id, child_ids=child_ids, log_paths=tuple(found)
)
result.extend(found)
If retaining the child_ids check is considered important for correctness on unusual filesystems (NFS, some FUSE mounts), it can be kept as a secondary step that only fires when root_id matches — avoiding the scandir when root_id has changed (the common cache-miss trigger).
Expected Improvement
For a VS Code installation with 200 dated log directories (~1 year of daily use), steady-state cost drops from ~201 stat() + 1 scandir to 1 stat() per _cached_discover_vscode_logs call. On a local SSD each stat() costs ~1–5 µs; on a networked filesystem each can cost milliseconds.
Testing Requirement
Monkeypatch _scan_child_ids and count invocations. After one warm call that populates _VSCODE_DISCOVERY_CACHE, a second call with an unchanged root must not invoke _scan_child_ids:
def test_cached_discover_skips_child_scan_on_root_id_hit(tmp_path, monkeypatch):
# create a realistic log dir tree and warm the cache with one call
...
scan_calls: list[Path] = []
original = vscode_parser._scan_child_ids
def spy(root: Path) -> vscode_parser._ChildIds:
scan_calls.append(root)
return original(root)
monkeypatch.setattr(vscode_parser, "_scan_child_ids", spy)
_cached_discover_vscode_logs(tmp_path)
assert scan_calls == [], "child scan must be skipped on root_id cache hit"
Generated by Performance Analysis · ● 3.9M · ◷
Summary
_cached_discover_vscode_logscalls_scan_child_ids(candidate)— which does oneos.scandir+ oneDirEntry.stat()per child directory — before checking whether the cachedroot_idis still valid. Every call, even a perfect cache hit, pays the full O(n_children) scan cost.File & Function
src/copilot_usage/vscode_parser.py·_cached_discover_vscode_logsWhat Makes It Slow
_scan_child_idsdoesos.scandir(root)and callsentry.stat(follow_symlinks=False)for every immediate child directory before the cache is even consulted. VS Code creates one dated subdirectory per session (20240115/,20240116/, …). A long-time VS Code user accumulates hundreds of these. At steady state — when nothing has changed — all those stat syscalls are wasted.On Linux (ext4/btrfs/overlayfs) and macOS (APFS), creating or removing a subdirectory always updates the parent directory's
mtime.root_id(which captures(st_mtime_ns, st_size)) is therefore sufficient to detect child-directory changes on these platforms; thechild_idslevel is belt-and-suspenders that costs an O(n_children) scandir on every call.Concrete Fix
Reorder: check the cached
root_idbefore calling_scan_child_ids. If the root's identity is unchanged, trust the cache and skip the child scan entirely.If retaining the
child_idscheck is considered important for correctness on unusual filesystems (NFS, some FUSE mounts), it can be kept as a secondary step that only fires whenroot_idmatches — avoiding the scandir whenroot_idhas changed (the common cache-miss trigger).Expected Improvement
For a VS Code installation with 200 dated log directories (~1 year of daily use), steady-state cost drops from ~201 stat() + 1 scandir to 1 stat() per
_cached_discover_vscode_logscall. On a local SSD each stat() costs ~1–5 µs; on a networked filesystem each can cost milliseconds.Testing Requirement
Monkeypatch
_scan_child_idsand count invocations. After one warm call that populates_VSCODE_DISCOVERY_CACHE, a second call with an unchanged root must not invoke_scan_child_ids: