Skip to content

[aw][perf] vscode_parser: _cached_discover_vscode_logs calls _scan_child_ids before the cache-hit check, causing O(n_childre [Content truncated due to length] #897

@microsasa

Description

@microsasa

Summary

_cached_discover_vscode_logs calls _scan_child_ids(candidate) — which does one os.scandir + one DirEntry.stat() per child directory — before checking whether the cached root_id is still valid. Every call, even a perfect cache hit, pays the full O(n_children) scan cost.

File & Function

src/copilot_usage/vscode_parser.py · _cached_discover_vscode_logs

What Makes It Slow

root_id: tuple[int, int] = (st.st_mtime_ns, st.st_size)
child_ids = _scan_child_ids(candidate)          # ← always runs: 1 scandir + n stat() calls
cached = _VSCODE_DISCOVERY_CACHE.get(candidate)
if (
    cached is not None
    and cached.root_id == root_id
    and cached.child_ids == child_ids
):
    result.extend(cached.log_paths)
    continue

_scan_child_ids does os.scandir(root) and calls entry.stat(follow_symlinks=False) for every immediate child directory before the cache is even consulted. VS Code creates one dated subdirectory per session (20240115/, 20240116/, …). A long-time VS Code user accumulates hundreds of these. At steady state — when nothing has changed — all those stat syscalls are wasted.

On Linux (ext4/btrfs/overlayfs) and macOS (APFS), creating or removing a subdirectory always updates the parent directory's mtime. root_id (which captures (st_mtime_ns, st_size)) is therefore sufficient to detect child-directory changes on these platforms; the child_ids level is belt-and-suspenders that costs an O(n_children) scandir on every call.

Concrete Fix

Reorder: check the cached root_id before calling _scan_child_ids. If the root's identity is unchanged, trust the cache and skip the child scan entirely.

root_id: tuple[int, int] = (st.st_mtime_ns, st.st_size)
cached = _VSCODE_DISCOVERY_CACHE.get(candidate)
if cached is not None and cached.root_id == root_id:
    # Root directory identity unchanged — no subdirectories added/removed
    # (parent mtime always updates on Linux/macOS).
    result.extend(cached.log_paths)
    continue
# Cache miss or root changed — scan children and update cache.
child_ids = _scan_child_ids(candidate)
found = sorted(candidate.glob(_GLOB_PATTERN))
_VSCODE_DISCOVERY_CACHE[candidate] = _VSCodeDiscoveryCache(
    root_id=root_id, child_ids=child_ids, log_paths=tuple(found)
)
result.extend(found)

If retaining the child_ids check is considered important for correctness on unusual filesystems (NFS, some FUSE mounts), it can be kept as a secondary step that only fires when root_id matches — avoiding the scandir when root_id has changed (the common cache-miss trigger).

Expected Improvement

For a VS Code installation with 200 dated log directories (~1 year of daily use), steady-state cost drops from ~201 stat() + 1 scandir to 1 stat() per _cached_discover_vscode_logs call. On a local SSD each stat() costs ~1–5 µs; on a networked filesystem each can cost milliseconds.

Testing Requirement

Monkeypatch _scan_child_ids and count invocations. After one warm call that populates _VSCODE_DISCOVERY_CACHE, a second call with an unchanged root must not invoke _scan_child_ids:

def test_cached_discover_skips_child_scan_on_root_id_hit(tmp_path, monkeypatch):
    # create a realistic log dir tree and warm the cache with one call
    ...
    scan_calls: list[Path] = []
    original = vscode_parser._scan_child_ids
    def spy(root: Path) -> vscode_parser._ChildIds:
        scan_calls.append(root)
        return original(root)
    monkeypatch.setattr(vscode_parser, "_scan_child_ids", spy)
    _cached_discover_vscode_logs(tmp_path)
    assert scan_calls == [], "child scan must be skipped on root_id cache hit"

Generated by Performance Analysis · ● 3.9M ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowaw-dispatchedIssue has been dispatched to implementerperfPerformance improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions