Summary
_update_vscode_summary (src/copilot_usage/vscode_parser.py, lines 492–518) is the core aggregation loop called for every VS Code log file that is new or changed. It is O(n_requests) and is the dominant cost when a log file is first parsed or re-parsed after a change. The current code accesses six acc.* attributes and multiple req.* attributes on every iteration via LOAD_ATTR bytecodes, instead of binding them to locals (LOAD_FAST) once before the loop.
What makes it slow
_SummaryAccumulator uses @dataclass(slots=True). Each LOAD_ATTR on a slotted dataclass goes through CPython's slot-descriptor protocol — faster than a plain __dict__ lookup, but still ~3–5× slower than LOAD_FAST on a local variable.
Per request, the loop currently issues these LOAD_ATTR reads against acc:
| attribute |
reads per iteration |
acc.requests_by_model |
1 |
acc.duration_by_model |
1 |
acc.requests_by_category |
1 |
acc.requests_by_date |
1 |
acc.first_timestamp |
1 |
acc.last_timestamp |
1 |
| total |
6 |
For a log file with 5 000 requests that is 30 000 LOAD_ATTR operations that could be LOAD_FAST. There are additional LOAD_ATTR reads on req (req.model twice, req.duration_ms twice, req.category once, req.timestamp up to twice) that can be hoisted to locals as well.
Concrete fix
Bind the four defaultdict fields to locals once before the loop, and hoist repeated req.* accesses inside the iteration body:
def _update_vscode_summary(
acc: _SummaryAccumulator, requests: Sequence[VSCodeRequest]
) -> None:
rbm = acc.requests_by_model
dbm = acc.duration_by_model
rbc = acc.requests_by_category
rbd = acc.requests_by_date
last_date_key: str = ""
last_date_val: date | None = None
for req in requests:
acc.total_requests += 1
dur = req.duration_ms
acc.total_duration_ms += dur
model = req.model
rbm[model] += 1
dbm[model] += dur
rbc[req.category] += 1
ts = req.timestamp
ts_date = ts.date()
if last_date_val is None or ts_date != last_date_val:
last_date_key = ts_date.isoformat()
last_date_val = ts_date
rbd[last_date_key] += 1
first = acc.first_timestamp
if first is None or ts < first:
acc.first_timestamp = ts
last_ts = acc.last_timestamp
if last_ts is None or ts > last_ts:
acc.last_timestamp = ts
ts_date.isoformat() replaces req.timestamp.strftime("%Y-%m-%d") — both produce identical "YYYY-MM-DD" strings but date.isoformat() operates on the already-computed date object without re-parsing the format string.
Expected improvement
For a log file with 5 000 requests: ~30 000 LOAD_ATTR → LOAD_FAST conversions on acc, plus ~10 000 on req. At typical CPython 3.12 bytecode costs the gain is roughly 1–2 ms per aggregation of a large log file. The gain scales linearly with request count — large VS Code installations accumulate thousands of requests per daily log file.
This only affects the cache-miss path (first parse or after a file change). Subsequent calls return the cached _vscode_summary_cache or _PER_FILE_SUMMARY_CACHE without entering this loop.
Testing requirement
Add a test to tests/copilot_usage/test_vscode_parser.py following the project's deterministic perf-test convention (no wall-clock timing):
- Build a synthetic
requests list of 10 000+ VSCodeRequest objects spanning multiple models, categories, and dates.
- Call
_update_vscode_summary and assert the result is bit-for-bit identical to a reference (total requests, per-model counts/durations, per-date counts, timestamp bounds) — guards against regression while providing coverage of the optimized hot path.
Correctness-equivalence is sufficient; wall-clock assertions are avoided to prevent flaky CI, matching the pattern in TestVscodeSummaryCacheSkipsReaggregation.
Generated by Performance Analysis · ● 4.4M · ◷
Summary
_update_vscode_summary(src/copilot_usage/vscode_parser.py, lines 492–518) is the core aggregation loop called for every VS Code log file that is new or changed. It is O(n_requests) and is the dominant cost when a log file is first parsed or re-parsed after a change. The current code accesses sixacc.*attributes and multiplereq.*attributes on every iteration viaLOAD_ATTRbytecodes, instead of binding them to locals (LOAD_FAST) once before the loop.What makes it slow
_SummaryAccumulatoruses@dataclass(slots=True). EachLOAD_ATTRon a slotted dataclass goes through CPython's slot-descriptor protocol — faster than a plain__dict__lookup, but still ~3–5× slower thanLOAD_FASTon a local variable.Per request, the loop currently issues these
LOAD_ATTRreads againstacc:acc.requests_by_modelacc.duration_by_modelacc.requests_by_categoryacc.requests_by_dateacc.first_timestampacc.last_timestampFor a log file with 5 000 requests that is 30 000
LOAD_ATTRoperations that could beLOAD_FAST. There are additionalLOAD_ATTRreads onreq(req.modeltwice,req.duration_mstwice,req.categoryonce,req.timestampup to twice) that can be hoisted to locals as well.Concrete fix
Bind the four
defaultdictfields to locals once before the loop, and hoist repeatedreq.*accesses inside the iteration body:ts_date.isoformat()replacesreq.timestamp.strftime("%Y-%m-%d")— both produce identical"YYYY-MM-DD"strings butdate.isoformat()operates on the already-computeddateobject without re-parsing the format string.Expected improvement
For a log file with 5 000 requests: ~30 000
LOAD_ATTR→LOAD_FASTconversions onacc, plus ~10 000 onreq. At typical CPython 3.12 bytecode costs the gain is roughly 1–2 ms per aggregation of a large log file. The gain scales linearly with request count — large VS Code installations accumulate thousands of requests per daily log file.This only affects the cache-miss path (first parse or after a file change). Subsequent calls return the cached
_vscode_summary_cacheor_PER_FILE_SUMMARY_CACHEwithout entering this loop.Testing requirement
Add a test to
tests/copilot_usage/test_vscode_parser.pyfollowing the project's deterministic perf-test convention (no wall-clock timing):requestslist of 10 000+VSCodeRequestobjects spanning multiple models, categories, and dates._update_vscode_summaryand assert the result is bit-for-bit identical to a reference (total requests, per-model counts/durations, per-date counts, timestamp bounds) — guards against regression while providing coverage of the optimized hot path.Correctness-equivalence is sufficient; wall-clock assertions are avoided to prevent flaky CI, matching the pattern in
TestVscodeSummaryCacheSkipsReaggregation.