feat(topology): recover codebase statistics per sub-package off the single scc walk#121
Merged
Conversation
…ingle scc walk (#78) On an undeclared fan-out, `codebase-stats` ran once over the whole tree, so each sub-package's totals were mushed into one record and unlabelled. Recover it per assessment root by slicing the first-party keep-set to each root's subtree (`scc_keep_for_root`) and re-aggregating the ONE cached `scc --by-file` walk — reuse, never re-walk, so no extra scc cost. Records are namespaced via `SLUG_NS` (`api/codebase-stats`) and the console table is labelled per package. `scc_keep_for_root "."` returns the full keep-set path unchanged, and the loop runs once with `SLUG_NS` empty, so a single package / declared workspace is byte-identical to before (verified end-to-end: all parsed records unchanged). This is #78 Phase 2b increment 2, first slice. The engine-routed `complexity` and `duplication` arms still run whole-tree — recovering them needs the engine (eslint / lizard / jscpd) re-routed per sub-package, tracked as a follow-up. Refs #78 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012oHR4g8pH7Ui242SRycFzw
maudlin
added a commit
that referenced
this pull request
Jun 25, 2026
…ine routing (#78) (#123) The last whole-tree measurement arm. On an undeclared fan-out, complexity ran once over the whole tree, routed by whole-tree engine selection. Recover it per assessment root with the FULL engine ladder re-routed per child (route_complexity_child, a scoped mirror of the whole-tree routing): ESLint on the JS/TS slice using the child's OWN flat config + local bin, lizard on the non-JS slice (inventory sliced to the subtree), scc fallback (keep-set sliced). Findings are namespaced (backend/complexity) and labelled per package. The single git-hotspots CSV is accumulated across packages — truncated once before the loop, appended per arm, always in TARGET-relative (namespaced) paths (scc/lizard/eslint findings re-prefixed; the standalone-lizard CSV's file column namespaced via awk) — so the churn × complexity join stays whole-tree. If nothing is measured the (empty) CSV is dropped, matching the pre-#78 absent state. Single package / declared workspace → one iteration at "." reusing the whole-tree routing, no cd, SLUG_NS empty, $PWD == $TARGET → byte-identical to before, CSV included (verified end-to-end on the lizard, scc and merged arms + full parsed-set diff). A fan-out routes each child to its own engine and accumulates a namespaced CSV. With stats (#121), duplication (#122) and now complexity, every measurement arm recovers per sub-package — #78 is complete. Closes #78 Claude-Session: https://claude.ai/code/session_012oHR4g8pH7Ui242SRycFzw Co-authored-by: Mark Ridley <210189+maudlin@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
On an undeclared fan-out,
codebase-statsran once over the whole tree, so each sub-package's totals were mushed into one record and unlabelled. This recovers it per assessment root: slice the first-party keep-set to each root's subtree (scc_keep_for_root) and re-aggregate the one cachedscc --by-filewalk — reuse, never re-walk, so no extra scc cost. Records are namespaced viaSLUG_NS(api/codebase-stats); the console table is labelled per package (📦 api/).This is #78 Phase 2b increment 2, first slice — the scc-measured arm. The engine-routed
complexityandduplicationarms still run whole-tree; recovering them needs the engine (eslint / lizard / jscpd) re-routed per sub-package, which is a meatier, separable change — tracked as a follow-up (see issue comment).Why this is safe
scc_keep_for_root "."returns the full keep-set path unchanged, and the loop runs once withSLUG_NSempty → a single package / declared workspace is byte-identical to before (the acceptance gate). Verified end-to-end: every parsed record on a single-package fixture is identical betweenmainand this branch (modulo the absoluteCHECKUP_OUT_DIRpath embedded in one message string).Verification
test/scc-inventory.test.sh— addedscc_keep_for_rootunit cases (slice correctness,.identity,./-prefix normalisation, nested-root filename flattening, breakdown round-trip): 39 passed.undeclared-fan-out→api/codebase-stats+web/codebase-stats, labelled.Refs #78
🤖 Generated with Claude Code