Fix DevProfile Pct sums and surface hidden-bucket counts#6
Conversation
Two related changes in the DevProfile Scope/Extensions render: 1. Denominator switch: cs.FilesTouched → len(devFiles[email]) FilesTouched counts every path the dev appeared on, including pure renames (commits where add==0 && del==0 on that file). The Scope and Extensions numerators, however, are built from fe.devLines, which only tracks files where the dev wrote lines. The mismatch silently deflated all Pcts in repos with reorgs and left sums below 100% with no visible cause. Switching to the authored-file count as the denominator makes the two sides consistent, aligns with the Herfindahl specialization index which already used the authored population, and leaves truncation as the ONLY reason a Pct sum can drop below 100. Validated on 5726 real profiles (pi-hole, praat, WordPress, kubernetes): zero profiles with <5 buckets sum to anything other than 100% after the fix. 2. ScopeHidden / ExtensionsHidden counters + "+N more" rendering A reader seeing "Scope: foo (28%), bar (25%), ... (+6 more)" now understands the 85% sum comes from 6 hidden buckets; previously the 85% read as a bug. Surfaced in CLI (inline suffix), HTML main report profile card (inline italic span), and HTML dedicated profile (line below the bar legend). Silent when Hidden == 0. Tests: regression on the denominator change (FilesTouched=5 but authored=4, sum must be 100 not 80); explicit truncation-sum invariant (6 buckets → 5 visible, sum<100); Hidden counters == 0 in the common case; Hidden counters == exact drop count when truncated. METRICS.md: both Scope and Extensions rows updated with the new denominator wording; Extensions caveat #2 (pure-rename gap) removed since it no longer applies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1f65be6a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Scope and Extensions both rendered as horizontal bars stacked
vertically in the dedicated profile page. Sharing the same
categorical palette (blue/green/purple/orange/red) invited false
cross-chart correlation — a reader's eye auto-pairs same-index
segments ("the blue dir must go with the blue ext") even though the
two axes are independent.
Swap Extensions to a teal monochromatic progression
(#0e4c5b → #5dbdb7). Two effects: (1) no hue collides with Scope's
palette, so same-color confusion is eliminated; (2) monochrome signals
"ordered distribution" instead of "distinct categories", which matches
what the data actually represents (top-5 slice of one ranking).
Palette stops at #5dbdb7 so the lightest shade still holds adequate
contrast with the white overlay labels on slices ≥8% (smaller slices
skip the label anyway per the existing `gt $e.Pct 8.0` guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fe.devLines[email] += cf.Additions + cf.Deletions ran on every commit_file row, including pure renames where the numstat carries 0/0. That +=0 created a zero-valued map entry which then survived as a "dev touched this file" signal into every downstream consumer: BusFactor, FileHotspots.UniqueDevs, DeveloperNetwork pairs, ChurnRisk bus factor, and — the reported symptom — DevProfile.Scope / Extensions Pct math. A dev with one real .go edit and one pure .md rename was reporting .go 50% + .md 50% even though they never authored a line of .md. Fix the write site: only touch fe.devLines when lines > 0. devCommits still increments unconditionally so the distinct "appeared on this file" signal remains available for any caller that wants it (none do today). devLines now cleanly means "lines this dev contributed", which is the semantic the METRICS.md Scope/Extensions doc already advertised but the code didn't enforce. Integration test uses streamLoad on a synthetic JSONL with an M commit and an R100 rename: Alice's profile surfaces only the .go bucket at 100%, and the renamed .md file reports 0 unique devs from FileHotspots. Existing tests pass unchanged because no test fixture relied on the zero-entry behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Two related changes in the DevProfile Scope/Extensions render:
Denominator switch: cs.FilesTouched → len(devFiles[email])
FilesTouched counts every path the dev appeared on, including pure
renames (commits where add==0 && del==0 on that file). The Scope
and Extensions numerators, however, are built from fe.devLines,
which only tracks files where the dev wrote lines. The mismatch
silently deflated all Pcts in repos with reorgs and left sums
below 100% with no visible cause.
Switching to the authored-file count as the denominator makes the
two sides consistent, aligns with the Herfindahl specialization
index which already used the authored population, and leaves
truncation as the ONLY reason a Pct sum can drop below 100.
Validated on 5726 real profiles (pi-hole, praat, WordPress,
kubernetes): zero profiles with <5 buckets sum to anything other
than 100% after the fix.
ScopeHidden / ExtensionsHidden counters + "+N more" rendering
A reader seeing "Scope: foo (28%), bar (25%), ... (+6 more)" now understands the 85% sum comes from 6 hidden buckets; previously the 85% read as a bug. Surfaced in CLI (inline suffix), HTML main report profile card (inline italic span), and HTML dedicated profile (line below the bar legend). Silent when Hidden == 0.
Tests: regression on the denominator change (FilesTouched=5 but authored=4, sum must be 100 not 80); explicit truncation-sum invariant (6 buckets → 5 visible, sum<100); Hidden counters == 0 in the common case; Hidden counters == exact drop count when truncated.
METRICS.md: both Scope and Extensions rows updated with the new denominator wording; Extensions caveat #2 (pure-rename gap) removed since it no longer applies.