Skip to content

Fix DevProfile Pct sums and surface hidden-bucket counts#6

Merged
lex0c merged 3 commits intomainfrom
fix/profile-pct-denominator
Apr 20, 2026
Merged

Fix DevProfile Pct sums and surface hidden-bucket counts#6
lex0c merged 3 commits intomainfrom
fix/profile-pct-denominator

Conversation

@lex0c
Copy link
Copy Markdown
Owner

@lex0c lex0c commented Apr 20, 2026

Two related changes in the DevProfile Scope/Extensions render:

  1. Denominator switch: cs.FilesTouched → len(devFiles[email])

    FilesTouched counts every path the dev appeared on, including pure
    renames (commits where add==0 && del==0 on that file). The Scope
    and Extensions numerators, however, are built from fe.devLines,
    which only tracks files where the dev wrote lines. The mismatch
    silently deflated all Pcts in repos with reorgs and left sums
    below 100% with no visible cause.

    Switching to the authored-file count as the denominator makes the
    two sides consistent, aligns with the Herfindahl specialization
    index which already used the authored population, and leaves
    truncation as the ONLY reason a Pct sum can drop below 100.
    Validated on 5726 real profiles (pi-hole, praat, WordPress,
    kubernetes): zero profiles with <5 buckets sum to anything other
    than 100% after the fix.

  2. ScopeHidden / ExtensionsHidden counters + "+N more" rendering

    A reader seeing "Scope: foo (28%), bar (25%), ... (+6 more)" now understands the 85% sum comes from 6 hidden buckets; previously the 85% read as a bug. Surfaced in CLI (inline suffix), HTML main report profile card (inline italic span), and HTML dedicated profile (line below the bar legend). Silent when Hidden == 0.

Tests: regression on the denominator change (FilesTouched=5 but authored=4, sum must be 100 not 80); explicit truncation-sum invariant (6 buckets → 5 visible, sum<100); Hidden counters == 0 in the common case; Hidden counters == exact drop count when truncated.

METRICS.md: both Scope and Extensions rows updated with the new denominator wording; Extensions caveat #2 (pure-rename gap) removed since it no longer applies.

Two related changes in the DevProfile Scope/Extensions render:

1. Denominator switch: cs.FilesTouched → len(devFiles[email])

   FilesTouched counts every path the dev appeared on, including pure
   renames (commits where add==0 && del==0 on that file). The Scope
   and Extensions numerators, however, are built from fe.devLines,
   which only tracks files where the dev wrote lines. The mismatch
   silently deflated all Pcts in repos with reorgs and left sums
   below 100% with no visible cause.

   Switching to the authored-file count as the denominator makes the
   two sides consistent, aligns with the Herfindahl specialization
   index which already used the authored population, and leaves
   truncation as the ONLY reason a Pct sum can drop below 100.
   Validated on 5726 real profiles (pi-hole, praat, WordPress,
   kubernetes): zero profiles with <5 buckets sum to anything other
   than 100% after the fix.

2. ScopeHidden / ExtensionsHidden counters + "+N more" rendering

   A reader seeing "Scope: foo (28%), bar (25%), ... (+6 more)" now
   understands the 85% sum comes from 6 hidden buckets; previously
   the 85% read as a bug. Surfaced in CLI (inline suffix), HTML main
   report profile card (inline italic span), and HTML dedicated
   profile (line below the bar legend). Silent when Hidden == 0.

Tests: regression on the denominator change (FilesTouched=5 but
authored=4, sum must be 100 not 80); explicit truncation-sum
invariant (6 buckets → 5 visible, sum<100); Hidden counters == 0 in
the common case; Hidden counters == exact drop count when
truncated.

METRICS.md: both Scope and Extensions rows updated with the new
denominator wording; Extensions caveat #2 (pure-rename gap) removed
since it no longer applies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1f65be6a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/stats/stats.go
lex0c and others added 2 commits April 19, 2026 23:32
Scope and Extensions both rendered as horizontal bars stacked
vertically in the dedicated profile page. Sharing the same
categorical palette (blue/green/purple/orange/red) invited false
cross-chart correlation — a reader's eye auto-pairs same-index
segments ("the blue dir must go with the blue ext") even though the
two axes are independent.

Swap Extensions to a teal monochromatic progression
(#0e4c5b → #5dbdb7). Two effects: (1) no hue collides with Scope's
palette, so same-color confusion is eliminated; (2) monochrome signals
"ordered distribution" instead of "distinct categories", which matches
what the data actually represents (top-5 slice of one ranking).

Palette stops at #5dbdb7 so the lightest shade still holds adequate
contrast with the white overlay labels on slices ≥8% (smaller slices
skip the label anyway per the existing `gt $e.Pct 8.0` guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fe.devLines[email] += cf.Additions + cf.Deletions ran on every
commit_file row, including pure renames where the numstat carries
0/0. That +=0 created a zero-valued map entry which then survived as
a "dev touched this file" signal into every downstream consumer:
BusFactor, FileHotspots.UniqueDevs, DeveloperNetwork pairs, ChurnRisk
bus factor, and — the reported symptom — DevProfile.Scope /
Extensions Pct math. A dev with one real .go edit and one pure .md
rename was reporting .go 50% + .md 50% even though they never
authored a line of .md.

Fix the write site: only touch fe.devLines when lines > 0. devCommits
still increments unconditionally so the distinct "appeared on this
file" signal remains available for any caller that wants it (none do
today). devLines now cleanly means "lines this dev contributed",
which is the semantic the METRICS.md Scope/Extensions doc already
advertised but the code didn't enforce.

Integration test uses streamLoad on a synthetic JSONL with an M
commit and an R100 rename: Alice's profile surfaces only the .go
bucket at 100%, and the renamed .md file reports 0 unique devs from
FileHotspots. Existing tests pass unchanged because no test fixture
relied on the zero-entry behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@lex0c lex0c merged commit 39a1a5f into main Apr 20, 2026
1 of 2 checks passed
@lex0c lex0c deleted the fix/profile-pct-denominator branch April 20, 2026 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant