feat(ci): add compare workflow for PR code quality diff#9
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Quality: PR Comparison13 files compared (5 added, 8 modified) %%{init: {'theme': 'neutral'}}%%
xychart-beta
title "Code Health After PR"
x-axis ["Readability", "Complexity", "Structure", "Duplication", "Naming", "Magic Numbers"]
y-axis "Score" 0 --> 100
bar [100, 17, 69, 57, 90, 81]
File changes — no changes — all metrics stableNo metric data available. Aggregate metricsAggregate Metrics
|
🟡 Code Health: B+ (78/100)
%%{init: {'theme': 'neutral'}}%%
xychart-beta
title "Code Health Scores"
x-axis ["Readability", "Complexity", "Structure", "Duplication", "Naming", "Magic Numbers"]
y-axis "Score" 0 --> 100
bar [100, 32, 84, 72, 94, 90]
🟢 Readability — A (100/100)Codebase averages: flesch_adapted=103.06, fog_adapted=2.31, avg_tokens_per_line=3.93, avg_line_length=34.30
Worst Offenders
🔴 Complexity — D- (32/100)Codebase averages: difficulty=36.52, effort=297806.01, volume=4860.41, estimated_bugs=1.62
Worst Offenders
🟡 Structure — B+ (84/100)Codebase averages: branching_density=0.14, mean_depth=4.14, avg_function_lines=11.96, max_depth=9.67, max_function_lines=26.20, variance=5.99, avg_param_count=1.12, max_param_count=2.02
Worst Offenders
🟡 Duplication — B (72/100)Codebase averages: redundancy=0.56, bigram_repetition_rate=0.20, trigram_repetition_rate=0.10
Worst Offenders
🟢 Naming — A (94/100)Codebase averages: entropy=1.28, mean=6.33, variance=14.48, avg_sub_words_per_id=1.18
Worst Offenders
🟢 Magic Numbers — A- (90/100)Codebase averages: density=0.02
Worst Offenders
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends the structure health category with five new metrics beyond indentation depth: - branching_density: control-flow keywords / non-blank lines, a language-agnostic proxy for cyclomatic complexity - avg_function_lines / max_function_lines: estimated via function-keyword detection at line start, catches god-function smells - avg_param_count / max_param_count: comma-count in function signatures, signals missing abstractions Covers Python, Ruby, JavaScript, Elixir, C#, Java, C++, Go, Rust, PHP, Swift, Shell, and Kotlin via keyword patterns and an access-modifier regex for C#/Java. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updates the keyword blocklist in Pipeline and the operator keyword set in Halstead to cover Python, Ruby, JavaScript, Elixir, C#, Java, C++, Go, Rust, PHP, Swift, Shell, and Kotlin. Previously the list was Python-focused with a mix of Rust/Go words that leaked into identifier counts for the five primary target languages. Now keywords are organised by language with clear comments, and language-specific words (synchronized, defer, suspend, willSet, etc.) no longer pollute identifier or operand analysis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests for Branching and FunctionMetrics are generated directly from the modules' public keyword lists, so adding a keyword automatically produces a new test case on the next run. - branching_keywords/0 exposes the MapSet for test generation - func_keywords/0 and access_modifiers/0 expose the function-detection keyword and C#/Java access modifier lists 54 tests total: 31 per-keyword branching tests, 10 func-keyword tests, 4 access-modifier tests, and 9 hand-written behaviour tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Based on observed codebase averages, the A thresholds were too generous — most metrics scored A despite averages that indicate room for improvement. Changes (A threshold → new A threshold): branching_density 0.10 → 0.08 mean_depth 4 → 3.5 avg_function_lines 10 → 8 max_depth 12 → 8 max_function_lines 25 → 20 variance 10 → 7 B/C/D thresholds tightened proportionally. avg_param_count and max_param_count unchanged — codebase average well within A. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(ci): add compare workflow for PR code quality diff Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(grader): use pattern matching in score_metric/2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(formatter): guard against nil files in format_markdown/2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(compare): add github format with mermaid chart and progress bars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(structure): add branching density and function metrics Extends the structure health category with five new metrics beyond indentation depth: - branching_density: control-flow keywords / non-blank lines, a language-agnostic proxy for cyclomatic complexity - avg_function_lines / max_function_lines: estimated via function-keyword detection at line start, catches god-function smells - avg_param_count / max_param_count: comma-count in function signatures, signals missing abstractions Covers Python, Ruby, JavaScript, Elixir, C#, Java, C++, Go, Rust, PHP, Swift, Shell, and Kotlin via keyword patterns and an access-modifier regex for C#/Java. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(keywords): expand keyword coverage to 13 languages Updates the keyword blocklist in Pipeline and the operator keyword set in Halstead to cover Python, Ruby, JavaScript, Elixir, C#, Java, C++, Go, Rust, PHP, Swift, Shell, and Kotlin. Previously the list was Python-focused with a mix of Rust/Go words that leaked into identifier counts for the five primary target languages. Now keywords are organised by language with clear comments, and language-specific words (synchronized, defer, suspend, willSet, etc.) no longer pollute identifier or operand analysis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(metrics): add auto-generated keyword coverage tests Tests for Branching and FunctionMetrics are generated directly from the modules' public keyword lists, so adding a keyword automatically produces a new test case on the next run. - branching_keywords/0 exposes the MapSet for test generation - func_keywords/0 and access_modifiers/0 expose the function-detection keyword and C#/Java access modifier lists 54 tests total: 31 per-keyword branching tests, 10 func-keyword tests, 4 access-modifier tests, and 9 hand-written behaviour tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(compare): add 🟢/🔴 direction emoji to aggregate metric deltas Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(structure): tighten structure category thresholds Based on observed codebase averages, the A thresholds were too generous — most metrics scored A despite averages that indicate room for improvement. Changes (A threshold → new A threshold): branching_density 0.10 → 0.08 mean_depth 4 → 3.5 avg_function_lines 10 → 8 max_depth 12 → 8 max_function_lines 25 → 20 variance 10 → 7 B/C/D thresholds tightened proportionally. avg_param_count and max_param_count unchanged — codebase average well within A. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
.github/workflows/compare.ymlto runcodeqa compareon every PRgit merge-base HEAD <base.sha>and compares it against PR HEADTest plan