Skip to content

fix(julia): verify compare-output targets and improve author recovery#736

Merged
mstykow merged 5 commits intomainfrom
verify/julia-parser
Apr 19, 2026
Merged

fix(julia): verify compare-output targets and improve author recovery#736
mstykow merged 5 commits intomainfrom
verify/julia-parser

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 19, 2026

Summary

  • Verify scorecard item 43 against compare-outputs on JuliaLang/Pkg.jl, JuliaLang/julia, and JuliaPlots/Plots.jl, then record the final Julia benchmark snapshots in docs/BENCHMARKS.md and mark the scorecard row verified.
  • Support legacy singular Julia author metadata in Project.toml parsing and broaden author detection for Julia-flavored README/TOML patterns such as Created by, Original author, package-qualified Primary ... author, and inline author rosters with (@handle) suffixes.
  • Tighten the follow-on author heuristics so code-shaped TOML snippets and generator banners stay quiet, and normalize original-author rosters into distinct people when the clean component detections exist.
  • Saved compare artifacts under .provenant/compare-runs/20260419T082612Z-Pkg.jl-15780, .provenant/compare-runs/20260419T082612Z-julia-15784, and .provenant/compare-runs/20260419T082007Z-Plots.jl-7256.

Issues

  • Covers: parser verification scorecard item 43 (Julia)

Scope and exclusions

  • Included: targeted Julia parser metadata support, shared author-detection improvements needed by the Julia compare targets, refreshed compare-output snapshots, benchmark rows, the scorecard status update, and the minimal copyright-golden follow-up required by the shared detector changes.
  • Explicit exclusions: no broad scanner refactors outside the Julia-driven author heuristics, and no target-specific benchmark-only tuning.

Intentional differences from Python

  • Julia remains a Provenant-only parser family with no Python ScanCode parser reference, so verification here is based on common-profile repository comparisons and file-level triage rather than parser-to-parser parity.

Follow-up work

  • Created or intentionally deferred: none.

Expected-output fixture changes

  • Files changed: testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/char/dtlk.c.yml, testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/crypto/nx/nx-842.c.yml
  • Why the new expected output is correct: the final detector keeps Original author: detections as distinct people instead of merged tails, so dtlk.c now preserves clean Chris Pallotta plus Jim Van Zandt, and nx-842.c preserves separate Robert Jennings and Seth Jennings. The surrounding heuristics were tightened so unrelated code-shaped authors = ... snippets and generator banners stay quiet, and the shared suites now pass locally with cargo test --features golden-tests copyright::golden_test::tests::test_golden_authors plus cargo test --features golden-tests copyright::golden_test::tests::test_golden_copyrights.

mstykow and others added 5 commits April 19, 2026 10:33
Handle legacy Julia Project.toml author fields so package parties stay populated on real-world repos like Plots and RecipesBase, with focused parser coverage to keep the compatibility path durable.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Recognize Julia-flavored author metadata patterns such as singular TOML author assignments, package-specific primary/original author labels, and inline rosters so common-profile author detection stays aligned on real repository manifests and README ownership lines.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Capture the final Julia compare-output evidence in the scorecard and benchmark reference so the verified row points at reproducible repository snapshots, timings, and end-state advantages instead of intermediate local notes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Restore the intended boundaries of the new Julia-driven author extraction so generic code-shaped  snippets and generator banners stay quiet, while preserving the richer original-author detections now reflected in the owning copyright goldens.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Preserve the Julia-driven author heuristics while collapsing stale merged original-author strings into distinct people only where the clean component detections exist, and update the owning Linux copyright goldens to the cleaner end state.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@mstykow mstykow merged commit 8b86b71 into main Apr 19, 2026
14 checks passed
@mstykow mstykow deleted the verify/julia-parser branch April 19, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant