fix(julia): verify compare-output targets and improve author recovery by mstykow · Pull Request #736 · mstykow/provenant

mstykow · 2026-04-19T08:35:51Z

Summary

Verify scorecard item 43 against compare-outputs on JuliaLang/Pkg.jl, JuliaLang/julia, and JuliaPlots/Plots.jl, then record the final Julia benchmark snapshots in docs/BENCHMARKS.md and mark the scorecard row verified.
Support legacy singular Julia author metadata in Project.toml parsing and broaden author detection for Julia-flavored README/TOML patterns such as Created by, Original author, package-qualified Primary ... author, and inline author rosters with (@handle) suffixes.
Tighten the follow-on author heuristics so code-shaped TOML snippets and generator banners stay quiet, and normalize original-author rosters into distinct people when the clean component detections exist.
Saved compare artifacts under .provenant/compare-runs/20260419T082612Z-Pkg.jl-15780, .provenant/compare-runs/20260419T082612Z-julia-15784, and .provenant/compare-runs/20260419T082007Z-Plots.jl-7256.

Issues

Covers: parser verification scorecard item 43 (Julia)

Scope and exclusions

Included: targeted Julia parser metadata support, shared author-detection improvements needed by the Julia compare targets, refreshed compare-output snapshots, benchmark rows, the scorecard status update, and the minimal copyright-golden follow-up required by the shared detector changes.
Explicit exclusions: no broad scanner refactors outside the Julia-driven author heuristics, and no target-specific benchmark-only tuning.

Intentional differences from Python

Julia remains a Provenant-only parser family with no Python ScanCode parser reference, so verification here is based on common-profile repository comparisons and file-level triage rather than parser-to-parser parity.

Follow-up work

Created or intentionally deferred: none.

Expected-output fixture changes

Files changed: testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/char/dtlk.c.yml, testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/crypto/nx/nx-842.c.yml
Why the new expected output is correct: the final detector keeps Original author: detections as distinct people instead of merged tails, so dtlk.c now preserves clean Chris Pallotta plus Jim Van Zandt, and nx-842.c preserves separate Robert Jennings and Seth Jennings. The surrounding heuristics were tightened so unrelated code-shaped authors = ... snippets and generator banners stay quiet, and the shared suites now pass locally with cargo test --features golden-tests copyright::golden_test::tests::test_golden_authors plus cargo test --features golden-tests copyright::golden_test::tests::test_golden_copyrights.

Handle legacy Julia Project.toml author fields so package parties stay populated on real-world repos like Plots and RecipesBase, with focused parser coverage to keep the compatibility path durable. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Recognize Julia-flavored author metadata patterns such as singular TOML author assignments, package-specific primary/original author labels, and inline rosters so common-profile author detection stays aligned on real repository manifests and README ownership lines. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Capture the final Julia compare-output evidence in the scorecard and benchmark reference so the verified row points at reproducible repository snapshots, timings, and end-state advantages instead of intermediate local notes. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Restore the intended boundaries of the new Julia-driven author extraction so generic code-shaped snippets and generator banners stay quiet, while preserving the richer original-author detections now reflected in the owning copyright goldens. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Preserve the Julia-driven author heuristics while collapsing stale merged original-author strings into distinct people only where the clean component detections exist, and update the owning Linux copyright goldens to the cleaner end state. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

mstykow and others added 5 commits April 19, 2026 10:33

mstykow merged commit 8b86b71 into main Apr 19, 2026
14 checks passed

mstykow deleted the verify/julia-parser branch April 19, 2026 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(julia): verify compare-output targets and improve author recovery#736

fix(julia): verify compare-output targets and improve author recovery#736
mstykow merged 5 commits intomainfrom
verify/julia-parser

mstykow commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mstykow commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues

Scope and exclusions

Intentional differences from Python

Follow-up work

Expected-output fixture changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mstykow commented Apr 19, 2026 •

edited

Loading