Skip to content

fix(copyright): recover eigen benchmark author parity#816

Merged
mstykow merged 10 commits intomainfrom
verify/eigen-benchmark
Apr 29, 2026
Merged

fix(copyright): recover eigen benchmark author parity#816
mstykow merged 10 commits intomainfrom
verify/eigen-benchmark

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 29, 2026

Summary

  • keep comment-led Copyright (c) ... by ... lines on the shared holder-extraction path, stop treating Xerox Corporation as junk holder text, and tighten follow-up holder recovery so prose-only copyright notice of ... spans do not leak holders while real names like 42North Inc. are preserved
  • rerun compare-outputs for ValveSoftware/eigen @ e9c43151265207fd3366bba21cddd61141ff402c with the maintained common profile and the same shared ScanCode cache identity
  • refresh the eigen benchmark checkpoint and chart to the latest end-state snapshot (eigen-43257), which keeps 0 package deltas, 0 dependency deltas, and 42 vs 42 top-level license detections, then sync the small set of copyright-golden expectations whose cleaner holder/copyright output is now intentionally preserved

Issues

  • Covers: benchmark verification for ValveSoftware/eigen
  • Closes:

Scope and exclusions

  • Included:
    • src/copyright/detector/pattern_extract/extraction/groups.rs
    • src/copyright/detector/postprocess_transforms/year_repairs.rs
    • src/copyright/detector/postprocess_transforms_test.rs
    • src/copyright/detector/tests_copyright_holder_pipeline.rs
    • src/copyright/detector/tests_false_positives.rs
    • src/copyright/detector/token_utils/filters.rs
    • src/copyright/refiner/holders_junk_patterns.rs
    • src/copyright/refiner/tests.rs
    • docs/BENCHMARKS.md
    • docs/benchmarks/scan-duration-vs-files.svg
    • targeted copyright golden expectations under testdata/copyright-golden/copyrights/
  • Explicit exclusions:
    • no attempt to match ScanCode's weak Distributed holder overcapture in bench/eig33.cpp

Intentional differences from Python

  • No intentional Python-compatibility divergence was introduced. The remaining bench/eig33.cpp holder delta is a weak license-prose overcapture (Distributed) that Provenant intentionally does not model as a holder.

Follow-up work

  • Created or intentionally deferred:
    • none for this PR beyond any future review of unrelated legacy acknowledgment-shaped author/copyright differences surfaced by the eigen compare run

Expected-output fixture changes

  • Files changed:
    • testdata/copyright-golden/copyrights/misco2/regexhq/regexhq-115.txt.yml
    • testdata/copyright-golden/copyrights/misco2/regexhq/regexhq-204.txt.yml
    • testdata/copyright-golden/copyrights/libopenthreads12-libopenthreads.copyright.yml
    • testdata/copyright-golden/copyrights/wxWindows_Library_.0_variant.yml
  • Why the new expected output is correct:
    • markdown-link and ellipsis-tailed holder/copyright cleanup now preserves the cleaner short form instead of stale dangling ( or ... suffixes
    • follow-up false-positive fixes keep prose-only copyright notice of ... text from leaking holders and preserve real derived names like 42North Inc.
    • validated with:
      • cargo test test_refine_holder_keeps_xerox_corporation
      • cargo test test_derive_holder_from_simple_copyright_string_strips_by_prefix
      • cargo test test_derive_holder_from_simple_copyright_string_keeps_leading_digits
      • cargo test test_comment_led_by_keyword_holder_backfilled
      • cargo test test_copyright_notice_of_prose_does_not_emit_xerox_holder
      • cargo test --features golden-tests --test copyright_golden test_fixture_misco3_not_real_copyrights -- --nocapture
      • cargo test --features golden-tests --test copyright_golden test_golden_copyrights -- --nocapture
      • cargo test --features golden-tests --test copyright_golden test_golden_holders -- --nocapture
      • cargo run --manifest-path xtask/Cargo.toml --bin compare-outputs -- --repo-url https://github.com/ValveSoftware/eigen.git --repo-ref e9c43151265207fd3366bba21cddd61141ff402c --profile common

mstykow added 4 commits April 29, 2026 12:47
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow force-pushed the verify/eigen-benchmark branch from 84dd53a to c743074 Compare April 29, 2026 11:10
mstykow and others added 3 commits April 29, 2026 14:30
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow enabled auto-merge (rebase) April 29, 2026 12:33
mstykow and others added 3 commits April 29, 2026 15:04
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow merged commit 5132db1 into main Apr 29, 2026
15 checks passed
@mstykow mstykow deleted the verify/eigen-benchmark branch April 29, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant