Skip to content

fix(copyright): recover malformed copyright year ranges#815

Merged
mstykow merged 2 commits intomainfrom
fix/copyright-malformed-year-range
Apr 29, 2026
Merged

fix(copyright): recover malformed copyright year ranges#815
mstykow merged 2 commits intomainfrom
fix/copyright-malformed-year-range

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 29, 2026

Summary

  • treat malformed copyright year ranges like 20010-2011 and existing oversized tails like 2010-20224 as year-like during detector gating instead of dropping the detection
  • wire the detector and tree-walk year checks through the shared helper so holder extraction still works when the malformed range is tokenized as a cardinal
  • add unit and end-to-end regressions covering malformed first-year ranges without changing normal email-bearing copyright handling

Scope and exclusions

  • Included: copyright detector token-utils year-like recognition, detector/tree-walk year gating, and targeted regression coverage
  • Explicit exclusions: lexer year grammar changes, golden fixture updates, and broader normalization of arbitrary long numeric ranges

Intentional differences from Python

  • Provenant now preserves malformed copyright ranges with obvious year-shape typos in more cases instead of requiring the first range token to be a strict four-digit year before extraction can proceed.

mstykow added 2 commits April 29, 2026 12:02
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow enabled auto-merge (rebase) April 29, 2026 10:16
@mstykow mstykow merged commit eafcd79 into main Apr 29, 2026
15 checks passed
@mstykow mstykow deleted the fix/copyright-malformed-year-range branch April 29, 2026 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant