Skip to content

fix(copyright): stop directive absorption and support single-file compare targets#700

Merged
mstykow merged 2 commits intomainfrom
fix/copyright-directive-compare-target
Apr 15, 2026
Merged

fix(copyright): stop directive absorption and support single-file compare targets#700
mstykow merged 2 commits intomainfrom
fix/copyright-directive-compare-target

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 15, 2026

Summary

  • stop copyright detections from absorbing lowercase trailing directive comment lines such as @lint-ignore-every while preserving the full Confidential and proprietary notice text
  • add focused detector/refiner regressions plus an anonymized copyright golden fixture for the reported case and the @History: author follow-up case
  • fix xtask compare-outputs for single-file --target-path runs by staging file inputs consistently so ScanCode and Provenant compare the same logical path

Issues

  • Covers: reported copyright false positive on trailing directive lines and the follow-up single-file compare-outputs failure
  • Closes:

Scope and exclusions

  • Included: copyright candidate grouping and same-line suffix handling, regression tests/golden coverage, and the xtask single-file compare workflow
  • Explicit exclusions: broader heuristic tightening for other theoretical multiline marker classes beyond the reproduced cases

Intentional differences from Python

  • For the reproduced two-line fixture, Provenant now drops the trailing directive line but preserves Confidential and proprietary; current ScanCode avoids the directive append too, but truncates the notice earlier at Confidential

Follow-up work

  • Created or intentionally deferred: broader investigation of other weak multiline marker classes (for example URL-follow-up lines or author-like prose) was deferred because this branch closes the reproduced bugs without new failing fixtures

Expected-output fixture changes

  • Files changed: testdata/copyright-golden/copyrights/copytest/anonymized_lint_directive_not_absorbed.txt and .yml
  • Why the new expected output is correct: the trailing @lint-ignore-every line is scanner metadata rather than copyright text, while the full first-line notice Confidential and proprietary should remain part of the extracted copyright

mstykow and others added 2 commits April 15, 2026 16:56
Preserve same-line confidentiality wording while treating lowercase scanner directives as non-copyright boundaries, and add focused regression coverage plus a new golden fixture for the reported case.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Stage file targets as a synthetic input path so ScanCode and Provenant compare the same logical file, and invoke Provenant from the staged parent directory instead of treating a file path as cwd.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@mstykow mstykow enabled auto-merge (rebase) April 15, 2026 15:03
@mstykow mstykow merged commit e89fc1c into main Apr 15, 2026
14 checks passed
@mstykow mstykow deleted the fix/copyright-directive-compare-target branch April 15, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant