Skip to content

fix(license): treat bare GPL1 as a clue#771

Merged
mstykow merged 1 commit intomainfrom
fix/gpl1-bare-word-clue
Apr 23, 2026
Merged

fix(license): treat bare GPL1 as a clue#771
mstykow merged 1 commit intomainfrom
fix/gpl1-bare-word-clue

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 23, 2026

Summary

  • demote gpl1_bare_word_only.RULE to clue-only evidence so bare GPL1 tokens remain visible without surfacing as hard GPL-1.0 detections
  • keep the change isolated to the GPL1 bare-word rule family only; no adjacent GPL phrase rules are adjusted here
  • add one behavior-only regression test showing GPL1 is surfaced as a clue rather than a detected license expression

Issues

Scope and exclusions

  • Included:
    • new overlay for gpl1_bare_word_only.RULE
    • regenerated embedded license index artifact
    • one behavior-only engine test for GPL1 clue output
  • Explicit exclusions:
    • no changes to gpl_194.RULE or gpl-1.0-plus_200.RULE
    • no new configuration-coupled tests asserting overlay presence or rule-kind internals
    • no changes to existing golden fixtures or broader GPL false-positive policy

Intentional differences from Python

  • This PR intentionally treats bare GPL1 the same way Provenant already treats bare GPL: as weak shorthand that should remain inspectable as a clue, but not be asserted as a hard license detection. That is narrower than a general GPL phrase policy and is specifically motivated by the upstream false-positive history in #2585, #1914/#1963, and #3932/#4106.

Follow-up work

  • Created or intentionally deferred:
    • deferred any changes for released under the GPL (gpl_194.RULE) and under the GPL (gpl-1.0-plus_200.RULE) until there is similarly direct upstream false-positive evidence for those exact rules

Expected-output fixture changes

  • Files changed: resources/license_detection/overlay/rules/gpl1_bare_word_only.RULE, resources/license_detection/license_index.zst
  • Why the new expected output is correct:
    • bare GPL1 is weak shorthand analogous to the already-demoted bare GPL family, and current upstream issue history shows repeated false positives from code tokens and labels rather than reliable substantive license notices

Demote the bare GPL1 shorthand rule to clue-only evidence so versionless GPL1 tokens remain inspectable without surfacing as hard GPL detections. The change is intentionally isolated to gpl1_bare_word_only.RULE, mirroring the existing bare-GPL treatment without widening policy to neighboring GPL phrase rules.

This follows upstream ScanCode false-positive history around bare GPL1 tokens in code and labels: aboutcode-org/scancode-toolkit#2585, kernel-symbol false positives in #1914 fixed by #1963, and later GPL1/GPL2/GPL3 label noise in #3932 fixed by #4106. Those threads consistently point to GPL1 as weak shorthand that is too fragile for asserted license output.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow enabled auto-merge (rebase) April 23, 2026 10:49
@mstykow mstykow merged commit fcc7f6c into main Apr 23, 2026
15 checks passed
@mstykow mstykow deleted the fix/gpl1-bare-word-clue branch April 23, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant