fix(license): keep weak GPL shorthand as clues#753
Merged
Conversation
Keep bare versionless GPL shorthand visible as clue-only evidence instead of letting the false-positive path silently drop it. This follows the upstream ScanCode maintainer direction that weak GPL markers such as bare GPL or similar shorthand should not become hard GPL detections, while still remaining inspectable evidence when present in surrounding text. The rationale matches the upstream false-positive discussions in aboutcode-org/scancode-toolkit#4005 and fix PR #4009, plus the broader weak-GPL triage in #2403 and #2793 (fixed in #2799): maintainers consistently move toward tighter evidence thresholds and false-positive suppression or downgrade rather than asserting GPL from fragile shorthand. This commit aligns Provenant with that sentiment by surfacing gpl_bare_word_only.RULE and gpl-1.0-plus_351.RULE as clues instead of detections. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Keep clue matches as standalone groups even when they sort before adjacent non-clue matches on the same line. Without this, weak GPL clues could merge back into neighboring reference detections and reappear as hard GPL results despite the intended downgrade. This follows the same upstream direction captured in aboutcode-org/scancode-toolkit#4005, #4009, #2403, and #2793/#2799: weak GPL evidence should not be promoted into asserted GPL detections. The grouping fix is the mechanical part that preserves that maintainer-aligned policy once the rules are downgraded to clues. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Add the common-profile Vulkan-ValidationLayers compare run to the benchmark table and regenerate the benchmark chart stats. The row captures both the speedup and the substantive outcome: weak Graphics Pipeline Library acronym hits stay visible as clues instead of becoming hard GPL detections, while AndroidManifest package visibility and Khronos documentation cleanup remain better than ScanCode. The rationale references the same upstream ScanCode sentiment discussed in aboutcode-org/scancode-toolkit#4005, #4009, #2403, and #2793/#2799: weak GPL shorthand should be downgraded or filtered rather than promoted into asserted GPL detections. The benchmark wording now reflects that clue-only behavior instead of describing it as simple rejection. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Keep clue-only weak GPL matches out of golden so the golden suite continues to track substantive license detections rather than raw clue noise. This fixes the GPL-3 fixture regression in CI without changing the public scanner behavior: weak GPL shorthand still surfaces as , but it no longer pollutes raw golden expression lists. This follow-up stays aligned with the same upstream ScanCode direction discussed in aboutcode-org/scancode-toolkit#4005, #4009, #2403, and #2793/#2799: weak GPL shorthand should be downgraded, not promoted into asserted license results. The golden helper now reflects that distinction by excluding clue-only matches from the expression list it compares. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Update the GPL external golden fixtures whose raw previously encoded weak GPL or free-unknown clue noise that no longer counts as a substantive expression after the clue-only weak-GPL policy. These are Rust-owned golden expectations, so syncing them to current actuals is the correct way to preserve the new public behavior while keeping the golden suite honest. This remains aligned with aboutcode-org/scancode-toolkit#4005, #4009, #2403, and #2793/#2799: weak GPL shorthand is intentionally downgraded instead of asserted as a hard license result. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Update the Freeware app_exec golden to current Rust actuals now that clue-only weak GPL matches are no longer counted as substantive license expressions in the golden helper. This is a Rust-owned golden sync, not a scanner regression fix. The result stays consistent with the same upstream ScanCode direction in aboutcode-org/scancode-toolkit#4005, #4009, #2403, and #2793/#2799: weak GPL shorthand should not survive as asserted license output. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Update the IJG license golden to current Rust actuals after clue-only matches stopped counting as substantive golden expressions. This keeps the Rust-owned golden expectation aligned with the scanner's intended public output. The change remains consistent with the same upstream ScanCode direction in aboutcode-org/scancode-toolkit#4005, #4009, #2403, and #2793/#2799: weak shorthand or clue-only evidence should not be treated as asserted license output. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the Creative Commons fossology goldens to current Rust actuals after clue-only and free-unknown noise stopped counting as substantive license expressions. These fixtures are Rust-owned expectations, so this preserves the intended public behavior rather than broadening detection again. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the fossology fixtures whose old expectations still counted free-unknown or unknown-license-reference clue noise as substantive expressions. The current Rust behavior intentionally keeps those weak signals out of the golden license expression list. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the public-domain-related fossology expectations to current Rust actuals after clue-only and duplicate public-domain fragments stopped surfacing as substantive golden expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the mixed-license fossology fixtures whose old expectations still depended on weak GPL, warranty, or proprietary-reference fragments that no longer count as substantive golden expressions under current Rust behavior. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the remaining SLIC external goldens after clue-only unknown-reference and public-domain fragments stopped contributing to raw golden license expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the remaining fossology license-reference goldens to current Rust actuals now that weak proprietary, free-unknown, warranty, and public-domain clue fragments no longer count as substantive raw license expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the remaining lic1 goldens whose previous expectations still counted weak free-unknown, extra GPL, or public-domain fragments as substantive expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the mixed-expression lic2 goldens whose old expectations still counted weak GPL, proprietary, and public-domain fragments as substantive raw expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the apache-heavy lic2 golden variants after their old expectations kept a weak free-unknown raw expression that no longer survives as a substantive result. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the remaining lic2 golden expectations where public-domain or unknown-reference fragments no longer count as substantive raw license expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the public-domain lic4 family after the old expectations kept public-domain fragments that no longer survive as substantive raw expressions in these fixtures. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the remaining lic4 fixtures where one weak GPL or proprietary fragment no longer survives as a substantive raw license expression. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Sync the unknown-suite goldens after clue-like warranty, free-unknown, and unknown-reference fragments stopped contributing to substantive raw license expressions. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GPLandthe GPLvisible aslicense_cluesinstead of letting them disappear or merge back into hard GPL detectionsgpl-1.0-plus_351.RULE, fix clue grouping/post-processing so downgraded weak GPL evidence stays clue-only, and update focused regressions accordinglylicense_expressionsso the golden suite continues to track substantive license detections rather than raw clue noiseKhronosGroup/Vulkan-ValidationLayers @ d72c5f52886913598d4064fe8d03bf8ac471e215common-profile compare run indocs/BENCHMARKS.mdand regenerate the benchmark chart/statsIssues
Scope and exclusions
gpl_bare_word_only.RULEandgpl-1.0-plus_351.RULEas clue-only weak GPL evidence, while preservingis_required_phrase: yesongpl-1.0-plus_351.RULElicense_expressionsgoldens likefossology-tests/GPL/gpl-3.0_1.xmldocs/benchmarks/scan-duration-vs-files.svginput/...path prefixes or ScanCode's legacy rule URL host formattingIntentional differences from Python
GPL-1.0-or-laterdetection.Follow-up work
from_filepath normalization and canonical rule URL host differences) untouched because the current Provenant output is cleaner and the mismatches are not semantic regressions.provenant/compare-runs/20260421T153750Z-Vulkan-ValidationLayers-34866/cargo run --manifest-path xtask/Cargo.toml --bin update-license-golden -- --list-mismatches --show-diff --filter gpl-3.0_1.xml --sync-actualExpected-output fixture changes
docs/BENCHMARKS.md,docs/benchmarks/scan-duration-vs-files.svg,resources/license_detection/license_index.zstlicense_cluesinstead of hard GPL detections while preserving the real GPL notice control pathlicense_expressionsgoldens, which is why the previousgpl-3.0_1.xmlCI failure now resolves without changing the public weak-GPL behavior