Skip to content

Conversation

@isasmendiagus
Copy link
Contributor

@isasmendiagus isasmendiagus commented Nov 17, 2025

New Features

  • Added --license-sources (-ls) option to filter copyleft inspection; supports both "-ls src1 src2" and repeated "-ls src1 -ls src2" syntaxes.

Changed

  • Copyleft dataset switched to OSADL authoritative source, expanding coverage (adds missing "-or-later" variants).
  • Copyleft checks now default to component-level licenses to reduce file-level noise.

Documentation

  • Added OSADL dataset attribution and license notice to the README.

Bug Fixes

  • Fixed terminal cursor disappearing after aborting a scan (Ctrl+C).

Summary by CodeRabbit

  • New Features

    • Added --license-sources (-ls) CLI option to copyleft inspection for filtering specific license sources (component_declared, license_file, file_header, file_spdx_tag, scancode).
    • Expanded copyleft license database from 21 to 32 licenses with support for -or-later variants, using OSADL authoritative data.
    • Narrowed copyleft inspection to component-level licenses by default, reducing noise from file-level detections.
  • Bug Fixes

    • Fixed terminal cursor disappearing after aborting scan with Ctrl+C.
  • Documentation

    • Added dataset license notice for OSADL copyleft checklist (CC-BY-4.0).
  • Chores

    • Version bumped to 1.41.0.

@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Walkthrough

This pull request adds OSADL-based copyleft license detection with source filtering, introduces a new CLI option --license-sources for filtering component-level vs. file-level license sources, refactors spinner/progress bar handling to use context managers, adds licensing metadata for the OSADL dataset, and includes comprehensive test coverage for the new functionality.

Changes

Cohort / File(s) Summary
Version & Constants
src/scanoss/__init__.py, src/scanoss/constants.py
Version bumped from 1.40.1 to 1.41.0; new constants VALID_LICENSE_SOURCES and DEFAULT_COPYLEFT_LICENSE_SOURCES added for license source filtering.
CLI & Copyleft Integration
src/scanoss/cli.py, src/scanoss/inspection/policy_check/scanoss/copyleft.py
Added --license-sources (-ls) CLI option with validation; extended Copyleft constructor and inspect_copyleft handler to accept and forward license_sources parameter.
License Processing & Filtering
src/scanoss/inspection/utils/license_utils.py, src/scanoss/inspection/utils/scan_result_processor.py
Refactored license evaluation to use OSADL data via new Osadl class; replaced static copyleft set with dynamic filtering based on include/exclude/explicit rules and license sources; added _select_licenses method to filter by source.
OSADL Data Integration
src/scanoss/osadl.py, src/scanoss/data/osadl-copyleft.json
New Osadl class provides load-on-demand access to OSADL copyleft metadata; embedded JSON dataset with comprehensive copyleft license mappings and metadata (title, license, attribution, timestamp).
Spinner/Progress Bar Refactoring
src/scanoss/filecount.py, src/scanoss/scanner.py, src/scanoss/scanners/folder_hasher.py, src/scanoss/scanners/scanner_hfh.py, src/scanoss/threadedscanning.py
Replaced explicit spinner/progress bar lifecycle management with context managers (nullcontext, with blocks); added __del__ destructor and atexit cleanup to ThreadedScanning; minor error message corrections.
Documentation & Licensing
CHANGELOG.md, LICENSE, README.md
Added CHANGELOG entries for new features; extended LICENSE with CC-BY-4.0 attribution for OSADL dataset; added "Dataset License Notice" section to README documenting OSADL copyleft data licensing.
Testing
tests/test_osadl.py, tests/test_policy_inspect.py
New test module test_osadl.py with comprehensive coverage of Osadl class initialization, copyleft checks, case-insensitivity, and shared class-level data; extended test_policy_inspect.py with license source filtering scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Copyleft
    participant ScanResultProcessor
    participant LicenseUtil
    participant Osadl

    CLI->>Copyleft: __init__(license_sources=['component_declared', 'license_file'])
    Copyleft->>ScanResultProcessor: __init__(..., license_sources)
    ScanResultProcessor->>LicenseUtil: init(include, exclude, explicit)
    LicenseUtil->>Osadl: __init__()
    Osadl->>Osadl: _load_copyleft_data() [on first use]
    
    Note over LicenseUtil: Per license in component
    ScanResultProcessor->>ScanResultProcessor: _select_licenses(licenses_data)
    ScanResultProcessor->>ScanResultProcessor: filter by license_sources
    ScanResultProcessor->>LicenseUtil: is_copyleft(spdx_id)
    
    alt explicit_licenses provided
        LicenseUtil->>LicenseUtil: check explicit_licenses
    else include/exclude rules
        LicenseUtil->>LicenseUtil: check include_licenses, exclude_licenses
    else default to OSADL
        LicenseUtil->>Osadl: is_copyleft(spdx_id)
        Osadl->>Osadl: normalize ID & lookup in _shared_copyleft_data
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • License evaluation refactoring (license_utils.py): Significant behavioral change from static default set to dynamic OSADL-driven filtering with three filter collections; requires careful validation of include/exclude/explicit logic and OSADL integration.
  • Scan result processing (scan_result_processor.py): New _select_licenses method replaces priority-based ordering; requires verification of filtering correctness across all license source types.
  • OSADL data accessor (osadl.py): New class with shared class-level data and load-on-demand mechanism; verify error handling, normalization logic, and "Yes"/"Yes (restricted)" copyleft determination.
  • Spinner refactoring across five files: Repetitive context manager pattern changes reduce per-file complexity, but cross-module consistency and nullcontext behavior should be validated.
  • Potential edge cases: License source filtering with None values, interaction of include/exclude/explicit with license_sources, thread safety of shared OSADL data.

Possibly related PRs

Suggested labels

enhancement, feature

Suggested reviewers

  • eeisegn
  • matiasdaloia

Poem

🐰 Hops through OSADL's copyleft halls,
License sources filter license calls,
Spinners spin now safely managed,
No more dangling threads left stranded,
Attribution's heart beats true!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'Releases/v1.41.0' is generic and does not describe the main changes (new --license-sources option, OSADL dataset migration, cursor fix, etc.). It only indicates a version bump. Use a descriptive title summarizing the main change, such as 'Add --license-sources option for copyleft inspection filtering' or 'Switch to OSADL copyleft dataset with license source filtering'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch releases/v1.41.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

SCANOSS SCAN Completed 🚀

  • Detected components: 1
  • Undeclared components: 0
  • Declared components: 1
  • Detected files: 86
  • Detected files undeclared: 0
  • Detected files declared: 86
  • Licenses detected: 1
  • Licenses detected with copyleft: 0
  • Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

@isasmendiagus isasmendiagus deleted the releases/v1.41.0 branch November 17, 2025 14:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/scanoss/scanner.py (1)

939-941: Typo in warning text (“uncounted” → “encountered”).

-                                'Warning: Some errors uncounted while scanning. Results might be incomplete.'
+                                'Warning: Some errors encountered while scanning. Results might be incomplete.'
🧹 Nitpick comments (11)
src/scanoss/threadedscanning.py (1)

25-25: Progress bar cleanup is robust; consider centralizing atexit registration.

The combination of atexit.register(self.complete_bar) and __del__ calling complete_bar() should reliably finish the bar and restore the cursor even on abrupt termination, which addresses the disappearing-cursor issue.

One thing to keep in mind is that registering self.complete_bar in __init__ means:

  • Each ThreadedScanning instance adds an atexit handler.
  • Those handlers hold strong references to the instances, so they will live until process exit.

If you expect many ThreadedScanning instances over a long-lived process, you might consider a class-level/centralized atexit handler (or a weakref-based registry) instead of per-instance registration. For typical CLI usage with a single instance, this is fine as-is.

Also applies to: 81-82, 107-112

src/scanoss/scanners/scanner_hfh.py (1)

113-125: Helper _execute_grpc_scan keeps scan error handling localized.

Pulling the gRPC call into _execute_grpc_scan with a try/except that logs to stderr and resets scan_results to None makes the failure mode explicit and keeps the main scan() flow simple.

If you ever need deeper diagnostics, consider including more context (e.g., traceback.format_exc() when self.base.debug is enabled) so failures in folder_hash_scan are easier to triage, but it's not required for correctness.

src/scanoss/scanners/folder_hasher.py (1)

160-191: Context-managed progress bar correctly ensures cleanup during hashing.

Wrapping the file hashing loop in with Bar('Hashing files...', max=len(filtered_files)) as bar: and calling bar.next() once per file (even on exceptions) is a solid improvement:

  • Guarantees Bar.finish() runs on normal completion and on exceptions (including Ctrl+C), which helps avoid the hidden-cursor issue.
  • Keeps the tree-building and CRC64 logic unchanged in structure and behavior.

If you expect frequent empty directories, you could optionally guard with an early return when not filtered_files to skip constructing a zero-length bar, but the current implementation is functionally fine.

CHANGELOG.md (1)

16-22: Minor wording polish for clarity.

Tighten the “Changed” bullet to avoid the compound adjective hiccup.

-**Switched to OSADL authoritative copyleft license data**
+**Switched to OSADL copyleft license data (authoritative)**
LICENSE (1)

25-31: Add a direct CC‑BY‑4.0 link for the embedded dataset.

Helps downstream consumers comply and audit.

 The OSADL data file contains its own license header with full attribution
 information.
+For the OSADL dataset license terms, see: https://creativecommons.org/licenses/by/4.0/
src/scanoss/constants.py (1)

21-22: Prefer immutable tuples for public constants.

Prevents accidental mutation and clarifies intent.

-VALID_LICENSE_SOURCES = ['component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode']
-DEFAULT_COPYLEFT_LICENSE_SOURCES = ['component_declared', 'license_file']
+VALID_LICENSE_SOURCES = ('component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode')
+DEFAULT_COPYLEFT_LICENSE_SOURCES = ('component_declared', 'license_file')

Additionally, consider typing:

# at top of file
from typing import Final, Tuple

VALID_LICENSE_SOURCES: Final[Tuple[str, ...]] = (
    'component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode'
)
DEFAULT_COPYLEFT_LICENSE_SOURCES: Final[Tuple[str, ...]] = (
    'component_declared', 'license_file'
)
src/scanoss/filecount.py (2)

152-161: Address Ruff PLC0206 and simplify CSV dict build.

Use .items() and avoid a temp list.

-        if file_types:
-            csv_dict = []
-            for k in file_types:
-                d = file_types[k]
-                csv_dict.append({'extension': k, 'count': d[0], 'size(MB)': f'{d[1] / (1 << 20):,.2f}'})
-            fields = ['extension', 'count', 'size(MB)']
+        if file_types:
+            rows = (
+                {'extension': ext, 'count': cnt_sz[0], 'size(MB)': f'{cnt_sz[1] / (1 << 20):,.2f}'}
+                for ext, cnt_sz in file_types.items()
+            )
+            fields = ['extension', 'count', 'size(MB)']
             file = sys.stdout
             if self.scan_output:
                 file = open(self.scan_output, 'w')
             writer = csv.DictWriter(file, fieldnames=fields)
             writer.writeheader()  # writing headers (field names)
-            writer.writerows(csv_dict)  # writing data rows
+            writer.writerows(rows)  # writing data rows

100-149: Consider a small extraction to reduce branches/statements (Ruff PLR0912/PLR0915).

Move per-file accounting into a helper (e.g., _accumulate(path, file_types)) and use setdefault:

-                        fc = file_types.get(f_suffix)
-                        if not fc:
-                            fc = [1, f_size]
-                        else:
-                            fc[0] = fc[0] + 1
-                            fc[1] = fc[1] + f_size
-                        file_types[f_suffix] = fc
+                        fc = file_types.setdefault(f_suffix, [0, 0])
+                        fc[0] += 1
+                        fc[1] += f_size
src/scanoss/inspection/policy_check/scanoss/copyleft.py (1)

56-69: Copyleft now always uses filtering mode; no way to opt out via API

The license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCES default plus passing self.license_sources into ScanResultProcessor means Copyleft will always operate in filtering mode (using either the CLI‑provided list or the default list), and never fall back to the priority mode implemented in _select_licenses.

That seems aligned with “default to component‑level licenses” from the PR description, but it also means there is currently no way (from this API) to request the old priority behaviour for copyleft checks.

If you foresee a need to restore priority mode, you might want an explicit sentinel (e.g. license_sources=None meaning “priority mode”) instead of unconditionally substituting the default here.

Also applies to: 92-101

src/scanoss/inspection/utils/scan_result_processor.py (1)

65-81: License source selection logic looks consistent with design

The new license_sources plumbing and _select_licenses() behaviour are coherent:

  • When license_sources is truthy, you filter to those sources while still including unknown/None, which avoids silently dropping legacy/uncategorised data.
  • When license_sources is falsy, the original priority semantics are preserved via the ordered source preference, with a sensible fallback to “all licenses”.

Given that Copyleft now always passes a non‑empty list, this class will typically operate in filtering mode for that flow, which matches the PR’s intent.

If you want to tighten things later, you could:

  • Type‑annotate license_sources as list[str] | None, and
  • Validate against VALID_LICENSE_SOURCES at the boundaries.

But functionally this looks good.

Also applies to: 167-189, 316-361

src/scanoss/inspection/utils/license_utils.py (1)

25-27: OSADL-backed copyleft logic and filters look correct

The new LicenseUtil wiring is clean:

  • The include/exclude/explicit parsing is reset-safe and case-insensitive.
  • is_copyleft() applies the precedence described in the docstring and only falls through to OSADL when no filters match.
  • Deferring to Osadl for the default copyleft decision aligns with the new authoritative dataset.

You might eventually consider exposing include/exclude/explicit as sets of SPDX IDs directly on the public API (rather than only via comma-separated strings) for programmatic callers, but that’s purely a convenience; the current implementation is solid.

Also applies to: 34-36, 40-46, 47-75, 76-107

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4060c9f and 6db2eac.

📒 Files selected for processing (18)
  • CHANGELOG.md (1 hunks)
  • LICENSE (1 hunks)
  • README.md (1 hunks)
  • src/scanoss/__init__.py (1 hunks)
  • src/scanoss/cli.py (4 hunks)
  • src/scanoss/constants.py (1 hunks)
  • src/scanoss/data/osadl-copyleft.json (1 hunks)
  • src/scanoss/filecount.py (2 hunks)
  • src/scanoss/inspection/policy_check/scanoss/copyleft.py (4 hunks)
  • src/scanoss/inspection/utils/license_utils.py (1 hunks)
  • src/scanoss/inspection/utils/scan_result_processor.py (3 hunks)
  • src/scanoss/osadl.py (1 hunks)
  • src/scanoss/scanner.py (5 hunks)
  • src/scanoss/scanners/folder_hasher.py (1 hunks)
  • src/scanoss/scanners/scanner_hfh.py (2 hunks)
  • src/scanoss/threadedscanning.py (3 hunks)
  • tests/test_osadl.py (1 hunks)
  • tests/test_policy_inspect.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
src/scanoss/scanners/scanner_hfh.py (2)
src/scanoss/scanossgrpc.py (1)
  • folder_hash_scan (429-447)
src/scanoss/scanossbase.py (1)
  • print_stderr (45-49)
tests/test_policy_inspect.py (2)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)
  • Copyleft (50-240)
  • run (220-240)
src/scanoss/inspection/policy_check/policy_check.py (2)
  • run (87-102)
  • PolicyStatus (33-44)
tests/test_osadl.py (2)
src/scanoss/osadl.py (2)
  • Osadl (31-130)
  • is_copyleft (108-130)
src/scanoss/inspection/utils/license_utils.py (1)
  • is_copyleft (76-107)
src/scanoss/scanner.py (3)
src/scanoss/file_filters.py (2)
  • get_filtered_files_from_folder (311-352)
  • get_filtered_files_from_files (354-404)
src/scanoss/threadedscanning.py (3)
  • stop_scanning (154-158)
  • queue_add (141-149)
  • run (168-191)
src/scanoss/scanners/scanner_hfh.py (1)
  • scan (126-152)
src/scanoss/inspection/utils/scan_result_processor.py (1)
src/scanoss/inspection/utils/license_utils.py (2)
  • LicenseUtil (30-116)
  • init (47-74)
src/scanoss/filecount.py (2)
src/scanoss/scanossbase.py (2)
  • print_msg (51-56)
  • print_trace (65-70)
src/scanoss/cli.py (1)
  • file_count (1310-1341)
src/scanoss/scanners/folder_hasher.py (2)
src/scanoss/scanossbase.py (1)
  • print_debug (58-63)
src/scanoss/utils/crc64.py (2)
  • CRC64 (29-96)
  • get_hash_buff (82-96)
src/scanoss/osadl.py (1)
src/scanoss/inspection/utils/license_utils.py (1)
  • is_copyleft (76-107)
src/scanoss/inspection/utils/license_utils.py (2)
src/scanoss/osadl.py (3)
  • Osadl (31-130)
  • print_debug (62-67)
  • is_copyleft (108-130)
src/scanoss/scanossbase.py (2)
  • ScanossBase (28-107)
  • print_debug (58-63)
🪛 GitHub Actions: Lint
src/scanoss/filecount.py

[error] 100-100: Ruff: PLR0912 Too many branches (15 > 12) in count_files method.


[error] 100-100: Ruff: PLR0915 Too many statements (54 > 50) in count_files method.


[error] 152-152: Ruff: PLC0206 Extracting value from dictionary without calling .items() in filecount.py.

🪛 LanguageTool
CHANGELOG.md

[uncategorized] ~17-~17: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

README.md

[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (7)
src/scanoss/__init__.py (1)

25-25: Version bump looks consistent with release.

__version__ = '1.41.0' matches the release branch and PR title; no further changes needed here.

README.md (1)

139-142: Dataset license notice and attribution look clear and sufficient.

The new section cleanly distinguishes the MIT license for the application from the CC-BY-4.0 license of the OSADL copyleft dataset, and provides proper attribution and source URL. No changes needed from a docs/UX standpoint.

tests/test_policy_inspect.py (2)

395-577: Coverage additions for license_sources look solid.

Good assertions across defaults, single/multiple sources, and output formats.


31-31: Incorrect review comment—proposed fix targets the wrong lines.

The review correctly identifies inconsistent imports in test_policy_inspect.py, but the fix direction is reversed. Line 31 (from scanoss.constants...) uses the correct import root; setup.cfg confirms the installed package name is scanoss (not src.scanoss). Lines 32–39 are the inconsistent ones and should be changed from src.scanoss.* to scanoss.* to match all other test files and the package configuration.

Likely an incorrect or invalid review comment.

src/scanoss/cli.py (1)

704-714: CLI wiring for -ls is correct and fully supported.

The action='extend' parameter is available in Python 3.8+, confirming your code works on Python 3.9 and later. No changes needed.

src/scanoss/data/osadl-copyleft.json (1)

1-133: Packaging verification confirmed—data file will be distributed correctly.

The OSADL dataset is properly configured: setup.cfg specifies [options.package_data] with * = data/*.txt, data/*.json, which matches osadl-copyleft.json. Combined with include_package_data = True, both wheel and source distributions will include the file. MANIFEST.in is not needed when package_data is explicitly configured.

src/scanoss/scanner.py (1)

785-789: Rewritten review comment is not needed — original review is based on incorrect assumptions.

The web search confirms that progress.bar.Bar and progress.spinner.Spinner already implement the context-manager protocol (__enter__ and __exit__ methods). The original code in src/scanoss/scanner.py (lines 785-789) is correct as written:

  • Bar can be used directly with with statements without additional wrapping
  • nullcontext() without arguments correctly returns None on entry, allowing the conditional check if bar: to work as intended
  • The original conditional pattern Bar(...) if condition else nullcontext() is valid and requires no modification

Likely an incorrect or invalid review comment.

Comment on lines +114 to +117
spinner_ctx = Spinner('Searching ') if (not self.quiet and self.isatty) else nullcontext()

with spinner_ctx as spinner:
file_types = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Spinner context misuse; wrap instance and finish.

Same fix as in scanner.py.

-        spinner_ctx = Spinner('Searching ') if (not self.quiet and self.isatty) else nullcontext()
+        spinner_ctx = nullcontext(Spinner('Searching ')) if (not self.quiet and self.isatty) else nullcontext()
@@
-                        if spinner:
-                            spinner.next()
+                        if spinner:
+                            spinner.next()
+            if spinner:
+                spinner.finish()

Also applies to: 146-148

Comment on lines +69 to +107
def _load_copyleft_data(self) -> bool:
"""
Load the embedded OSADL copyleft JSON file into class-level shared data.
Data is loaded only once and shared across all instances.
:return: True if successful, False otherwise
"""
if Osadl._data_loaded:
return True

# OSADL copyleft license checklist from: https://www.osadl.org/Checklists
# Data source: https://www.osadl.org/fileadmin/checklists/copyleft.json
# License: CC-BY-4.0 (Creative Commons Attribution 4.0 International)
# Copyright: (C) 2017 - 2024 Open Source Automation Development Lab (OSADL) eG
try:
f_name = importlib_resources.files(__name__) / 'data/osadl-copyleft.json'
with importlib_resources.as_file(f_name) as f:
with open(f, 'r', encoding='utf-8') as file:
data = json.load(file)
except Exception as e:
self.print_stderr(f'ERROR: Problem loading OSADL copyleft data: {e}')
return False

# Process copyleft data
copyleft = data.get('copyleft', {})
if not copyleft:
self.print_stderr('ERROR: No copyleft data found in OSADL JSON')
return False

# Store in class-level shared dictionary
for lic_id, status in copyleft.items():
# Normalize license ID (lowercase) for consistent lookup
lic_id_lc = lic_id.lower()
Osadl._shared_copyleft_data[lic_id_lc] = status

Osadl._data_loaded = True
self.print_debug(f'Loaded {len(Osadl._shared_copyleft_data)} OSADL copyleft entries')
return True

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OSADL load failure silently degrades to “no copyleft” for all licenses

Right now, if _load_copyleft_data() fails (e.g. missing or unreadable osadl-copyleft.json), it logs an error and returns False, but __init__ ignores that return value. Because _shared_copyleft_data remains empty, is_copyleft() will simply return False for every license.

For a copyleft policy check this is risky: a packaging or runtime issue would silently under‑report copyleft instead of failing fast.

Consider one of the following:

  • Raise an exception in __init__ when _load_copyleft_data() returns False, so callers can surface a hard failure; or
  • Track a separate “load_failed” flag and have is_copyleft() emit a stronger error or choose a safer default (e.g. treat everything as copyleft, or require explicit filters) rather than silently returning False.

This would make failures in the authoritative dataset much more visible to users.

Also applies to: 108-130

🤖 Prompt for AI Agents
In src/scanoss/osadl.py around lines 69-107 (and also applies to 108-130),
_load_copyleft_data() currently returns False on load error but __init__ ignores
that, causing silent under-reporting of copyleft; change the initializer to
fail-fast: call _load_copyleft_data() in __init__, and if it returns False raise
a clear RuntimeError (or custom exception) with the underlying error message so
callers see the failure; ensure any necessary cleanup or state (e.g.
_data_loaded remains False and _shared_copyleft_data empty) is preserved before
raising; alternatively, if you prefer non-exception behavior, implement a
_load_failed flag set by _load_copyleft_data() on error and update is_copyleft()
to check this flag and raise an error or apply a safe default—pick one approach
and implement it consistently (update docstrings and error messages
accordingly).

Comment on lines +367 to +370
spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()

with spinner_ctx as spinner:
save_wfps_for_print = not self.no_wfp_file or not self.threaded_scan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Spinner is not a context manager; wrap the instance and ensure cleanup.

Using Spinner() directly in a with can raise at runtime if it lacks __enter__/__exit__. Also, when nullcontext() is used, no finish() is called, leaving the cursor state messy.

Apply this pattern:

  • Wrap the real spinner with nullcontext(Spinner(...)).
  • Explicitly finish() before exiting the block.
-        spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+        spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext()

         with spinner_ctx as spinner:
@@
-            if self.threaded_scan and scan_block != '':
-                self.threaded_scan.queue_add(scan_block)  # Make sure all files have been submitted
+            if self.threaded_scan and scan_block != '':
+                self.threaded_scan.queue_add(scan_block)  # Make sure all files have been submitted
+            if spinner:
+                spinner.finish()

Also applies to: 423-425

🤖 Prompt for AI Agents
In src/scanoss/scanner.py around lines 367-370, the code uses Spinner(...)
directly in a with-statement which fails if Spinner isn't a context manager and
also leaves cursor state un-restored when nullcontext() is used; change creation
to wrap the real Spinner inside nullcontext(Spinner(...)) so the with always
yields a spinner object (e.g. nullcontext(Spinner(...)) when using a spinner,
nullcontext(Spinner(...)) when quiet is false/true accordingly), then inside the
with-block ensure you call spinner.finish() (or equivalent cleanup) before
exiting the block; apply the same change to the similar block at lines 423-425.

Comment on lines +637 to +640
spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()

with spinner_ctx as spinner:
save_wfps_for_print = not self.no_wfp_file or not self.threaded_scan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same spinner issue in scan_files; wrap and call finish().

Mirror the fix in scan_files().

-        spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+        spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext()
@@
-            if self.threaded_scan and scan_block != '':
-                self.threaded_scan.queue_add(scan_block)  # Make sure all files have been submitted
+            if self.threaded_scan and scan_block != '':
+                self.threaded_scan.queue_add(scan_block)  # Make sure all files have been submitted
+            if spinner:
+                spinner.finish()

Also applies to: 693-694

🤖 Prompt for AI Agents
In src/scanoss/scanner.py around lines 637-640 (and similarly 693-694), the
Spinner is being used via a context but not guaranteed to have its finish()
method called; change the usage so the spinner is assigned from the context
(e.g., spinner_ctx = Spinner(...) if (not self.quiet and self.isatty) else
nullcontext()), enter the context to get spinner, and ensure after the work
completes you explicitly call spinner.finish() when spinner is an actual Spinner
instance (use try/finally or check type) so the spinner always gets finished;
apply the same fix to the block at lines 693-694.

Comment on lines +1060 to +1063
spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()

with spinner_ctx as spinner:
to_fingerprint_files = file_filters.get_filtered_files_from_folder(scan_dir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Spinner context in wfp_folder: wrap and finish.

Same fix applies here.

-        spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+        spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext()
@@
-            for file in to_fingerprint_files:
+            for file in to_fingerprint_files:
                 if spinner:
                     spinner.next()
@@
-                wfps += self.winnowing.wfp_for_file(str(abs_path), file)
+                wfps += self.winnowing.wfp_for_file(str(abs_path), file)
+            if spinner:
+                spinner.finish()

Also applies to: 1065-1069

🤖 Prompt for AI Agents
In src/scanoss/scanner.py around lines 1060-1063 (and similarly for 1065-1069),
the spinner context is created conditionally but not guaranteed to be properly
finished; replace the conditional one-liner with logic that, when a Spinner() is
instantiated, explicitly starts it before the block and ensures finish() is
called in a finally block (or use a proper context manager that calls finish on
exit); if using nullcontext for the non-TTY case keep it, but for the real
Spinner ensure it is entered/started and finished so the spinner is wrapped and
cleaned up correctly.

Comment on lines +140 to 151
spinner_ctx = Spinner('Scanning folder...')

with spinner_ctx as spinner:
grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,))
grpc_thread.start()

def spin():
while not stop_spinner:
while grpc_thread.is_alive():
spinner.next()
time.sleep(0.1)

spinner_thread = threading.Thread(target=spin)
spinner_thread.start()

try:
response = self.client.folder_hash_scan(hfh_request, self.use_grpc)
if response:
self.scan_results = response
finally:
stop_spinner = True
spinner_thread.join()
spinner.finish()
grpc_thread.join()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Spinner + gRPC thread pattern improves UX and cleanup.

Running folder_hash_scan in a background thread while driving a Spinner in the foreground and using a with Spinner(...) as spinner: block ensures:

  • The spinner is advanced responsively while the gRPC call runs.
  • The spinner is properly finalized (and the cursor restored) even if an exception interrupts the loop, addressing the Ctrl+C cursor issue.
  • grpc_thread.join() guarantees scan_results is fully populated (or set to None by the helper) before returning.

Please just double-check that the progress.spinner.Spinner version you depend on supports context manager usage in this way, to avoid runtime surprises if the package is upgraded or replaced.


🌐 Web query:

Verify whether the `Spinner` class from the `progress` Python package supports context manager usage (`with Spinner(...) as spinner:`) in the version used by this project.

💡 Result:

I need the specific version of the progress package that your project uses (or the project's requirements file / pyproject.toml / Pipfile). Which version or repo/file should I check?


I need to find the version of the progress package used in the project first. Let me search the codebase for dependency specifications.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find dependency specification files
fd -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "Pipfile" -o -name "poetry.lock" \)

Length of output: 170


Let me correct that and search for dependency files with a simpler approach:


---

🏁 Script executed:

```shell
#!/bin/bash
# Find dependency specification files
find . -maxdepth 2 \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" \)

Length of output: 127


Now let me extract the progress package version from the requirements files:


---

🏁 Script executed:

```shell
#!/bin/bash
# Search for progress package in main requirements files
echo "=== requirements.txt ===" && cat requirements.txt && \
echo -e "\n=== pyproject.toml ===" && cat pyproject.toml

Length of output: 834


Perfect! The project has progress as a dependency without a version pin (meaning any version). Now let me search for information about whether the Spinner class supports context manager usage:

Let me search more specifically for the progress.spinner.Spinner class and its context manager support:

Let me search more specifically for the progress package's Spinner implementation:

Let me check the actual source code structure of the progress package to verify if Spinner inherits the context manager capability:

Perfect! I found the source code. Spinner inherits from Infinite, not from Bar. Now let me verify if the Infinite class supports context manager usage:

Based on my verification, I have found a significant concern with the review comment:

Spinner does NOT appear to support context manager usage.

The source code shows that Spinner inherits from Infinite, not from Bar. While the progress package documentation explicitly demonstrates Bar as a context manager with with Bar(...) as bar: syntax, Spinner is only documented with traditional loop usage: spinner = Spinner('Loading ') in a while loop. No official documentation or examples show Spinner being used as a context manager.

The code in the review attempts to use with Spinner(...) as spinner:, which is not supported by the progress.spinner.Spinner class and will raise an AttributeError at runtime.


Fix: Use the traditional loop pattern or wrap in explicit context manager.

Replace:

spinner_ctx = Spinner('Scanning folder...')
with spinner_ctx as spinner:
    grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,))
    grpc_thread.start()
    while grpc_thread.is_alive():
        spinner.next()
        time.sleep(0.1)
    grpc_thread.join()

With:

spinner = Spinner('Scanning folder...')
try:
    grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,))
    grpc_thread.start()
    while grpc_thread.is_alive():
        spinner.next()
        time.sleep(0.1)
    grpc_thread.join()
finally:
    spinner.finish()
🤖 Prompt for AI Agents
In src/scanoss/scanners/scanner_hfh.py around lines 140 to 151, the code uses
Spinner as a context manager but progress.spinner.Spinner does not support the
context-manager protocol; replace the with-block by instantiating Spinner
directly, start the grpc thread, run the existing loop calling spinner.next()
while the thread is alive, and ensure in a finally block you call
spinner.finish() (and join the thread if not already) so the spinner is always
cleaned up.

Comment on lines +83 to +99
def test_multiple_instances_share_data(self):
"""Test that multiple instances share the same class-level data"""
osadl1 = Osadl()
osadl2 = Osadl()

# Both instances should see data loaded by first instance
result1 = osadl1.is_copyleft('GPL-2.0-only')
self.assertTrue(result1)
self.assertTrue(Osadl._data_loaded)

# Second instance uses the same class-level shared data
result2 = osadl2.is_copyleft('MIT')
self.assertFalse(result2)

# Verify both instances reference the same class-level data
self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix tautological assertion in test_multiple_instances_share_data

The last assertion:

self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)

is always true and doesn’t verify that multiple instances share the same data. To test sharing more directly, consider:

self.assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data)

or equivalently comparing id(...) between the two instances.

🤖 Prompt for AI Agents
In tests/test_osadl.py around lines 83 to 99, the final assertion is
tautological (assertIs(Osadl._shared_copyleft_data,
Osadl._shared_copyleft_data)) and therefore doesn't verify instance sharing;
replace it with an assertion that compares the two instances' references (for
example assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data) or
assertEqual(id(osadl1._shared_copyleft_data), id(osadl2._shared_copyleft_data)))
so the test actually verifies that both instances reference the same class-level
data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants