Releases/v1.41.0 #169

isasmendiagus · 2025-11-17T14:14:50Z

New Features

Added --license-sources (-ls) option to filter copyleft inspection; supports both "-ls src1 src2" and repeated "-ls src1 -ls src2" syntaxes.

Changed

Copyleft dataset switched to OSADL authoritative source, expanding coverage (adds missing "-or-later" variants).
Copyleft checks now default to component-level licenses to reduce file-level noise.

Documentation

Added OSADL dataset attribution and license notice to the README.

Bug Fixes

Fixed terminal cursor disappearing after aborting a scan (Ctrl+C).

Summary by CodeRabbit

New Features
- Added --license-sources (-ls) CLI option to copyleft inspection for filtering specific license sources (component_declared, license_file, file_header, file_spdx_tag, scancode).
- Expanded copyleft license database from 21 to 32 licenses with support for -or-later variants, using OSADL authoritative data.
- Narrowed copyleft inspection to component-level licenses by default, reducing noise from file-level detections.
Bug Fixes
- Fixed terminal cursor disappearing after aborting scan with Ctrl+C.
Documentation
- Added dataset license notice for OSADL copyleft checklist (CC-BY-4.0).
Chores
- Version bumped to 1.41.0.

…inspection

coderabbitai · 2025-11-17T14:15:03Z

Walkthrough

This pull request adds OSADL-based copyleft license detection with source filtering, introduces a new CLI option --license-sources for filtering component-level vs. file-level license sources, refactors spinner/progress bar handling to use context managers, adds licensing metadata for the OSADL dataset, and includes comprehensive test coverage for the new functionality.

Changes

Cohort / File(s)	Summary
Version & Constants `src/scanoss/__init__.py`, `src/scanoss/constants.py`	Version bumped from 1.40.1 to 1.41.0; new constants `VALID_LICENSE_SOURCES` and `DEFAULT_COPYLEFT_LICENSE_SOURCES` added for license source filtering.
CLI & Copyleft Integration `src/scanoss/cli.py`, `src/scanoss/inspection/policy_check/scanoss/copyleft.py`	Added `--license-sources` (`-ls`) CLI option with validation; extended `Copyleft` constructor and `inspect_copyleft` handler to accept and forward `license_sources` parameter.
License Processing & Filtering `src/scanoss/inspection/utils/license_utils.py`, `src/scanoss/inspection/utils/scan_result_processor.py`	Refactored license evaluation to use OSADL data via new `Osadl` class; replaced static copyleft set with dynamic filtering based on include/exclude/explicit rules and license sources; added `_select_licenses` method to filter by source.
OSADL Data Integration `src/scanoss/osadl.py`, `src/scanoss/data/osadl-copyleft.json`	New `Osadl` class provides load-on-demand access to OSADL copyleft metadata; embedded JSON dataset with comprehensive copyleft license mappings and metadata (title, license, attribution, timestamp).
Spinner/Progress Bar Refactoring `src/scanoss/filecount.py`, `src/scanoss/scanner.py`, `src/scanoss/scanners/folder_hasher.py`, `src/scanoss/scanners/scanner_hfh.py`, `src/scanoss/threadedscanning.py`	Replaced explicit spinner/progress bar lifecycle management with context managers (`nullcontext`, `with` blocks); added `__del__` destructor and `atexit` cleanup to `ThreadedScanning`; minor error message corrections.
Documentation & Licensing `CHANGELOG.md`, `LICENSE`, `README.md`	Added CHANGELOG entries for new features; extended LICENSE with CC-BY-4.0 attribution for OSADL dataset; added "Dataset License Notice" section to README documenting OSADL copyleft data licensing.
Testing `tests/test_osadl.py`, `tests/test_policy_inspect.py`	New test module `test_osadl.py` with comprehensive coverage of `Osadl` class initialization, copyleft checks, case-insensitivity, and shared class-level data; extended `test_policy_inspect.py` with license source filtering scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Copyleft
    participant ScanResultProcessor
    participant LicenseUtil
    participant Osadl

    CLI->>Copyleft: __init__(license_sources=['component_declared', 'license_file'])
    Copyleft->>ScanResultProcessor: __init__(..., license_sources)
    ScanResultProcessor->>LicenseUtil: init(include, exclude, explicit)
    LicenseUtil->>Osadl: __init__()
    Osadl->>Osadl: _load_copyleft_data() [on first use]
    
    Note over LicenseUtil: Per license in component
    ScanResultProcessor->>ScanResultProcessor: _select_licenses(licenses_data)
    ScanResultProcessor->>ScanResultProcessor: filter by license_sources
    ScanResultProcessor->>LicenseUtil: is_copyleft(spdx_id)
    
    alt explicit_licenses provided
        LicenseUtil->>LicenseUtil: check explicit_licenses
    else include/exclude rules
        LicenseUtil->>LicenseUtil: check include_licenses, exclude_licenses
    else default to OSADL
        LicenseUtil->>Osadl: is_copyleft(spdx_id)
        Osadl->>Osadl: normalize ID & lookup in _shared_copyleft_data
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

License evaluation refactoring (license_utils.py): Significant behavioral change from static default set to dynamic OSADL-driven filtering with three filter collections; requires careful validation of include/exclude/explicit logic and OSADL integration.
Scan result processing (scan_result_processor.py): New _select_licenses method replaces priority-based ordering; requires verification of filtering correctness across all license source types.
OSADL data accessor (osadl.py): New class with shared class-level data and load-on-demand mechanism; verify error handling, normalization logic, and "Yes"/"Yes (restricted)" copyleft determination.
Spinner refactoring across five files: Repetitive context manager pattern changes reduce per-file complexity, but cross-module consistency and nullcontext behavior should be validated.
Potential edge cases: License source filtering with None values, interaction of include/exclude/explicit with license_sources, thread safety of shared OSADL data.

Possibly related PRs

Chore/refactor inspect module #161: Introduces refactoring of Copyleft and ScanResultProcessor classes that this PR extends with license_sources and OSADL data handling.
SP-3649-update-copyleft-inspection-rules #165: Implements identical license source filtering feature (--license-sources CLI option, VALID/DEFAULT constants, and license_sources parameter wiring).
SP-3561_conversion-issues #156: Adds license-source filtering and restriction logic to copyleft inspection comparable to this PR's approach.

Suggested labels

enhancement, feature

Suggested reviewers

eeisegn
matiasdaloia

Poem

🐰 Hops through OSADL's copyleft halls,
License sources filter license calls,
Spinners spin now safely managed,
No more dangling threads left stranded,
Attribution's heart beats true! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'Releases/v1.41.0' is generic and does not describe the main changes (new --license-sources option, OSADL dataset migration, cursor fix, etc.). It only indicates a version bump.	Use a descriptive title summarizing the main change, such as 'Add --license-sources option for copyleft inspection filtering' or 'Switch to OSADL copyleft dataset with license source filtering'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch releases/v1.41.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-17T14:15:22Z

SCANOSS SCAN Completed 🚀

Detected components: 1
Undeclared components: 0
Declared components: 1
Detected files: 86
Detected files undeclared: 0
Detected files declared: 86
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/scanoss/scanner.py (1)

939-941: Typo in warning text (“uncounted” → “encountered”).

-                                'Warning: Some errors uncounted while scanning. Results might be incomplete.'
+                                'Warning: Some errors encountered while scanning. Results might be incomplete.'

🧹 Nitpick comments (11)

src/scanoss/threadedscanning.py (1)

25-25: Progress bar cleanup is robust; consider centralizing atexit registration.

The combination of atexit.register(self.complete_bar) and __del__ calling complete_bar() should reliably finish the bar and restore the cursor even on abrupt termination, which addresses the disappearing-cursor issue.

One thing to keep in mind is that registering self.complete_bar in __init__ means:

Each ThreadedScanning instance adds an atexit handler.

Those handlers hold strong references to the instances, so they will live until process exit.

If you expect many ThreadedScanning instances over a long-lived process, you might consider a class-level/centralized atexit handler (or a weakref-based registry) instead of per-instance registration. For typical CLI usage with a single instance, this is fine as-is.

Also applies to: 81-82, 107-112

src/scanoss/scanners/scanner_hfh.py (1)

113-125: Helper _execute_grpc_scan keeps scan error handling localized.

Pulling the gRPC call into _execute_grpc_scan with a try/except that logs to stderr and resets scan_results to None makes the failure mode explicit and keeps the main scan() flow simple.

If you ever need deeper diagnostics, consider including more context (e.g., traceback.format_exc() when self.base.debug is enabled) so failures in folder_hash_scan are easier to triage, but it's not required for correctness.

src/scanoss/scanners/folder_hasher.py (1)

160-191: Context-managed progress bar correctly ensures cleanup during hashing.

Wrapping the file hashing loop in with Bar('Hashing files...', max=len(filtered_files)) as bar: and calling bar.next() once per file (even on exceptions) is a solid improvement:

Guarantees Bar.finish() runs on normal completion and on exceptions (including Ctrl+C), which helps avoid the hidden-cursor issue.

Keeps the tree-building and CRC64 logic unchanged in structure and behavior.

If you expect frequent empty directories, you could optionally guard with an early return when not filtered_files to skip constructing a zero-length bar, but the current implementation is functionally fine.
CHANGELOG.md (1)
16-22: Minor wording polish for clarity.

Tighten the “Changed” bullet to avoid the compound adjective hiccup.
-**Switched to OSADL authoritative copyleft license data**
+**Switched to OSADL copyleft license data (authoritative)**
LICENSE (1)
25-31: Add a direct CC‑BY‑4.0 link for the embedded dataset.

Helps downstream consumers comply and audit.
 The OSADL data file contains its own license header with full attribution
 information.
+For the OSADL dataset license terms, see: https://creativecommons.org/licenses/by/4.0/
src/scanoss/constants.py (1)
21-22: Prefer immutable tuples for public constants.

Prevents accidental mutation and clarifies intent.
-VALID_LICENSE_SOURCES = ['component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode']
-DEFAULT_COPYLEFT_LICENSE_SOURCES = ['component_declared', 'license_file']
+VALID_LICENSE_SOURCES = ('component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode')
+DEFAULT_COPYLEFT_LICENSE_SOURCES = ('component_declared', 'license_file')
Additionally, consider typing:
# at top of file
from typing import Final, Tuple

VALID_LICENSE_SOURCES: Final[Tuple[str, ...]] = (
    'component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode'
)
DEFAULT_COPYLEFT_LICENSE_SOURCES: Final[Tuple[str, ...]] = (
    'component_declared', 'license_file'
)
src/scanoss/filecount.py (2)
152-161: Address Ruff PLC0206 and simplify CSV dict build.

Use .items() and avoid a temp list.
-        if file_types:
-            csv_dict = []
-            for k in file_types:
-                d = file_types[k]
-                csv_dict.append({'extension': k, 'count': d[0], 'size(MB)': f'{d[1] / (1 << 20):,.2f}'})
-            fields = ['extension', 'count', 'size(MB)']
+        if file_types:
+            rows = (
+                {'extension': ext, 'count': cnt_sz[0], 'size(MB)': f'{cnt_sz[1] / (1 << 20):,.2f}'}
+                for ext, cnt_sz in file_types.items()
+            )
+            fields = ['extension', 'count', 'size(MB)']
             file = sys.stdout
             if self.scan_output:
                 file = open(self.scan_output, 'w')
             writer = csv.DictWriter(file, fieldnames=fields)
             writer.writeheader()  # writing headers (field names)
-            writer.writerows(csv_dict)  # writing data rows
+            writer.writerows(rows)  # writing data rows
100-149: Consider a small extraction to reduce branches/statements (Ruff PLR0912/PLR0915).

Move per-file accounting into a helper (e.g., _accumulate(path, file_types)) and use setdefault:
-                        fc = file_types.get(f_suffix)
-                        if not fc:
-                            fc = [1, f_size]
-                        else:
-                            fc[0] = fc[0] + 1
-                            fc[1] = fc[1] + f_size
-                        file_types[f_suffix] = fc
+                        fc = file_types.setdefault(f_suffix, [0, 0])
+                        fc[0] += 1
+                        fc[1] += f_size
src/scanoss/inspection/policy_check/scanoss/copyleft.py (1)

56-69: Copyleft now always uses filtering mode; no way to opt out via API

The license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCES default plus passing self.license_sources into ScanResultProcessor means Copyleft will always operate in filtering mode (using either the CLI‑provided list or the default list), and never fall back to the priority mode implemented in _select_licenses.

That seems aligned with “default to component‑level licenses” from the PR description, but it also means there is currently no way (from this API) to request the old priority behaviour for copyleft checks.

If you foresee a need to restore priority mode, you might want an explicit sentinel (e.g. license_sources=None meaning “priority mode”) instead of unconditionally substituting the default here.

Also applies to: 92-101

src/scanoss/inspection/utils/scan_result_processor.py (1)

65-81: License source selection logic looks consistent with design

The new license_sources plumbing and _select_licenses() behaviour are coherent:

When license_sources is truthy, you filter to those sources while still including unknown/None, which avoids silently dropping legacy/uncategorised data.

When license_sources is falsy, the original priority semantics are preserved via the ordered source preference, with a sensible fallback to “all licenses”.

Given that Copyleft now always passes a non‑empty list, this class will typically operate in filtering mode for that flow, which matches the PR’s intent.

If you want to tighten things later, you could:

Type‑annotate license_sources as list[str] | None, and

Validate against VALID_LICENSE_SOURCES at the boundaries.

But functionally this looks good.

Also applies to: 167-189, 316-361

src/scanoss/inspection/utils/license_utils.py (1)

25-27: OSADL-backed copyleft logic and filters look correct

The new LicenseUtil wiring is clean:

The include/exclude/explicit parsing is reset-safe and case-insensitive.

is_copyleft() applies the precedence described in the docstring and only falls through to OSADL when no filters match.

Deferring to Osadl for the default copyleft decision aligns with the new authoritative dataset.

You might eventually consider exposing include/exclude/explicit as sets of SPDX IDs directly on the public API (rather than only via comma-separated strings) for programmatic callers, but that’s purely a convenience; the current implementation is solid.

Also applies to: 34-36, 40-46, 47-75, 76-107

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4060c9f and 6db2eac.

📒 Files selected for processing (18)

CHANGELOG.md (1 hunks)
LICENSE (1 hunks)
README.md (1 hunks)
src/scanoss/__init__.py (1 hunks)
src/scanoss/cli.py (4 hunks)
src/scanoss/constants.py (1 hunks)
src/scanoss/data/osadl-copyleft.json (1 hunks)
src/scanoss/filecount.py (2 hunks)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (4 hunks)
src/scanoss/inspection/utils/license_utils.py (1 hunks)
src/scanoss/inspection/utils/scan_result_processor.py (3 hunks)
src/scanoss/osadl.py (1 hunks)
src/scanoss/scanner.py (5 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (2 hunks)
src/scanoss/threadedscanning.py (3 hunks)
tests/test_osadl.py (1 hunks)
tests/test_policy_inspect.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (9)

src/scanoss/scanners/scanner_hfh.py (2)

src/scanoss/scanossgrpc.py (1)

folder_hash_scan (429-447)

src/scanoss/scanossbase.py (1)

print_stderr (45-49)

tests/test_policy_inspect.py (2)

src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)

Copyleft (50-240)

run (220-240)

src/scanoss/inspection/policy_check/policy_check.py (2)

run (87-102)

PolicyStatus (33-44)

tests/test_osadl.py (2)

src/scanoss/osadl.py (2)

Osadl (31-130)

is_copyleft (108-130)

src/scanoss/inspection/utils/license_utils.py (1)

is_copyleft (76-107)

src/scanoss/scanner.py (3)

src/scanoss/file_filters.py (2)

get_filtered_files_from_folder (311-352)

get_filtered_files_from_files (354-404)

src/scanoss/threadedscanning.py (3)

stop_scanning (154-158)

queue_add (141-149)

run (168-191)

src/scanoss/scanners/scanner_hfh.py (1)

scan (126-152)

src/scanoss/inspection/utils/scan_result_processor.py (1)

src/scanoss/inspection/utils/license_utils.py (2)

LicenseUtil (30-116)

init (47-74)

src/scanoss/filecount.py (2)

src/scanoss/scanossbase.py (2)

print_msg (51-56)

print_trace (65-70)

src/scanoss/cli.py (1)

file_count (1310-1341)

src/scanoss/scanners/folder_hasher.py (2)

src/scanoss/scanossbase.py (1)

print_debug (58-63)

src/scanoss/utils/crc64.py (2)

CRC64 (29-96)

get_hash_buff (82-96)

src/scanoss/osadl.py (1)

src/scanoss/inspection/utils/license_utils.py (1)

is_copyleft (76-107)

src/scanoss/inspection/utils/license_utils.py (2)

src/scanoss/osadl.py (3)

Osadl (31-130)

print_debug (62-67)

is_copyleft (108-130)

src/scanoss/scanossbase.py (2)

ScanossBase (28-107)

print_debug (58-63)

🪛 GitHub Actions: Lint

src/scanoss/filecount.py

[error] 100-100: Ruff: PLR0912 Too many branches (15 > 12) in count_files method.

[error] 100-100: Ruff: PLR0915 Too many statements (54 > 50) in count_files method.

[error] 152-152: Ruff: PLC0206 Extracting value from dictionary without calling .items() in filecount.py.

🪛 LanguageTool

CHANGELOG.md

[uncategorized] ~17-~17: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

README.md

[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (7)

src/scanoss/__init__.py (1)

25-25: Version bump looks consistent with release.

__version__ = '1.41.0' matches the release branch and PR title; no further changes needed here.

README.md (1)

139-142: Dataset license notice and attribution look clear and sufficient.

The new section cleanly distinguishes the MIT license for the application from the CC-BY-4.0 license of the OSADL copyleft dataset, and provides proper attribution and source URL. No changes needed from a docs/UX standpoint.

tests/test_policy_inspect.py (2)

395-577: Coverage additions for license_sources look solid.

Good assertions across defaults, single/multiple sources, and output formats.

31-31: Incorrect review comment—proposed fix targets the wrong lines.

The review correctly identifies inconsistent imports in test_policy_inspect.py, but the fix direction is reversed. Line 31 (from scanoss.constants...) uses the correct import root; setup.cfg confirms the installed package name is scanoss (not src.scanoss). Lines 32–39 are the inconsistent ones and should be changed from src.scanoss.* to scanoss.* to match all other test files and the package configuration.

Likely an incorrect or invalid review comment.

src/scanoss/cli.py (1)

704-714: CLI wiring for -ls is correct and fully supported.

The action='extend' parameter is available in Python 3.8+, confirming your code works on Python 3.9 and later. No changes needed.

src/scanoss/data/osadl-copyleft.json (1)

1-133: Packaging verification confirmed—data file will be distributed correctly.

The OSADL dataset is properly configured: setup.cfg specifies [options.package_data] with * = data/*.txt, data/*.json, which matches osadl-copyleft.json. Combined with include_package_data = True, both wheel and source distributions will include the file. MANIFEST.in is not needed when package_data is explicitly configured.

src/scanoss/scanner.py (1)

785-789: Rewritten review comment is not needed — original review is based on incorrect assumptions.

The web search confirms that progress.bar.Bar and progress.spinner.Spinner already implement the context-manager protocol (__enter__ and __exit__ methods). The original code in src/scanoss/scanner.py (lines 785-789) is correct as written:

Bar can be used directly with with statements without additional wrapping

nullcontext() without arguments correctly returns None on entry, allowing the conditional check if bar: to work as intended

The original conditional pattern Bar(...) if condition else nullcontext() is valid and requires no modification

Likely an incorrect or invalid review comment.

coderabbitai · 2025-11-17T14:28:21Z

src/scanoss/filecount.py

+        spinner_ctx = Spinner('Searching ') if (not self.quiet and self.isatty) else nullcontext()
+
+        with spinner_ctx as spinner:
+            file_types = {}


⚠️ Potential issue | 🔴 Critical

Spinner context misuse; wrap instance and finish.

Same fix as in scanner.py.

- spinner_ctx = Spinner('Searching ') if (not self.quiet and self.isatty) else nullcontext() + spinner_ctx = nullcontext(Spinner('Searching ')) if (not self.quiet and self.isatty) else nullcontext() @@ - if spinner: - spinner.next() + if spinner: + spinner.next() + if spinner: + spinner.finish()

Also applies to: 146-148

coderabbitai · 2025-11-17T14:28:21Z

src/scanoss/osadl.py

+    def _load_copyleft_data(self) -> bool:
+        """
+        Load the embedded OSADL copyleft JSON file into class-level shared data.
+        Data is loaded only once and shared across all instances.
+
+        :return: True if successful, False otherwise
+        """
+        if Osadl._data_loaded:
+            return True
+
+        # OSADL copyleft license checklist from: https://www.osadl.org/Checklists
+        # Data source: https://www.osadl.org/fileadmin/checklists/copyleft.json
+        # License: CC-BY-4.0 (Creative Commons Attribution 4.0 International)
+        # Copyright: (C) 2017 - 2024 Open Source Automation Development Lab (OSADL) eG
+        try:
+            f_name = importlib_resources.files(__name__) / 'data/osadl-copyleft.json'
+            with importlib_resources.as_file(f_name) as f:
+                with open(f, 'r', encoding='utf-8') as file:
+                    data = json.load(file)
+        except Exception as e:
+            self.print_stderr(f'ERROR: Problem loading OSADL copyleft data: {e}')
+            return False
+
+        # Process copyleft data
+        copyleft = data.get('copyleft', {})
+        if not copyleft:
+            self.print_stderr('ERROR: No copyleft data found in OSADL JSON')
+            return False
+
+        # Store in class-level shared dictionary
+        for lic_id, status in copyleft.items():
+            # Normalize license ID (lowercase) for consistent lookup
+            lic_id_lc = lic_id.lower()
+            Osadl._shared_copyleft_data[lic_id_lc] = status
+
+        Osadl._data_loaded = True
+        self.print_debug(f'Loaded {len(Osadl._shared_copyleft_data)} OSADL copyleft entries')
+        return True
+


⚠️ Potential issue | 🟠 Major

OSADL load failure silently degrades to “no copyleft” for all licenses

Right now, if _load_copyleft_data() fails (e.g. missing or unreadable osadl-copyleft.json), it logs an error and returns False, but __init__ ignores that return value. Because _shared_copyleft_data remains empty, is_copyleft() will simply return False for every license.

For a copyleft policy check this is risky: a packaging or runtime issue would silently under‑report copyleft instead of failing fast.

Consider one of the following:

Raise an exception in __init__ when _load_copyleft_data() returns False, so callers can surface a hard failure; or

Track a separate “load_failed” flag and have is_copyleft() emit a stronger error or choose a safer default (e.g. treat everything as copyleft, or require explicit filters) rather than silently returning False.

This would make failures in the authoritative dataset much more visible to users.

Also applies to: 108-130

🤖 Prompt for AI Agents

In src/scanoss/osadl.py around lines 69-107 (and also applies to 108-130), _load_copyleft_data() currently returns False on load error but __init__ ignores that, causing silent under-reporting of copyleft; change the initializer to fail-fast: call _load_copyleft_data() in __init__, and if it returns False raise a clear RuntimeError (or custom exception) with the underlying error message so callers see the failure; ensure any necessary cleanup or state (e.g. _data_loaded remains False and _shared_copyleft_data empty) is preserved before raising; alternatively, if you prefer non-exception behavior, implement a _load_failed flag set by _load_copyleft_data() on error and update is_copyleft() to check this flag and raise an error or apply a safe default—pick one approach and implement it consistently (update docstrings and error messages accordingly).

coderabbitai · 2025-11-17T14:28:21Z

src/scanoss/scanner.py

+        spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+
+        with spinner_ctx as spinner:
+            save_wfps_for_print = not self.no_wfp_file or not self.threaded_scan


⚠️ Potential issue | 🔴 Critical

Spinner is not a context manager; wrap the instance and ensure cleanup.

Using Spinner() directly in a with can raise at runtime if it lacks __enter__/__exit__. Also, when nullcontext() is used, no finish() is called, leaving the cursor state messy.

Apply this pattern:

Wrap the real spinner with nullcontext(Spinner(...)).

Explicitly finish() before exiting the block.

- spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext() + spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext() with spinner_ctx as spinner: @@ - if self.threaded_scan and scan_block != '': - self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted + if self.threaded_scan and scan_block != '': + self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted + if spinner: + spinner.finish()

Also applies to: 423-425

🤖 Prompt for AI Agents

In src/scanoss/scanner.py around lines 367-370, the code uses Spinner(...) directly in a with-statement which fails if Spinner isn't a context manager and also leaves cursor state un-restored when nullcontext() is used; change creation to wrap the real Spinner inside nullcontext(Spinner(...)) so the with always yields a spinner object (e.g. nullcontext(Spinner(...)) when using a spinner, nullcontext(Spinner(...)) when quiet is false/true accordingly), then inside the with-block ensure you call spinner.finish() (or equivalent cleanup) before exiting the block; apply the same change to the similar block at lines 423-425.

coderabbitai · 2025-11-17T14:28:21Z

src/scanoss/scanner.py

+        spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+
+        with spinner_ctx as spinner:
+            save_wfps_for_print = not self.no_wfp_file or not self.threaded_scan


⚠️ Potential issue | 🔴 Critical

Same spinner issue in scan_files; wrap and call finish().

Mirror the fix in scan_files().

- spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext() + spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext() @@ - if self.threaded_scan and scan_block != '': - self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted + if self.threaded_scan and scan_block != '': + self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted + if spinner: + spinner.finish()

Also applies to: 693-694

🤖 Prompt for AI Agents

In src/scanoss/scanner.py around lines 637-640 (and similarly 693-694), the Spinner is being used via a context but not guaranteed to have its finish() method called; change the usage so the spinner is assigned from the context (e.g., spinner_ctx = Spinner(...) if (not self.quiet and self.isatty) else nullcontext()), enter the context to get spinner, and ensure after the work completes you explicitly call spinner.finish() when spinner is an actual Spinner instance (use try/finally or check type) so the spinner always gets finished; apply the same fix to the block at lines 693-694.

coderabbitai · 2025-11-17T14:28:22Z

src/scanoss/scanner.py

+        spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+
+        with spinner_ctx as spinner:
+            to_fingerprint_files = file_filters.get_filtered_files_from_folder(scan_dir)


⚠️ Potential issue | 🔴 Critical

Spinner context in wfp_folder: wrap and finish.

Same fix applies here.

- spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext() + spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext() @@ - for file in to_fingerprint_files: + for file in to_fingerprint_files: if spinner: spinner.next() @@ - wfps += self.winnowing.wfp_for_file(str(abs_path), file) + wfps += self.winnowing.wfp_for_file(str(abs_path), file) + if spinner: + spinner.finish()

Also applies to: 1065-1069

🤖 Prompt for AI Agents

In src/scanoss/scanner.py around lines 1060-1063 (and similarly for 1065-1069), the spinner context is created conditionally but not guaranteed to be properly finished; replace the conditional one-liner with logic that, when a Spinner() is instantiated, explicitly starts it before the block and ensures finish() is called in a finally block (or use a proper context manager that calls finish on exit); if using nullcontext for the non-TTY case keep it, but for the real Spinner ensure it is entered/started and finished so the spinner is wrapped and cleaned up correctly.

coderabbitai · 2025-11-17T14:28:22Z

src/scanoss/scanners/scanner_hfh.py

+        spinner_ctx = Spinner('Scanning folder...')
+
+        with spinner_ctx as spinner:
+            grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,))
+            grpc_thread.start()

-        def spin():
-            while not stop_spinner:
+            while grpc_thread.is_alive():
                spinner.next()
                time.sleep(0.1)

-        spinner_thread = threading.Thread(target=spin)
-        spinner_thread.start()
-
-        try:
-            response = self.client.folder_hash_scan(hfh_request, self.use_grpc)
-            if response:
-                self.scan_results = response
-        finally:
-            stop_spinner = True
-            spinner_thread.join()
-            spinner.finish()
+            grpc_thread.join()



⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Spinner + gRPC thread pattern improves UX and cleanup.

Running folder_hash_scan in a background thread while driving a Spinner in the foreground and using a with Spinner(...) as spinner: block ensures:

The spinner is advanced responsively while the gRPC call runs.

The spinner is properly finalized (and the cursor restored) even if an exception interrupts the loop, addressing the Ctrl+C cursor issue.

grpc_thread.join() guarantees scan_results is fully populated (or set to None by the helper) before returning.

Please just double-check that the progress.spinner.Spinner version you depend on supports context manager usage in this way, to avoid runtime surprises if the package is upgraded or replaced.

🌐 Web query:

Verify whether the `Spinner` class from the `progress` Python package supports context manager usage (`with Spinner(...) as spinner:`) in the version used by this project.

💡 Result:

I need the specific version of the progress package that your project uses (or the project's requirements file / pyproject.toml / Pipfile). Which version or repo/file should I check?

I need to find the version of the progress package used in the project first. Let me search the codebase for dependency specifications.

--- 🏁 Script executed: ```shell #!/bin/bash # Find dependency specification files fd -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "Pipfile" -o -name "poetry.lock" \)

Length of output: 170

Let me correct that and search for dependency files with a simpler approach:

--- 🏁 Script executed: ```shell #!/bin/bash # Find dependency specification files find . -maxdepth 2 \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" \)

Length of output: 127

Now let me extract the progress package version from the requirements files:

--- 🏁 Script executed: ```shell #!/bin/bash # Search for progress package in main requirements files echo "=== requirements.txt ===" && cat requirements.txt && \ echo -e "\n=== pyproject.toml ===" && cat pyproject.toml

Length of output: 834

Perfect! The project has progress as a dependency without a version pin (meaning any version). Now let me search for information about whether the Spinner class supports context manager usage:

Let me search more specifically for the progress.spinner.Spinner class and its context manager support:

Let me search more specifically for the progress package's Spinner implementation:

Let me check the actual source code structure of the progress package to verify if Spinner inherits the context manager capability:

Perfect! I found the source code. Spinner inherits from Infinite, not from Bar. Now let me verify if the Infinite class supports context manager usage:

Based on my verification, I have found a significant concern with the review comment:

Spinner does NOT appear to support context manager usage.

The source code shows that Spinner inherits from Infinite, not from Bar. While the progress package documentation explicitly demonstrates Bar as a context manager with with Bar(...) as bar: syntax, Spinner is only documented with traditional loop usage: spinner = Spinner('Loading ') in a while loop. No official documentation or examples show Spinner being used as a context manager.

The code in the review attempts to use with Spinner(...) as spinner:, which is not supported by the progress.spinner.Spinner class and will raise an AttributeError at runtime.

Fix: Use the traditional loop pattern or wrap in explicit context manager.

Replace:

spinner_ctx = Spinner('Scanning folder...') with spinner_ctx as spinner: grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,)) grpc_thread.start() while grpc_thread.is_alive(): spinner.next() time.sleep(0.1) grpc_thread.join()

With:

spinner = Spinner('Scanning folder...') try: grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,)) grpc_thread.start() while grpc_thread.is_alive(): spinner.next() time.sleep(0.1) grpc_thread.join() finally: spinner.finish()

🤖 Prompt for AI Agents

In src/scanoss/scanners/scanner_hfh.py around lines 140 to 151, the code uses Spinner as a context manager but progress.spinner.Spinner does not support the context-manager protocol; replace the with-block by instantiating Spinner directly, start the grpc thread, run the existing loop calling spinner.next() while the thread is alive, and ensure in a finally block you call spinner.finish() (and join the thread if not already) so the spinner is always cleaned up.

coderabbitai · 2025-11-17T14:28:22Z

tests/test_osadl.py

+    def test_multiple_instances_share_data(self):
+        """Test that multiple instances share the same class-level data"""
+        osadl1 = Osadl()
+        osadl2 = Osadl()
+
+        # Both instances should see data loaded by first instance
+        result1 = osadl1.is_copyleft('GPL-2.0-only')
+        self.assertTrue(result1)
+        self.assertTrue(Osadl._data_loaded)
+
+        # Second instance uses the same class-level shared data
+        result2 = osadl2.is_copyleft('MIT')
+        self.assertFalse(result2)
+
+        # Verify both instances reference the same class-level data
+        self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)
+


⚠️ Potential issue | 🟡 Minor

Fix tautological assertion in test_multiple_instances_share_data

The last assertion:

self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)

is always true and doesn’t verify that multiple instances share the same data. To test sharing more directly, consider:

self.assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data)

or equivalently comparing id(...) between the two instances.

🤖 Prompt for AI Agents

In tests/test_osadl.py around lines 83 to 99, the final assertion is tautological (assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)) and therefore doesn't verify instance sharing; replace it with an assertion that compares the two instances' references (for example assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data) or assertEqual(id(osadl1._shared_copyleft_data), id(osadl2._shared_copyleft_data))) so the test actually verifies that both instances reference the same class-level data.

isasmendiagus added 5 commits November 14, 2025 17:42

fix(cli): terminal cursor disappears after aborting scan with ctrl+c

0a561dc

feat(policies): add support for license source filtering in copyleft …

6242fdb

…inspection

feat(licenses): integrate OSADL copyleft data

63ae916

chore(linter): improve comment formatting

ba806b6

chore: bump version to 1.41.0

6db2eac

isasmendiagus closed this Nov 17, 2025

isasmendiagus deleted the releases/v1.41.0 branch November 17, 2025 14:20

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Releases/v1.41.0 #169

Releases/v1.41.0 #169

Uh oh!

isasmendiagus commented Nov 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

coderabbitai bot Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Releases/v1.41.0 #169

Releases/v1.41.0 #169

Uh oh!

Conversation

isasmendiagus commented Nov 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Features

Changed

Documentation

Bug Fixes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Nov 17, 2025

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

isasmendiagus commented Nov 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 17, 2025 •

edited

Loading