Skip to content

Conversation

@isasmendiagus
Copy link
Contributor

@isasmendiagus isasmendiagus commented Nov 17, 2025

Summary by CodeRabbit

  • New Features

    • Added --license-sources option to filter copyleft inspection by source types (component_declared, license_file, file_header, file_spdx_tag, scancode)
    • Upgraded copyleft database to OSADL authoritative data, expanding coverage from 21 to 32 licenses with support for or-later variants
    • Changed copyleft inspection to default to component-level sources only, reducing noise
  • Bug Fixes

    • Fixed terminal cursor disappearing after aborting scan with Ctrl+C
  • Documentation

    • Added dataset attribution (CC-BY-4.0) to README

@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Walkthrough

This PR introduces configurable license source filtering for copyleft inspection, integrates OSADL dataset-backed license checking, updates to version 1.41.0, refactors spinner lifecycle management across multiple scanning paths using context managers, and adds comprehensive test coverage.

Changes

Cohort / File(s) Change Summary
Documentation & Metadata
CHANGELOG.md, README.md
Added Unreleased section documenting new --license-sources option, OSADL dataset integration, changed copyleft defaults to component-level only, and Ctrl+C cursor fix. Added Dataset License Notice with CC-BY-4.0 attribution for OSADL data.
Version Bump
src/scanoss/__init__.py
Incremented __version__ from '1.40.1' to '1.41.0'.
CLI Command Wiring
src/scanoss/cli.py
Added new -ls/--license-sources option to inspect copyleft command. Imported and threaded DEFAULT_COPYLEFT_LICENSE_SOURCES and VALID_LICENSE_SOURCES through the inspection path. Updated Copyleft constructor call to pass license_sources parameter.
Constants
src/scanoss/constants.py
Introduced VALID_LICENSE_SOURCES list (component_declared, license_file, file_header, file_spdx_tag, scancode) and DEFAULT_COPYLEFT_LICENSE_SOURCES list (component_declared, license_file).
OSADL Dataset & Class
src/scanoss/data/osadl-copyleft.json, src/scanoss/osadl.py
Added new OSADL copyleft dataset JSON file with 32+ license entries (including -or-later variants). Created new Osadl class with class-level caching, is_copyleft() method, and automatic data loading from embedded resource.
License Utilities Refactor
src/scanoss/inspection/utils/license_utils.py
Replaced hardcoded copyleft list with OSADL-backed filtering. Introduced explicit/include/exclude license filters as instance attributes. Reworked is_copyleft() logic to honor explicit list first, then include, then exclude, else delegate to OSADL. Removed BASE_OSADL_URL and get_osadl_url().
Copyleft Inspection
src/scanoss/inspection/policy_check/scanoss/copyleft.py
Added license_sources parameter to Copyleft.__init__, stored as instance attribute with default fallback. Updated docstrings and passed license_sources to ScanResultProcessor.
Result Processing
src/scanoss/inspection/utils/scan_result_processor.py
Added optional license_sources parameter to constructor. Introduced _select_licenses() method supporting filtering mode (when sources specified) and priority mode (original behavior). Replaces per-license-order processing with source-aware selection.
Spinner Context Management
src/scanoss/filecount.py, src/scanoss/scanner.py, src/scanoss/scanners/folder_hasher.py, src/scanoss/scanners/scanner_hfh.py, src/scanoss/threadedscanning.py
Refactored spinner/progress-bar lifecycle across scanning methods to use context managers (with nullcontext fallback). Added _execute_grpc_scan() helper in ScannerHFH for threaded gRPC execution. Added __del__ and atexit registration in ThreadedScanning for cleanup on exit.
Test Coverage
tests/test_osadl.py, tests/test_policy_inspect.py
Added new test module for Osadl class with initialization, copyleft checks, case-insensitivity, and class-level caching validation. Extended test_policy_inspect.py with extensive license source filtering scenarios (default, individual sources, multiple sources, output formats).

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI
    participant Copyleft
    participant ScanResultProcessor
    participant LicenseUtil
    participant Osadl

    User->>CLI: scanoss inspect copyleft --license-sources component_declared
    CLI->>Copyleft: __init__(license_sources=['component_declared'])
    Copyleft->>ScanResultProcessor: __init__(license_sources=['component_declared'])
    
    loop For each scan result
        ScanResultProcessor->>LicenseUtil: Process component licenses
        LicenseUtil->>Osadl: is_copyleft(spdx_id)
        Osadl-->>LicenseUtil: bool (from cached OSADL data)
        LicenseUtil->>ScanResultProcessor: copyleft status
        
        alt license_sources filtering mode
            ScanResultProcessor->>ScanResultProcessor: _select_licenses(filters by source)
        else priority mode (no filtering)
            ScanResultProcessor->>ScanResultProcessor: _select_licenses(default priority order)
        end
        
        ScanResultProcessor-->>ScanResultProcessor: append filtered licenses
    end
    
    Copyleft-->>CLI: inspection results
    CLI-->>User: formatted output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • License filtering logic: The new _select_licenses() method in ScanResultProcessor introduces dual-mode behavior (filtering vs. priority)—verify correctness of both code paths and edge cases (empty sources, None, etc.).
  • OSADL data integration: Confirm OSADL dataset loading from embedded resource, license ID normalization to lowercase, and handling of missing data.
  • Spinner context refactoring: Multiple files use new nullcontext pattern for lifecycle management; verify proper initialization, cleanup on success/error, and no resource leaks or unintended UI behavior changes.
  • API changes: Constructor signature changes to Copyleft and ScanResultProcessor with new optional license_sources parameter—ensure backward compatibility and correct parameter threading through all call sites.
  • Test coverage in test_policy_inspect.py: Extensive test additions; verify all license source scenarios are properly tested and assertions are correct.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • eeisegn

Poem

🐰 Hop, hop, sources bloom!
OSADL data finds its room,
Copyleft now picks and chooses,
While spinners dance in context's muses,
License checks spin true and bright!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'release/1.41.0' is vague and generic, indicating only the version number without describing the actual changes or features introduced in this release. Consider using a more descriptive title that summarizes key changes, such as 'Add copyleft license source filtering and update to OSADL dataset' to make the PR purpose clear to reviewers.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 87.04% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch release/1.41.0

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

SCANOSS SCAN Completed 🚀

  • Detected components: 1
  • Undeclared components: 0
  • Declared components: 1
  • Detected files: 87
  • Detected files undeclared: 0
  • Detected files declared: 87
  • Licenses detected: 1
  • Licenses detected with copyleft: 0
  • Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (9)
src/scanoss/filecount.py (1)

109-109: Minor: Unnecessary change from f-string to regular string.

The change from f'ERROR: Please specify a folder to scan' to a regular string literal is cosmetic since no variables are interpolated. While not harmful, it's an optional refactor.

src/scanoss/scanners/scanner_hfh.py (1)

113-124: Consider logging the exception details for debugging.

The exception handler at line 123 prints an error message but doesn't include the full traceback. For debugging gRPC issues, trace-level logging with full exception details would be helpful.

     def _execute_grpc_scan(self, hfh_request: Dict) -> None:
         """
         Execute folder hash scan.
 
         Args:
             hfh_request: Request dictionary for the gRPC call
         """
         try:
             self.scan_results = self.client.folder_hash_scan(hfh_request, self.use_grpc)
         except Exception as e:
             self.base.print_stderr(f'Error during folder hash scan: {e}')
+            self.base.print_trace(f'Full traceback: {e}', exc_info=True)
             self.scan_results = None
src/scanoss/scanner.py (1)

29-31: Spinner/nullcontext refactor is sound; consider a small helper to reduce duplication

Using Spinner('Fingerprinting ') only when not self.quiet and self.isatty, with nullcontext() otherwise, plus with spinner_ctx as spinner: and if spinner: spinner.next() keeps behavior correct and fixes cleanup on exceptions/CTRL+C. The same pattern now appears in scan_folder, scan_files, and wfp_folder, so a private helper like _spinner_ctx(label: str) could DRY this up if you decide to touch this again.

Also applies to: 367-425, 624-707, 1058-1078

src/scanoss/inspection/policy_check/scanoss/copyleft.py (1)

29-30: License source wiring looks correct; empty-list behavior is worth confirming

Importing DEFAULT_COPYLEFT_LICENSE_SOURCES, storing self.license_sources = license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCES, and passing that into ScanResultProcessor lines up cleanly with _select_licenses’ filtering mode, so the new default (“component_declared, license_fileplusunknown) and -ls ...` overrides should work as intended.

Note that using license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCES means an explicit empty list will also fall back to the default set. If you ever need to distinguish “no filtering, use priority mode” from “use default sources”, you’ll want a sentinel (e.g., None) instead of relying on truthiness. If treating [] as “use defaults” is intentional, this is fine as-is.

Also applies to: 68-69, 92-101

src/scanoss/cli.py (2)

704-713: Clarify/verify argparse extend usage for --license-sources

The option wiring is logically sound (multiple -ls values, validated against VALID_LICENSE_SOURCES, default handled in downstream logic via None). Two things to verify/consider:

  • action='extend' is only available in Python 3.8+. If this project still supports 3.7, this will break argument parsing; otherwise it’s fine but should match the documented supported versions.
  • Right now the default behavior (no -ls) is implemented in business logic rather than via default=.... That’s OK, but the help string is the only place where the default list appears; if the constant changes, help stays correct, but tests/docs should cover the implied default behavior.

If 3.7 support is still required, switching to action='append' plus flattening later would avoid the compatibility concern.


1756-1769: Defensive access to license_sources when calling Copyleft

Passing license_sources=args.license_sources cleanly threads the CLI option through to Copyleft. However, inspect_copyleft can be called programmatically with a custom Namespace that may not define license_sources, which would raise AttributeError.

To make this more robust without affecting CLI behavior, consider:

-            license_sources=args.license_sources,  # License sources to check (list)
+            license_sources=getattr(args, "license_sources", None),  # License sources to check (list)

Also consider documenting license_sources in the function docstring alongside include/exclude/explicit.

src/scanoss/inspection/utils/scan_result_processor.py (2)

65-82: Constructor change is reasonable; consider normalizing license_sources

Adding license_sources to ScanResultProcessor.__init__ and storing it on the instance is fine and backwards compatible when callers pass None.

You might want to normalize it once here (e.g., convert to a set or ensure it’s None or a non‑empty list) so _select_licenses doesn’t have to worry about unexpected types or accidental [] vs None differences:

-        self.license_sources = license_sources
+        self.license_sources = list(license_sources) if license_sources else None

This keeps the “priority mode when not provided” behavior while guarding against non‑iterables being passed in.


316-362: License selection logic: semantics are sound, but tighten edge‑case handling

The two‑mode behavior is clear and useful:

  • Filtering mode when license_sources is provided, including 'unknown' and None.
  • Priority mode otherwise, preferring component_declared → license_file → file_header → scancode, with a full fallback list.

A couple of refinements to consider:

  1. Empty license_sources vs None
    Currently any truthy value triggers filtering mode; an explicit empty list (e.g., license_sources=[] from non‑CLI callers) falls back to priority mode. That’s slightly surprising. If you want “explicitly no sources” to mean “no licenses”, you could handle it explicitly:

    if self.license_sources is not None:
        if not self.license_sources:
            return []
        ...
  2. Minor readability tweak
    The filter comprehension can be slightly clearer:

  •    if self.license_sources:
    
  •        sources_to_include = set(self.license_sources) | {'unknown'}
    
  •        return [lic for lic in licenses_data
    
  •                if lic.get('source') in sources_to_include or lic.get('source') is None]
    
  •    if self.license_sources is not None:
    
  •        sources_to_include = set(self.license_sources) | {'unknown'}
    
  •        return [
    
  •            lic for lic in licenses_data
    
  •            if lic.get('source') in sources_to_include or lic.get('source') is None
    
  •        ]
    
    
    

Functionally you’re in good shape; these are just to make behavior around edge cases more explicit.

src/scanoss/osadl.py (1)

108-131: is_copyleft behavior is sensible; consider explicit “data not loaded” guard

The lookup and normalization logic is straightforward:

  • Empty/None SPDX IDs return False.
  • Unknown IDs log a debug message and return False.
  • Status values are treated as copyleft when they start with "Yes" (covers "Yes" and "Yes (restricted)").

One optional robustness improvement:

-        status = Osadl._shared_copyleft_data.get(spdx_id_lc)
+        if not Osadl._data_loaded:
+            self.print_debug('OSADL copyleft data not loaded; treating as non-copyleft')
+            return False
+
+        status = Osadl._shared_copyleft_data.get(spdx_id_lc)

This makes it explicit when results are falling back due to a load failure rather than a missing SPDX ID.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4060c9f and 6db2eac.

📒 Files selected for processing (18)
  • CHANGELOG.md (1 hunks)
  • LICENSE (1 hunks)
  • README.md (1 hunks)
  • src/scanoss/__init__.py (1 hunks)
  • src/scanoss/cli.py (4 hunks)
  • src/scanoss/constants.py (1 hunks)
  • src/scanoss/data/osadl-copyleft.json (1 hunks)
  • src/scanoss/filecount.py (2 hunks)
  • src/scanoss/inspection/policy_check/scanoss/copyleft.py (4 hunks)
  • src/scanoss/inspection/utils/license_utils.py (1 hunks)
  • src/scanoss/inspection/utils/scan_result_processor.py (3 hunks)
  • src/scanoss/osadl.py (1 hunks)
  • src/scanoss/scanner.py (5 hunks)
  • src/scanoss/scanners/folder_hasher.py (1 hunks)
  • src/scanoss/scanners/scanner_hfh.py (2 hunks)
  • src/scanoss/threadedscanning.py (3 hunks)
  • tests/test_osadl.py (1 hunks)
  • tests/test_policy_inspect.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
src/scanoss/scanner.py (3)
src/scanoss/file_filters.py (2)
  • get_filtered_files_from_folder (311-352)
  • get_filtered_files_from_files (354-404)
src/scanoss/threadedscanning.py (3)
  • stop_scanning (154-158)
  • queue_add (141-149)
  • run (168-191)
src/scanoss/scanners/scanner_hfh.py (1)
  • scan (126-152)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (1)
src/scanoss/inspection/utils/scan_result_processor.py (1)
  • ScanResultProcessor (53-412)
src/scanoss/scanners/scanner_hfh.py (2)
src/scanoss/scanossgrpc.py (1)
  • folder_hash_scan (429-447)
src/scanoss/scanossbase.py (1)
  • print_stderr (45-49)
src/scanoss/filecount.py (2)
src/scanoss/scanossbase.py (2)
  • print_msg (51-56)
  • print_trace (65-70)
src/scanoss/cli.py (1)
  • file_count (1310-1341)
src/scanoss/scanners/folder_hasher.py (2)
src/scanoss/scanossbase.py (1)
  • print_debug (58-63)
src/scanoss/utils/crc64.py (2)
  • CRC64 (29-96)
  • get_hash_buff (82-96)
src/scanoss/inspection/utils/scan_result_processor.py (1)
src/scanoss/inspection/utils/license_utils.py (2)
  • LicenseUtil (30-116)
  • init (47-74)
tests/test_policy_inspect.py (2)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)
  • Copyleft (50-240)
  • run (220-240)
src/scanoss/inspection/policy_check/policy_check.py (2)
  • run (87-102)
  • PolicyStatus (33-44)
src/scanoss/osadl.py (1)
src/scanoss/inspection/utils/license_utils.py (1)
  • is_copyleft (76-107)
src/scanoss/inspection/utils/license_utils.py (2)
src/scanoss/osadl.py (3)
  • Osadl (31-130)
  • print_debug (62-67)
  • is_copyleft (108-130)
src/scanoss/scanossbase.py (2)
  • ScanossBase (28-107)
  • print_debug (58-63)
tests/test_osadl.py (2)
src/scanoss/osadl.py (2)
  • Osadl (31-130)
  • is_copyleft (108-130)
src/scanoss/inspection/utils/license_utils.py (1)
  • is_copyleft (76-107)
🪛 GitHub Actions: Lint
src/scanoss/filecount.py

[error] 100-100: PLR0912 Too many branches (15 > 12).


[error] 100-100: PLR0915 Too many statements (54 > 50).


[error] 152-152: PLC0206 Extracting value from dictionary without calling .items().

🪛 LanguageTool
README.md

[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

CHANGELOG.md

[uncategorized] ~17-~17: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (15)
src/scanoss/__init__.py (1)

25-25: Version bump is appropriate for the release scope.

The minor version bump from 1.40.1 to 1.41.0 aligns with the feature additions and improvements introduced in this PR (new CLI option, OSADL data, progress bar refactors, etc.).

src/scanoss/filecount.py (1)

114-148: LGTM! Clean refactoring to context-managed spinner.

The refactoring properly uses nullcontext() when no spinner is needed and ensures cleanup across all paths. The conditional if spinner: check before calling spinner.next() is appropriate.

README.md (1)

139-142: LGTM! Proper attribution for OSADL dataset.

The license notice correctly attributes the OSADL copyleft license dataset with appropriate licensing information (CC-BY-4.0) and source link.

LICENSE (1)

21-31: LGTM! Proper dual-license attribution.

The addition clearly separates the MIT-licensed code from the CC-BY-4.0 licensed OSADL dataset, with appropriate references to the data file location.

src/scanoss/data/osadl-copyleft.json (1)

1-133: LGTM! Well-structured OSADL copyleft dataset.

The JSON file includes proper metadata with license attribution (CC-BY-4.0), copyright notice, disclaimer, and timestamp. The copyleft mapping covers 121 license identifiers with appropriate status values.

src/scanoss/constants.py (1)

20-22: LGTM! Well-defined license source constants.

The constants appropriately define valid license sources and sensible defaults for copyleft inspection (focusing on component-declared and license file sources).

tests/test_policy_inspect.py (2)

31-31: LGTM! Proper import of new constants for testing.


393-645: Excellent comprehensive test coverage for license source filtering!

The test suite thoroughly covers:

  • Default behavior and explicit None handling
  • Individual license sources (component_declared, license_file, file_header, scancode)
  • Multiple source combinations
  • All valid sources
  • Integration with markdown output
  • Interaction with include/exclude filters
  • No-copyleft scenarios

The tests validate both the filtering logic and integration with other features.

src/scanoss/scanners/folder_hasher.py (1)

160-191: Context-managed progress bar looks correct

Wrapping the hashing loop in with Bar(...) as bar: keeps behavior the same while ensuring the bar is cleaned up on exceptions/KeyboardInterrupt, and bar.next() is still called once per file (even on errors). No issues spotted in the refactor.

src/scanoss/scanner.py (1)

785-852: Bar/nullcontext handling in scan_wfp_file is correct

bar_ctx = Bar('Scanning', max=file_count) if (not self.quiet and self.isatty) else nullcontext() plus with bar_ctx as bar: and if bar: bar.next(...) gives you a progress bar only in interactive, non-quiet runs and guarantees teardown on errors. The batching/scan logic around it is unchanged, so behavior should remain identical aside from the more robust bar lifecycle.

CHANGELOG.md (1)

10-28: Unreleased notes accurately describe the new copyleft behavior

The --license-sources option, OSADL switch, expanded copyleft coverage, new defaults, and the Ctrl+C cursor fix are clearly explained and aligned with the implementation. No changes needed here.

src/scanoss/inspection/utils/license_utils.py (1)

25-27: OSADL-based copyleft logic and filter precedence look correct

Initializing Osadl once per LicenseUtil, resetting the include/exclude/explicit sets in init(), and applying precedence in is_copyleft as:

  1. explicit-only if provided,
  2. include overrides (force copyleft),
  3. exclude overrides (force non-copyleft),
  4. otherwise defer to OSADL,

matches the intended behavior and plays nicely with repeated init() calls. This is a clear improvement over the previous hard-coded list approach.

Also applies to: 34-36, 40-47, 47-76, 76-107

src/scanoss/cli.py (1)

56-69: New copyleft constants import looks consistent

Importing DEFAULT_COPYLEFT_LICENSE_SOURCES and VALID_LICENSE_SOURCES here is appropriate and keeps all inspect‑related defaults centralized in constants.py. No issues from this change alone.

src/scanoss/inspection/utils/scan_result_processor.py (1)

161-190: Use of _select_licenses in _append_license_to_component is correct

The new call to _select_licenses(new_component['licenses']) is placed after the early‑return when licenses is missing, so it won’t see None and won’t regress the existing warning path. Looping over selected_licenses preserves the previous counting behavior while allowing the new filtering/priority semantics.

No functional issues spotted here.

src/scanoss/osadl.py (1)

31-54: Overall Osadl design and shared caching look good

The Osadl class cleanly encapsulates OSADL copyleft data, and the class‑level _shared_copyleft_data/_data_loaded pattern avoids repeated file I/O across instances. Debug printing via print_debug and print_stderr is consistent with the rest of the codebase.

No structural issues here.

@github-actions
Copy link

SCANOSS SCAN Completed 🚀

  • Detected components: 1
  • Undeclared components: 0
  • Declared components: 1
  • Detected files: 86
  • Detected files undeclared: 0
  • Detected files declared: 86
  • Licenses detected: 1
  • Licenses detected with copyleft: 0
  • Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

@isasmendiagus isasmendiagus merged commit a156202 into main Nov 17, 2025
4 of 6 checks passed
@isasmendiagus isasmendiagus deleted the release/1.41.0 branch November 17, 2025 16:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/test_osadl.py (1)

83-99: Fix the vacuous assertion to verify shared data.

Line 98 contains self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data) which always passes. To properly test that multiple instances share the same class-level data, use:

-        # Verify both instances reference the same class-level data
-        self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)
+        # Verify both instances reference the same class-level shared data
+        self.assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data)
src/scanoss/osadl.py (1)

73-73: Critical: Fix resource loading path.

Using __name__ (which resolves to "scanoss.osadl") will look for the data file at scanoss/osadl/data/osadl-copyleft.json, but the actual file is at scanoss/data/osadl-copyleft.json. This will cause data loading to fail silently.

Use __package__ instead to anchor at the package level:

-            f_name = importlib_resources.files(__name__) / 'data/osadl-copyleft.json'
+            f_name = importlib_resources.files(__package__) / 'data/osadl-copyleft.json'
🧹 Nitpick comments (1)
src/scanoss/scanner.py (1)

450-450: Consider fixing typos in error messages.

While reviewing, I noticed two typos in error messages (pre-existing, but worth fixing):

  • Line 450: "encounted" → "encountered"
  • Line 940: "uncounted" → "encountered"

Apply this diff:

-                self.print_stderr('Warning: Some errors encounted while scanning. Results might be incomplete.')
+                self.print_stderr('Warning: Some errors encountered while scanning. Results might be incomplete.')

And:

-                            'Warning: Some errors uncounted while scanning. Results might be incomplete.'
+                            'Warning: Some errors encountered while scanning. Results might be incomplete.'

Also applies to: 940-940

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6db2eac and a42511a.

📒 Files selected for processing (9)
  • CHANGELOG.md (2 hunks)
  • README.md (1 hunks)
  • src/scanoss/__init__.py (1 hunks)
  • src/scanoss/data/osadl-copyleft.json (1 hunks)
  • src/scanoss/inspection/utils/license_utils.py (1 hunks)
  • src/scanoss/osadl.py (1 hunks)
  • src/scanoss/scanner.py (5 hunks)
  • tests/test_osadl.py (1 hunks)
  • tests/test_policy_inspect.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/scanoss/data/osadl-copyleft.json
🧰 Additional context used
🧬 Code graph analysis (5)
src/scanoss/osadl.py (2)
src/scanoss/scanossbase.py (1)
  • ScanossBase (28-107)
src/scanoss/inspection/utils/license_utils.py (1)
  • is_copyleft (76-108)
tests/test_policy_inspect.py (2)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)
  • Copyleft (50-240)
  • run (220-240)
src/scanoss/inspection/policy_check/policy_check.py (2)
  • run (87-102)
  • PolicyStatus (33-44)
src/scanoss/inspection/utils/license_utils.py (2)
src/scanoss/osadl.py (2)
  • Osadl (33-120)
  • is_copyleft (97-120)
src/scanoss/scanossbase.py (2)
  • ScanossBase (28-107)
  • print_debug (58-63)
src/scanoss/scanner.py (3)
src/scanoss/file_filters.py (2)
  • get_filtered_files_from_folder (311-352)
  • get_filtered_files_from_files (354-404)
src/scanoss/threadedscanning.py (3)
  • stop_scanning (154-158)
  • queue_add (141-149)
  • run (168-191)
src/scanoss/scanossapi.py (1)
  • scan (134-244)
tests/test_osadl.py (1)
src/scanoss/osadl.py (2)
  • Osadl (33-120)
  • is_copyleft (97-120)
🪛 GitHub Actions: Lint
src/scanoss/osadl.py

[error] 26-26: F401 'sys' imported but unused. Remove unused import.

🪛 LanguageTool
README.md

[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

CHANGELOG.md

[uncategorized] ~19-~19: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (11)
README.md (1)

139-142: LGTM! Dataset attribution properly documented.

The license notice clearly documents the CC-BY-4.0 licensed OSADL data and provides proper attribution as required. The static analysis hint about hyphenating "Open Source" is a false positive—this is the official name of the organization.

src/scanoss/__init__.py (1)

25-25: LGTM! Version correctly updated.

Version bump to 1.41.0 is consistent with the release branch and CHANGELOG.md.

tests/test_policy_inspect.py (2)

31-31: LGTM! Correct imports for license source filtering.

The imports of DEFAULT_COPYLEFT_LICENSE_SOURCES and VALID_LICENSE_SOURCES align with the new license source filtering feature.


393-646: Excellent test coverage for license source filtering.

The test suite comprehensively covers:

  • Default behavior with DEFAULT_COPYLEFT_LICENSE_SOURCES
  • Explicit None handling
  • Individual source filtering (component_declared, license_file, file_header)
  • Multiple source combinations
  • All valid sources
  • Integration with output formats (json, md, jira_md)
  • Interaction with include/exclude filters
  • Empty result scenarios

The tests properly validate both the filtering behavior and the expected component counts for each scenario.

CHANGELOG.md (1)

11-31: LGTM! Comprehensive changelog documentation.

The 1.41.0 release notes clearly document:

  • New --license-sources option with usage details
  • Migration to OSADL authoritative data with expanded coverage (21→32 licenses)
  • Default behavior change to reduce noise from file-level detections
  • Bug fix for terminal cursor disappearing

The documentation is clear and will help users understand the changes and migration path.

src/scanoss/scanner.py (1)

29-29: Excellent refactoring of spinner lifecycle management.

The migration to context-managed spinners and progress bars using with statements and nullcontext() as a fallback ensures proper cleanup regardless of how the code exits (normal completion, exception, or Ctrl+C). This directly addresses the terminal cursor issue mentioned in the CHANGELOG.

The pattern is consistently applied across all scanning paths:

  • scan_folder (lines 367-424)
  • scan_files (lines 637-694)
  • scan_wfp_file (lines 785-851)
  • wfp_folder (lines 1060-1069)

This is a solid improvement in resource management.

Also applies to: 367-424, 637-694, 785-851, 1060-1069

src/scanoss/inspection/utils/license_utils.py (3)

25-45: Well-designed integration of OSADL data.

The refactored LicenseUtil class cleanly integrates OSADL authoritative data through the new Osadl instance while maintaining backward compatibility through filter sets. The class-level design allows for efficient data sharing across instances.


47-74: Clear and well-documented filter initialization.

The init() method properly implements the filter precedence:

  1. Explicit licenses (when provided) ignore OSADL data entirely
  2. Include/exclude filters overlay on top of OSADL data
  3. All license IDs are normalized to lowercase for consistent comparison

The early return when explicit licenses are provided is a good optimization.


76-108: Improved copyleft determination logic with clear precedence.

The refactored is_copyleft() method implements a clear and well-documented precedence order:

  1. Explicit filter (if set) → uses only those licenses
  2. Include filter → forces copyleft=True
  3. Exclude filter → forces copyleft=False
  4. No filters → delegates to OSADL authoritative data

This design provides flexibility while maintaining deterministic behavior, and the comprehensive docstring clearly explains the logic flow.

src/scanoss/osadl.py (2)

33-55: LGTM! Well-designed singleton-like data loading.

The class-level caching approach (_shared_copyleft_data and _data_loaded) ensures the JSON file is loaded only once and shared across all instances, which is efficient and appropriate for read-only reference data.


97-120: LGTM! Robust copyleft determination logic.

The is_copyleft() method properly:

  • Handles None/empty inputs
  • Normalizes license IDs to lowercase for case-insensitive comparison
  • Treats both "Yes" and "Yes (restricted)" as copyleft (case-insensitive)
  • Provides debug feedback for missing data

The implementation aligns well with OSADL's data structure.

"""

import json
import sys
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused sys import.

The sys module is imported but never used in this file. This is flagged by the linter and should be removed.

Apply this diff:

 import json
-import sys
 
 import importlib_resources
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sys
import json
import importlib_resources
🧰 Tools
🪛 GitHub Actions: Lint

[error] 26-26: F401 'sys' imported but unused. Remove unused import.

🤖 Prompt for AI Agents
In src/scanoss/osadl.py around line 26, the file imports the unused module
`sys`; remove the `import sys` line to satisfy the linter and eliminate the
unused import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants