-
Notifications
You must be signed in to change notification settings - Fork 24
Releases/v1.41.0 #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases/v1.41.0 #169
Conversation
WalkthroughThis pull request adds OSADL-based copyleft license detection with source filtering, introduces a new CLI option Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant Copyleft
participant ScanResultProcessor
participant LicenseUtil
participant Osadl
CLI->>Copyleft: __init__(license_sources=['component_declared', 'license_file'])
Copyleft->>ScanResultProcessor: __init__(..., license_sources)
ScanResultProcessor->>LicenseUtil: init(include, exclude, explicit)
LicenseUtil->>Osadl: __init__()
Osadl->>Osadl: _load_copyleft_data() [on first use]
Note over LicenseUtil: Per license in component
ScanResultProcessor->>ScanResultProcessor: _select_licenses(licenses_data)
ScanResultProcessor->>ScanResultProcessor: filter by license_sources
ScanResultProcessor->>LicenseUtil: is_copyleft(spdx_id)
alt explicit_licenses provided
LicenseUtil->>LicenseUtil: check explicit_licenses
else include/exclude rules
LicenseUtil->>LicenseUtil: check include_licenses, exclude_licenses
else default to OSADL
LicenseUtil->>Osadl: is_copyleft(spdx_id)
Osadl->>Osadl: normalize ID & lookup in _shared_copyleft_data
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/scanoss/scanner.py (1)
939-941: Typo in warning text (“uncounted” → “encountered”).- 'Warning: Some errors uncounted while scanning. Results might be incomplete.' + 'Warning: Some errors encountered while scanning. Results might be incomplete.'
🧹 Nitpick comments (11)
src/scanoss/threadedscanning.py (1)
25-25: Progress bar cleanup is robust; consider centralizingatexitregistration.The combination of
atexit.register(self.complete_bar)and__del__callingcomplete_bar()should reliably finish the bar and restore the cursor even on abrupt termination, which addresses the disappearing-cursor issue.One thing to keep in mind is that registering
self.complete_barin__init__means:
- Each
ThreadedScanninginstance adds anatexithandler.- Those handlers hold strong references to the instances, so they will live until process exit.
If you expect many
ThreadedScanninginstances over a long-lived process, you might consider a class-level/centralizedatexithandler (or a weakref-based registry) instead of per-instance registration. For typical CLI usage with a single instance, this is fine as-is.Also applies to: 81-82, 107-112
src/scanoss/scanners/scanner_hfh.py (1)
113-125: Helper_execute_grpc_scankeeps scan error handling localized.Pulling the gRPC call into
_execute_grpc_scanwith a try/except that logs tostderrand resetsscan_resultstoNonemakes the failure mode explicit and keeps the mainscan()flow simple.If you ever need deeper diagnostics, consider including more context (e.g.,
traceback.format_exc()whenself.base.debugis enabled) so failures infolder_hash_scanare easier to triage, but it's not required for correctness.src/scanoss/scanners/folder_hasher.py (1)
160-191: Context-managed progress bar correctly ensures cleanup during hashing.Wrapping the file hashing loop in
with Bar('Hashing files...', max=len(filtered_files)) as bar:and callingbar.next()once per file (even on exceptions) is a solid improvement:
- Guarantees
Bar.finish()runs on normal completion and on exceptions (including Ctrl+C), which helps avoid the hidden-cursor issue.- Keeps the tree-building and CRC64 logic unchanged in structure and behavior.
If you expect frequent empty directories, you could optionally guard with an early return when
not filtered_filesto skip constructing a zero-length bar, but the current implementation is functionally fine.CHANGELOG.md (1)
16-22: Minor wording polish for clarity.Tighten the “Changed” bullet to avoid the compound adjective hiccup.
-**Switched to OSADL authoritative copyleft license data** +**Switched to OSADL copyleft license data (authoritative)**LICENSE (1)
25-31: Add a direct CC‑BY‑4.0 link for the embedded dataset.Helps downstream consumers comply and audit.
The OSADL data file contains its own license header with full attribution information. +For the OSADL dataset license terms, see: https://creativecommons.org/licenses/by/4.0/src/scanoss/constants.py (1)
21-22: Prefer immutable tuples for public constants.Prevents accidental mutation and clarifies intent.
-VALID_LICENSE_SOURCES = ['component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode'] -DEFAULT_COPYLEFT_LICENSE_SOURCES = ['component_declared', 'license_file'] +VALID_LICENSE_SOURCES = ('component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode') +DEFAULT_COPYLEFT_LICENSE_SOURCES = ('component_declared', 'license_file')Additionally, consider typing:
# at top of file from typing import Final, Tuple VALID_LICENSE_SOURCES: Final[Tuple[str, ...]] = ( 'component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode' ) DEFAULT_COPYLEFT_LICENSE_SOURCES: Final[Tuple[str, ...]] = ( 'component_declared', 'license_file' )src/scanoss/filecount.py (2)
152-161: Address Ruff PLC0206 and simplify CSV dict build.Use
.items()and avoid a temp list.- if file_types: - csv_dict = [] - for k in file_types: - d = file_types[k] - csv_dict.append({'extension': k, 'count': d[0], 'size(MB)': f'{d[1] / (1 << 20):,.2f}'}) - fields = ['extension', 'count', 'size(MB)'] + if file_types: + rows = ( + {'extension': ext, 'count': cnt_sz[0], 'size(MB)': f'{cnt_sz[1] / (1 << 20):,.2f}'} + for ext, cnt_sz in file_types.items() + ) + fields = ['extension', 'count', 'size(MB)'] file = sys.stdout if self.scan_output: file = open(self.scan_output, 'w') writer = csv.DictWriter(file, fieldnames=fields) writer.writeheader() # writing headers (field names) - writer.writerows(csv_dict) # writing data rows + writer.writerows(rows) # writing data rows
100-149: Consider a small extraction to reduce branches/statements (Ruff PLR0912/PLR0915).Move per-file accounting into a helper (e.g.,
_accumulate(path, file_types)) and usesetdefault:- fc = file_types.get(f_suffix) - if not fc: - fc = [1, f_size] - else: - fc[0] = fc[0] + 1 - fc[1] = fc[1] + f_size - file_types[f_suffix] = fc + fc = file_types.setdefault(f_suffix, [0, 0]) + fc[0] += 1 + fc[1] += f_sizesrc/scanoss/inspection/policy_check/scanoss/copyleft.py (1)
56-69: Copyleft now always uses filtering mode; no way to opt out via APIThe
license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCESdefault plus passingself.license_sourcesintoScanResultProcessormeans Copyleft will always operate in filtering mode (using either the CLI‑provided list or the default list), and never fall back to the priority mode implemented in_select_licenses.That seems aligned with “default to component‑level licenses” from the PR description, but it also means there is currently no way (from this API) to request the old priority behaviour for copyleft checks.
If you foresee a need to restore priority mode, you might want an explicit sentinel (e.g.
license_sources=Nonemeaning “priority mode”) instead of unconditionally substituting the default here.Also applies to: 92-101
src/scanoss/inspection/utils/scan_result_processor.py (1)
65-81: License source selection logic looks consistent with designThe new
license_sourcesplumbing and_select_licenses()behaviour are coherent:
- When
license_sourcesis truthy, you filter to those sources while still includingunknown/None, which avoids silently dropping legacy/uncategorised data.- When
license_sourcesis falsy, the original priority semantics are preserved via the ordered source preference, with a sensible fallback to “all licenses”.Given that Copyleft now always passes a non‑empty list, this class will typically operate in filtering mode for that flow, which matches the PR’s intent.
If you want to tighten things later, you could:
- Type‑annotate
license_sourcesaslist[str] | None, and- Validate against
VALID_LICENSE_SOURCESat the boundaries.But functionally this looks good.
Also applies to: 167-189, 316-361
src/scanoss/inspection/utils/license_utils.py (1)
25-27: OSADL-backed copyleft logic and filters look correctThe new
LicenseUtilwiring is clean:
- The include/exclude/explicit parsing is reset-safe and case-insensitive.
is_copyleft()applies the precedence described in the docstring and only falls through to OSADL when no filters match.- Deferring to
Osadlfor the default copyleft decision aligns with the new authoritative dataset.You might eventually consider exposing
include/exclude/explicitas sets of SPDX IDs directly on the public API (rather than only via comma-separated strings) for programmatic callers, but that’s purely a convenience; the current implementation is solid.Also applies to: 34-36, 40-46, 47-75, 76-107
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
CHANGELOG.md(1 hunks)LICENSE(1 hunks)README.md(1 hunks)src/scanoss/__init__.py(1 hunks)src/scanoss/cli.py(4 hunks)src/scanoss/constants.py(1 hunks)src/scanoss/data/osadl-copyleft.json(1 hunks)src/scanoss/filecount.py(2 hunks)src/scanoss/inspection/policy_check/scanoss/copyleft.py(4 hunks)src/scanoss/inspection/utils/license_utils.py(1 hunks)src/scanoss/inspection/utils/scan_result_processor.py(3 hunks)src/scanoss/osadl.py(1 hunks)src/scanoss/scanner.py(5 hunks)src/scanoss/scanners/folder_hasher.py(1 hunks)src/scanoss/scanners/scanner_hfh.py(2 hunks)src/scanoss/threadedscanning.py(3 hunks)tests/test_osadl.py(1 hunks)tests/test_policy_inspect.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
src/scanoss/scanners/scanner_hfh.py (2)
src/scanoss/scanossgrpc.py (1)
folder_hash_scan(429-447)src/scanoss/scanossbase.py (1)
print_stderr(45-49)
tests/test_policy_inspect.py (2)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)
Copyleft(50-240)run(220-240)src/scanoss/inspection/policy_check/policy_check.py (2)
run(87-102)PolicyStatus(33-44)
tests/test_osadl.py (2)
src/scanoss/osadl.py (2)
Osadl(31-130)is_copyleft(108-130)src/scanoss/inspection/utils/license_utils.py (1)
is_copyleft(76-107)
src/scanoss/scanner.py (3)
src/scanoss/file_filters.py (2)
get_filtered_files_from_folder(311-352)get_filtered_files_from_files(354-404)src/scanoss/threadedscanning.py (3)
stop_scanning(154-158)queue_add(141-149)run(168-191)src/scanoss/scanners/scanner_hfh.py (1)
scan(126-152)
src/scanoss/inspection/utils/scan_result_processor.py (1)
src/scanoss/inspection/utils/license_utils.py (2)
LicenseUtil(30-116)init(47-74)
src/scanoss/filecount.py (2)
src/scanoss/scanossbase.py (2)
print_msg(51-56)print_trace(65-70)src/scanoss/cli.py (1)
file_count(1310-1341)
src/scanoss/scanners/folder_hasher.py (2)
src/scanoss/scanossbase.py (1)
print_debug(58-63)src/scanoss/utils/crc64.py (2)
CRC64(29-96)get_hash_buff(82-96)
src/scanoss/osadl.py (1)
src/scanoss/inspection/utils/license_utils.py (1)
is_copyleft(76-107)
src/scanoss/inspection/utils/license_utils.py (2)
src/scanoss/osadl.py (3)
Osadl(31-130)print_debug(62-67)is_copyleft(108-130)src/scanoss/scanossbase.py (2)
ScanossBase(28-107)print_debug(58-63)
🪛 GitHub Actions: Lint
src/scanoss/filecount.py
[error] 100-100: Ruff: PLR0912 Too many branches (15 > 12) in count_files method.
[error] 100-100: Ruff: PLR0915 Too many statements (54 > 50) in count_files method.
[error] 152-152: Ruff: PLC0206 Extracting value from dictionary without calling .items() in filecount.py.
🪛 LanguageTool
CHANGELOG.md
[uncategorized] ~17-~17: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
README.md
[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (7)
src/scanoss/__init__.py (1)
25-25: Version bump looks consistent with release.
__version__ = '1.41.0'matches the release branch and PR title; no further changes needed here.README.md (1)
139-142: Dataset license notice and attribution look clear and sufficient.The new section cleanly distinguishes the MIT license for the application from the CC-BY-4.0 license of the OSADL copyleft dataset, and provides proper attribution and source URL. No changes needed from a docs/UX standpoint.
tests/test_policy_inspect.py (2)
395-577: Coverage additions for license_sources look solid.Good assertions across defaults, single/multiple sources, and output formats.
31-31: Incorrect review comment—proposed fix targets the wrong lines.The review correctly identifies inconsistent imports in test_policy_inspect.py, but the fix direction is reversed. Line 31 (
from scanoss.constants...) uses the correct import root; setup.cfg confirms the installed package name isscanoss(notsrc.scanoss). Lines 32–39 are the inconsistent ones and should be changed fromsrc.scanoss.*toscanoss.*to match all other test files and the package configuration.Likely an incorrect or invalid review comment.
src/scanoss/cli.py (1)
704-714: CLI wiring for -ls is correct and fully supported.The
action='extend'parameter is available in Python 3.8+, confirming your code works on Python 3.9 and later. No changes needed.src/scanoss/data/osadl-copyleft.json (1)
1-133: Packaging verification confirmed—data file will be distributed correctly.The OSADL dataset is properly configured:
setup.cfgspecifies[options.package_data]with* = data/*.txt, data/*.json, which matchesosadl-copyleft.json. Combined withinclude_package_data = True, both wheel and source distributions will include the file. MANIFEST.in is not needed whenpackage_datais explicitly configured.src/scanoss/scanner.py (1)
785-789: Rewritten review comment is not needed — original review is based on incorrect assumptions.The web search confirms that
progress.bar.Barandprogress.spinner.Spinneralready implement the context-manager protocol (__enter__and__exit__methods). The original code insrc/scanoss/scanner.py(lines 785-789) is correct as written:
Barcan be used directly withwithstatements without additional wrappingnullcontext()without arguments correctly returnsNoneon entry, allowing the conditional checkif bar:to work as intended- The original conditional pattern
Bar(...) if condition else nullcontext()is valid and requires no modificationLikely an incorrect or invalid review comment.
| spinner_ctx = Spinner('Searching ') if (not self.quiet and self.isatty) else nullcontext() | ||
|
|
||
| with spinner_ctx as spinner: | ||
| file_types = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spinner context misuse; wrap instance and finish.
Same fix as in scanner.py.
- spinner_ctx = Spinner('Searching ') if (not self.quiet and self.isatty) else nullcontext()
+ spinner_ctx = nullcontext(Spinner('Searching ')) if (not self.quiet and self.isatty) else nullcontext()
@@
- if spinner:
- spinner.next()
+ if spinner:
+ spinner.next()
+ if spinner:
+ spinner.finish()Also applies to: 146-148
| def _load_copyleft_data(self) -> bool: | ||
| """ | ||
| Load the embedded OSADL copyleft JSON file into class-level shared data. | ||
| Data is loaded only once and shared across all instances. | ||
| :return: True if successful, False otherwise | ||
| """ | ||
| if Osadl._data_loaded: | ||
| return True | ||
|
|
||
| # OSADL copyleft license checklist from: https://www.osadl.org/Checklists | ||
| # Data source: https://www.osadl.org/fileadmin/checklists/copyleft.json | ||
| # License: CC-BY-4.0 (Creative Commons Attribution 4.0 International) | ||
| # Copyright: (C) 2017 - 2024 Open Source Automation Development Lab (OSADL) eG | ||
| try: | ||
| f_name = importlib_resources.files(__name__) / 'data/osadl-copyleft.json' | ||
| with importlib_resources.as_file(f_name) as f: | ||
| with open(f, 'r', encoding='utf-8') as file: | ||
| data = json.load(file) | ||
| except Exception as e: | ||
| self.print_stderr(f'ERROR: Problem loading OSADL copyleft data: {e}') | ||
| return False | ||
|
|
||
| # Process copyleft data | ||
| copyleft = data.get('copyleft', {}) | ||
| if not copyleft: | ||
| self.print_stderr('ERROR: No copyleft data found in OSADL JSON') | ||
| return False | ||
|
|
||
| # Store in class-level shared dictionary | ||
| for lic_id, status in copyleft.items(): | ||
| # Normalize license ID (lowercase) for consistent lookup | ||
| lic_id_lc = lic_id.lower() | ||
| Osadl._shared_copyleft_data[lic_id_lc] = status | ||
|
|
||
| Osadl._data_loaded = True | ||
| self.print_debug(f'Loaded {len(Osadl._shared_copyleft_data)} OSADL copyleft entries') | ||
| return True | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSADL load failure silently degrades to “no copyleft” for all licenses
Right now, if _load_copyleft_data() fails (e.g. missing or unreadable osadl-copyleft.json), it logs an error and returns False, but __init__ ignores that return value. Because _shared_copyleft_data remains empty, is_copyleft() will simply return False for every license.
For a copyleft policy check this is risky: a packaging or runtime issue would silently under‑report copyleft instead of failing fast.
Consider one of the following:
- Raise an exception in
__init__when_load_copyleft_data()returnsFalse, so callers can surface a hard failure; or - Track a separate “load_failed” flag and have
is_copyleft()emit a stronger error or choose a safer default (e.g. treat everything as copyleft, or require explicit filters) rather than silently returning False.
This would make failures in the authoritative dataset much more visible to users.
Also applies to: 108-130
🤖 Prompt for AI Agents
In src/scanoss/osadl.py around lines 69-107 (and also applies to 108-130),
_load_copyleft_data() currently returns False on load error but __init__ ignores
that, causing silent under-reporting of copyleft; change the initializer to
fail-fast: call _load_copyleft_data() in __init__, and if it returns False raise
a clear RuntimeError (or custom exception) with the underlying error message so
callers see the failure; ensure any necessary cleanup or state (e.g.
_data_loaded remains False and _shared_copyleft_data empty) is preserved before
raising; alternatively, if you prefer non-exception behavior, implement a
_load_failed flag set by _load_copyleft_data() on error and update is_copyleft()
to check this flag and raise an error or apply a safe default—pick one approach
and implement it consistently (update docstrings and error messages
accordingly).
| spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext() | ||
|
|
||
| with spinner_ctx as spinner: | ||
| save_wfps_for_print = not self.no_wfp_file or not self.threaded_scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spinner is not a context manager; wrap the instance and ensure cleanup.
Using Spinner() directly in a with can raise at runtime if it lacks __enter__/__exit__. Also, when nullcontext() is used, no finish() is called, leaving the cursor state messy.
Apply this pattern:
- Wrap the real spinner with
nullcontext(Spinner(...)). - Explicitly
finish()before exiting the block.
- spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+ spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext()
with spinner_ctx as spinner:
@@
- if self.threaded_scan and scan_block != '':
- self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted
+ if self.threaded_scan and scan_block != '':
+ self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted
+ if spinner:
+ spinner.finish()Also applies to: 423-425
🤖 Prompt for AI Agents
In src/scanoss/scanner.py around lines 367-370, the code uses Spinner(...)
directly in a with-statement which fails if Spinner isn't a context manager and
also leaves cursor state un-restored when nullcontext() is used; change creation
to wrap the real Spinner inside nullcontext(Spinner(...)) so the with always
yields a spinner object (e.g. nullcontext(Spinner(...)) when using a spinner,
nullcontext(Spinner(...)) when quiet is false/true accordingly), then inside the
with-block ensure you call spinner.finish() (or equivalent cleanup) before
exiting the block; apply the same change to the similar block at lines 423-425.
| spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext() | ||
|
|
||
| with spinner_ctx as spinner: | ||
| save_wfps_for_print = not self.no_wfp_file or not self.threaded_scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same spinner issue in scan_files; wrap and call finish().
Mirror the fix in scan_files().
- spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+ spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext()
@@
- if self.threaded_scan and scan_block != '':
- self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted
+ if self.threaded_scan and scan_block != '':
+ self.threaded_scan.queue_add(scan_block) # Make sure all files have been submitted
+ if spinner:
+ spinner.finish()Also applies to: 693-694
🤖 Prompt for AI Agents
In src/scanoss/scanner.py around lines 637-640 (and similarly 693-694), the
Spinner is being used via a context but not guaranteed to have its finish()
method called; change the usage so the spinner is assigned from the context
(e.g., spinner_ctx = Spinner(...) if (not self.quiet and self.isatty) else
nullcontext()), enter the context to get spinner, and ensure after the work
completes you explicitly call spinner.finish() when spinner is an actual Spinner
instance (use try/finally or check type) so the spinner always gets finished;
apply the same fix to the block at lines 693-694.
| spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext() | ||
|
|
||
| with spinner_ctx as spinner: | ||
| to_fingerprint_files = file_filters.get_filtered_files_from_folder(scan_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spinner context in wfp_folder: wrap and finish.
Same fix applies here.
- spinner_ctx = Spinner('Fingerprinting ') if (not self.quiet and self.isatty) else nullcontext()
+ spinner_ctx = nullcontext(Spinner('Fingerprinting ')) if (not self.quiet and self.isatty) else nullcontext()
@@
- for file in to_fingerprint_files:
+ for file in to_fingerprint_files:
if spinner:
spinner.next()
@@
- wfps += self.winnowing.wfp_for_file(str(abs_path), file)
+ wfps += self.winnowing.wfp_for_file(str(abs_path), file)
+ if spinner:
+ spinner.finish()Also applies to: 1065-1069
🤖 Prompt for AI Agents
In src/scanoss/scanner.py around lines 1060-1063 (and similarly for 1065-1069),
the spinner context is created conditionally but not guaranteed to be properly
finished; replace the conditional one-liner with logic that, when a Spinner() is
instantiated, explicitly starts it before the block and ensures finish() is
called in a finally block (or use a proper context manager that calls finish on
exit); if using nullcontext for the non-TTY case keep it, but for the real
Spinner ensure it is entered/started and finished so the spinner is wrapped and
cleaned up correctly.
| spinner_ctx = Spinner('Scanning folder...') | ||
|
|
||
| with spinner_ctx as spinner: | ||
| grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,)) | ||
| grpc_thread.start() | ||
|
|
||
| def spin(): | ||
| while not stop_spinner: | ||
| while grpc_thread.is_alive(): | ||
| spinner.next() | ||
| time.sleep(0.1) | ||
|
|
||
| spinner_thread = threading.Thread(target=spin) | ||
| spinner_thread.start() | ||
|
|
||
| try: | ||
| response = self.client.folder_hash_scan(hfh_request, self.use_grpc) | ||
| if response: | ||
| self.scan_results = response | ||
| finally: | ||
| stop_spinner = True | ||
| spinner_thread.join() | ||
| spinner.finish() | ||
| grpc_thread.join() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Spinner + gRPC thread pattern improves UX and cleanup.
Running folder_hash_scan in a background thread while driving a Spinner in the foreground and using a with Spinner(...) as spinner: block ensures:
- The spinner is advanced responsively while the gRPC call runs.
- The spinner is properly finalized (and the cursor restored) even if an exception interrupts the loop, addressing the Ctrl+C cursor issue.
grpc_thread.join()guaranteesscan_resultsis fully populated (or set toNoneby the helper) before returning.
Please just double-check that the progress.spinner.Spinner version you depend on supports context manager usage in this way, to avoid runtime surprises if the package is upgraded or replaced.
🌐 Web query:
Verify whether the `Spinner` class from the `progress` Python package supports context manager usage (`with Spinner(...) as spinner:`) in the version used by this project.
💡 Result:
I need the specific version of the progress package that your project uses (or the project's requirements file / pyproject.toml / Pipfile). Which version or repo/file should I check?
I need to find the version of the progress package used in the project first. Let me search the codebase for dependency specifications.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find dependency specification files
fd -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "Pipfile" -o -name "poetry.lock" \)
Length of output: 170
Let me correct that and search for dependency files with a simpler approach:
---
🏁 Script executed:
```shell
#!/bin/bash
# Find dependency specification files
find . -maxdepth 2 \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" \)
Length of output: 127
Now let me extract the progress package version from the requirements files:
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for progress package in main requirements files
echo "=== requirements.txt ===" && cat requirements.txt && \
echo -e "\n=== pyproject.toml ===" && cat pyproject.toml
Length of output: 834
Perfect! The project has progress as a dependency without a version pin (meaning any version). Now let me search for information about whether the Spinner class supports context manager usage:
Let me search more specifically for the progress.spinner.Spinner class and its context manager support:
Let me search more specifically for the progress package's Spinner implementation:
Let me check the actual source code structure of the progress package to verify if Spinner inherits the context manager capability:
Perfect! I found the source code. Spinner inherits from Infinite, not from Bar. Now let me verify if the Infinite class supports context manager usage:
Based on my verification, I have found a significant concern with the review comment:
Spinner does NOT appear to support context manager usage.
The source code shows that Spinner inherits from Infinite, not from Bar. While the progress package documentation explicitly demonstrates Bar as a context manager with with Bar(...) as bar: syntax, Spinner is only documented with traditional loop usage: spinner = Spinner('Loading ') in a while loop. No official documentation or examples show Spinner being used as a context manager.
The code in the review attempts to use with Spinner(...) as spinner:, which is not supported by the progress.spinner.Spinner class and will raise an AttributeError at runtime.
Fix: Use the traditional loop pattern or wrap in explicit context manager.
Replace:
spinner_ctx = Spinner('Scanning folder...')
with spinner_ctx as spinner:
grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,))
grpc_thread.start()
while grpc_thread.is_alive():
spinner.next()
time.sleep(0.1)
grpc_thread.join()With:
spinner = Spinner('Scanning folder...')
try:
grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,))
grpc_thread.start()
while grpc_thread.is_alive():
spinner.next()
time.sleep(0.1)
grpc_thread.join()
finally:
spinner.finish()🤖 Prompt for AI Agents
In src/scanoss/scanners/scanner_hfh.py around lines 140 to 151, the code uses
Spinner as a context manager but progress.spinner.Spinner does not support the
context-manager protocol; replace the with-block by instantiating Spinner
directly, start the grpc thread, run the existing loop calling spinner.next()
while the thread is alive, and ensure in a finally block you call
spinner.finish() (and join the thread if not already) so the spinner is always
cleaned up.
| def test_multiple_instances_share_data(self): | ||
| """Test that multiple instances share the same class-level data""" | ||
| osadl1 = Osadl() | ||
| osadl2 = Osadl() | ||
|
|
||
| # Both instances should see data loaded by first instance | ||
| result1 = osadl1.is_copyleft('GPL-2.0-only') | ||
| self.assertTrue(result1) | ||
| self.assertTrue(Osadl._data_loaded) | ||
|
|
||
| # Second instance uses the same class-level shared data | ||
| result2 = osadl2.is_copyleft('MIT') | ||
| self.assertFalse(result2) | ||
|
|
||
| # Verify both instances reference the same class-level data | ||
| self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix tautological assertion in test_multiple_instances_share_data
The last assertion:
self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)is always true and doesn’t verify that multiple instances share the same data. To test sharing more directly, consider:
self.assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data)or equivalently comparing id(...) between the two instances.
🤖 Prompt for AI Agents
In tests/test_osadl.py around lines 83 to 99, the final assertion is
tautological (assertIs(Osadl._shared_copyleft_data,
Osadl._shared_copyleft_data)) and therefore doesn't verify instance sharing;
replace it with an assertion that compares the two instances' references (for
example assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data) or
assertEqual(id(osadl1._shared_copyleft_data), id(osadl2._shared_copyleft_data)))
so the test actually verifies that both instances reference the same class-level
data.
New Features
Changed
Documentation
Bug Fixes
Summary by CodeRabbit
New Features
--license-sources(-ls) CLI option to copyleft inspection for filtering specific license sources (component_declared, license_file, file_header, file_spdx_tag, scancode).-or-latervariants, using OSADL authoritative data.Bug Fixes
Documentation
Chores