Skip to content

Conversation

@Alex-1089
Copy link
Contributor

@Alex-1089 Alex-1089 commented Oct 21, 2025

Summary by CodeRabbit

  • New Features

    • Added source-based filtering in CycloneDX conversion to improve license accuracy
  • Bug Fixes

    • Fixed missing dependencies during SPDX conversion to ensure complete dependency output
    • Improved SPDX license handling to include licenses with unspecified source while still excluding invalid sources
  • Chores

    • Released version 1.37.1 (documentation updated)

@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

Walkthrough

Updated package to v1.37.1; added license-source filtering for CycloneDX parsing, relaxed SPDX license-source acceptance for unspecified sources, and adjusted CycloneDX produce_from_str to unpack produce_from_json's tuple.

Changes

Cohort / File(s) Summary
Version Update
CHANGELOG.md, src/scanoss/__init__.py
Bumped package version from 1.37.0 to 1.37.1 and added changelog entry referencing source-filtering and SPDX fixes.
CycloneDX parsing & produce
src/scanoss/cyclonedx.py
In parse(), filter license entries to keep only those whose source is one of component_declared, license_file, or file_header. In produce_from_str(), unpack the (success, data) tuple returned by produce_from_json() and return the success flag.
SPDXLite license filtering
src/scanoss/spdxlite.py
Adjusted license filtering to allow licenses with source == None or empty string to pass, while still excluding non-empty sources not in the allowed set; deduplicates license names as before.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant CycloneDX
    participant Producer as produce_from_json

    Caller->>CycloneDX: call parse(xml/json)
    CycloneDX->>CycloneDX: extract licenses
    note right of CycloneDX #E6F4EA: keep only licenses with\nsource in {component_declared, license_file, file_header}
    CycloneDX-->>Caller: parsed component data

    Caller->>CycloneDX: produce_from_str(input)
    CycloneDX->>Producer: produce_from_json(parsed)
    Producer-->>CycloneDX: (success, data)
    note right of CycloneDX #FFF7E6: unpack tuple and\nreturn success flag
    CycloneDX-->>Caller: success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • eeisegn
  • agustingroh
  • ortizjeronimo

Poem

🐰 A tiny hop, a careful tweak,
Licenses filtered, no more mystique.
Version nudged and parsers tuned,
I nibble code while moonlight croons. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "SP-3561_conversion-issues" is related to the changeset and refers to a real aspect of the changes made in this pull request. The actual modifications address conversion-related issues: source filtering is added to cyclonedx conversion, and dependencies being skipped during spdx conversion are fixed, along with a version bump. While the title is somewhat generic and doesn't specify the exact nature of the fixes (source filtering, dependency handling), it does convey meaningful information about the PR's focus area and is not as vague as generic placeholders like "misc updates" or "stuff."
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bug/SP-3561_conversion-issues

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6107d33 and 1cd23d7.

📒 Files selected for processing (1)
  • src/scanoss/spdxlite.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (2)
src/scanoss/spdxlite.py (2)

229-231: LGTM! Clear documentation of the filtering behavior.

The docstring accurately describes the new license source filtering logic: unspecified sources (None or empty string) are allowed, while non-empty sources must be in the allowed list.


252-252: LGTM! Correct fix for license source filtering.

The logic now correctly allows licenses with unspecified sources (None or '') to pass through, while filtering out only those with non-empty sources that are not in the allowed list. This resolves the issue flagged in the previous review.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

SCANOSS SCAN Completed 🚀

  • Detected components: 1
  • Undeclared components: 0
  • Declared components: 1
  • Detected files: 83
  • Detected files undeclared: 0
  • Detected files declared: 83
  • Licenses detected: 1
  • Licenses detected with copyleft: 0
  • Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

@Alex-1089 Alex-1089 requested a review from eeisegn October 21, 2025 11:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
CHANGELOG.md (1)

698-698: Optional: add an Unreleased comparison link for completeness.

Add a compare link for the Unreleased section to follow the existing pattern.

 [1.37.0]: https://github.com/scanoss/scanoss.py/compare/v1.36.0...v1.37.0
 [1.37.1]: https://github.com/scanoss/scanoss.py/compare/v1.37.0...v1.37.1
+[Unreleased]: https://github.com/scanoss/scanoss.py/compare/v1.37.1...HEAD
src/scanoss/spdxlite.py (1)

228-239: Align docstring with implemented policy (include file_header, clarify None/empty).

Update the comment to reflect accepted sources and the fact that None/'' are permitted (i.e., only non-empty, non-allowed sources are filtered).

-            This method filters license information to include only licenses from trusted sources
-            ('component_declared' or 'license_file') and removes any duplicate license names.
+            This method filters license information to include only licenses from trusted sources
+            ('component_declared', 'license_file', 'file_header'). Licenses with an unspecified
+            source (None or '') are allowed. Non-empty, non-allowed sources are excluded. It also
+            removes any duplicate license names.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff6b8f7 and 6107d33.

📒 Files selected for processing (4)
  • CHANGELOG.md (2 hunks)
  • src/scanoss/__init__.py (1 hunks)
  • src/scanoss/cyclonedx.py (2 hunks)
  • src/scanoss/spdxlite.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/cyclonedx.py (2)
src/scanoss/spdxlite.py (1)
  • produce_from_json (276-293)
src/scanoss/csvoutput.py (1)
  • produce_from_json (182-222)
🔇 Additional comments (3)
CHANGELOG.md (1)

12-16: Changelog entry reads well and matches the code changes.

Version/date and notes are consistent. Nothing else to change here.

src/scanoss/__init__.py (1)

25-25: Version bump matches CHANGELOG.

All good.

src/scanoss/cyclonedx.py (1)

302-304: No changes required; return-type change is correct.

Verification confirms all callers of produce_from_str either expect a bool return or ignore the return value. No caller attempts tuple unpacking or dict access. The change is safe and introduces no breaking changes.

Comment on lines +155 to 160
name = lic.get('name')
source = lic.get('source')
if source not in ('component_declared', 'license_file', 'file_header'):
continue
fdl.append({'id': name})
fd['licenses'] = fdl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

License source filtering for file matches looks good; mirror this for dependencies.

You correctly keep only ('component_declared', 'license_file', 'file_header') here. The dependency path above (Lines 91–103) still accepts all sources and should apply the same filter for consistency.

Suggested change for the dependency block:

-                        licenses = deps.get('licenses')
-                        fdl = []
-                        if licenses:
-                            dc = []
-                            for lic in licenses:
-                                name = lic.get('name')
-                                if name not in dc:  # Only save the license name once
-                                    fdl.append({'id': name})
-                                    dc.append(name)
+                        licenses = deps.get('licenses')
+                        fdl = []
+                        if licenses:
+                            seen = set()
+                            for lic in licenses:
+                                name = lic.get('name')
+                                source = lic.get('source')
+                                if source not in ('component_declared', 'license_file', 'file_header'):
+                                    continue
+                                if name and name not in seen:
+                                    fdl.append({'id': name})
+                                    seen.add(name)
                         fd['licenses'] = fdl

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/scanoss/cyclonedx.py around lines 91 to 103, the dependency license
handling currently accepts all license sources while the file-level code (lines
155–160) restricts to ('component_declared', 'license_file', 'file_header');
update the dependency block to apply the same filter by checking
lic.get('source') and only appending licenses whose source is one of those three
values so dependency licenses mirror the file-level filtering logic.

@github-actions
Copy link

SCANOSS SCAN Completed 🚀

  • Detected components: 1
  • Undeclared components: 0
  • Declared components: 1
  • Detected files: 83
  • Detected files undeclared: 0
  • Detected files declared: 83
  • Licenses detected: 1
  • Licenses detected with copyleft: 0
  • Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

@eeisegn eeisegn requested a review from matiasdaloia October 21, 2025 12:13
@Alex-1089 Alex-1089 merged commit 099ede5 into main Oct 21, 2025
6 checks passed
@Alex-1089 Alex-1089 deleted the bug/SP-3561_conversion-issues branch October 21, 2025 12:48
name = license_info.get('name')
source = license_info.get('source')
if source not in ("component_declared", "license_file", "file_header"):
if source not in (None, '') and source not in ("component_declared", "license_file", "file_header"):
Copy link
Contributor

@matiasdaloia matiasdaloia Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified to

if source not in (None, '', "component_declared", "license_file", "file_header"):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants