Skip to content

fix: prevent PURL double-encoding in ecosyste.ms API calls#131

Merged
vpetersson merged 4 commits intomasterfrom
purl-fixes
Jan 19, 2026
Merged

fix: prevent PURL double-encoding in ecosyste.ms API calls#131
vpetersson merged 4 commits intomasterfrom
purl-fixes

Conversation

@vpetersson
Copy link
Contributor

The ecosyste.ms API was failing to find scoped npm packages because PURLs with %40-encoded namespaces (e.g., pkg:npm/%40scope/name@1.0.0) were being double-encoded when passed as query parameters (%40 → %2540).

Changes:

  • Add purl_to_string() helper that outputs @ instead of %40 for API calls
  • Use purl_to_string() in ecosystems.py to prevent double-encoding
  • Simplify normalize_purl() to only fix actual bugs (%40%40, @@)
  • Preserve canonical %40 encoding in SBOM output per PURL spec
  • Add explicit packageurl-python>=0.17.6 dependency
  • Update tests to verify correct behavior

The PURL spec requires @ in namespaces to be encoded as %40 in canonical form. Our SBOM output correctly uses %40. For API calls, we now use decoded properties (purl.namespace, purl.name) or purl_to_string() to ensure proper single-encoding by the HTTP client.

The ecosyste.ms API was failing to find scoped npm packages because
PURLs with %40-encoded namespaces (e.g., pkg:npm/%40scope/name@1.0.0)
were being double-encoded when passed as query parameters (%40 → %2540).

Changes:
- Add purl_to_string() helper that outputs @ instead of %40 for API calls
- Use purl_to_string() in ecosystems.py to prevent double-encoding
- Simplify normalize_purl() to only fix actual bugs (%40%40, @@)
- Preserve canonical %40 encoding in SBOM output per PURL spec
- Add explicit packageurl-python>=0.17.6 dependency
- Update tests to verify correct behavior

The PURL spec requires @ in namespaces to be encoded as %40 in canonical
form. Our SBOM output correctly uses %40. For API calls, we now use
decoded properties (purl.namespace, purl.name) or purl_to_string() to
ensure proper single-encoding by the HTTP client.
Copilot AI review requested due to automatic review settings January 19, 2026 12:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a double-encoding bug in the ecosyste.ms API integration where scoped npm packages with @ symbols in their namespaces (e.g., @scope/package) were failing to be found. The issue occurred because PURLs with %40-encoded namespaces were being double-encoded to %2540 when used as query parameters.

Changes:

  • Added purl_to_string() helper function that outputs @ instead of %40 for API calls to prevent double-encoding
  • Updated ecosystems.py to use purl_to_string() for API requests
  • Simplified normalize_purl() to only fix actual encoding bugs (%40%40 and @@)
  • Added comprehensive test coverage for PURL normalization and encoding behavior

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sbomify_action/_enrichment/utils.py Added purl_to_string() helper function to convert PURLs with literal @ for API calls
sbomify_action/_enrichment/sources/ecosystems.py Updated to use purl_to_string() to prevent double-encoding in API calls
sbomify_action/serialization.py Simplified normalize_purl() logic and added _normalize_purls_in_json() helper
sbomify_action/cli/main.py Added PURL normalization to final SBOM output using _normalize_purls_in_json()
pyproject.toml Added explicit packageurl-python>=0.17.6 dependency
tests/test_serialization.py Added comprehensive tests for PURL normalization and encoding behavior
tests/test_augmentation_module.py Updated test to accept both @ and %40 forms of scoped package PURLs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Changes based on Copilot review comments:

- Rename _normalize_purls_in_json to _fix_purl_encoding_bugs_in_json
  for clearer intent (fixes encoding bugs, not general normalization)

- Improve purl_to_string() docstring with guidance on when to use
  str(purl) vs purl_to_string() for different contexts

- Extract duplicate regex patterns as module-level compiled constants:
  _DOUBLE_ENCODED_AT_PATTERN and _DOUBLE_AT_PATTERN

- Update tests to validate JSON structure by parsing results and
  accessing values through proper JSON paths instead of string matching
Changes based on Copilot review comments:
- Rename _normalize_purls_in_json to _fix_purl_encoding_bugs_in_json
- Improve purl_to_string() docstring with usage guidance
- Extract duplicate regex patterns as module-level constants
- Update tests to validate JSON structure after normalization

Fix CLE Lifecycle Integration Test failure:
- Add NON_SPDX_LICENSES blocklist for licenses that license-expression
  library incorrectly accepts (e.g., Artistic-dist, ClArtistic)
- These are now correctly sanitized from license.id to license.name
@vpetersson vpetersson requested a review from Copilot January 19, 2026 13:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Changes based on Copilot review comments (round 1):
- Rename _normalize_purls_in_json to _fix_purl_encoding_bugs_in_json
- Improve purl_to_string() docstring with usage guidance
- Extract duplicate regex patterns as module-level constants
- Update tests to validate JSON structure after normalization

Changes based on Copilot review comments (round 2):
- Add doseq=True to urlencode() for proper list/tuple qualifier handling
- Use more specific PURL regex pattern to avoid unintended replacements
- Move import json to top of test file, remove redundant imports

Fix CLE Lifecycle Integration Test failure:
- Add NON_SPDX_LICENSES blocklist for licenses that license-expression
  library incorrectly accepts (Artistic-dist, ClArtistic)
- These are now correctly sanitized from license.id to license.name
@vpetersson vpetersson requested a review from Copilot January 19, 2026 13:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vpetersson vpetersson merged commit aa03587 into master Jan 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant