fix: prevent PURL double-encoding in ecosyste.ms API calls#131
fix: prevent PURL double-encoding in ecosyste.ms API calls#131vpetersson merged 4 commits intomasterfrom
Conversation
The ecosyste.ms API was failing to find scoped npm packages because PURLs with %40-encoded namespaces (e.g., pkg:npm/%40scope/name@1.0.0) were being double-encoded when passed as query parameters (%40 → %2540). Changes: - Add purl_to_string() helper that outputs @ instead of %40 for API calls - Use purl_to_string() in ecosystems.py to prevent double-encoding - Simplify normalize_purl() to only fix actual bugs (%40%40, @@) - Preserve canonical %40 encoding in SBOM output per PURL spec - Add explicit packageurl-python>=0.17.6 dependency - Update tests to verify correct behavior The PURL spec requires @ in namespaces to be encoded as %40 in canonical form. Our SBOM output correctly uses %40. For API calls, we now use decoded properties (purl.namespace, purl.name) or purl_to_string() to ensure proper single-encoding by the HTTP client.
There was a problem hiding this comment.
Pull request overview
This PR fixes a double-encoding bug in the ecosyste.ms API integration where scoped npm packages with @ symbols in their namespaces (e.g., @scope/package) were failing to be found. The issue occurred because PURLs with %40-encoded namespaces were being double-encoded to %2540 when used as query parameters.
Changes:
- Added
purl_to_string()helper function that outputs@instead of%40for API calls to prevent double-encoding - Updated ecosystems.py to use
purl_to_string()for API requests - Simplified
normalize_purl()to only fix actual encoding bugs (%40%40and@@) - Added comprehensive test coverage for PURL normalization and encoding behavior
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sbomify_action/_enrichment/utils.py | Added purl_to_string() helper function to convert PURLs with literal @ for API calls |
| sbomify_action/_enrichment/sources/ecosystems.py | Updated to use purl_to_string() to prevent double-encoding in API calls |
| sbomify_action/serialization.py | Simplified normalize_purl() logic and added _normalize_purls_in_json() helper |
| sbomify_action/cli/main.py | Added PURL normalization to final SBOM output using _normalize_purls_in_json() |
| pyproject.toml | Added explicit packageurl-python>=0.17.6 dependency |
| tests/test_serialization.py | Added comprehensive tests for PURL normalization and encoding behavior |
| tests/test_augmentation_module.py | Updated test to accept both @ and %40 forms of scoped package PURLs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Changes based on Copilot review comments: - Rename _normalize_purls_in_json to _fix_purl_encoding_bugs_in_json for clearer intent (fixes encoding bugs, not general normalization) - Improve purl_to_string() docstring with guidance on when to use str(purl) vs purl_to_string() for different contexts - Extract duplicate regex patterns as module-level compiled constants: _DOUBLE_ENCODED_AT_PATTERN and _DOUBLE_AT_PATTERN - Update tests to validate JSON structure by parsing results and accessing values through proper JSON paths instead of string matching
Changes based on Copilot review comments: - Rename _normalize_purls_in_json to _fix_purl_encoding_bugs_in_json - Improve purl_to_string() docstring with usage guidance - Extract duplicate regex patterns as module-level constants - Update tests to validate JSON structure after normalization Fix CLE Lifecycle Integration Test failure: - Add NON_SPDX_LICENSES blocklist for licenses that license-expression library incorrectly accepts (e.g., Artistic-dist, ClArtistic) - These are now correctly sanitized from license.id to license.name
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Changes based on Copilot review comments (round 1): - Rename _normalize_purls_in_json to _fix_purl_encoding_bugs_in_json - Improve purl_to_string() docstring with usage guidance - Extract duplicate regex patterns as module-level constants - Update tests to validate JSON structure after normalization Changes based on Copilot review comments (round 2): - Add doseq=True to urlencode() for proper list/tuple qualifier handling - Use more specific PURL regex pattern to avoid unintended replacements - Move import json to top of test file, remove redundant imports Fix CLE Lifecycle Integration Test failure: - Add NON_SPDX_LICENSES blocklist for licenses that license-expression library incorrectly accepts (Artistic-dist, ClArtistic) - These are now correctly sanitized from license.id to license.name
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The ecosyste.ms API was failing to find scoped npm packages because PURLs with %40-encoded namespaces (e.g., pkg:npm/%40scope/name@1.0.0) were being double-encoded when passed as query parameters (%40 → %2540).
Changes:
The PURL spec requires @ in namespaces to be encoded as %40 in canonical form. Our SBOM output correctly uses %40. For API calls, we now use decoded properties (purl.namespace, purl.name) or purl_to_string() to ensure proper single-encoding by the HTTP client.