Skip to content

feat(enrichment): add dedicated Lifecycle plugin for CLE data#123

Merged
vpetersson merged 14 commits intomasterfrom
cle
Jan 18, 2026
Merged

feat(enrichment): add dedicated Lifecycle plugin for CLE data#123
vpetersson merged 14 commits intomasterfrom
cle

Conversation

@vpetersson
Copy link
Contributor

Separate lifecycle (CLE) data from License DB into a dedicated enrichment source following the plugin architecture.

Changes:

  • Add lifecycle_data.py with lifecycle data for distros and packages
  • Add LifecycleSource plugin (priority 5) with intelligent lookup:
    • Package-specific lifecycle (Python, PHP, Go, Rust, Django, Rails, Laravel, React, Vue) takes precedence
    • Distro lifecycle (Alpine, Ubuntu, Rocky, etc.) as fallback
  • Remove CLE fields from LicenseDBSource (now license-only)
  • Move DISTRO_LIFECYCLE from license_db_generator to lifecycle_data
  • Register LifecycleSource in create_default_registry()
  • Update README with Lifecycle Enrichment documentation

The Lifecycle and License DB plugins now have clear responsibilities:

  • License DB: license, description, supplier, homepage
  • Lifecycle: CLE dates (release, end-of-support, end-of-life)

Both sources are invoked independently and results are merged.

Separate lifecycle (CLE) data from License DB into a dedicated
enrichment source following the plugin architecture.

Changes:
- Add lifecycle_data.py with lifecycle data for distros and packages
- Add LifecycleSource plugin (priority 5) with intelligent lookup:
  - Package-specific lifecycle (Python, PHP, Go, Rust, Django, Rails,
    Laravel, React, Vue) takes precedence
  - Distro lifecycle (Alpine, Ubuntu, Rocky, etc.) as fallback
- Remove CLE fields from LicenseDBSource (now license-only)
- Move DISTRO_LIFECYCLE from license_db_generator to lifecycle_data
- Register LifecycleSource in create_default_registry()
- Update README with Lifecycle Enrichment documentation

The Lifecycle and License DB plugins now have clear responsibilities:
- License DB: license, description, supplier, homepage
- Lifecycle: CLE dates (release, end-of-support, end-of-life)

Both sources are invoked independently and results are merged.
Copilot AI review requested due to automatic review settings January 18, 2026 09:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR separates lifecycle (CLE) data from the License DB into a dedicated enrichment plugin following the established plugin architecture. The Lifecycle source now provides Common Lifecycle Enumeration dates for both Linux distributions and language runtimes/frameworks, with intelligent lookup prioritizing package-specific lifecycle data over distro-level fallbacks.

Changes:

  • Introduced a dedicated LifecycleSource plugin (priority 5) for CLE data enrichment
  • Moved DISTRO_LIFECYCLE from license_db_generator to a new lifecycle_data.py module
  • Added comprehensive PACKAGE_LIFECYCLE data for language runtimes (Python, PHP, Go, Rust) and frameworks (Django, Rails, Laravel, React, Vue)

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
sbomify_action/_enrichment/lifecycle_data.py New module containing DISTRO_LIFECYCLE and PACKAGE_LIFECYCLE data with helper functions
sbomify_action/_enrichment/sources/lifecycle.py New LifecycleSource plugin implementing intelligent CLE lookup with package/distro fallback
sbomify_action/_enrichment/sources/license_db.py Removed CLE fields, now focuses solely on license data
sbomify_action/_enrichment/enricher.py Registered LifecycleSource and added cache clearing
tests/test_lifecycle_enrichment.py Comprehensive test coverage for lifecycle functionality
README.md Updated documentation with new Lifecycle Enrichment section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vpetersson vpetersson requested a review from Copilot January 18, 2026 09:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… packages

Remove misleading distro-level lifecycle inference from PURLs. The PURL
namespace (e.g., pkg:deb/ubuntu/curl) doesn't indicate the actual OS
version, making distro lifecycle assignment unreliable for arbitrary
packages.

Now:
- OS components (CycloneDX type: operating-system) get CLE via name/version
- Only explicitly tracked runtimes/frameworks get CLE via PURL patterns
- Arbitrary OS packages (curl, nginx, etc.) correctly return no lifecycle

Also:
- Add Debian lifecycle data (10, 11, 12, 13)
- Add name mappings: alma→almalinux, amazon→amazonlinux
- Fix Amazon publisher to "Amazon Web Services, Inc. (AWS)"
- Handle complex version strings (e.g., "2023.10.20260105 (Amazon Linux)")
@vpetersson vpetersson requested a review from Copilot January 18, 2026 10:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 18, 2026 11:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 18, 2026 12:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 18, 2026 12:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add percentage progress logging to all generators showing
  "Processed X/Y (Z%) - N valid licenses..." for better visibility

- Parallelize Ubuntu/Debian .deb downloads using ThreadPoolExecutor
  (default 20 workers, configurable via SBOMIFY_LICENSE_DB_WORKERS)
  This provides ~10-20x speedup for these slow generators

- Fix Debian package index fetching to fall back from .gz to .xz
  format (bookworm-updates only provides .xz)

- Collect all packages upfront before processing to enable
  accurate progress tracking and percentage calculation
@vpetersson vpetersson requested a review from Copilot January 18, 2026 13:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…akes a long time) and fixes up documentation
…g .deb packages

Previously, generating Ubuntu/Debian license databases required downloading
and extracting entire .deb packages to get copyright files. Large packages
(linux-image, chromium) could be gigabytes when extracted, causing CI runners
to run out of disk space.

Now we fetch copyright files directly from the distro changelogs servers:
- Ubuntu: changelogs.ubuntu.com
- Debian: metadata.ftp-master.debian.org

This uses zero disk space for the vast majority of packages. Only falls back
to .deb extraction (with targeted tar -O extraction) if HTTP fails.

Also reduced default parallel workers from 20 to 5 to limit disk usage
if the fallback path is triggered.
@vpetersson vpetersson merged commit 1167e3f into master Jan 18, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant