Skip to content

Add RHEL support to osv-processor#43277

Open
mostlikelee wants to merge 3 commits intomainfrom
tlee/43183-osv-processor-rhel
Open

Add RHEL support to osv-processor#43277
mostlikelee wants to merge 3 commits intomainfrom
tlee/43183-osv-processor-rhel

Conversation

@mostlikelee
Copy link
Copy Markdown
Contributor

@mostlikelee mostlikelee commented Apr 8, 2026

Resolves #43183

Summary

Extends the osv-processor CI tool to generate RHEL OSV artifacts from Red Hat's GCS-published vulnerability data. Adds a --platform rhel flag that processes Red Hat:enterprise_linux ecosystem entries, collapses repository/variant suffixes (appstream, baseos, server, workstation, etc.) to major version, and deduplicates CVE+package pairs across ecosystems.

How to test locally

Download the Red Hat OSV feed from GCS and run the processor:

# Download (~23MB)
curl -sL "https://storage.googleapis.com/osv-vulnerabilities/Red%20Hat/all.zip" \
  -o /tmp/rhel-osv.zip
unzip -q /tmp/rhel-osv.zip -d /tmp/rhel-osv

# Run processor
go run ./cmd/osv-processor \
  --platform rhel \
  --input /tmp/rhel-osv \
  --output /tmp/rhel-osv-artifacts \
  --versions "7,8,9,10"

# Inspect output
ls -lh /tmp/rhel-osv-artifacts/

Expected output (19,084 advisories, ~4 seconds):

RHEL 7:  4,041 packages, 4,610 CVEs  (~335KB)
RHEL 8:  6,764 packages, 6,868 CVEs  (~1.1MB)
RHEL 9:  4,415 packages, 5,922 CVEs  (~1.3MB)
RHEL 10: 1,851 packages, 966 CVEs    (~252KB)

Related

Summary by CodeRabbit

  • New Features

    • Added RHEL platform support with a new --platform flag to select between Ubuntu and RHEL processing modes
    • RHEL mode includes version-specific filtering and CVE extraction with deduplication capabilities
  • Tests

    • Added comprehensive test coverage for RHEL processing, including version parsing, CVE extraction logic, and artifact generation validation

Extends the osv-processor CI tool to generate RHEL OSV artifacts from
Red Hat's GCS data.

- Add --platform flag (ubuntu default, also accepts rhel)
- Add extractRHELVersion() for parsing Red Hat ecosystem strings
- Collapse repository/variant suffixes (appstream, baseos, server,
  workstation, etc.) to major version
- Deduplicate CVE+package pairs across ecosystems
- Output: osv-rhel-{VERSION}-{YYYY-MM-DD}.json.gz
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the cmd/osv-processor tool to support generating RHEL OSV artifacts from Red Hat’s OSV data feed by adding a --platform rhel mode with RHEL ecosystem parsing, CVE extraction, and cross-ecosystem de-duplication.

Changes:

  • Added --platform flag and config plumbing to choose between Ubuntu and RHEL processing paths.
  • Implemented RHEL artifact generation (runRHEL) including enterprise_linux version extraction and CVE list extraction per advisory.
  • Added unit tests covering RHEL ecosystem parsing, CVE extraction, artifact generation, de-duplication, and version filtering.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
cmd/osv-processor/main.go Adds platform selection and new RHEL processing pipeline + artifact format.
cmd/osv-processor/main_test.go Adds test coverage for new RHEL parsing/extraction and artifact generation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 61.07383% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.85%. Comparing base (61c89ae) to head (13ad378).
⚠️ Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
cmd/osv-processor/main.go 61.07% 42 Missing and 16 partials ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #43277    +/-   ##
========================================
  Coverage   66.85%   66.85%            
========================================
  Files        2587     2588     +1     
  Lines      207405   207707   +302     
  Branches     9170     9170            
========================================
+ Hits       138653   138861   +208     
- Misses      56143    56198    +55     
- Partials    12609    12648    +39     
Flag Coverage Δ
backend 68.64% <61.07%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mostlikelee mostlikelee marked this pull request as ready for review April 8, 2026 19:38
@mostlikelee mostlikelee requested a review from a team as a code owner April 8, 2026 19:38
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@dantecatalfamo
Copy link
Copy Markdown
Member

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Walkthrough

This PR extends the OSV processor to support RHEL (Red Hat Enterprise Linux) feeds alongside the existing Ubuntu functionality. A new --platform flag allows routing between the existing run() function for Ubuntu and a new runRHEL() function for RHEL. The RHEL implementation introduces a dedicated processing pipeline that parses .json OSV files, extracts CVEs from multiple fields, filters by RHEL version, deduplicates vulnerabilities across ecosystems, and outputs compressed artifact files. CLI flag defaults have been updated to reflect platform-specific behavior, with delta generation limited to Ubuntu and RHEL input defaulting to /tmp/rhel-osv. Tests validate ecosystem parsing, CVE extraction, artifact generation, and deduplication logic.

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding RHEL support to osv-processor tool.
Description check ✅ Passed PR description clearly explains the changes, includes related issue, testing instructions, and expected outputs. However, the description template checklist is not completed.
Linked Issues check ✅ Passed The PR successfully implements RHEL OSV feed generation following the Ubuntu pattern as required by issue #43183.
Out of Scope Changes check ✅ Passed All changes are focused on adding RHEL support to osv-processor with new platform flag, RHEL data structures, and related tests with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tlee/43183-osv-processor-rhel

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/osv-processor/main.go`:
- Around line 660-698: The dedupe uses the raw key (seen[rhelVer][vulnKey{pkg:
packageName, cve: cveID}]) before calling transformVuln, so normalized package
names/CVEs or modified vuln data get dropped; move the de-duplication check to
after calling transformVuln and computing vulnToUse and the transformed pkg
(i.e., perform seen lookup/insert using vulnKey{pkg: pkg, cve: vulnToUse.CVE}),
and when a duplicate is detected compare the existing recorded ProcessedVuln in
artifacts[rhelVer].Vulnerabilities[pkg] to vulnToUse and handle conflicts
explicitly (log or merge) instead of silently skipping to ensure differing
Fixed/Versions are not lost; update references to seen, vulnKey, transformVuln,
vulnToUse, artifacts and RHELArtifactData.Vulnerabilities accordingly.
- Around line 734-757: Function extractCVEIDs currently returns CVEs only from
Upstream or, as a fallback, from Related/ID; change it to build a union across
all supported fields (osv.Upstream, osv.Related, and osv.ID) so no CVEs are
dropped. Iterate all three sources, add any string starting with "CVE-" to the
result while deduplicating (use a map/set keyed by the CVE string) before
returning the slice. Keep the function name extractCVEIDs and ensure the
returned order is stable (e.g., append in the order Upstream, Related, then ID
if not already present).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a44c7e67-93f9-48eb-9516-62a80bec3b0f

📥 Commits

Reviewing files that changed from the base of the PR and between 3202402 and 13ad378.

📒 Files selected for processing (2)
  • cmd/osv-processor/main.go
  • cmd/osv-processor/main_test.go

Comment on lines +660 to +698
for _, cveID := range cveIDs {
// Deduplicate: same CVE+package can appear in baseos, appstream, crb
if seen[rhelVer] == nil {
seen[rhelVer] = make(map[vulnKey]struct{})
}
key := vulnKey{pkg: packageName, cve: cveID}
if _, exists := seen[rhelVer][key]; exists {
continue
}
seen[rhelVer][key] = struct{}{}

vuln := ProcessedVuln{
CVE: cveID,
Published: osvData.Published,
Modified: osvData.Modified,
Introduced: introduced,
Fixed: fixed,
Versions: affected.Versions,
}

packages, modifiedVuln := transformVuln(packageName, cveID, &vuln)
if packages == nil {
continue
}
vulnToUse := &vuln
if modifiedVuln != nil {
vulnToUse = modifiedVuln
}

for _, pkg := range packages {
if _, exists := artifacts[rhelVer]; !exists {
artifacts[rhelVer] = &RHELArtifactData{
SchemaVersion: "1.0",
RHELVersion: rhelVer,
Vulnerabilities: make(map[string][]ProcessedVuln),
}
}
artifacts[rhelVer].Vulnerabilities[pkg] = append(artifacts[rhelVer].Vulnerabilities[pkg], *vulnToUse)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The dedupe key doesn't match the record you actually emit.

seen is populated from the raw packageName/cveID pair before transformVuln runs, but the artifact is written with the transformed package/CVE. That means normalization happens after the duplicate check, and a later entry with different Fixed/Versions data is dropped purely because it was encountered second. Deduplicate on the emitted pkg + vulnToUse.CVE instead, and make conflicts explicit instead of silently picking one.

Suggested direction
-				key := vulnKey{pkg: packageName, cve: cveID}
-				if _, exists := seen[rhelVer][key]; exists {
-					continue
-				}
-				seen[rhelVer][key] = struct{}{}
-
 				vuln := ProcessedVuln{
 					CVE:        cveID,
 					Published:  osvData.Published,
 					Modified:   osvData.Modified,
 					Introduced: introduced,
 					Fixed:      fixed,
 					Versions:   affected.Versions,
 				}
@@
 				vulnToUse := &vuln
 				if modifiedVuln != nil {
 					vulnToUse = modifiedVuln
 				}
 
 				for _, pkg := range packages {
+					key := vulnKey{pkg: pkg, cve: vulnToUse.CVE}
+					if _, exists := seen[rhelVer][key]; exists {
+						continue
+					}
+					seen[rhelVer][key] = struct{}{}
+
 					if _, exists := artifacts[rhelVer]; !exists {
 						artifacts[rhelVer] = &RHELArtifactData{
 							SchemaVersion:   "1.0",
 							RHELVersion:     rhelVer,
 							Vulnerabilities: make(map[string][]ProcessedVuln),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, cveID := range cveIDs {
// Deduplicate: same CVE+package can appear in baseos, appstream, crb
if seen[rhelVer] == nil {
seen[rhelVer] = make(map[vulnKey]struct{})
}
key := vulnKey{pkg: packageName, cve: cveID}
if _, exists := seen[rhelVer][key]; exists {
continue
}
seen[rhelVer][key] = struct{}{}
vuln := ProcessedVuln{
CVE: cveID,
Published: osvData.Published,
Modified: osvData.Modified,
Introduced: introduced,
Fixed: fixed,
Versions: affected.Versions,
}
packages, modifiedVuln := transformVuln(packageName, cveID, &vuln)
if packages == nil {
continue
}
vulnToUse := &vuln
if modifiedVuln != nil {
vulnToUse = modifiedVuln
}
for _, pkg := range packages {
if _, exists := artifacts[rhelVer]; !exists {
artifacts[rhelVer] = &RHELArtifactData{
SchemaVersion: "1.0",
RHELVersion: rhelVer,
Vulnerabilities: make(map[string][]ProcessedVuln),
}
}
artifacts[rhelVer].Vulnerabilities[pkg] = append(artifacts[rhelVer].Vulnerabilities[pkg], *vulnToUse)
}
for _, cveID := range cveIDs {
// Deduplicate: same CVE+package can appear in baseos, appstream, crb
if seen[rhelVer] == nil {
seen[rhelVer] = make(map[vulnKey]struct{})
}
vuln := ProcessedVuln{
CVE: cveID,
Published: osvData.Published,
Modified: osvData.Modified,
Introduced: introduced,
Fixed: fixed,
Versions: affected.Versions,
}
packages, modifiedVuln := transformVuln(packageName, cveID, &vuln)
if packages == nil {
continue
}
vulnToUse := &vuln
if modifiedVuln != nil {
vulnToUse = modifiedVuln
}
for _, pkg := range packages {
key := vulnKey{pkg: pkg, cve: vulnToUse.CVE}
if _, exists := seen[rhelVer][key]; exists {
continue
}
seen[rhelVer][key] = struct{}{}
if _, exists := artifacts[rhelVer]; !exists {
artifacts[rhelVer] = &RHELArtifactData{
SchemaVersion: "1.0",
RHELVersion: rhelVer,
Vulnerabilities: make(map[string][]ProcessedVuln),
}
}
artifacts[rhelVer].Vulnerabilities[pkg] = append(artifacts[rhelVer].Vulnerabilities[pkg], *vulnToUse)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 660 - 698, The dedupe uses the raw
key (seen[rhelVer][vulnKey{pkg: packageName, cve: cveID}]) before calling
transformVuln, so normalized package names/CVEs or modified vuln data get
dropped; move the de-duplication check to after calling transformVuln and
computing vulnToUse and the transformed pkg (i.e., perform seen lookup/insert
using vulnKey{pkg: pkg, cve: vulnToUse.CVE}), and when a duplicate is detected
compare the existing recorded ProcessedVuln in
artifacts[rhelVer].Vulnerabilities[pkg] to vulnToUse and handle conflicts
explicitly (log or merge) instead of silently skipping to ensure differing
Fixed/Versions are not lost; update references to seen, vulnKey, transformVuln,
vulnToUse, artifacts and RHELArtifactData.Vulnerabilities accordingly.

Comment on lines +734 to +757
// extractCVEIDs returns all CVE IDs from an OSV entry.
// RHEL advisories list CVEs in the "upstream" field (same as Ubuntu).
func extractCVEIDs(osv *OSVData) []string {
var cves []string
for _, upstream := range osv.Upstream {
if strings.HasPrefix(upstream, "CVE-") {
cves = append(cves, upstream)
}
}
// Fallback: check Related field
if len(cves) == 0 {
for _, related := range osv.Related {
if strings.HasPrefix(related, "CVE-") {
cves = append(cves, related)
}
}
}
// Fallback: check ID itself
if len(cves) == 0 {
if strings.HasPrefix(osv.ID, "CVE-") {
cves = append(cves, osv.ID)
}
}
return cves
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

extractCVEIDs is dropping valid CVEs from mixed-field advisories.

The helper says it returns all CVE IDs, but Related and ID are only consulted when Upstream produced none. An advisory with Upstream=["CVE-1"] and Related=["CVE-2"] will emit only CVE-1, so the generated RHEL artifact is incomplete. Build a union across all supported fields and dedupe as you append.

Suggested fix
 func extractCVEIDs(osv *OSVData) []string {
 	var cves []string
+	seen := make(map[string]struct{})
+	add := func(id string) {
+		if !strings.HasPrefix(id, "CVE-") {
+			return
+		}
+		if _, ok := seen[id]; ok {
+			return
+		}
+		seen[id] = struct{}{}
+		cves = append(cves, id)
+	}
+
 	for _, upstream := range osv.Upstream {
-		if strings.HasPrefix(upstream, "CVE-") {
-			cves = append(cves, upstream)
-		}
+		add(upstream)
 	}
-	// Fallback: check Related field
-	if len(cves) == 0 {
-		for _, related := range osv.Related {
-			if strings.HasPrefix(related, "CVE-") {
-				cves = append(cves, related)
-			}
-		}
+
+	for _, related := range osv.Related {
+		add(related)
 	}
-	// Fallback: check ID itself
-	if len(cves) == 0 {
-		if strings.HasPrefix(osv.ID, "CVE-") {
-			cves = append(cves, osv.ID)
-		}
-	}
+
+	add(osv.ID)
 	return cves
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// extractCVEIDs returns all CVE IDs from an OSV entry.
// RHEL advisories list CVEs in the "upstream" field (same as Ubuntu).
func extractCVEIDs(osv *OSVData) []string {
var cves []string
for _, upstream := range osv.Upstream {
if strings.HasPrefix(upstream, "CVE-") {
cves = append(cves, upstream)
}
}
// Fallback: check Related field
if len(cves) == 0 {
for _, related := range osv.Related {
if strings.HasPrefix(related, "CVE-") {
cves = append(cves, related)
}
}
}
// Fallback: check ID itself
if len(cves) == 0 {
if strings.HasPrefix(osv.ID, "CVE-") {
cves = append(cves, osv.ID)
}
}
return cves
// extractCVEIDs returns all CVE IDs from an OSV entry.
// RHEL advisories list CVEs in the "upstream" field (same as Ubuntu).
func extractCVEIDs(osv *OSVData) []string {
var cves []string
seen := make(map[string]struct{})
add := func(id string) {
if !strings.HasPrefix(id, "CVE-") {
return
}
if _, ok := seen[id]; ok {
return
}
seen[id] = struct{}{}
cves = append(cves, id)
}
for _, upstream := range osv.Upstream {
add(upstream)
}
for _, related := range osv.Related {
add(related)
}
add(osv.ID)
return cves
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 734 - 757, Function extractCVEIDs
currently returns CVEs only from Upstream or, as a fallback, from Related/ID;
change it to build a union across all supported fields (osv.Upstream,
osv.Related, and osv.ID) so no CVEs are dropped. Iterate all three sources, add
any string starting with "CVE-" to the result while deduplicating (use a map/set
keyed by the CVE string) before returning the slice. Keep the function name
extractCVEIDs and ensure the returned order is stable (e.g., append in the order
Upstream, Related, then ID if not already present).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate OSV feeds

3 participants