Skip to content

fix: populate product version in CPE output#2509

Open
SaInekK wants to merge 13 commits into
projectdiscovery:devfrom
SaInekK:fix-cpe-version-enrichment
Open

fix: populate product version in CPE output#2509
SaInekK wants to merge 13 commits into
projectdiscovery:devfrom
SaInekK:fix-cpe-version-enrichment

Conversation

@SaInekK
Copy link
Copy Markdown

@SaInekK SaInekK commented Jun 3, 2026

Proposed changes

Fixes #2476

Previously, httpx always emitted * for the version field of CPE 2.3 strings (cpe:2.3:a:vendor:product:*:...), even when the version was known. This PR fills that field using the version that wappalyzer already extracts during technology detection.

What changed:

  • Version enrichment — detected technology versions (wappalyzer's Name:version entries) are parsed into a lookup map and matched against CPE product names (case-insensitive). When a match is found, the CPE's version field is populated, e.g. cpe:2.3:a:vercel:next.js:14.2.3:*:.... Inputs are never mutated; when no version is known the CPE keeps its * (no regression).
  • The actual HTTPX is not detecting the product version in the CPE (Common Platform Enumeration) #2476 root cause — bare httpx -cpe did not enable technology detection, so wappalyzer never ran and versions could never populate. All tech-detect triggers (-tech-detect, JSON/CSV output, asset-upload, and now -cpe) are routed through a single techDetectRequired predicate so -cpe alone now turns detection on.
  • Hardening:
    • -cpe alone previously left the wappalyzer client nil while tech-detect was enabled → nil-pointer panic. The init gate now uses the same techDetectRequired predicate, so the invariant tech-detect ⇒ wappalyzer initialized holds across all call sites.
    • setCPEVersion validates a CPE 2.3 string has exactly 13 fields and skips enrichment when the version contains reserved/structural characters (:, *, ?) rather than emitting a malformed value.

Proof

Before: cpe:2.3:a:vercel:next.js:*:*:*:*:*:*:*:*
After: cpe:2.3:a:vercel:next.js:14.2.3:*:*:*:*:*:*:*

  • New unit tests in runner/cpe_test.go cover the version-string helpers, the techDetectRequired predicate (incl. the -cpe-alone case from HTTPX is not detecting the product version in the CPE (Common Platform Enumeration) #2476), the 13-field/reserved-char guards, and input immutability.
  • New functional testcase: scanme.sh {{binary}} -cpe -silent.
  • Verified locally against the full CI pipeline: golangci-lint (0 issues), go vet ./..., go build ./..., go build -race ./..., go test ./..., integration tests (21 passed), functional tests (24 passed), and an end-to-end run of httpx -cpe -silent against a live target (no panic).

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • Improvements
    • CPE output now includes detected product version info; tech-detection activation is centralized for consistent behavior.
  • Tests
    • Added unit tests for CPE version handling and enrichment; added a new functional test case.
  • Documentation
    • Clarified the CPE flag help text to indicate version-enhanced output.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds CPE version enrichment: centralizes tech-detect decision, sanitizes and safely injects product versions into CPE 2.3 entries, exposes EnrichCPEVersions, integrates enrichment into the runner flow, adds unit tests for the helpers, and adds a functional test case plus a help text update.

Changes

CPE Version Enrichment

Layer / File(s) Summary
CPE enrichment functions and unit tests
runner/cpe.go, runner/cpe_test.go
Implements techDetectRequired(), CPE version normalization and safe injection helpers, buildTechVersionMap() to parse "Name:version" entries, and exported EnrichCPEVersions() which returns a new []CPEInfo with injected versions when matched case-insensitively. Adds tests covering sanitization, decision logic, injection invariants, map building, enrichment behavior, and immutability.
Runner integration of CPE enrichment
runner/runner.go, runner/options.go
Centralizes tech-detect checks behind techDetectRequired(...) for Wappalyzer init, scan options, and screenshot-based fingerprinting; enriches cpeMatches via EnrichCPEVersions(...) before CPEDetect-gated output. Updates -cpe flag help text to mention product version.
Functional test for CPE detection
cmd/functional-test/testcases.txt
Adds scanme.sh {{binary}} -cpe -silent as a functional test case to exercise CPE detection and enrichment end-to-end.
sequenceDiagram
  participant Scanner
  participant Wappalyzer
  participant TechMap
  participant EnrichCPEVersions
  participant CPEMatches

  Scanner->>Wappalyzer: detect technologies
  Wappalyzer-->>Scanner: technologies with versions
  Scanner->>TechMap: buildTechVersionMap(technologies)
  TechMap-->>EnrichCPEVersions: name:version map
  Scanner->>CPEMatches: fetch CPE matches
  CPEMatches-->>EnrichCPEVersions: CPEInfo entries with version=*
  EnrichCPEVersions->>EnrichCPEVersions: case-insensitive product lookup
  EnrichCPEVersions->>EnrichCPEVersions: sanitize and inject versions
  EnrichCPEVersions-->>Scanner: enriched CPEInfo with detected versions
Loading

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 I hopped through CPE fields with care,
Found versions hiding in tech's snare,
Cleaned and stitched each version in,
No more asterisks where they had been,
Now scanners hum with CVE-ready flair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: populate product version in CPE output' directly describes the main change—enriching CPE strings with product versions detected by Wappalyzer.
Linked Issues check ✅ Passed The PR fully addresses issue #2476 by implementing CPE version enrichment using Wappalyzer-detected versions, enabling CVE lookups as requested.
Out of Scope Changes check ✅ Passed All changes align with the linked issue: CPE enrichment logic, tech-detection routing, validation guards, tests, and a functional test case for -cpe flag.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cmd/functional-test/testcases.txt (1)

24-24: ⚡ Quick win

This functional testcase does not assert version enrichment.

cmd/functional-test/main.go:48-63 only compares output counts, not contents. That means this line still passes if both binaries emit one CPE line and the version field stays *, so it won't catch the regression this PR is trying to prevent. Please add a content-aware assertion or a golden-style testcase for the enriched CPE string.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/functional-test/testcases.txt` at line 24, The test only verifies output
counts and misses CPE content; update the functional test to assert enriched CPE
strings. Modify cmd/functional-test/main.go (the comparison logic around lines
48-63) to parse the output for the specific CPE line emitted by scanme.sh and
assert the version field is enriched (not "*" or matches an expected version),
or add a golden-style entry in cmd/functional-test/testcases.txt (e.g., include
the expected enriched CPE string next to the invocation) and update main.go to
compare output lines against the golden value; reference the scanme.sh
invocation in testcases.txt and the output-compare routine in main.go when
adding the content-aware assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@runner/cpe.go`:
- Around line 115-120: The helper sanitizeCPEVersion currently lowercases the
detected version which alters semantic identifiers (e.g., 1.0.0-RC1 →
1.0.0-rc1); update sanitizeCPEVersion to preserve the original case and only
trim whitespace and convert internal spaces to underscores (use
strings.TrimSpace and strings.ReplaceAll but remove strings.ToLower), so the
function returns the normalized spacing/underscore form without changing letter
case.
- Around line 150-163: buildTechVersionMap currently overwrites duplicate
product names non-deterministically; update it to detect and handle conflicts
deterministically (e.g., skip ambiguous products). In buildTechVersionMap, track
seen versions per normalized product name (use a temporary map[string]string for
first-seen version and a set/map[string]bool to mark conflicting names), and
when encountering a different version for the same name mark it ambiguous and
remove/avoid adding it to the final versions map; ensure the returned map only
contains names with a single unambiguous version. This fixes nondeterminism when
technologies (from FingerprintWithInfo iteration) contain multiple versions for
the same product.

---

Nitpick comments:
In `@cmd/functional-test/testcases.txt`:
- Line 24: The test only verifies output counts and misses CPE content; update
the functional test to assert enriched CPE strings. Modify
cmd/functional-test/main.go (the comparison logic around lines 48-63) to parse
the output for the specific CPE line emitted by scanme.sh and assert the version
field is enriched (not "*" or matches an expected version), or add a
golden-style entry in cmd/functional-test/testcases.txt (e.g., include the
expected enriched CPE string next to the invocation) and update main.go to
compare output lines against the golden value; reference the scanme.sh
invocation in testcases.txt and the output-compare routine in main.go when
adding the content-aware assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4838feb2-5058-4cbb-adf9-ae6e379b0177

📥 Commits

Reviewing files that changed from the base of the PR and between 8e7c09a and 588fb01.

📒 Files selected for processing (4)
  • cmd/functional-test/testcases.txt
  • runner/cpe.go
  • runner/cpe_test.go
  • runner/runner.go

Comment thread runner/cpe.go
Comment thread runner/cpe.go
@SaInekK SaInekK changed the title feat: fill product version in CPE output (#2476) feat: fill product version in CPE output Jun 4, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
runner/cpe.go (1)

166-168: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the “returns a copy” contract on early return paths.

When technologies is empty, this returns the original slice, not a copy. Either return a copied slice or update the function doc to avoid contract drift.

Suggested fix
 func EnrichCPEVersions(matches []CPEInfo, technologies []string) []CPEInfo {
 	if len(matches) == 0 || len(technologies) == 0 {
-		return matches
+		return append([]CPEInfo(nil), matches...)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/cpe.go` around lines 166 - 168, The early-return in runner/cpe.go
currently returns the original matches slice when len(matches)==0 ||
len(technologies)==0, violating the “returns a copy” contract; update that
early-return to return a shallow copy of matches instead of the original slice
(e.g., allocate a new slice and copy or append into a nil slice) so callers
always get a distinct slice instance; change the return at the
matches/technologies empty-check to return the copied slice while keeping the
same semantics.
♻️ Duplicate comments (2)
runner/cpe.go (2)

144-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve duplicate technology versions deterministically.

buildTechVersionMap currently overwrites duplicates for the same normalized product key, so final enrichment can depend on input ordering. Use an explicit conflict policy (e.g., drop ambiguous products).

Suggested fix (skip ambiguous duplicates)
 func buildTechVersionMap(technologies []string) map[string]string {
 	versions := make(map[string]string, len(technologies))
+	conflicts := make(map[string]struct{})
 	for _, tech := range technologies {
 		parts := strings.SplitN(tech, ":", 2)
 		if len(parts) != 2 {
 			continue
 		}
 		name := strings.ToLower(strings.TrimSpace(parts[0]))
 		version := strings.TrimSpace(parts[1])
 		if name == "" || version == "" {
 			continue
 		}
+		if _, conflicted := conflicts[name]; conflicted {
+			continue
+		}
+		if existing, ok := versions[name]; ok && existing != version {
+			delete(versions, name)
+			conflicts[name] = struct{}{}
+			continue
+		}
 		versions[name] = version
 	}
 	return versions
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/cpe.go` around lines 144 - 157, buildTechVersionMap currently
overwrites entries for the same normalized product key; change it to apply a
deterministic conflict policy that skips ambiguous duplicates: while iterating
technologies, normalize name and version as you already do, but keep a separate
map (or extend versions) to record the first-seen version and a flag for
ambiguity; if you encounter the same name with a different version, remove (or
mark) that name from versions so it is not returned; ensure the logic references
buildTechVersionMap, the iterations over technologies, and the name/version
variables so duplicates are detected and ambiguous products are skipped instead
of being overwritten.

113-118: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve detected version casing in CPE enrichment.

Lowercasing the detected version (RC1rc1) changes the value being mapped to CPE/CVE data and can reduce match fidelity.

Suggested fix
 func sanitizeCPEVersion(version string) string {
-	return strings.ToLower(strings.ReplaceAll(strings.TrimSpace(version), " ", "_"))
+	return strings.ReplaceAll(strings.TrimSpace(version), " ", "_")
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/cpe.go` around lines 113 - 118, The sanitizeCPEVersion function
currently lowercases the detected version which alters semantics (e.g., "RC1" ->
"rc1") and can reduce CPE/CVE match fidelity; update sanitizeCPEVersion to
preserve the original casing while still trimming whitespace and replacing
spaces with underscores (i.e., remove the strings.ToLower call) so versions are
normalized for embedding but not case-normalized, and ensure any callers (e.g.,
generateCPE) continue to use sanitizeCPEVersion for version normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@runner/cpe.go`:
- Around line 166-168: The early-return in runner/cpe.go currently returns the
original matches slice when len(matches)==0 || len(technologies)==0, violating
the “returns a copy” contract; update that early-return to return a shallow copy
of matches instead of the original slice (e.g., allocate a new slice and copy or
append into a nil slice) so callers always get a distinct slice instance; change
the return at the matches/technologies empty-check to return the copied slice
while keeping the same semantics.

---

Duplicate comments:
In `@runner/cpe.go`:
- Around line 144-157: buildTechVersionMap currently overwrites entries for the
same normalized product key; change it to apply a deterministic conflict policy
that skips ambiguous duplicates: while iterating technologies, normalize name
and version as you already do, but keep a separate map (or extend versions) to
record the first-seen version and a flag for ambiguity; if you encounter the
same name with a different version, remove (or mark) that name from versions so
it is not returned; ensure the logic references buildTechVersionMap, the
iterations over technologies, and the name/version variables so duplicates are
detected and ambiguous products are skipped instead of being overwritten.
- Around line 113-118: The sanitizeCPEVersion function currently lowercases the
detected version which alters semantics (e.g., "RC1" -> "rc1") and can reduce
CPE/CVE match fidelity; update sanitizeCPEVersion to preserve the original
casing while still trimming whitespace and replacing spaces with underscores
(i.e., remove the strings.ToLower call) so versions are normalized for embedding
but not case-normalized, and ensure any callers (e.g., generateCPE) continue to
use sanitizeCPEVersion for version normalization.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b6ceee74-15ad-42f4-a729-045e3d7765a5

📥 Commits

Reviewing files that changed from the base of the PR and between c4e0819 and 47c471f.

📒 Files selected for processing (1)
  • runner/cpe.go

@SaInekK SaInekK changed the title feat: fill product version in CPE output fix: populate product version in CPE output Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTPX is not detecting the product version in the CPE (Common Platform Enumeration)

1 participant