fix: populate product version in CPE output by SaInekK · Pull Request #2509 · projectdiscovery/httpx

SaInekK · 2026-06-03T23:56:13Z

Proposed changes

Previously, httpx always emitted * for the version field of CPE 2.3 strings (cpe:2.3:a:vendor:product:*:...), even when the version was known. This PR fills that field using the version that wappalyzer already extracts during technology detection.

What changed:

Version enrichment — detected technology versions (wappalyzer's Name:version entries) are parsed into a lookup map and matched against CPE product names (case-insensitive). When a match is found, the CPE's version field is populated, e.g. cpe:2.3:a:vercel:next.js:14.2.3:*:.... Inputs are never mutated; when no version is known the CPE keeps its * (no regression).
The actual HTTPX is not detecting the product version in the CPE (Common Platform Enumeration) #2476 root cause — bare httpx -cpe did not enable technology detection, so wappalyzer never ran and versions could never populate. All tech-detect triggers (-tech-detect, JSON/CSV output, asset-upload, and now -cpe) are routed through a single techDetectRequired predicate so -cpe alone now turns detection on.
Hardening:
- -cpe alone previously left the wappalyzer client nil while tech-detect was enabled → nil-pointer panic. The init gate now uses the same techDetectRequired predicate, so the invariant tech-detect ⇒ wappalyzer initialized holds across all call sites.
- setCPEVersion validates a CPE 2.3 string has exactly 13 fields and skips enrichment when the version contains reserved/structural characters (:, *, ?) rather than emitting a malformed value.

Proof

Before: cpe:2.3:a:vercel:next.js:*:*:*:*:*:*:*:*
After: cpe:2.3:a:vercel:next.js:14.2.3:*:*:*:*:*:*:*

New unit tests in runner/cpe_test.go cover the version-string helpers, the techDetectRequired predicate (incl. the -cpe-alone case from HTTPX is not detecting the product version in the CPE (Common Platform Enumeration) #2476), the 13-field/reserved-char guards, and input immutability.
New functional testcase: scanme.sh {{binary}} -cpe -silent.
Verified locally against the full CI pipeline: golangci-lint (0 issues), go vet ./..., go build ./..., go build -race ./..., go test ./..., integration tests (21 passed), functional tests (24 passed), and an end-to-end run of httpx -cpe -silent against a live target (no panic).

Checklist

Pull request is created against the dev branch
All checks passed (lint, unit/integration/regression tests etc.) with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Summary by CodeRabbit

Improvements
- CPE output now includes detected product version info; tech-detection activation is centralized for consistent behavior.
Tests
- Added unit tests for CPE version handling and enrichment; added a new functional test case.
Documentation
- Clarified the CPE flag help text to indicate version-enhanced output.

…ry#2476)

coderabbitai · 2026-06-03T23:56:28Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds CPE version enrichment: centralizes tech-detect decision, sanitizes and safely injects product versions into CPE 2.3 entries, exposes EnrichCPEVersions, integrates enrichment into the runner flow, adds unit tests for the helpers, and adds a functional test case plus a help text update.

Changes

CPE Version Enrichment

Layer / File(s)	Summary
CPE enrichment functions and unit tests `runner/cpe.go`, `runner/cpe_test.go`	Implements `techDetectRequired()`, CPE version normalization and safe injection helpers, `buildTechVersionMap()` to parse "Name:version" entries, and exported `EnrichCPEVersions()` which returns a new `[]CPEInfo` with injected versions when matched case-insensitively. Adds tests covering sanitization, decision logic, injection invariants, map building, enrichment behavior, and immutability.
Runner integration of CPE enrichment `runner/runner.go`, `runner/options.go`	Centralizes tech-detect checks behind `techDetectRequired(...)` for Wappalyzer init, scan options, and screenshot-based fingerprinting; enriches `cpeMatches` via `EnrichCPEVersions(...)` before CPEDetect-gated output. Updates `-cpe` flag help text to mention product version.
Functional test for CPE detection `cmd/functional-test/testcases.txt`	Adds `scanme.sh {{binary}} -cpe -silent` as a functional test case to exercise CPE detection and enrichment end-to-end.

sequenceDiagram
  participant Scanner
  participant Wappalyzer
  participant TechMap
  participant EnrichCPEVersions
  participant CPEMatches

  Scanner->>Wappalyzer: detect technologies
  Wappalyzer-->>Scanner: technologies with versions
  Scanner->>TechMap: buildTechVersionMap(technologies)
  TechMap-->>EnrichCPEVersions: name:version map
  Scanner->>CPEMatches: fetch CPE matches
  CPEMatches-->>EnrichCPEVersions: CPEInfo entries with version=*
  EnrichCPEVersions->>EnrichCPEVersions: case-insensitive product lookup
  EnrichCPEVersions->>EnrichCPEVersions: sanitize and inject versions
  EnrichCPEVersions-->>Scanner: enriched CPEInfo with detected versions

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 I hopped through CPE fields with care,
Found versions hiding in tech's snare,
Cleaned and stitched each version in,
No more asterisks where they had been,
Now scanners hum with CVE-ready flair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: populate product version in CPE output' directly describes the main change—enriching CPE strings with product versions detected by Wappalyzer.
Linked Issues check	✅ Passed	The PR fully addresses issue `#2476` by implementing CPE version enrichment using Wappalyzer-detected versions, enabling CVE lookups as requested.
Out of Scope Changes check	✅ Passed	All changes align with the linked issue: CPE enrichment logic, tech-detection routing, validation guards, tests, and a functional test case for `-cpe` flag.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

cmd/functional-test/testcases.txt (1)
24-24: ⚡ Quick win

This functional testcase does not assert version enrichment.

cmd/functional-test/main.go:48-63 only compares output counts, not contents. That means this line still passes if both binaries emit one CPE line and the version field stays *, so it won't catch the regression this PR is trying to prevent. Please add a content-aware assertion or a golden-style testcase for the enriched CPE string.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/functional-test/testcases.txt` at line 24, The test only verifies output
counts and misses CPE content; update the functional test to assert enriched CPE
strings. Modify cmd/functional-test/main.go (the comparison logic around lines
48-63) to parse the output for the specific CPE line emitted by scanme.sh and
assert the version field is enriched (not "*" or matches an expected version),
or add a golden-style entry in cmd/functional-test/testcases.txt (e.g., include
the expected enriched CPE string next to the invocation) and update main.go to
compare output lines against the golden value; reference the scanme.sh
invocation in testcases.txt and the output-compare routine in main.go when
adding the content-aware assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@runner/cpe.go`:
- Around line 115-120: The helper sanitizeCPEVersion currently lowercases the
detected version which alters semantic identifiers (e.g., 1.0.0-RC1 →
1.0.0-rc1); update sanitizeCPEVersion to preserve the original case and only
trim whitespace and convert internal spaces to underscores (use
strings.TrimSpace and strings.ReplaceAll but remove strings.ToLower), so the
function returns the normalized spacing/underscore form without changing letter
case.
- Around line 150-163: buildTechVersionMap currently overwrites duplicate
product names non-deterministically; update it to detect and handle conflicts
deterministically (e.g., skip ambiguous products). In buildTechVersionMap, track
seen versions per normalized product name (use a temporary map[string]string for
first-seen version and a set/map[string]bool to mark conflicting names), and
when encountering a different version for the same name mark it ambiguous and
remove/avoid adding it to the final versions map; ensure the returned map only
contains names with a single unambiguous version. This fixes nondeterminism when
technologies (from FingerprintWithInfo iteration) contain multiple versions for
the same product.

---

Nitpick comments:
In `@cmd/functional-test/testcases.txt`:
- Line 24: The test only verifies output counts and misses CPE content; update
the functional test to assert enriched CPE strings. Modify
cmd/functional-test/main.go (the comparison logic around lines 48-63) to parse
the output for the specific CPE line emitted by scanme.sh and assert the version
field is enriched (not "*" or matches an expected version), or add a
golden-style entry in cmd/functional-test/testcases.txt (e.g., include the
expected enriched CPE string next to the invocation) and update main.go to
compare output lines against the golden value; reference the scanme.sh
invocation in testcases.txt and the output-compare routine in main.go when
adding the content-aware assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4838feb2-5058-4cbb-adf9-ae6e379b0177

📥 Commits

Reviewing files that changed from the base of the PR and between 8e7c09a and 588fb01.

📒 Files selected for processing (4)

cmd/functional-test/testcases.txt
runner/cpe.go
runner/cpe_test.go
runner/runner.go

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

runner/cpe.go (1)

166-168: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the “returns a copy” contract on early return paths.

When technologies is empty, this returns the original slice, not a copy. Either return a copied slice or update the function doc to avoid contract drift.

Suggested fix

 func EnrichCPEVersions(matches []CPEInfo, technologies []string) []CPEInfo {
 	if len(matches) == 0 || len(technologies) == 0 {
-		return matches
+		return append([]CPEInfo(nil), matches...)
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/cpe.go` around lines 166 - 168, The early-return in runner/cpe.go
currently returns the original matches slice when len(matches)==0 ||
len(technologies)==0, violating the “returns a copy” contract; update that
early-return to return a shallow copy of matches instead of the original slice
(e.g., allocate a new slice and copy or append into a nil slice) so callers
always get a distinct slice instance; change the return at the
matches/technologies empty-check to return the copied slice while keeping the
same semantics.

♻️ Duplicate comments (2)

runner/cpe.go (2)

144-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve duplicate technology versions deterministically.

buildTechVersionMap currently overwrites duplicates for the same normalized product key, so final enrichment can depend on input ordering. Use an explicit conflict policy (e.g., drop ambiguous products).

Suggested fix (skip ambiguous duplicates)

 func buildTechVersionMap(technologies []string) map[string]string {
 	versions := make(map[string]string, len(technologies))
+	conflicts := make(map[string]struct{})
 	for _, tech := range technologies {
 		parts := strings.SplitN(tech, ":", 2)
 		if len(parts) != 2 {
 			continue
 		}
 		name := strings.ToLower(strings.TrimSpace(parts[0]))
 		version := strings.TrimSpace(parts[1])
 		if name == "" || version == "" {
 			continue
 		}
+		if _, conflicted := conflicts[name]; conflicted {
+			continue
+		}
+		if existing, ok := versions[name]; ok && existing != version {
+			delete(versions, name)
+			conflicts[name] = struct{}{}
+			continue
+		}
 		versions[name] = version
 	}
 	return versions
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/cpe.go` around lines 144 - 157, buildTechVersionMap currently
overwrites entries for the same normalized product key; change it to apply a
deterministic conflict policy that skips ambiguous duplicates: while iterating
technologies, normalize name and version as you already do, but keep a separate
map (or extend versions) to record the first-seen version and a flag for
ambiguity; if you encounter the same name with a different version, remove (or
mark) that name from versions so it is not returned; ensure the logic references
buildTechVersionMap, the iterations over technologies, and the name/version
variables so duplicates are detected and ambiguous products are skipped instead
of being overwritten.

113-118: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve detected version casing in CPE enrichment.

Lowercasing the detected version (RC1 → rc1) changes the value being mapped to CPE/CVE data and can reduce match fidelity.

Suggested fix

 func sanitizeCPEVersion(version string) string {
-	return strings.ToLower(strings.ReplaceAll(strings.TrimSpace(version), " ", "_"))
+	return strings.ReplaceAll(strings.TrimSpace(version), " ", "_")
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/cpe.go` around lines 113 - 118, The sanitizeCPEVersion function
currently lowercases the detected version which alters semantics (e.g., "RC1" ->
"rc1") and can reduce CPE/CVE match fidelity; update sanitizeCPEVersion to
preserve the original casing while still trimming whitespace and replacing
spaces with underscores (i.e., remove the strings.ToLower call) so versions are
normalized for embedding but not case-normalized, and ensure any callers (e.g.,
generateCPE) continue to use sanitizeCPEVersion for version normalization.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@runner/cpe.go`:
- Around line 166-168: The early-return in runner/cpe.go currently returns the
original matches slice when len(matches)==0 || len(technologies)==0, violating
the “returns a copy” contract; update that early-return to return a shallow copy
of matches instead of the original slice (e.g., allocate a new slice and copy or
append into a nil slice) so callers always get a distinct slice instance; change
the return at the matches/technologies empty-check to return the copied slice
while keeping the same semantics.

---

Duplicate comments:
In `@runner/cpe.go`:
- Around line 144-157: buildTechVersionMap currently overwrites entries for the
same normalized product key; change it to apply a deterministic conflict policy
that skips ambiguous duplicates: while iterating technologies, normalize name
and version as you already do, but keep a separate map (or extend versions) to
record the first-seen version and a flag for ambiguity; if you encounter the
same name with a different version, remove (or mark) that name from versions so
it is not returned; ensure the logic references buildTechVersionMap, the
iterations over technologies, and the name/version variables so duplicates are
detected and ambiguous products are skipped instead of being overwritten.
- Around line 113-118: The sanitizeCPEVersion function currently lowercases the
detected version which alters semantics (e.g., "RC1" -> "rc1") and can reduce
CPE/CVE match fidelity; update sanitizeCPEVersion to preserve the original
casing while still trimming whitespace and replacing spaces with underscores
(i.e., remove the strings.ToLower call) so versions are normalized for embedding
but not case-normalized, and ensure any callers (e.g., generateCPE) continue to
use sanitizeCPEVersion for version normalization.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b6ceee74-15ad-42f4-a729-045e3d7765a5

📥 Commits

Reviewing files that changed from the base of the PR and between c4e0819 and 47c471f.

📒 Files selected for processing (1)

runner/cpe.go

SaInekK added 9 commits June 4, 2026 02:40

feat: add cpe version-string helpers

f239bb2

feat: parse detected tech versions into lookup map

55d2117

feat: enrich cpe matches with detected product versions

a100b6f

feat: inject detected version into cpe output

4a590ff

test: add cpe version enrichment functional testcase

393327e

fix: enable tech-detect for -cpe so versions populate (projectdiscove…

ca3a061

…ry#2476)

fix: skip cpe version enrichment for values with reserved chars

ba87a34

fix: prevent nil wappalyzer panic when -cpe used alone

b071d44

fix: tighten cpe 2.3 field validation and presize tech map

588fb01

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread runner/cpe.go

Comment thread runner/cpe.go

docs: note product version in -cpe flag description

c4e0819

SaInekK changed the title ~~feat: fill product version in CPE output (#2476)~~ feat: fill product version in CPE output Jun 4, 2026

docs: tighten cpe helper comments to match package style

47c471f

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

SaInekK changed the title ~~feat: fill product version in CPE output~~ fix: populate product version in CPE output Jun 4, 2026

SaInekK added 2 commits June 4, 2026 03:22

fix(cpe): preserve version case and drop ambiguous tech versions

75d0a8f

fix(cpe): return a copy from EnrichCPEVersions early-return path

fc04813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: populate product version in CPE output#2509

fix: populate product version in CPE output#2509
SaInekK wants to merge 13 commits into
projectdiscovery:devfrom
SaInekK:fix-cpe-version-enrichment

SaInekK commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SaInekK commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Proof

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SaInekK commented Jun 3, 2026 •

edited

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading