fix: populate product version in CPE output#2509
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds CPE version enrichment: centralizes tech-detect decision, sanitizes and safely injects product versions into CPE 2.3 entries, exposes EnrichCPEVersions, integrates enrichment into the runner flow, adds unit tests for the helpers, and adds a functional test case plus a help text update. ChangesCPE Version Enrichment
sequenceDiagram
participant Scanner
participant Wappalyzer
participant TechMap
participant EnrichCPEVersions
participant CPEMatches
Scanner->>Wappalyzer: detect technologies
Wappalyzer-->>Scanner: technologies with versions
Scanner->>TechMap: buildTechVersionMap(technologies)
TechMap-->>EnrichCPEVersions: name:version map
Scanner->>CPEMatches: fetch CPE matches
CPEMatches-->>EnrichCPEVersions: CPEInfo entries with version=*
EnrichCPEVersions->>EnrichCPEVersions: case-insensitive product lookup
EnrichCPEVersions->>EnrichCPEVersions: sanitize and inject versions
EnrichCPEVersions-->>Scanner: enriched CPEInfo with detected versions
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cmd/functional-test/testcases.txt (1)
24-24: ⚡ Quick winThis functional testcase does not assert version enrichment.
cmd/functional-test/main.go:48-63only compares output counts, not contents. That means this line still passes if both binaries emit one CPE line and the version field stays*, so it won't catch the regression this PR is trying to prevent. Please add a content-aware assertion or a golden-style testcase for the enriched CPE string.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/functional-test/testcases.txt` at line 24, The test only verifies output counts and misses CPE content; update the functional test to assert enriched CPE strings. Modify cmd/functional-test/main.go (the comparison logic around lines 48-63) to parse the output for the specific CPE line emitted by scanme.sh and assert the version field is enriched (not "*" or matches an expected version), or add a golden-style entry in cmd/functional-test/testcases.txt (e.g., include the expected enriched CPE string next to the invocation) and update main.go to compare output lines against the golden value; reference the scanme.sh invocation in testcases.txt and the output-compare routine in main.go when adding the content-aware assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@runner/cpe.go`:
- Around line 115-120: The helper sanitizeCPEVersion currently lowercases the
detected version which alters semantic identifiers (e.g., 1.0.0-RC1 →
1.0.0-rc1); update sanitizeCPEVersion to preserve the original case and only
trim whitespace and convert internal spaces to underscores (use
strings.TrimSpace and strings.ReplaceAll but remove strings.ToLower), so the
function returns the normalized spacing/underscore form without changing letter
case.
- Around line 150-163: buildTechVersionMap currently overwrites duplicate
product names non-deterministically; update it to detect and handle conflicts
deterministically (e.g., skip ambiguous products). In buildTechVersionMap, track
seen versions per normalized product name (use a temporary map[string]string for
first-seen version and a set/map[string]bool to mark conflicting names), and
when encountering a different version for the same name mark it ambiguous and
remove/avoid adding it to the final versions map; ensure the returned map only
contains names with a single unambiguous version. This fixes nondeterminism when
technologies (from FingerprintWithInfo iteration) contain multiple versions for
the same product.
---
Nitpick comments:
In `@cmd/functional-test/testcases.txt`:
- Line 24: The test only verifies output counts and misses CPE content; update
the functional test to assert enriched CPE strings. Modify
cmd/functional-test/main.go (the comparison logic around lines 48-63) to parse
the output for the specific CPE line emitted by scanme.sh and assert the version
field is enriched (not "*" or matches an expected version), or add a
golden-style entry in cmd/functional-test/testcases.txt (e.g., include the
expected enriched CPE string next to the invocation) and update main.go to
compare output lines against the golden value; reference the scanme.sh
invocation in testcases.txt and the output-compare routine in main.go when
adding the content-aware assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4838feb2-5058-4cbb-adf9-ae6e379b0177
📒 Files selected for processing (4)
cmd/functional-test/testcases.txtrunner/cpe.gorunner/cpe_test.gorunner/runner.go
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
runner/cpe.go (1)
166-168:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep the “returns a copy” contract on early return paths.
When
technologiesis empty, this returns the original slice, not a copy. Either return a copied slice or update the function doc to avoid contract drift.Suggested fix
func EnrichCPEVersions(matches []CPEInfo, technologies []string) []CPEInfo { if len(matches) == 0 || len(technologies) == 0 { - return matches + return append([]CPEInfo(nil), matches...) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@runner/cpe.go` around lines 166 - 168, The early-return in runner/cpe.go currently returns the original matches slice when len(matches)==0 || len(technologies)==0, violating the “returns a copy” contract; update that early-return to return a shallow copy of matches instead of the original slice (e.g., allocate a new slice and copy or append into a nil slice) so callers always get a distinct slice instance; change the return at the matches/technologies empty-check to return the copied slice while keeping the same semantics.
♻️ Duplicate comments (2)
runner/cpe.go (2)
144-157:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve duplicate technology versions deterministically.
buildTechVersionMapcurrently overwrites duplicates for the same normalized product key, so final enrichment can depend on input ordering. Use an explicit conflict policy (e.g., drop ambiguous products).Suggested fix (skip ambiguous duplicates)
func buildTechVersionMap(technologies []string) map[string]string { versions := make(map[string]string, len(technologies)) + conflicts := make(map[string]struct{}) for _, tech := range technologies { parts := strings.SplitN(tech, ":", 2) if len(parts) != 2 { continue } name := strings.ToLower(strings.TrimSpace(parts[0])) version := strings.TrimSpace(parts[1]) if name == "" || version == "" { continue } + if _, conflicted := conflicts[name]; conflicted { + continue + } + if existing, ok := versions[name]; ok && existing != version { + delete(versions, name) + conflicts[name] = struct{}{} + continue + } versions[name] = version } return versions }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@runner/cpe.go` around lines 144 - 157, buildTechVersionMap currently overwrites entries for the same normalized product key; change it to apply a deterministic conflict policy that skips ambiguous duplicates: while iterating technologies, normalize name and version as you already do, but keep a separate map (or extend versions) to record the first-seen version and a flag for ambiguity; if you encounter the same name with a different version, remove (or mark) that name from versions so it is not returned; ensure the logic references buildTechVersionMap, the iterations over technologies, and the name/version variables so duplicates are detected and ambiguous products are skipped instead of being overwritten.
113-118:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve detected version casing in CPE enrichment.
Lowercasing the detected version (
RC1→rc1) changes the value being mapped to CPE/CVE data and can reduce match fidelity.Suggested fix
func sanitizeCPEVersion(version string) string { - return strings.ToLower(strings.ReplaceAll(strings.TrimSpace(version), " ", "_")) + return strings.ReplaceAll(strings.TrimSpace(version), " ", "_") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@runner/cpe.go` around lines 113 - 118, The sanitizeCPEVersion function currently lowercases the detected version which alters semantics (e.g., "RC1" -> "rc1") and can reduce CPE/CVE match fidelity; update sanitizeCPEVersion to preserve the original casing while still trimming whitespace and replacing spaces with underscores (i.e., remove the strings.ToLower call) so versions are normalized for embedding but not case-normalized, and ensure any callers (e.g., generateCPE) continue to use sanitizeCPEVersion for version normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@runner/cpe.go`:
- Around line 166-168: The early-return in runner/cpe.go currently returns the
original matches slice when len(matches)==0 || len(technologies)==0, violating
the “returns a copy” contract; update that early-return to return a shallow copy
of matches instead of the original slice (e.g., allocate a new slice and copy or
append into a nil slice) so callers always get a distinct slice instance; change
the return at the matches/technologies empty-check to return the copied slice
while keeping the same semantics.
---
Duplicate comments:
In `@runner/cpe.go`:
- Around line 144-157: buildTechVersionMap currently overwrites entries for the
same normalized product key; change it to apply a deterministic conflict policy
that skips ambiguous duplicates: while iterating technologies, normalize name
and version as you already do, but keep a separate map (or extend versions) to
record the first-seen version and a flag for ambiguity; if you encounter the
same name with a different version, remove (or mark) that name from versions so
it is not returned; ensure the logic references buildTechVersionMap, the
iterations over technologies, and the name/version variables so duplicates are
detected and ambiguous products are skipped instead of being overwritten.
- Around line 113-118: The sanitizeCPEVersion function currently lowercases the
detected version which alters semantics (e.g., "RC1" -> "rc1") and can reduce
CPE/CVE match fidelity; update sanitizeCPEVersion to preserve the original
casing while still trimming whitespace and replacing spaces with underscores
(i.e., remove the strings.ToLower call) so versions are normalized for embedding
but not case-normalized, and ensure any callers (e.g., generateCPE) continue to
use sanitizeCPEVersion for version normalization.
Proposed changes
Fixes #2476
Previously, httpx always emitted
*for the version field of CPE 2.3 strings (cpe:2.3:a:vendor:product:*:...), even when the version was known. This PR fills that field using the version that wappalyzer already extracts during technology detection.What changed:
Name:versionentries) are parsed into a lookup map and matched against CPE product names (case-insensitive). When a match is found, the CPE's version field is populated, e.g.cpe:2.3:a:vercel:next.js:14.2.3:*:.... Inputs are never mutated; when no version is known the CPE keeps its*(no regression).httpx -cpedid not enable technology detection, so wappalyzer never ran and versions could never populate. All tech-detect triggers (-tech-detect, JSON/CSV output, asset-upload, and now-cpe) are routed through a singletechDetectRequiredpredicate so-cpealone now turns detection on.-cpealone previously left the wappalyzer clientnilwhile tech-detect was enabled → nil-pointer panic. The init gate now uses the sametechDetectRequiredpredicate, so the invariant tech-detect ⇒ wappalyzer initialized holds across all call sites.setCPEVersionvalidates a CPE 2.3 string has exactly 13 fields and skips enrichment when the version contains reserved/structural characters (:,*,?) rather than emitting a malformed value.Proof
Before:
cpe:2.3:a:vercel:next.js:*:*:*:*:*:*:*:*After:
cpe:2.3:a:vercel:next.js:14.2.3:*:*:*:*:*:*:*runner/cpe_test.gocover the version-string helpers, thetechDetectRequiredpredicate (incl. the-cpe-alone case from HTTPX is not detecting the product version in the CPE (Common Platform Enumeration) #2476), the 13-field/reserved-char guards, and input immutability.scanme.sh {{binary}} -cpe -silent.golangci-lint(0 issues),go vet ./...,go build ./...,go build -race ./...,go test ./..., integration tests (21 passed), functional tests (24 passed), and an end-to-end run ofhttpx -cpe -silentagainst a live target (no panic).Checklist
Summary by CodeRabbit