Skip to content

Conversation

@dogancanbakir
Copy link
Member

@dogancanbakir dogancanbakir commented Jan 6, 2026

Summary

Adds support for passive detection of CPE (Common Platform Enumeration) identifiers and WordPress plugins/themes using the awesome-search-queries database.

CPE Detection (-cpe flag)

  • Matches response title, body, and favicon hash against patterns extracted from Shodan, FOFA, and Google dork queries
  • Extracts product and vendor information
  • Generates CPE 2.3 identifiers (e.g., cpe:2.3:a:vendor:product:*:*:*:*:*:*:*:*)

WordPress Detection (-wp flag)

  • Detects plugins via /wp-content/plugins/[name]/ patterns in response body
  • Detects themes via /wp-content/themes/[name]/ patterns in response body
  • Validates against known plugins/themes list from awesome-search-queries

New CLI Flags (PROBES group)

Flag Description
-cpe Display CPE (Common Platform Enumeration) based on awesome-search-queries
-wp, -wordpress Display WordPress plugins and themes

Both are automatically included in JSON/CSV output.

Testing

# Test CPE detection
echo "https://jira.atlassian.com" | go run . -cpe -silent
# Output: https://jira.atlassian.com [cpe:2.3:a:stagil:stagil_navigation:*:*:*:*:*:*:*:*]

# Test WordPress detection
echo "https://wordpress.org" | go run . -wp -silent
# Output: https://wordpress.org [wp-plugins:gutenberg]

# Test both flags together
echo "https://wordpress.org" | go run . -cpe -wp -silent
# Output: https://wordpress.org [cpe:2.3:a:webp:webp_server_go:*:*:*:*:*:*:*:*] [wp-plugins:gutenberg]

# Test JSON output (CPE and WordPress included automatically)
echo "https://wordpress.org" | go run . -j -silent | jq '{cpe,wordpress}'
# Output:
# {
#   "cpe": [{"product": "webp_server_go", "vendor": "webp", "cpe": "cpe:2.3:a:webp:webp_server_go:*:*:*:*:*:*:*:*"}],
#   "wordpress": {"plugins": ["gutenberg"]}
# }

# Test with tech-detect for comparison
echo "https://wordpress.org" | go run . -td -cpe -wp -silent

Test plan

  • Test CPE detection on various sites (Jenkins, Jira, GitLab)
  • Test WordPress plugin detection
  • Test WordPress theme detection
  • Test JSON output includes CPE and WordPress data
  • Test CSV output includes CPE and WordPress data
  • Verify no false positives with validation against known lists

Closes #1975

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CPE (Common Platform Enumeration) detection to identify and display software products, vendors, and versions discovered during scans.
    • Added WordPress detection to identify and display installed WordPress plugins and themes.
  • Documentation

    • Updated README with new command-line flags for CPE detection (-cpe) and WordPress detection (-wordpress).

✏️ Tip: You can customize this high-level summary in your review settings.

Add support for passive detection of CPE (Common Platform Enumeration)
identifiers and WordPress plugins/themes using awesome-search-queries.

CPE Detection (-cpe flag):
- Matches response title, body, and favicon hash against patterns
- Extracts product, vendor, and generates CPE 2.3 identifiers
- Uses patterns from Shodan, FOFA, Google dorks

WordPress Detection (-wp flag):
- Detects plugins via /wp-content/plugins/[name]/ patterns
- Detects themes via /wp-content/themes/[name]/ patterns
- Validates against known plugins/themes list

New CLI flags in PROBES group:
- -cpe: display CPE based on awesome-search-queries
- -wp, -wordpress: display WordPress plugins and themes

Both are automatically included in JSON/CSV output.

Closes #1975
@auto-assign auto-assign bot requested a review from dwisiswant0 January 6, 2026 10:24
@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

The PR adds passive CPE and WordPress detection capabilities to httpx by integrating the awesome-search-queries library. Two new detector modules are introduced: CPEDetector for identifying products via pattern matching against title, body, and favicon hashes, and WordPressDetector for extracting WordPress plugin and theme names from HTML responses. Both detectors are initialized conditionally in the runner, and their results are attached to the output Result structure via new fields.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added -cpe and -wordpress flag documentation to the PROBES section; reformatted alignment and expanded flag descriptions.
Dependency Management
go.mod
Added indirect dependency on github.com/projectdiscovery/awesome-search-queries for query and plugin/theme data.
CLI Options
runner/options.go
Introduced CPEDetect and WordPress boolean fields to ScanOptions and Options structs; added corresponding -cpe and -wordpress command-line flags.
CPE Detection
runner/cpe.go
New module implementing CPEDetector with pattern-based matching against title, body, and favicon hashes; includes vendor parsing, CPE string generation, and deduplication utilities.
WordPress Detection
runner/wordpress.go
New module implementing WordPressDetector that loads plugin and theme lists from awesome-search-queries and extracts matches from HTML via regex with deduplication.
Output Structure
runner/types.go
Extended Result struct with CPE (slice of CPEInfo) and WordPress (pointer to WordPressInfo) fields, including JSON/CSV/mapstructure tags.
Runner Integration
runner/runner.go
Added cpeDetector and wpDetector fields to Runner; conditional initialization during runner creation; detector invocations integrated into scan result processing to populate new Result fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Two detectors hopping through the code,
CPE and WordPress on the road,
Patterns matched, plugins found with glee,
Results enriched for all to see! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main feature addition: passive CPE and WordPress detection, matching the changeset's primary focus across multiple files (cpe.go, wordpress.go, options.go, runner.go).
Linked Issues check ✅ Passed The PR fully addresses issue #1975 objectives: implements passive CPE detection with product/vendor extraction and CPE 2.3 identifier generation [cpe.go], implements WordPress plugin/theme detection with validation [wordpress.go], exposes CLI flags -cpe and -wordpress [options.go], and includes results in structured outputs [runner.go, types.go].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #1975 objectives: CPE detection, WordPress detection, CLI flags, and output integration. The go.mod dependency addition (awesome-search-queries) is required for core functionality.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @go.mod:
- Line 131: The dependency github.com/projectdiscovery/awesome-search-queries is
marked as indirect in go.mod but is directly imported by runner/cpe.go and
runner/wordpress.go; run `go mod tidy` (or manually remove the `// indirect`
comment and ensure the require line matches
`github.com/projectdiscovery/awesome-search-queries
v0.0.0-20260104120501-961ef30f7193`) so the module is recorded as a direct
dependency, then re-run `go build`/tests to confirm imports in runner/cpe.go and
runner/wordpress.go resolve correctly.

In @runner/cpe.go:
- Around line 106-116: The code in the loop over titlePrefixes redundantly
strips the prefix twice: first with strings.TrimPrefix(query, prefix) then again
with strings.TrimPrefix(..., prefix[:len(prefix)-1]); update the logic in the
block handling titlePrefixes (the loop using titlePrefixes, extractQuotedValue,
and writing into d.titlePatterns via appendUnique) so you only strip the
intended prefix once—either remove the second TrimPrefix call entirely, or
replace the first TrimPrefix with logic that conditionally trims the variant
without its last character when the prefix form includes a trailing quote (e.g.,
handle prefixes like `title='`/`title="`), then normalize to lowercase and
proceed to set d.titlePatterns[pattern] as before.
🧹 Nitpick comments (1)
runner/runner.go (1)

2348-2371: Potential nil pointer dereference in WordPress detection output.

Line 2351 calls wpInfo.HasData() which is safe because HasData() has a nil receiver check. However, line 2350 uses r.wpDetector.Detect(...) which can return nil, and the current flow relies on HasData() handling nil correctly. The code is technically safe due to the HasData() implementation, but the pattern is fragile.

🔎 Suggested defensive pattern

For consistency with cpeMatches (which uses a nil-safe slice), consider:

 	var wpInfo *WordPressInfo
 	if r.wpDetector != nil {
 		wpInfo = r.wpDetector.Detect(string(resp.Data))
-		if wpInfo.HasData() && r.options.WordPress {
+		if wpInfo != nil && wpInfo.HasData() && r.options.WordPress {

This makes the nil check explicit and doesn't rely on HasData() implementation details.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc2c7a2 and 81461d3.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • README.md
  • go.mod
  • runner/cpe.go
  • runner/options.go
  • runner/runner.go
  • runner/types.go
  • runner/wordpress.go
🧰 Additional context used
🧬 Code graph analysis (2)
runner/types.go (2)
runner/cpe.go (1)
  • CPEInfo (11-15)
runner/wordpress.go (1)
  • WordPressInfo (12-15)
runner/runner.go (2)
runner/cpe.go (3)
  • CPEDetector (17-21)
  • NewCPEDetector (35-68)
  • CPEInfo (11-15)
runner/wordpress.go (3)
  • WordPressDetector (17-22)
  • NewWordPressDetector (24-59)
  • WordPressInfo (12-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Functional Test (macOS-latest)
  • GitHub Check: Functional Test (windows-latest)
  • GitHub Check: Functional Test (ubuntu-latest)
  • GitHub Check: Analyze (go)
  • GitHub Check: release-test
🔇 Additional comments (20)
runner/options.go (4)

88-89: New fields properly added to ScanOptions.

The CPEDetect and WordPress boolean fields are correctly added to the ScanOptions struct, aligning with the new detection capabilities.


153-154: Clone method correctly updated.

The new fields are properly propagated in the Clone() method, ensuring cloned instances retain the CPE and WordPress detection settings.


263-264: Options struct correctly extended.

The CPEDetect and WordPress fields are properly added to the main Options struct, matching the ScanOptions additions.


396-397: CLI flags correctly defined.

The new -cpe and -wordpress/-wp flags are properly wired with clear descriptions. They are appropriately placed in the PROBES group alongside related detection features like -td (tech-detect).

runner/cpe.go (5)

11-21: Well-structured data types.

The CPEInfo and CPEDetector structs are cleanly designed with appropriate JSON tags and encapsulation of pattern maps for title, body, and favicon matching.


35-68: Constructor properly initializes detector from query data.

The NewCPEDetector function correctly loads and parses queries from the awesome-search-queries library, building pattern maps for each query type. Error handling is appropriate with wrapped errors for context.


70-91: Vendor parsing and CPE generation are correct.

The parseVendor function handles both string and array vendor formats gracefully. The generateCPE function produces valid CPE 2.3 identifiers with proper escaping of spaces to underscores.


153-170: Quote extraction logic is correct.

The extractQuotedValue function properly handles quoted strings and truncates at logical OR operators, which is appropriate for parsing search query syntax.


181-225: Detection logic correctly matches patterns and deduplicates results.

The Detect method efficiently checks title, body, and favicon patterns with case-insensitive matching and proper deduplication using a seen map. The approach of using strings.Contains for title/body and exact match for favicon hash is appropriate.

README.md (1)

113-117: Documentation correctly updated for new flags.

The README properly documents the new -cpe and -wordpress/-wp flags with clear descriptions that align with the implementation in runner/options.go.

runner/types.go (1)

105-106: Result struct correctly extended with new detection fields.

The CPE slice and WordPress pointer fields are properly added with consistent tags (json, csv, mapstructure) matching the existing field conventions. Using a pointer for WordPressInfo enables proper omitempty behavior for nil values.

runner/runner.go (4)

84-85: Runner struct correctly extended with detector fields.

The cpeDetector and wpDetector fields are properly added to the Runner struct to hold the initialized detectors.


138-150: Graceful initialization with appropriate error handling.

The detectors are initialized conditionally based on flags or JSON/CSV output requirements. Using warning logs instead of fatal errors on initialization failure is a good practice, allowing the scan to proceed without these optional features.


316-317: Scan options correctly propagate detection flags.

The CPEDetect and WordPress flags are properly set in scanopts, enabling detection when explicitly requested or when structured output (JSON/CSV) is enabled.


2436-2437: Result population is correct.

The CPE and WordPress fields are properly assigned to the result struct, integrating the detection data into the output pipeline.

runner/wordpress.go (5)

12-22: Clean data structure design.

The WordPressInfo and WordPressDetector structs are well-designed. Using map[string]struct{} for known plugins/themes provides O(1) lookups, and the compiled regex patterns are efficient for repeated matching.


24-59: Constructor properly initializes detector with external data.

The NewWordPressDetector function correctly:

  • Compiles regex patterns for plugin/theme path extraction
  • Loads known plugins/themes from the external library
  • Returns errors appropriately on any failure

61-70: List loading helper is correct.

The loadList function properly parses newline-delimited data, trims whitespace, and populates the target map. Returning scanner.Err() is correct for propagating any scanning errors.


72-114: Detection logic is correct with proper deduplication and validation.

The Detect method:

  • Returns early on empty body
  • Uses regex to extract plugin/theme names from paths
  • Validates against known lists to reduce false positives
  • Deduplicates matches using seen maps
  • Returns nil when no matches are found

116-118: Nil-safe HasData method is a good pattern.

The nil receiver check (w != nil) ensures safe usage even when Detect() returns nil, which is the pattern used in runner/runner.go.

github.com/pierrec/lz4/v4 v4.1.23 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
github.com/projectdiscovery/awesome-search-queries v0.0.0-20260104120501-961ef30f7193 // indirect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the runner directory and check what files exist
find . -type f -name "cpe.go" -o -name "wordpress.go" | head -20

Repository: projectdiscovery/httpx

Length of output: 103


🏁 Script executed:

# Also check if go.mod exists and verify line 131
if [ -f go.mod ]; then
  sed -n '130,132p' go.mod
fi

Repository: projectdiscovery/httpx

Length of output: 303


🏁 Script executed:

# Search for imports of awesome-search-queries in the codebase
rg "awesome-search-queries" --type go

Repository: projectdiscovery/httpx

Length of output: 397


Dependency should be marked as direct, not indirect.

The awesome-search-queries dependency is directly imported in runner/cpe.go and runner/wordpress.go, so it should not be marked as // indirect in go.mod. Run go mod tidy to correct this automatically.

🤖 Prompt for AI Agents
In @go.mod at line 131, The dependency
github.com/projectdiscovery/awesome-search-queries is marked as indirect in
go.mod but is directly imported by runner/cpe.go and runner/wordpress.go; run
`go mod tidy` (or manually remove the `// indirect` comment and ensure the
require line matches `github.com/projectdiscovery/awesome-search-queries
v0.0.0-20260104120501-961ef30f7193`) so the module is recorded as a direct
dependency, then re-run `go build`/tests to confirm imports in runner/cpe.go and
runner/wordpress.go resolve correctly.

Comment on lines +106 to +116
for _, prefix := range titlePrefixes {
if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {
pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))
pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1])
if pattern != "" {
pattern = strings.ToLower(pattern)
d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)
}
return
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Suspicious redundant prefix stripping.

Line 109 attempts to strip prefix[:len(prefix)-1] after already trimming the full prefix on line 108. This appears to be dead code or a logic error — if the prefix was already removed, stripping a shorter version won't have any effect.

🔎 Suggested fix

If the intent was to handle cases where the prefix includes a trailing quote character (like title=' or title="), line 108 should handle that. Consider removing line 109 if it's redundant:

 		if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {
 			pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))
-			pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1])
 			if pattern != "" {
 				pattern = strings.ToLower(pattern)
 				d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)
 			}
 			return
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, prefix := range titlePrefixes {
if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {
pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))
pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1])
if pattern != "" {
pattern = strings.ToLower(pattern)
d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)
}
return
}
}
for _, prefix := range titlePrefixes {
if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {
pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))
if pattern != "" {
pattern = strings.ToLower(pattern)
d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)
}
return
}
}
🤖 Prompt for AI Agents
In @runner/cpe.go around lines 106 - 116, The code in the loop over
titlePrefixes redundantly strips the prefix twice: first with
strings.TrimPrefix(query, prefix) then again with strings.TrimPrefix(...,
prefix[:len(prefix)-1]); update the logic in the block handling titlePrefixes
(the loop using titlePrefixes, extractQuotedValue, and writing into
d.titlePatterns via appendUnique) so you only strip the intended prefix
once—either remove the second TrimPrefix call entirely, or replace the first
TrimPrefix with logic that conditionally trims the variant without its last
character when the prefix form includes a trailing quote (e.g., handle prefixes
like `title='`/`title="`), then normalize to lowercase and proceed to set
d.titlePatterns[pattern] as before.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Anyway I think that wordpress fingerprinting would be better suited within a nuclei template.

@dogancanbakir dogancanbakir merged commit 834bbd7 into dev Jan 8, 2026
15 checks passed
@dogancanbakir dogancanbakir deleted the feature/cpe-wordpress-detection branch January 8, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Passive CPE and Wordpress Plugin / Theme Detection

3 participants