Skip to content

CPE matching sometimes incorrect for software without vendor data #41644

@getvictor

Description

@getvictor

Fleet versions

  • Discovered: 4.80.2
  • Reproduced: 4.80.2

💥 Actual behavior

When software has no vendor data (common for python_packages, deb_packages, homebrew_packages, and npm_packages sources), the CPE matching algorithm can return different vendors across runs for the same software. This is because multiple CPE entries in the NVD database share the same product name but have different vendors (e.g., debian:duplicity vs duplicity_project:duplicity), and SQLite does not guarantee row ordering.

Even with deterministic ordering (ORDER BY), the algorithm has no way to prefer the correct vendor when the software record has no vendor field. For example, requests from python_packages could match jenkins:requests or python:requests and the algorithm cannot distinguish between them.

The source field (e.g., python_packages, npm_packages) maps to a target_sw value (e.g., python, node.js) that is already used in the CPE output string, but is not currently used during candidate ranking. This information could serve as an additional scoring signal to prefer CPE entries whose vendor relates to the software's ecosystem.

🛠️ To fix

Use the software's source-to-target_sw mapping as an additional scoring signal when ranking CPE candidates. For example, if source = "python_packages" maps to target_sw = "python", score CPE entries higher when their vendor contains "python". This would correctly prefer python:requests over jenkins:requests for Python packages.

🧑‍💻 Steps to reproduce

These steps:

  • Have been confirmed to consistently lead to reproduction in multiple Fleet instances.
  1. Ensure hosts report software from python_packages source (e.g., requests, duplicity, protobuf, Markdown).
  2. Run fleet vuln_processing --dev and query CPE assignments for these packages.
  3. Note the vendor in the CPE (e.g., jenkins:requests vs python:requests).
  4. The assigned vendor may be incorrect because the algorithm cannot distinguish between CPE entries with the same product name when no vendor data is available.

🕯️ More info

Only rpm_packages has vendor data, and it is the distro name (e.g., "Fedora Project"), not the upstream vendor.

Related issue: #39899

Metadata

Metadata

Labels

#g-security-complianceSecurity & Compliance product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.bugSomething isn't working as documented~assisting qaThis issue can be QA'd by anyone outside the QA team when capacity allows~released bugThis bug was found in a stable release.

Type

No type

Projects

Status

✅ Ready for release

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions