CPE matching sometimes incorrect for software without vendor data

**Fleet versions**
  - *Discovered:* 4.80.2
  - *Reproduced:* 4.80.2

<hr/>

### 💥 Actual behavior

When software has no vendor data (common for `python_packages`, `deb_packages`, `homebrew_packages`, and `npm_packages` sources), the CPE matching algorithm can return different vendors across runs for the same software. This is because multiple CPE entries in the NVD database share the same product name but have different vendors (e.g., `debian:duplicity` vs `duplicity_project:duplicity`), and SQLite does not guarantee row ordering.

Even with deterministic ordering (`ORDER BY`), the algorithm has no way to prefer the correct vendor when the software record has no vendor field. For example, `requests` from `python_packages` could match `jenkins:requests` or `python:requests` and the algorithm cannot distinguish between them.

The `source` field (e.g., `python_packages`, `npm_packages`) maps to a `target_sw` value (e.g., `python`, `node.js`) that is already used in the CPE output string, but is not currently used during candidate ranking. This information could serve as an additional scoring signal to prefer CPE entries whose vendor relates to the software's ecosystem.

### 🛠️ To fix

Use the software's `source`-to-`target_sw` mapping as an additional scoring signal when ranking CPE candidates. For example, if `source = "python_packages"` maps to `target_sw = "python"`, score CPE entries higher when their vendor contains "python". This would correctly prefer `python:requests` over `jenkins:requests` for Python packages.

### 🧑‍💻 Steps to reproduce

These steps:

- [x] Have been confirmed to consistently lead to reproduction in multiple Fleet instances.

1. Ensure hosts report software from `python_packages` source (e.g., `requests`, `duplicity`, `protobuf`, `Markdown`).
2. Run `fleet vuln_processing --dev` and query CPE assignments for these packages.
3. Note the vendor in the CPE (e.g., `jenkins:requests` vs `python:requests`).
4. The assigned vendor may be incorrect because the algorithm cannot distinguish between CPE entries with the same product name when no vendor data is available.

### 🕯️ More info

Only `rpm_packages` has vendor data, and it is the distro name (e.g., "Fedora Project"), not the upstream vendor.

Related issue: https://github.com/fleetdm/fleet/issues/39899



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPE matching sometimes incorrect for software without vendor data #41644

💥 Actual behavior

🛠️ To fix

🧑‍💻 Steps to reproduce

🕯️ More info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CPE matching sometimes incorrect for software without vendor data #41644

Description

💥 Actual behavior

🛠️ To fix

🧑‍💻 Steps to reproduce

🕯️ More info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions