-
Notifications
You must be signed in to change notification settings - Fork 843
CPE matching sometimes incorrect for software without vendor data #41644
Copy link
Copy link
Open
Labels
#g-security-complianceSecurity & Compliance product groupSecurity & Compliance product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.Ready to write code. Scheduled in a release. See "Making changes" in handbook.bugSomething isn't working as documentedSomething isn't working as documented~assisting qaThis issue can be QA'd by anyone outside the QA team when capacity allowsThis issue can be QA'd by anyone outside the QA team when capacity allows~released bugThis bug was found in a stable release.This bug was found in a stable release.
Milestone
Metadata
Metadata
Assignees
Labels
#g-security-complianceSecurity & Compliance product groupSecurity & Compliance product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.Ready to write code. Scheduled in a release. See "Making changes" in handbook.bugSomething isn't working as documentedSomething isn't working as documented~assisting qaThis issue can be QA'd by anyone outside the QA team when capacity allowsThis issue can be QA'd by anyone outside the QA team when capacity allows~released bugThis bug was found in a stable release.This bug was found in a stable release.
Type
Projects
Status
✅ Ready for release
Fleet versions
💥 Actual behavior
When software has no vendor data (common for
python_packages,deb_packages,homebrew_packages, andnpm_packagessources), the CPE matching algorithm can return different vendors across runs for the same software. This is because multiple CPE entries in the NVD database share the same product name but have different vendors (e.g.,debian:duplicityvsduplicity_project:duplicity), and SQLite does not guarantee row ordering.Even with deterministic ordering (
ORDER BY), the algorithm has no way to prefer the correct vendor when the software record has no vendor field. For example,requestsfrompython_packagescould matchjenkins:requestsorpython:requestsand the algorithm cannot distinguish between them.The
sourcefield (e.g.,python_packages,npm_packages) maps to atarget_swvalue (e.g.,python,node.js) that is already used in the CPE output string, but is not currently used during candidate ranking. This information could serve as an additional scoring signal to prefer CPE entries whose vendor relates to the software's ecosystem.🛠️ To fix
Use the software's
source-to-target_swmapping as an additional scoring signal when ranking CPE candidates. For example, ifsource = "python_packages"maps totarget_sw = "python", score CPE entries higher when their vendor contains "python". This would correctly preferpython:requestsoverjenkins:requestsfor Python packages.🧑💻 Steps to reproduce
These steps:
python_packagessource (e.g.,requests,duplicity,protobuf,Markdown).fleet vuln_processing --devand query CPE assignments for these packages.jenkins:requestsvspython:requests).🕯️ More info
Only
rpm_packageshas vendor data, and it is the distro name (e.g., "Fedora Project"), not the upstream vendor.Related issue: #39899