Fleet version:
4.75.0
💥 Actual behavior
Hitting the /api/latest/fleet/os_versions?order_key=hosts_count&order_direction=desc&page=0&per_page=20 takes ~1 minutes and occasionally times out (5XX server error)
🛠️ To fix
TBD
Product designer: _________________________
🧑💻 Steps to reproduce
- Create a DB with an Ubuntu OS version that has >30 kernels with vulnerabilities (in operating_systems)
- Hit the endpoint
🕯️ More info (optional)
We return only 20 OS versions by default. But the problem is that for each OS version, we look up all kernel vulnerabilities for all kernels that the OS version uses in the fleet. Theoretically, 1 OS version may use 1,000 kernel versions.
Customer has Ubuntu 24.04.3 LTS that has 33 kernels across their fleet (33 rows in operating_systems).
When fetching kernel vulnerabilities, we run the following query which takes ~10 seconds (on a standalone DB without traffic) and returns ~192,000 rows (5,800 distinct CVEs):
SELECT
os.id as os_id,
sc.cve,
sc.resolved_in_version,
MIN(sc.created_at) as created_at
FROM software_cve sc
JOIN kernel_host_counts khc ON khc.software_id = sc.software_id
JOIN operating_systems os ON os.os_version_id = khc.os_version_id
WHERE os.id IN (
?, ?, ...
)
AND khc.hosts_count > 0
GROUP BY os.id, sc.cve, sc.resolved_in_version;
For each unique CVE above, we get CVE metadata, in batches of 500 CVEs. This also takes a long time.
SELECT
cve,
cvss_score,
epss_probability,
cisa_known_exploit,
published,
description
FROM cve_meta
WHERE cve IN (?, ?, ?, ...)
Video demo of the fix
https://www.youtube.com/watch?v=4HZlKG0G1B0
QA
- The migration populates a new table based on existing Linux kernel vulnerability data. It may take a while to run (1+ minutes) for large customer deployments. We need to test migration on a customer DB (like numa) before release.
- The loadtest should include Ubuntu hosts (the osquery perf updates (on main and 4.76.0 branch) will automatically add Ubuntu patch versions and kernels to closer represent real environments)
Fleet version:
4.75.0
💥 Actual behavior
Hitting the
/api/latest/fleet/os_versions?order_key=hosts_count&order_direction=desc&page=0&per_page=20takes ~1 minutes and occasionally times out (5XX server error)🛠️ To fix
TBD
Product designer: _________________________
🧑💻 Steps to reproduce
🕯️ More info (optional)
We return only 20 OS versions by default. But the problem is that for each OS version, we look up all kernel vulnerabilities for all kernels that the OS version uses in the fleet. Theoretically, 1 OS version may use 1,000 kernel versions.
Customer has Ubuntu 24.04.3 LTS that has 33 kernels across their fleet (33 rows in operating_systems).
When fetching kernel vulnerabilities, we run the following query which takes ~10 seconds (on a standalone DB without traffic) and returns ~192,000 rows (5,800 distinct CVEs):
For each unique CVE above, we get CVE metadata, in batches of 500 CVEs. This also takes a long time.
Video demo of the fix
https://www.youtube.com/watch?v=4HZlKG0G1B0
QA