You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reproduced: 4.86.0 (code paths unchanged since 4.84.0)
Web browser and operating system: N/A (server-side / API)
💥 Actual behavior
Requests of the form:
GET /api/v1/fleet/hosts?device_mapping=true&page=1&per_page=100&query=john.doe%40example.com
cause high load on the MySQL reader on instances with ~10k hosts. Each page load runs an expensive aggregation over the entire host_emails table, even though only 100 rows are returned.
Root cause — in server/datastore/mysql/hosts.go, when device_mapping=true, applyHostFilters adds a LEFT JOIN on a derived table with GROUP BY (hosts.go:1289):
LEFT JOIN (
SELECT host_id,
CONCAT('[', GROUP_CONCAT(JSON_OBJECT('email', email, 'source', ...)), ']') AS device_mapping
FROM host_emails
GROUP BY host_id
) dm ONdm.host_id=h.id
Because this derived table has a GROUP BY, MySQL must materialize it in full — aggregating every row of host_emails for all hosts into a temp table — before the join and the outer LIMIT 100 can be applied. The outer WHERE/LIMIT cannot be pushed into it. The full cost is paid on every page request regardless of result size.
Two factors compound it:
The count query pays the same cost.CountHosts (hosts.go:2048) reuses the same options with DeviceMapping=true, so the derived join is added to the SELECT count(*) statement too, even though device_mapping is never selected for counting. With per_page set (pagination metadata included), each page load runs both the list query and the count query, materializing the full aggregation twice.
The search term triggers a second pass over host_emails.hostSearchLike (server/datastore/mysql/mysql.go:1166) appends, for any non-empty query:
OR EXISTS (SELECT1FROM host_emails he WHEREhe.host_id=h.idANDhe.emailLIKE'%john.doe@example.com%')
The leading % wildcard prevents use of idx_host_emails_email, falling back to a per-host lookup on host_id. (Broadened from "only when the term looks like an email" to "always" in PR Improve host search by end user email address #40197, shipped 4.84.0.)
The materialized GROUP_CONCAT derived table is the dominant load source.
🛠️ Expected behavior
Listing hosts with device_mapping=true and a search query should scale with the size of the returned page (~100 rows), not with the total number of hosts/emails in the instance. It should not place disproportionate load on the MySQL reader.
🧑💻 Steps to reproduce
These steps:
Have been confirmed to consistently lead to reproduction in multiple Fleet instances.
Describe the workflow that led to the error, but have not yet been reproduced in multiple Fleet instances.
Have a Fleet instance with ~10k hosts and populated host_emails (e.g. IdP / end-user email mapping).
Call GET /api/v1/fleet/hosts?device_mapping=true&page=1&per_page=100&query=<some-email> (e.g. from the Hosts page with a search term while device mapping is requested).
Observe elevated MySQL reader load / slow query. EXPLAIN shows materialization of the host_emailsGROUP BY derived table.
🕯️ More info (optional)
Proposed fix (two independent, low-risk changes):
Skip the device_mapping join in the count query — set opt.DeviceMapping = false in CountHosts (the column is never counted), mirroring the existing DisableIssues handling.
Replace the derived-table join with a SELECT-list correlated subquery (only when opt.DeviceMapping), so it is evaluated only for the ~100 rows actually returned, each as an indexed lookup on idx_host_emails_host_id_email:
COALESCE((SELECT CONCAT('[', GROUP_CONCAT(JSON_OBJECT('email', email, 'source', ...)), ']')
FROM host_emails he WHEREhe.host_id=h.id), 'null') AS device_mapping
This matches the existing pattern already used for host_additional in the same query (hosts.go:1164).
Validate with EXPLAIN ANALYZE on a ~10k-host dataset before/after.
Fleet versions
Web browser and operating system: N/A (server-side / API)
💥 Actual behavior
Requests of the form:
cause high load on the MySQL reader on instances with ~10k hosts. Each page load runs an expensive aggregation over the entire
host_emailstable, even though only 100 rows are returned.Root cause — in
server/datastore/mysql/hosts.go, whendevice_mapping=true,applyHostFiltersadds aLEFT JOINon a derived table withGROUP BY(hosts.go:1289):Because this derived table has a
GROUP BY, MySQL must materialize it in full — aggregating every row ofhost_emailsfor all hosts into a temp table — before the join and the outerLIMIT 100can be applied. The outerWHERE/LIMITcannot be pushed into it. The full cost is paid on every page request regardless of result size.Two factors compound it:
The count query pays the same cost.
CountHosts(hosts.go:2048) reuses the same options withDeviceMapping=true, so the derived join is added to theSELECT count(*)statement too, even thoughdevice_mappingis never selected for counting. Withper_pageset (pagination metadata included), each page load runs both the list query and the count query, materializing the full aggregation twice.The search term triggers a second pass over
host_emails.hostSearchLike(server/datastore/mysql/mysql.go:1166) appends, for any non-empty query:The leading
%wildcard prevents use ofidx_host_emails_email, falling back to a per-host lookup onhost_id. (Broadened from "only when the term looks like an email" to "always" in PR Improve host search by end user email address #40197, shipped 4.84.0.)The materialized
GROUP_CONCATderived table is the dominant load source.🛠️ Expected behavior
Listing hosts with
device_mapping=trueand a search query should scale with the size of the returned page (~100 rows), not with the total number of hosts/emails in the instance. It should not place disproportionate load on the MySQL reader.🧑💻 Steps to reproduce
These steps:
host_emails(e.g. IdP / end-user email mapping).GET /api/v1/fleet/hosts?device_mapping=true&page=1&per_page=100&query=<some-email>(e.g. from the Hosts page with a search term while device mapping is requested).EXPLAINshows materialization of thehost_emailsGROUP BYderived table.🕯️ More info (optional)
Proposed fix (two independent, low-risk changes):
Skip the device_mapping join in the count query — set
opt.DeviceMapping = falseinCountHosts(the column is never counted), mirroring the existingDisableIssueshandling.Replace the derived-table join with a SELECT-list correlated subquery (only when
opt.DeviceMapping), so it is evaluated only for the ~100 rows actually returned, each as an indexed lookup onidx_host_emails_host_id_email:This matches the existing pattern already used for
host_additionalin the same query (hosts.go:1164).Validate with
EXPLAIN ANALYZEon a ~10k-host dataset before/after.