Fleet versions
Web browser and operating system:
💥 Actual behavior
Orbit clients hitting GET /api/latest/fleet/device/{token}/desktop repeatedly with expired tokens trigger high database usage due to expensive SQL queries executing before rate limiting is applied. The DB load monitor shows sustained Average Active Sessions (AAS) climbing from ~1 to ~7+ during the affected period, primarily driven by wait/io/table/sql/handler.
Clients experience 401 errors alongside high resource consumption on the Fleet server. The database query being triggered on every request is a large JOIN across multiple tables in LoadHostByDeviceAuthToken:
SELECT
h.id,
h.osquery_host_id,
h.created_at,
h.updated_at,
h.detail_updated_at,
h.node_key,
h.hostname,
h.uuid,
h.platform,
h.osquery_version,
h.os_version,
h.build,
h.platform_like,
h.code_name,
h.uptime,
h.memory,
h.cpu_type,
h.cpu_subtype,
h.cpu_brand,
h.cpu_physical_cores,
h.cpu_logical_cores,
h.hardware_vendor,
h.hardware_model,
h.hardware_version,
h.hardware_serial,
h.computer_name,
h.primary_ip_id,
h.distributed_interval,
h.logger_tls_period,
h.config_tls_refresh,
h.primary_ip,
h.primary_mac,
h.label_updated_at,
h.last_enrolled_at,
h.refetch_requested,
h.refetch_critical_queries_until,
h.team_id,
h.policy_updated_at,
h.public_ip,
COALESCE(hd.gigs_disk_space_available, 0) as gigs_disk_space_available,
COALESCE(hd.percent_disk_space_available, 0) as percent_disk_space_available,
COALESCE(hd.gigs_total_disk_space, 0) as gigs_total_disk_space,
hd.encrypted as disk_encryption_enabled,
IF(hdep.host_id AND ISNULL(hdep.deleted_at), true, false) AS dep_assigned_to_fleet,
EXISTS(SELECT 1 FROM host_identity_scep_certificates hisc WHERE hisc.host_id = h.id AND hisc.revoked = 0) as has_host_identity_cert
FROM
host_device_auth hda
INNER JOIN
hosts h ON hda.host_id = h.id
LEFT OUTER JOIN
host_disks hd ON hd.host_id = hda.host_id
LEFT OUTER JOIN
host_mdm hm ON hm.host_id = h.id
LEFT OUTER JOIN
host_dep_assignments hdep ON hdep.host_id = h.id AND hdep.deleted_at IS NULL
WHERE
(hda.token = ? OR hda.previous_token = ?) AND
hda.updated_at >= DATE_SUB(NOW(), INTERVAL ? SECOND)
🛠️ Expected behavior
When orbit clients send requests with expired device tokens, the server should:
- Fast-fail the authentication without executing expensive multi-table JOIN queries when the token has already expired
- Apply rate limiting BEFORE running the authentication query, so that burst traffic from invalid/expired tokens is throttled early
- Return a lightweight 401/404 response with minimal database impact
🧑💻 Steps to reproduce
These steps:
- Deploy a Fleet instance with many managed hosts (Fleet Desktop enabled)
- Configure Orbit/Fleet Desktop on endpoints (token rotates every 1 hour per
server/service/devices.go:263)
- Simulate orbit clients repeatedly calling
GET /api/latest/fleet/device/{token}/desktop with expired tokens (e.g., via the desktop-rate-limit tool)
- Observe high DB CPU/IO and increased Average Active Sessions
- Note that 401 errors are returned but after expensive queries have already executed
🕯️ More info (optional)
N/A
Fleet versions
Web browser and operating system:
💥 Actual behavior
Orbit clients hitting
GET /api/latest/fleet/device/{token}/desktoprepeatedly with expired tokens trigger high database usage due to expensive SQL queries executing before rate limiting is applied. The DB load monitor shows sustained Average Active Sessions (AAS) climbing from ~1 to ~7+ during the affected period, primarily driven bywait/io/table/sql/handler.Clients experience 401 errors alongside high resource consumption on the Fleet server. The database query being triggered on every request is a large JOIN across multiple tables in
LoadHostByDeviceAuthToken:🛠️ Expected behavior
When orbit clients send requests with expired device tokens, the server should:
🧑💻 Steps to reproduce
These steps:
server/service/devices.go:263)GET /api/latest/fleet/device/{token}/desktopwith expired tokens (e.g., via the desktop-rate-limit tool)🕯️ More info (optional)
N/A