Skip to content

Reduce MySQL reader load on GET /hosts with device_mapping + search query (#47722)#48488

Merged
lucasmrod merged 3 commits into
mainfrom
lucas/47722-host-device-mapping-reader-load
Jul 1, 2026
Merged

Reduce MySQL reader load on GET /hosts with device_mapping + search query (#47722)#48488
lucasmrod merged 3 commits into
mainfrom
lucas/47722-host-device-mapping-reader-load

Conversation

@lucasmrod

@lucasmrod lucasmrod commented Jun 30, 2026

Copy link
Copy Markdown
Member

Related issue: Resolves #47722

The issue was from a customer running GET /api/v1/fleet/hosts?device_mapping=true&page=1&per_page=100&query=<ADDRESS>%40example.com on a script in a for loop. This change reduces the impact of the API on such workflows.

Results from my local load test:

EXPLAIN ANALYZE:

┌───────────────────────────────────┬────────────────┬─────────────┬─────────────────────────────────────────────┐
│                                   │ optimizer cost │ actual time │         device_mapping aggregation          │
├───────────────────────────────────┼────────────────┼─────────────┼─────────────────────────────────────────────┤
│ Old (derived-table GROUP BY join) │ ~23,179        │ ~73 ms      │ materialized dm derived table, cost ~7,125  │
├───────────────────────────────────┼────────────────┼─────────────┼─────────────────────────────────────────────┤
│ New (correlated subquery)         │ ~1,260         │ ~25 ms      │ Aggregate … loops=1 (only the returned row) │
└───────────────────────────────────┴────────────────┴─────────────┴─────────────────────────────────────────────┘

Tests with 10k hosts:

┌───────────────────────────────────┬────────────┬───────────────┬───────┐
│              dataset              │ OLD (main) │ NEW (this PR) │ ratio │
├───────────────────────────────────┼────────────┼───────────────┼───────┤
│ 10k hosts × 3 emails (30k rows)   │ 4.6s       │ 1.1s          │ ~4×   │
├───────────────────────────────────┼────────────┼───────────────┼───────┤
│ 10k hosts × 30 emails (300k rows) │ 35.9s      │ 1.2s          │ ~30×  │
└───────────────────────────────────┴────────────┴───────────────┴───────┘

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.

  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters.

Testing

What & why

GET /api/v1/fleet/hosts?device_mapping=true&page=1&per_page=100&query=<email> caused high MySQL reader load on instances with ~10k hosts. Each page load ran an expensive aggregation over the entire host_emails table even though only ~100 rows are returned.

Root cause: with device_mapping=true, applyHostFilters added a LEFT JOIN on a derived table with GROUP BY host_id over host_emails. Because of the GROUP BY, MySQL must fully materialize that derived table (aggregating every row for all hosts) before the outer WHERE/LIMIT 100 can be applied, so the full cost is paid on every page request regardless of result size. CountHosts reused the same options, materializing the aggregation a second time per page load.

Fixes (both in server/datastore/mysql/hosts.go):

  1. Replaced the derived-table join with a correlated subquery in the SELECT list (only when opt.DeviceMapping), so it is evaluated only for the rows actually returned, each as an indexed lookup on idx_host_emails_host_id_email. This matches the existing host_additional pattern in the same query.
  2. Set opt.DeviceMapping = false in CountHosts — the column is never selected for counting — mirroring the existing opt.DisableIssues handling.

Notes

  • The composite index idx_host_emails_host_id_email (host_id, email) already exists, so the correlated subquery resolves via an indexed lookup per returned row.
  • TestHosts (full suite) passes, including HostDeviceMapping, CustomHostDeviceMapping, and IDPHostDeviceMapping (the last two verify the custom_*custom and idpmdm_idp_accounts source translation still works through the new subquery).
  • Recommend validating with EXPLAIN ANALYZE on a ~10k-host dataset before/after, per the issue. I did not have access to such a dataset.

Summary by CodeRabbit

  • Performance
    • Improved host list responsiveness when using search filters alongside device mapping.
    • Reduced database load during host listing by retrieving device mapping more efficiently per host.
    • Improved host counting speed by avoiding device-mapping evaluation for count queries.

…47722)

When listing hosts with device_mapping=true, the device mapping was
computed via a LEFT JOIN on a derived table with GROUP BY over
host_emails. Because of the GROUP BY, MySQL materialized the full
aggregation across all hosts before applying the outer LIMIT, paying
the full cost on every page load. The count query reused the same
options and paid the same cost a second time per page.

- Compute device_mapping as a correlated subquery in the SELECT list so
  it is evaluated only for the rows actually returned, each as an
  indexed lookup on idx_host_emails_host_id_email. Matches the existing
  host_additional pattern.
- Set opt.DeviceMapping = false in CountHosts since device_mapping is
  never selected when counting, mirroring opt.DisableIssues.
Copilot AI review requested due to automatic review settings June 30, 2026 14:46
@lucasmrod lucasmrod requested a review from a team as a code owner June 30, 2026 14:46

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

  • Copilot's review of this pull request may be incomplete because some of the changed files are excluded by your Copilot content exclusion settings. See Excluding content from Copilot for details.

Pull request overview

This PR reduces MySQL reader load for GET /api/v1/fleet/hosts when device_mapping=true (especially with pagination + search) by avoiding an expensive host_emails aggregation that previously had to be fully materialized regardless of the outer LIMIT.

Changes:

  • Replaces the host_emails derived-table LEFT JOIN ... GROUP BY for device mapping with a correlated subquery in ListHosts, so aggregation is evaluated only for returned rows.
  • Removes the now-unneeded device-mapping join from applyHostFilters.
  • Ensures CountHosts doesn’t carry DeviceMapping options (though device_mapping is no longer part of the count query either way).

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File Description
server/datastore/mysql/hosts.go Switches device_mapping to a correlated subquery, removes the derived join, and adjusts CountHosts options to avoid unnecessary work.
changes/47722-host-device-mapping-reader-load User-visible change note (diff content excluded by policy; not reviewed).
Files excluded by content exclusion policy (1)
  • changes/47722-host-device-mapping-reader-load

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2098 to +2099
// device_mapping is never selected when counting, so skip its (expensive) subquery.
opt.DeviceMapping = false
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

server/datastore/mysql/hosts.go now computes device_mapping in ListHosts with a correlated subquery over host_emails instead of a derived-table join. applyHostFilters no longer adds the device-mapping join fragment. CountHosts now disables opt.DeviceMapping before building the count query. A changelog entry was added for the change.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: reducing MySQL reader load for host listing with device_mapping and search.
Linked Issues check ✅ Passed The code implements the requested fix by using a correlated subquery for device_mapping and disabling it in CountHosts.
Out of Scope Changes check ✅ Passed The changes stay focused on the linked performance fix and a changelog entry, with no obvious unrelated additions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description includes the related issue, checklist items, testing, and a clear what/why summary that matches the template well.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lucas/47722-host-device-mapping-reader-load

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.91%. Comparing base (8a163e6) to head (e990d10).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #48488   +/-   ##
=======================================
  Coverage   67.90%   67.91%           
=======================================
  Files        3678     3677    -1     
  Lines      233675   233633   -42     
  Branches    12412    12426   +14     
=======================================
- Hits       158687   158661   -26     
+ Misses      60725    60712   -13     
+ Partials    14263    14260    -3     
Flag Coverage Δ
backend 69.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nulmete nulmete left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Could you include a before/after comparison if possible? Would feel more confident approving seeing that

@lucasmrod

Copy link
Copy Markdown
Member Author

LGTM 👍 Could you include a before/after comparison if possible? Would feel more confident approving seeing that

Definitely. My bad for not including it in the PR description. Please take a look.

@lucasmrod lucasmrod merged commit bec3b0d into main Jul 1, 2026
46 of 49 checks passed
@lucasmrod lucasmrod deleted the lucas/47722-host-device-mapping-reader-load branch July 1, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High MySQL reader load on GET /hosts with device_mapping=true + search query

3 participants