Cherry-pick #45709: Implement roaring bitmaps for historical data collection#45826
Merged
Merged
Conversation
<!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #45715 # Details This PR refactors the way the charts module stores historical data to use the [roaring bitmap](https://github.com/RoaringBitmap/roaring) package instead of saving raw bitmaps. See [this blurb](https://github.com/RoaringBitmap/roaring#how-does-roaring-compares-with-the-alternatives) to learn how roaring compresses data, but TL;DR for our purposes it represents a huge improvement especially for larger deployments where host ID numbers may be very large. In testing, some data was reduced 96%. The majority of the changes in this PR are straight swapping of types from `[]byte` to `*roaring.Bitmap` in vars and function signatures, and updating the internals of our bit math helpers to use roaring methods instead of native AND and OR methods. I've tried to comment on all functional changes. Since the charts have been shipped already, so there will be data in the wild in the prior "dense" format, the code still handles dense bitmaps on _read_, but will always _write_ roaring bitmaps. The majority of the data will therefore have turned over within 30 days on its own, but I plan on a follow-up PR that will transform open rows when the cron runs so that we should be guaranteed to turn over completely within 30 days. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters. ## Testing - [X] Added/updated automated tests - Tests updated to accommodate the new format, and existing unchanged tests act as proof against regression - [X] QA'd all new/changed functionality manually - Using a tool that dumps the `host_scd_data` rows data into a JSON file (with the keys being entity_id+data and the values being host IDs on that date), compared the data from main branch and this and confirmed they're identical - With a host count of ~9000, some of which have IDs of over 1,000,000, the data storage requirements were: * 82,558,976 bytes for dense * 2,867,200 for roaring (a 96% decrease) For unreleased bug fixes in a release candidate, one of: - [X] Confirmed that the fix is not expected to adversely impact load test results - should hugely improve - [X] Alerted the release DRI if additional load testing is needed ## Database migrations - [X] Checked schema for all modified table for columns that will auto-update timestamps during migration. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Implemented roaring bitmaps in historical data collection to optimize bitmap handling for chart data aggregation * Added encoding support to bitmap storage schema for flexible data representation <!-- end of auto-generated comment: release notes by coderabbit.ai -->
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## rc-minor-fleet-v4.86.0 #45826 +/- ##
=========================================================
Coverage ? 66.78%
=========================================================
Files ? 2748
Lines ? 219689
Branches ? 10848
=========================================================
Hits ? 146716
Misses ? 59715
Partials ? 13258
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ksykulev
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-pick of #45709 into the RC branch.