Skip to content

osquery RocksDB corrupted, causing host to silently transfer to undesired fleet #43294

@ddribeiro

Description

@ddribeiro

Fleet versions 4.83.0, Orbit 1.53.1

  • Discovered: 4.83.0, Orbit 1.53.1
  • Reproduced:

Web browser and operating system: macOS 26.4


💥  Actual behavior

A computer enrolled into the production customer-numa Fleet server experienced RocksDB corruption, which caused it to re-enroll into Fleet using the original enroll secret it was enrolled with.

In practice, this caused the host to transfer to a different fleet, which triggered an unintended removal of profiles.

🛠️ To fix

  1. We want to address whatever caused the RocksDB corruption to prevent it from happening in the first place.
  2. For situations where RocksDB corruption does occur, last fleet assignment in Fleet should be sticky, so that unintended (and unlogged) fleet transfers do not occur.

🧑‍💻  Steps to reproduce

These steps:

  • Have been confirmed to consistently lead to reproduction in multiple Fleet instances.
  • Describe the workflow that led to the error, but have not yet been reproduced in multiple Fleet instances.

We have not been able to reproduce the RocksDB corruption that triggered this in the first place, but the enrollment back into the original team (rather than current) can be reproduced by doing the following:

  1. Enroll a test Mac into Unassigned, by using Unassigned's enroll secret when generating the fleet-osquery.pkg
  2. Transferred the test machine to any other fleet on your Fleet server.
  3. Delete the osquery database to simulate RocksDB corruption/failure (sudo rm -rf /opt/orbit/osquery.db)
  4. Restart Orbit to trigger re-enrollement (sudo launchctl kickstart -k system/com.fleetdm.orbit)
  5. Observe that host re-enrolled into Unassigned, with no activities or logs indicating that a transfer took place.

🕯️ More info (optional)

Internal Slack thread w/ engineering (includes link to customer thread with logs): https://fleetdm.slack.com/archives/C019WG4GH0A/p1775494092467729

Metadata

Metadata

Assignees

Labels

#g-orchestrationOrchestration product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.P2Urgent: Supported workflow not functioning as intended, newly drafted feature with urgent Fleet needbugSomething isn't working as documentedcustomer-numa

Type

No type

Projects

Status

🦤 ‎In review

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions