Skip to content

Add server-side orbit debug logging enablement - currently only configurable as a duration-after-enrollment setting#45367

Merged
JordanMontgomery merged 5 commits into
mainfrom
JM-43997-orbit-debug-on-enroll
May 14, 2026
Merged

Add server-side orbit debug logging enablement - currently only configurable as a duration-after-enrollment setting#45367
JordanMontgomery merged 5 commits into
mainfrom
JM-43997-orbit-debug-on-enroll

Conversation

@JordanMontgomery
Copy link
Copy Markdown
Member

@JordanMontgomery JordanMontgomery commented May 13, 2026

Related issue: Resolves #43997

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.

  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters.

  • Timeouts are implemented and retries are limited to avoid infinite loops

  • If paths of existing endpoints are modified without backwards compatibility, checked the frontend/CLI for any necessary changes

Testing

Database migrations

  • Checked schema for all modified table for columns that will auto-update timestamps during migration.
  • Confirmed that updating the timestamps is acceptable, and will not cause unwanted side effects.
  • Ensured the correct collation is explicitly set for character columns (COLLATE utf8mb4_unicode_ci).

fleetd/orbit/Fleet Desktop

  • Verified compatibility with the latest released version of Fleet (see Must rule)
  • Verified that fleetd runs on macOS, Linux and Windows
  • Verified auto-update works from the released version of component to the new version (see tools/tuf/test)

Summary by CodeRabbit

  • New Features

    • Configure Orbit to enable debug logging for a limited window on agent enrollment; enrolled hosts receive debug/verbose behavior while the window is active and it is reflected in agent config.
  • Chores

    • Added database column to record per-host debug-until timestamps and datastore support to extend it safely.
  • Tests

    • Added integration and unit tests covering validation, enrollment stamping, config generation, and runtime debug toggling.

Review Change Stack

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 74.35897% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.72%. Comparing base (957089f) to head (cab6648).
⚠️ Report is 107 commits behind head on main.

Files with missing lines Patch % Lines
server/service/orbit.go 70.00% 9 Missing and 9 partials ⚠️
orbit/cmd/orbit/orbit.go 0.00% 4 Missing ⚠️
...tables/20260512143542_AddOrbitDebugUntilToHosts.go 55.55% 3 Missing and 1 partial ⚠️
orbit/pkg/update/debug_log_runner.go 88.23% 1 Missing and 1 partial ⚠️
orbit/pkg/update/flag_runner.go 83.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #45367      +/-   ##
==========================================
- Coverage   66.81%   66.72%   -0.10%     
==========================================
  Files        2724     2729       +5     
  Lines      219027   218490     -537     
  Branches    10716    10716              
==========================================
- Hits       146342   145779     -563     
+ Misses      59521    59482      -39     
- Partials    13164    13229      +65     
Flag Coverage Δ
backend 68.57% <74.35%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JordanMontgomery JordanMontgomery marked this pull request as ready for review May 13, 2026 19:59
@JordanMontgomery JordanMontgomery requested a review from a team as a code owner May 13, 2026 19:59
Copilot AI review requested due to automatic review settings May 13, 2026 19:59
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cbe9fab2-7325-42b1-83a2-6e72e6e9681e

📥 Commits

Reviewing files that changed from the base of the PR and between 73925a3 and cab6648.

📒 Files selected for processing (2)
  • orbit/pkg/update/debug_log_runner.go
  • orbit/pkg/update/debug_log_runner_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • orbit/pkg/update/debug_log_runner_test.go
  • orbit/pkg/update/debug_log_runner.go

Walkthrough

This PR implements server-driven Orbit debug logging on enrollment. It adds the orbit.debug_logging_on_enroll_duration agent option (seconds, capped), records per-host orbit_debug_until in the DB via a migration and conditional datastore update, stamps hosts at enroll using effective agent options, exposes DebugLogging and merges verbose=true into Orbit Flags when active, and updates Orbit client receivers and flag handling to respect server-driven debug and the startup debug floor. Tests cover validation, migration, datastore, server logic, and Orbit client behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding server-side orbit debug logging enablement with duration-after-enrollment configuration.
Description check ✅ Passed The PR description includes a properly completed checklist with all required items checked and confirmed, linking to issue #43997, and demonstrating comprehensive coverage of validation, testing, and database migration checks.
Linked Issues check ✅ Passed The changeset fully implements issue #43997 requirements: adds orbit.debug_logging_on_enroll_duration agent option with validation, database schema support, server-side enrollment stamping, debug logging controls, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are within scope of issue #43997; no unrelated modifications to CLI, REST API, GitOps generation, or other out-of-scope areas were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch JM-43997-orbit-debug-on-enroll

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a server-driven mechanism to enable Orbit debug logging for a limited time after enrollment, based on an orbit.debug_logging_on_enroll_duration agent option. It introduces a per-host orbit_debug_until stamp, surfaces a debug_logging field in the Orbit config response, and updates Orbit to reconcile osquery flags and its own log level accordingly.

Changes:

  • Add orbit_debug_until to hosts and a datastore method to extend it idempotently.
  • On Orbit enrollment, stamp orbit_debug_until from effective agent options; on config fetch, emit debug_logging and merge verbose=true into command_line_startup_flags while active.
  • Update Orbit to (a) reconcile nil/empty Flags to clear osquery.flags, (b) preserve startup --debug as a floor for osquery flags, and (c) toggle Orbit zerolog level via new DebugLogReceiver, with tests.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/service/orbit.go Stamps orbit_debug_until on enroll; merges debug behavior into Orbit config response.
server/service/orbit_test.go Adds unit tests for stamping and debug flag/flag-merge logic.
server/service/integration_core_test.go Integration test covering validation + stamping behavior on enroll.
server/mock/datastore_mock.go Adds mock support for ExtendHostOrbitDebugUntil.
server/fleet/orbit.go Adds DebugLogging *bool to OrbitConfig payload.
server/fleet/hosts.go Adds OrbitDebugUntil field to Host.
server/fleet/datastore.go Adds ExtendHostOrbitDebugUntil to datastore interface.
server/fleet/agent_options.go Adds orbit.debug_logging_on_enroll_duration option + validation caps.
server/fleet/agent_options_test.go Adds validation test cases for new orbit agent option.
server/datastore/mysql/schema.sql Adds orbit_debug_until column to hosts schema snapshot.
server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts.go Migration to add orbit_debug_until.
server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts_test.go Migration test verifying column behavior (NULL/set/clear).
server/datastore/mysql/hosts.go Loads orbit_debug_until; adds ExtendHostOrbitDebugUntil implementation.
server/datastore/mysql/hosts_test.go Tests idempotent “extend only” semantics for orbit_debug_until.
orbit/pkg/update/flag_runner.go Allows reconciling nil/empty flags; preserves startup debug as a floor.
orbit/pkg/update/flag_runner_test.go Tests nil-flags reconciliation and startup-debug floor behavior.
orbit/pkg/update/debug_log_runner.go New receiver to toggle Orbit zerolog level from server config.
orbit/pkg/update/debug_log_runner_test.go Tests debug log receiver behavior + startup floor.
orbit/cmd/orbit/orbit.go Registers debug log receiver; plumbs StartedInDebug into flag runner.
orbit/changes/43997-orbit-debug-logging-on-enroll Orbit changelog entry for new agent option.
changes/43997-orbit-debug-logging-on-enroll Server changelog entry for new agent option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/service/orbit.go
Comment thread server/service/orbit_test.go
Comment thread server/service/orbit.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server/service/integration_core_test.go (1)

17031-17045: 💤 Low value

Consider testing the exact cap boundary (86400).

The test verifies that 86401 is rejected but doesn't confirm that the maximum allowed value (86400 = 1 day) is accepted. Testing the exact boundary would strengthen validation coverage.

🧪 Optional test addition
 	// Reject above cap.
 	var acResp appConfigResponse
 	s.DoJSON("PATCH", "/api/latest/fleet/config", json.RawMessage(`{
 		"agent_options": { "orbit": {"debug_logging_on_enroll_duration": 86401} }
 	}`), http.StatusBadRequest, &acResp)
+
+	// Accept exact cap (86400 = 1 day).
+	s.DoJSON("PATCH", "/api/latest/fleet/config", json.RawMessage(`{
+		"agent_options": { "orbit": {"debug_logging_on_enroll_duration": 86400} }
+	}`), http.StatusOK, &acResp)
 
 	// Reject negative.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/service/integration_core_test.go` around lines 17031 - 17045, Add a
positive boundary test for the Orbit debug logging cap: call s.DoJSON with the
PATCH to "/api/latest/fleet/config" using
agent_options.orbit.debug_logging_on_enroll_duration set to 86400 and assert
success (http.StatusOK) populating appConfigResponse; this complements the
existing negative tests for 86401 and -1 and verifies that the exact allowed
maximum is accepted. Ensure you use the same s.DoJSON helper and
appConfigResponse type as in the surrounding tests so the new assertion
integrates consistently.
server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts_test.go (1)

25-25: ⚡ Quick win

Use a unique host identifier in test queries.

These statements target a single host via hostname, which is not unique on hosts; using node_key or osquery_host_id would keep the test deterministic if fixtures/data shape changes.

As per coding guidelines: “ensure that appropriate filtering criteria are applied… check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results.”

Also applies to: 31-31, 34-34, 40-40, 43-43

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts_test.go`
at line 25, The test uses non-unique hostname filtering in queries like the
SELECT that scans into debugUntil; change these to use a unique host identifier
(e.g., node_key or osquery_host_id) instead of hostname so the test is
deterministic—update the SQL WHERE clause(s) in the queries that call
db.QueryRow(`SELECT orbit_debug_until FROM hosts WHERE hostname = ?`, "host-1")
and the other affected queries (the ones at the other noted locations) to use
WHERE node_key = ? (or osquery_host_id = ?) and pass the fixture's unique
node_key value; ensure any variables/helpers referencing the host (e.g., the
test fixture that provides "host-1") are updated to supply the unique
identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts_test.go`:
- Line 25: The test uses non-unique hostname filtering in queries like the
SELECT that scans into debugUntil; change these to use a unique host identifier
(e.g., node_key or osquery_host_id) instead of hostname so the test is
deterministic—update the SQL WHERE clause(s) in the queries that call
db.QueryRow(`SELECT orbit_debug_until FROM hosts WHERE hostname = ?`, "host-1")
and the other affected queries (the ones at the other noted locations) to use
WHERE node_key = ? (or osquery_host_id = ?) and pass the fixture's unique
node_key value; ensure any variables/helpers referencing the host (e.g., the
test fixture that provides "host-1") are updated to supply the unique
identifier.

In `@server/service/integration_core_test.go`:
- Around line 17031-17045: Add a positive boundary test for the Orbit debug
logging cap: call s.DoJSON with the PATCH to "/api/latest/fleet/config" using
agent_options.orbit.debug_logging_on_enroll_duration set to 86400 and assert
success (http.StatusOK) populating appConfigResponse; this complements the
existing negative tests for 86401 and -1 and verifies that the exact allowed
maximum is accepted. Ensure you use the same s.DoJSON helper and
appConfigResponse type as in the surrounding tests so the new assertion
integrates consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e271dbfd-7d95-4229-b197-20a3eb9dcf70

📥 Commits

Reviewing files that changed from the base of the PR and between b1ecaef and 8d25aeb.

📒 Files selected for processing (21)
  • changes/43997-orbit-debug-logging-on-enroll
  • orbit/changes/43997-orbit-debug-logging-on-enroll
  • orbit/cmd/orbit/orbit.go
  • orbit/pkg/update/debug_log_runner.go
  • orbit/pkg/update/debug_log_runner_test.go
  • orbit/pkg/update/flag_runner.go
  • orbit/pkg/update/flag_runner_test.go
  • server/datastore/mysql/hosts.go
  • server/datastore/mysql/hosts_test.go
  • server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts.go
  • server/datastore/mysql/migrations/tables/20260512143542_AddOrbitDebugUntilToHosts_test.go
  • server/datastore/mysql/schema.sql
  • server/fleet/agent_options.go
  • server/fleet/agent_options_test.go
  • server/fleet/datastore.go
  • server/fleet/hosts.go
  • server/fleet/orbit.go
  • server/mock/datastore_mock.go
  • server/service/integration_core_test.go
  • server/service/orbit.go
  • server/service/orbit_test.go

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@MagnusHJensen MagnusHJensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIdn't have the chance to test this on a device, but code looks good to me. So if you have tested it yourself, then I give my blessing to merge 😆

@JordanMontgomery JordanMontgomery merged commit bee5eda into main May 14, 2026
69 checks passed
@JordanMontgomery JordanMontgomery deleted the JM-43997-orbit-debug-on-enroll branch May 14, 2026 17:32
dnplkndll added a commit to ledoent/fleet that referenced this pull request May 14, 2026
…ploy)

Upstream PR fleetdm#45367 (aggregated commit bee5eda, 2026-05-14) added the
orbit_debug_until column to the hosts table in MySQL schema.sql. The PG
baseline is regenerated from a production pg_dump; that migration hasn't
landed on prod yet (it lands when this very deploy rolls out), so the
baseline lags by one column.

Adding the entry to known_column_drift.txt with a deferred-regen
comment, per the file header's prescribed workflow. Once the next
aggregation runs after this deploy lands, the prod baseline will
include orbit_debug_until and this allowlist entry can be removed.

Without this entry, the validate-pg-compat CI gate fails check_column_drift
on every aggregated build, blocking the build-ledo image publish that
deploys this fix in the first place. Classic chicken-and-egg — break
with a documented intentional drift.
dnplkndll added a commit to ledoent/fleet that referenced this pull request May 23, 2026
…ploy)

Upstream PR fleetdm#45367 (aggregated commit bee5eda, 2026-05-14) added the
orbit_debug_until column to the hosts table in MySQL schema.sql. The PG
baseline is regenerated from a production pg_dump; that migration hasn't
landed on prod yet (it lands when this very deploy rolls out), so the
baseline lags by one column.

Adding the entry to known_column_drift.txt with a deferred-regen
comment, per the file header's prescribed workflow. Once the next
aggregation runs after this deploy lands, the prod baseline will
include orbit_debug_until and this allowlist entry can be removed.

Without this entry, the validate-pg-compat CI gate fails check_column_drift
on every aggregated build, blocking the build-ledo image publish that
deploys this fix in the first place. Classic chicken-and-egg — break
with a documented intentional drift.
dnplkndll added a commit to ledoent/fleet that referenced this pull request May 23, 2026
…ploy)

Upstream PR fleetdm#45367 (aggregated commit bee5eda, 2026-05-14) added the
orbit_debug_until column to the hosts table in MySQL schema.sql. The PG
baseline is regenerated from a production pg_dump; that migration hasn't
landed on prod yet (it lands when this very deploy rolls out), so the
baseline lags by one column.

Adding the entry to known_column_drift.txt with a deferred-regen
comment, per the file header's prescribed workflow. Once the next
aggregation runs after this deploy lands, the prod baseline will
include orbit_debug_until and this allowlist entry can be removed.

Without this entry, the validate-pg-compat CI gate fails check_column_drift
on every aggregated build, blocking the build-ledo image publish that
deploys this fix in the first place. Classic chicken-and-egg — break
with a documented intentional drift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable fleetd debug logging at runtime during setup experience

3 participants