Skip to content

Cut steady-state log noise from miren server#767

Merged
phinze merged 2 commits into
mainfrom
phinze/mir-1024-reduce-log-noise-in-steady-state-miren-server-operation
Apr 17, 2026
Merged

Cut steady-state log noise from miren server#767
phinze merged 2 commits into
mainfrom
phinze/mir-1024-reduce-log-noise-in-steady-state-miren-server-operation

Conversation

@phinze
Copy link
Copy Markdown
Contributor

@phinze phinze commented Apr 17, 2026

After a v0.7.0 upgrade on one of our prod servers we pulled two hours of journalctl -u miren and ran a frequency analysis. The server was producing around 86k lines in that window — roughly 75% DEBUG and 24% INFO — and most of it was content-free steady-state chatter. Every internal RPC logged its own auth success, every proxied HTTP request logged its route match, every sandbox-pool reconcile emitted a three-line trio, every lease got a heartbeat line, every sandbox no-op got a two-line "I checked, nothing to do" pair. Finding anything interesting in the log stream was genuinely hard.

This PR is the straightforward cut: remove the lines that are pure noise and delete the Processing event INFO line in the generic controller framework, which was the single loudest prod-visible offender because it fires for every reconcile of every entity. Also took out a redundant container stats log — that data already flows into the customer metrics pipeline under metrics/, so the slog copy was just duplication.

The ticket suggested some of these could be downgraded to TRACE instead, but the codebase uses stdlib slog with only Debug/Info/Warn/Error. Rather than introduce a custom TRACE level just for this, we deleted outright — git log -S will find anything we want back.

A few offenders were actually metrics wearing a trenchcoat. The OTel SDK is already set up in pkg/rpc/otel.go, but we don't currently emit any first-party Counter / Histogram / UpDownCounter instruments from the server itself. Ingress request counts, controller reconcile counts, and sandbox pool gauges all belong there rather than in the log stream. That's its own project, tracked as MIR-1026. The failed to lock file chatter from embedded etcd's purge routine (240× in 2h) is library-internal and needs its own intervention, tracked as MIR-1025.

Extrapolating from the ticket's volume numbers, this removes roughly 65k of the 86k lines per 2h (~76%).

Closes MIR-1024
Related: MIR-1025, MIR-1026

After a v0.7.0 upgrade on one of our prod servers, a 2h sample of
`journalctl -u miren` showed ~86k lines — 75% DEBUG, 24% INFO —
with most of the volume being content-free chatter that made real
signal hard to spot.

Walked through the top offenders and deleted them: every RPC auth
success, every HTTP route match, every lease renewal, the sandbox-
pool reconcile trio, the no-op "sandbox already exists" pair, the
deployment loop's "skipping pool for different app" backwards
logging, the generic controller framework's `Processing event` INFO
line (the loudest prod-visible offender), and the redundant
container stats debug (the data already lands in the customer
metrics pipeline under metrics/).

A few lines we spotted look more metric-shaped than log-shaped —
ingress request counts, reconcile counts, pool gauges. We didn't
convert them here because first-party OTel metric instruments
aren't wired up for server telemetry yet. That work is tracked in
MIR-1026, and the etcd fileutil purge chatter (library-internal)
is tracked in MIR-1025.

Closes MIR-1024
@phinze phinze requested a review from a team as a code owner April 17, 2026 17:47
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfcb405b-fe2a-4c11-ad21-888104b40050

📥 Commits

Reviewing files that changed from the base of the PR and between 6f1d392 and d5eacd6.

📒 Files selected for processing (1)
  • controllers/sandbox/sandbox_frozen_test.go

📝 Walkthrough

Walkthrough

This pull request removes debug/info log statements across eight source files (deployment launcher, sandbox metrics/controller/sandbox, sandbox pool manager, controller reconciliation, RPC server, and HTTP ingress). No control flow, error handling, or functional behavior was changed. Additionally, the frozen SHA-256 expected hash for sandbox.go in a frozen test was updated; no other test logic was modified.


Comment @coderabbitai help to get the list of available commands and usage tips.

The TestSandboxControllerFrozen guard fired because the previous
commit removed two debug log lines from sandbox.go. The audit the
guard asks for is satisfied: saga_controller.go got the same
"sandbox already exists, skipping create" log removed in the same
commit, and the change is log-only with no behavioral impact.
@phinze phinze merged commit 0795463 into main Apr 17, 2026
12 checks passed
@phinze phinze deleted the phinze/mir-1024-reduce-log-noise-in-steady-state-miren-server-operation branch April 17, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants