Cut steady-state log noise from miren server#767
Merged
phinze merged 2 commits intoApr 17, 2026
Merged
Conversation
After a v0.7.0 upgrade on one of our prod servers, a 2h sample of `journalctl -u miren` showed ~86k lines — 75% DEBUG, 24% INFO — with most of the volume being content-free chatter that made real signal hard to spot. Walked through the top offenders and deleted them: every RPC auth success, every HTTP route match, every lease renewal, the sandbox- pool reconcile trio, the no-op "sandbox already exists" pair, the deployment loop's "skipping pool for different app" backwards logging, the generic controller framework's `Processing event` INFO line (the loudest prod-visible offender), and the redundant container stats debug (the data already lands in the customer metrics pipeline under metrics/). A few lines we spotted look more metric-shaped than log-shaped — ingress request counts, reconcile counts, pool gauges. We didn't convert them here because first-party OTel metric instruments aren't wired up for server telemetry yet. That work is tracked in MIR-1026, and the etcd fileutil purge chatter (library-internal) is tracked in MIR-1025. Closes MIR-1024
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis pull request removes debug/info log statements across eight source files (deployment launcher, sandbox metrics/controller/sandbox, sandbox pool manager, controller reconciliation, RPC server, and HTTP ingress). No control flow, error handling, or functional behavior was changed. Additionally, the frozen SHA-256 expected hash for Comment |
The TestSandboxControllerFrozen guard fired because the previous commit removed two debug log lines from sandbox.go. The audit the guard asks for is satisfied: saga_controller.go got the same "sandbox already exists, skipping create" log removed in the same commit, and the change is log-only with no behavioral impact.
evanphx
approved these changes
Apr 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After a v0.7.0 upgrade on one of our prod servers we pulled two hours of
journalctl -u mirenand ran a frequency analysis. The server was producing around 86k lines in that window — roughly 75% DEBUG and 24% INFO — and most of it was content-free steady-state chatter. Every internal RPC logged its own auth success, every proxied HTTP request logged its route match, every sandbox-pool reconcile emitted a three-line trio, every lease got a heartbeat line, every sandbox no-op got a two-line "I checked, nothing to do" pair. Finding anything interesting in the log stream was genuinely hard.This PR is the straightforward cut: remove the lines that are pure noise and delete the
Processing eventINFO line in the generic controller framework, which was the single loudest prod-visible offender because it fires for every reconcile of every entity. Also took out a redundant container stats log — that data already flows into the customer metrics pipeline undermetrics/, so the slog copy was just duplication.The ticket suggested some of these could be downgraded to TRACE instead, but the codebase uses stdlib
slogwith only Debug/Info/Warn/Error. Rather than introduce a custom TRACE level just for this, we deleted outright —git log -Swill find anything we want back.A few offenders were actually metrics wearing a trenchcoat. The OTel SDK is already set up in
pkg/rpc/otel.go, but we don't currently emit any first-partyCounter/Histogram/UpDownCounterinstruments from the server itself. Ingress request counts, controller reconcile counts, and sandbox pool gauges all belong there rather than in the log stream. That's its own project, tracked as MIR-1026. Thefailed to lock filechatter from embedded etcd's purge routine (240× in 2h) is library-internal and needs its own intervention, tracked as MIR-1025.Extrapolating from the ticket's volume numbers, this removes roughly 65k of the 86k lines per 2h (~76%).
Closes MIR-1024
Related: MIR-1025, MIR-1026