MIR-993: Configurable port-wait timeout for slow-cold-init addons#755
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (15)
✅ Files skipped from review due to trivial changes (5)
🚧 Files skipped from review as they are similar to previous changes (6)
📝 WalkthroughWalkthroughAdded a new optional Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
controllers/deployment/launcher.go (1)
846-848: Prefer semantic timeout comparison over raw string equality.Line 846 currently mismatches equivalent duration strings (for example,
60svs1m), which can trigger unnecessary pool replacement.♻️ Suggested refactor
- if spec1.PortWaitTimeout != spec2.PortWaitTimeout { - return fmt.Sprintf("port wait timeout mismatch: %s vs %s", spec1.PortWaitTimeout, spec2.PortWaitTimeout), false - } + resolve := func(raw string) time.Duration { + if strings.TrimSpace(raw) == "" { + return 15 * time.Second + } + d, err := time.ParseDuration(raw) + if err != nil || d <= 0 { + return 15 * time.Second + } + return d + } + t1 := resolve(spec1.PortWaitTimeout) + t2 := resolve(spec2.PortWaitTimeout) + if t1 != t2 { + return fmt.Sprintf("port wait timeout mismatch: %s (%s) vs %s (%s)", + spec1.PortWaitTimeout, t1, spec2.PortWaitTimeout, t2), false + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/deployment/launcher.go` around lines 846 - 848, The current string equality check for PortWaitTimeout on spec1 and spec2 can misreport equivalent durations (e.g. "60s" vs "1m"); change the comparison to parse both spec1.PortWaitTimeout and spec2.PortWaitTimeout with time.ParseDuration, handle parse errors (log/return a clear mismatch/error if parsing fails), and compare the resulting time.Duration values numerically so equivalent durations are treated as equal; reference the symbols spec1.PortWaitTimeout and spec2.PortWaitTimeout and ensure the mismatch message reflects the original strings or the parsed durations for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/compute/schema.yml`:
- Around line 26-32: Update the port_wait_timeout documentation to state that
empty, unparsable, or non-positive durations (<=0, e.g. "0s") are treated the
same by the controller and will fall back to the default of 15s; reference that
values are parsed via time.ParseDuration and give an example of a non-positive
value being ignored so users don't assume "0s" disables the wait. Keep the
existing examples (e.g. "60s") and explicitly note addons needing longer waits
should set a larger positive duration.
---
Nitpick comments:
In `@controllers/deployment/launcher.go`:
- Around line 846-848: The current string equality check for PortWaitTimeout on
spec1 and spec2 can misreport equivalent durations (e.g. "60s" vs "1m"); change
the comparison to parse both spec1.PortWaitTimeout and spec2.PortWaitTimeout
with time.ParseDuration, handle parse errors (log/return a clear mismatch/error
if parsing fails), and compare the resulting time.Duration values numerically so
equivalent durations are treated as equal; reference the symbols
spec1.PortWaitTimeout and spec2.PortWaitTimeout and ensure the mismatch message
reflects the original strings or the parsed durations for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 056a84f3-60d3-4147-9e40-1c09ea2ed031
📒 Files selected for processing (15)
api/compute/compute_v1alpha/schema.gen.goapi/compute/schema.ymlcontrollers/deployment/launcher.gocontrollers/deployment/specs_match_test.gocontrollers/sandbox/create_saga.gocontrollers/sandbox/create_saga_test.gocontrollers/sandbox/port_wait_timeout_test.gocontrollers/sandbox/sandbox.gocontrollers/sandbox/sandbox_frozen_test.gopkg/addon/framework.gopkg/addon/mysql/dedicated.gopkg/addon/mysql/shared.gopkg/addon/postgresql/dedicated.gopkg/addon/postgresql/shared.gopkg/addon/rabbitmq/dedicated.go
Sandbox creation fails any port-bind health check that doesn't complete within a hardcoded 15s, intentionally marking the sandbox DEAD so the pool crash-backs-off. MySQL 8's first-boot runs --initialize, a temp setup server, then the real server — ~20s on loaded dev hardware — so the first attempt always fails, and crash cooldown delays the retry. Postgres (initdb) and RabbitMQ (Erlang VM + mnesia) have similar cold-init behavior. Add an optional port_wait_timeout to SandboxSpec, threaded through both the legacy controller path and the saga create path, and set sensible budgets on the affected addons: - MySQL: 60s (shared + dedicated) - Postgres: 60s (shared + dedicated) - RabbitMQ: 90s (dedicated) Empty spec falls back to the existing 15s default, so normal apps are unaffected. Invalid/non-positive values also fall back rather than bricking a pool. Fixes MIR-993.
c79f8d2 to
310987c
Compare
Summary
--initialize→ temporary setup server → real server) takes ~20s, exceeding the hardcoded 15s port-bind budget. The pool then marks the sandbox DEAD and crash-backs-off, so the fast (~1s) second boot is delayed too.port_wait_timeouttoSandboxSpec, threaded through both the legacy controller path (controllers/sandbox/sandbox.go) and the saga create path (controllers/sandbox/create_saga.go— via a newPortWaitTimeoutfield onbootContainersOut→waitPortsIn, wired by matchingsaga:"port_wait_timeout"tags). Empty/invalid values fall back to the existing 15s default so normal apps and typos are unaffected.PortWaitTimeoutfield onaddon.CreateSandboxPoolSpec:60s(shared + dedicated)60s(shared + dedicated)90s(dedicated) — Erlang VM + mnesia boot is slower than SQL enginesspecsMatchnow comparesPortWaitTimeoutso changing the budget on an existing pool is detected as a spec drift.Test plan
make lint— 0 issuesTestResolvePortWaitTimeoutcovers empty / valid / zero / negative / garbage / bare-number fallback behaviorTestCreateSandboxSaga_PortWaitTimeoutDefault/_Overrideassert the spec value flows through toWaitForPortTestSpecsMatchCoversAllFieldsandTestSandboxControllerFrozenhashes updated; saga path carries the equivalent changewaiting for ports to be bound ... timeout: 1m0s, first boot comes up cleanly without a DEAD / crash-backoff cyclemake test-blackbox— existing MySQL addon blackbox still green