Skip to content

Improve DNS and Colima reliability and security#47

Merged
munezaclovis merged 16 commits intomainfrom
feat/dns-colima-reliability-fixes
Mar 25, 2026
Merged

Improve DNS and Colima reliability and security#47
munezaclovis merged 16 commits intomainfrom
feat/dns-colima-reliability-fixes

Conversation

@munezaclovis
Copy link
Copy Markdown
Contributor

@munezaclovis munezaclovis commented Mar 25, 2026

Summary

  • Fix DNS doctor health check: Replace no-op UDP Dial() with a real DNS A query using miekg/dns client to actually verify the server responds
  • Add AAAA/IPv6 DNS response: Return ::1 for AAAA queries to eliminate Safari "Happy Eyeballs" latency (matches Laravel Valet's approach)
  • Flush DNS cache after resolver setup: Append dscacheutil -flushcache and killall -HUP mDNSResponder to the sudo resolver script so changes take effect immediately
  • Add Colima VM auto-recovery: If Start() fails (broken VM after sleep/update), automatically attempt force-stop + delete + restart before giving up
  • Make VM resources configurable: Add vm section to pv.yml with cpu, memory, disk fields (defaults: 2/2/60)
  • Add SIGKILL fallback to pv stop: Escalate to SIGKILL after 5s SIGTERM timeout to prevent zombie supervisor processes
  • Add macOS version pre-flight check: Verify macOS >= 13 (Ventura) before starting Colima with --vm-type vz, returning a clear error instead of a cryptic Colima failure
  • Fix DNS startup race: Use NotifyStartedFunc callback to wait for the UDP socket to bind before logging "DNS server listening" and starting FrankenPHP
  • Migrate launchctl to bootstrap/bootout: Use modern API with automatic fallback to legacy load/unload for older macOS
  • Restrict PID file permissions: Change from 0644 to 0600 (principle of least privilege)
  • Redirect Colima state to ~/.pv/: Set COLIMA_HOME=~/.pv/internal/colima so all VM state, Docker sockets, and Lima data live under the managed tree instead of ~/.colima/

Test plan

  • All existing tests pass (go test ./...)
  • New tests added for AAAA DNS queries, MX empty response, macOS version parsing, ColimaHomeDir, updated ColimaSocketPath
  • go vet ./... clean
  • go build ./... clean
  • Manual: run pv install on a fresh machine, verify DNS resolves .test domains
  • Manual: run pv doctor, verify DNS health check sends a real query
  • Manual: run pv service:add redis, verify Colima state appears under ~/.pv/internal/colima/
  • Manual: configure vm: section in pv.yml, verify Colima respects the values
  • Manual: verify pv stop terminates cleanly (SIGTERM path) and force-kills after timeout (SIGKILL path)

The previous checkDNSResponding() used net.Dial("udp", ...) which
always succeeds for UDP (connectionless). Replace with an actual DNS
A query using miekg/dns client, verifying 127.0.0.1 is returned.
Safari uses Happy Eyeballs (RFC 6555) and prefers IPv6. Previously,
AAAA queries returned empty NOERROR, forcing a fallback delay on
every page load. Now return ::1 for AAAA queries alongside 127.0.0.1
for A queries, matching Laravel Valet's approach.
After writing /etc/resolver/{tld}, macOS may serve stale cached
negative responses for up to 60 seconds. Append dscacheutil
-flushcache and killall -HUP mDNSResponder to both resolver
setup scripts so DNS changes take effect immediately.
When Start() fails (e.g. broken VM state after macOS sleep or update),
automatically attempt recovery by force-stopping, deleting, and
restarting the VM. Logs a warning so users know recovery was attempted.
This matches DDEV's recommended recovery pattern.
Add 'vm' section under 'defaults' in pv.yml with cpu, memory, and
disk fields. Defaults to 2 CPU, 2 GB memory, 60 GB disk. This lets
users on resource-constrained or powerful machines adjust allocation
without modifying source code.
If the supervisor process doesn't exit within 5 seconds of receiving
SIGTERM, escalate to SIGKILL to prevent zombie processes. This
matches the existing FrankenPHP.Stop() pattern which already has
SIGKILL escalation.
Check that macOS >= 13 (Ventura) before starting Colima with
--vm-type vz and --mount-type virtiofs. Returns a clear error
message instead of letting Colima fail with a cryptic error on
older macOS versions.
Use miekg/dns NotifyStartedFunc callback to signal when the UDP
socket is bound. The supervisor now waits for the ready signal
(with 5s timeout) before printing "DNS server listening" and
starting FrankenPHP, ensuring the log is accurate and DNS is
actually reachable.
Use the modern launchctl bootstrap/bootout API (recommended since
macOS Yosemite) with automatic fallback to the deprecated
load/unload commands for older systems. This eliminates deprecation
warnings on modern macOS while maintaining backward compatibility.
The PID file only needs to be readable by the owning user.
Apply principle of least privilege by removing group/other
read permissions.
Set COLIMA_HOME=~/.pv/internal/colima in colimaCmd() and the Colima
shim so all VM state, Docker sockets, and Lima data live under the
managed ~/.pv/ tree instead of ~/.colima/. This ensures pv uninstall
cleanly removes all state, and avoids conflicts with user-installed
Colima instances.

- Add ColimaHomeDir() returning ~/.pv/internal/colima
- Update ColimaSocketPath() to use the new location
- Add ColimaHomeDir() to EnsureDirs() (must exist before Colima reads it)
- Update colima shim to export COLIMA_HOME
The || true was masking failures of the entire && chain including
mkdir and printf. Use a subshell to isolate the best-effort DNS
cache flush so resolver file creation errors still propagate.
When macOS version cannot be detected or parsed, log a warning
before returning nil so there's a diagnostic trail. Also lowercase
the error string per Go conventions.
Capture forceStop and Delete errors and append them to the final
error message so users can diagnose why VM recovery failed. Also
suppress subprocess output in forceStop to avoid noisy terminal
output during automated recovery.
Check SIGKILL result and confirm the process actually exited before
reporting success. Previously, pv stop could report success while
the process was still running, causing 'address in use' on next start.
Add table-driven tests covering: zero-value, partial, negative, valid,
and identity cases for WithDefaults(). Also fix setup wizard to carry
existing VM config through when saving settings, preventing silent
reset of user's CPU/memory/disk configuration.
@munezaclovis munezaclovis merged commit 2648c18 into main Mar 25, 2026
1 check failed
@munezaclovis munezaclovis deleted the feat/dns-colima-reliability-fixes branch March 25, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant