20170717 try surf (junk)#1
Closed
rfay wants to merge 26 commits into
Closed
Conversation
89ace32 to
ed6399b
Compare
rfay
pushed a commit
that referenced
this pull request
Apr 30, 2018
Problem: ``` The current host header value does not match the configured trusted hosts pattern! Check the pattern defined in $GLOBALS['TYPO3_CONF_VARS']['SYS']['trustedHostsPattern'] and adapt it, if you want to allow the current host header 'website.ddev.local:8080' for your installation. ```
rfay
added a commit
that referenced
this pull request
Apr 1, 2026
…eContainer 30m hangs ## The Issue - Related to ddev#8265 - ContainerInspect polling still produces 30m hangs on Lima/Colima-VZ On macOS with Lima (both lima-VZ and colima-VZ), `RunSimpleContainer` calls for trivial commands (cat a file, ls, push traefik config) are timing out at exactly 30 minutes with: "timed out after 30m0s waiting for container X to stop" PR ddev#8265 replaced `ContainerWait` (which hung indefinitely) with `ContainerInspect` polling and a 30-minute context deadline. The 30m timeout now fires where previously we saw 4h hangs, but the root cause is not yet resolved. ## How This PR Solves The Issue Adds `util.Debug` logging around the `ContainerInspect` polling loop to distinguish between two failure modes: **Mode A**: `ContainerInspect` blocks on the socket proxy (Lima/Colima) and doesn't return until the 30-minute context deadline fires. Each individual call blocks. Symptom: "attempt #1" logged, "returned after" never logged until ~30m later. **Mode B**: `ContainerInspect` returns quickly but reports `Running=true` for 30 minutes because the Docker daemon on Lima has stale/incorrect container state. Symptom: rapid "returned after Xms" messages all showing Running=true. With `DDEV_DEBUG=true` the logs will show which mode is occurring, enabling a targeted fix. ## Candidate Fixes (to be applied once root cause is confirmed) ### If Mode A (ContainerInspect blocks): The fix is a per-call short timeout using goroutines, since Go context cancellation may not unblock a stuck OS-level socket read on Lima's proxy: ```go const perCallTimeout = 10 * time.Second for { ch := make(chan inspectResult, 1) go func() { callCtx, cancel := context.WithTimeout(context.Background(), perCallTimeout) defer cancel() info, err := apiClient.ContainerInspect(callCtx, c.ID, ...) ch <- inspectResult{info, err} }() select { case <-waitCtx.Done(): return timeout error case res := <-ch: if res.err == nil && !res.info.State.Running { break } // err or still running: fall through to tick } select { case <-waitCtx.Done(): return timeout error case <-tickChan.C: } } ``` Goroutine leak is bounded (max timeout/perCallTimeout per call) and acceptable. If the container exits and ContainerInspect subsequently hangs once, the goroutine for that call leaks but the next call returns quickly and we proceed. ### If Mode B (stale Running=true): The Docker daemon on Lima isn't getting container exit events. Options: - Use `docker` CLI via `exec.CommandContext` to check state (fresh socket connection each call) - Force-kill the container after a shorter threshold (e.g. 60s) if it's still "Running" but was started for a trivial command - Investigate Lima's Docker daemon event propagation ### Other considerations - Both failures have been seen on lima-VZ and colima-VZ builds, not on other platforms - The commands involved are trivial: read a file, list directory contents, push traefik config - A container running `cat file && exit` should complete in <100ms ## Manual Testing Instructions 1. On a Lima or Colima-VZ Mac: `DDEV_DEBUG=true ddev start` for a project that triggers `GetExistingDBType` or Traefik config push 2. Look for `RunSimpleContainer: ContainerInspect attempt #1` in output 3. Check if "returned after" appears immediately or only after 30m 4. Report which mode is occurring ## Automated Testing Overview No new tests - this is a diagnostic-only change to gather information for the fix. ## Release/Deployment Notes Debug-only logging - no behavior change. Logs only appear with `DDEV_DEBUG=true`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rfay
added a commit
that referenced
this pull request
Apr 1, 2026
…eContainer 30m hangs ## The Issue - Related to ddev#8265 - ContainerInspect polling still produces 30m hangs on Lima/Colima-VZ On macOS with Lima (both lima-VZ and colima-VZ), `RunSimpleContainer` calls for trivial commands (cat a file, ls, push traefik config) are timing out at exactly 30 minutes with: "timed out after 30m0s waiting for container X to stop" PR ddev#8265 replaced `ContainerWait` (which hung indefinitely) with `ContainerInspect` polling and a 30-minute context deadline. The 30m timeout now fires where previously we saw 4h hangs, but the root cause is not yet resolved. ## How This PR Solves The Issue Adds `util.Debug` logging around the `ContainerInspect` polling loop to distinguish between two failure modes: **Mode A**: `ContainerInspect` blocks on the socket proxy (Lima/Colima) and doesn't return until the 30-minute context deadline fires. Each individual call blocks. Symptom: "attempt #1" logged, "returned after" never logged until ~30m later. **Mode B**: `ContainerInspect` returns quickly but reports `Running=true` for 30 minutes because the Docker daemon on Lima has stale/incorrect container state. Symptom: rapid "returned after Xms" messages all showing Running=true. With `DDEV_DEBUG=true` the logs will show which mode is occurring, enabling a targeted fix. ## Candidate Fixes (to be applied once root cause is confirmed) ### If Mode A (ContainerInspect blocks): The fix is a per-call short timeout using goroutines, since Go context cancellation may not unblock a stuck OS-level socket read on Lima's proxy: ```go const perCallTimeout = 10 * time.Second for { ch := make(chan inspectResult, 1) go func() { callCtx, cancel := context.WithTimeout(context.Background(), perCallTimeout) defer cancel() info, err := apiClient.ContainerInspect(callCtx, c.ID, ...) ch <- inspectResult{info, err} }() select { case <-waitCtx.Done(): return timeout error case res := <-ch: if res.err == nil && !res.info.State.Running { break } // err or still running: fall through to tick } select { case <-waitCtx.Done(): return timeout error case <-tickChan.C: } } ``` Goroutine leak is bounded (max timeout/perCallTimeout per call) and acceptable. If the container exits and ContainerInspect subsequently hangs once, the goroutine for that call leaks but the next call returns quickly and we proceed. ### If Mode B (stale Running=true): The Docker daemon on Lima isn't getting container exit events. Options: - Use `docker` CLI via `exec.CommandContext` to check state (fresh socket connection each call) - Force-kill the container after a shorter threshold (e.g. 60s) if it's still "Running" but was started for a trivial command - Investigate Lima's Docker daemon event propagation ### Other considerations - Both failures have been seen on lima-VZ and colima-VZ builds, not on other platforms - The commands involved are trivial: read a file, list directory contents, push traefik config - A container running `cat file && exit` should complete in <100ms ## Manual Testing Instructions 1. On a Lima or Colima-VZ Mac: `DDEV_DEBUG=true ddev start` for a project that triggers `GetExistingDBType` or Traefik config push 2. Look for `RunSimpleContainer: ContainerInspect attempt #1` in output 3. Check if "returned after" appears immediately or only after 30m 4. Report which mode is occurring ## Automated Testing Overview No new tests - this is a diagnostic-only change to gather information for the fix. ## Release/Deployment Notes Debug-only logging - no behavior change. Logs only appear with `DDEV_DEBUG=true`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rfay
added a commit
that referenced
this pull request
Apr 2, 2026
…eContainer 30m hangs ## The Issue - Related to ddev#8265 - ContainerInspect polling still produces 30m hangs on Lima/Colima-VZ On macOS with Lima (both lima-VZ and colima-VZ), `RunSimpleContainer` calls for trivial commands (cat a file, ls, push traefik config) are timing out at exactly 30 minutes with: "timed out after 30m0s waiting for container X to stop" PR ddev#8265 replaced `ContainerWait` (which hung indefinitely) with `ContainerInspect` polling and a 30-minute context deadline. The 30m timeout now fires where previously we saw 4h hangs, but the root cause is not yet resolved. ## How This PR Solves The Issue Adds `util.Debug` logging around the `ContainerInspect` polling loop to distinguish between two failure modes: **Mode A**: `ContainerInspect` blocks on the socket proxy (Lima/Colima) and doesn't return until the 30-minute context deadline fires. Each individual call blocks. Symptom: "attempt #1" logged, "returned after" never logged until ~30m later. **Mode B**: `ContainerInspect` returns quickly but reports `Running=true` for 30 minutes because the Docker daemon on Lima has stale/incorrect container state. Symptom: rapid "returned after Xms" messages all showing Running=true. With `DDEV_DEBUG=true` the logs will show which mode is occurring, enabling a targeted fix. ## Candidate Fixes (to be applied once root cause is confirmed) ### If Mode A (ContainerInspect blocks): The fix is a per-call short timeout using goroutines, since Go context cancellation may not unblock a stuck OS-level socket read on Lima's proxy: ```go const perCallTimeout = 10 * time.Second for { ch := make(chan inspectResult, 1) go func() { callCtx, cancel := context.WithTimeout(context.Background(), perCallTimeout) defer cancel() info, err := apiClient.ContainerInspect(callCtx, c.ID, ...) ch <- inspectResult{info, err} }() select { case <-waitCtx.Done(): return timeout error case res := <-ch: if res.err == nil && !res.info.State.Running { break } // err or still running: fall through to tick } select { case <-waitCtx.Done(): return timeout error case <-tickChan.C: } } ``` Goroutine leak is bounded (max timeout/perCallTimeout per call) and acceptable. If the container exits and ContainerInspect subsequently hangs once, the goroutine for that call leaks but the next call returns quickly and we proceed. ### If Mode B (stale Running=true): The Docker daemon on Lima isn't getting container exit events. Options: - Use `docker` CLI via `exec.CommandContext` to check state (fresh socket connection each call) - Force-kill the container after a shorter threshold (e.g. 60s) if it's still "Running" but was started for a trivial command - Investigate Lima's Docker daemon event propagation ### Other considerations - Both failures have been seen on lima-VZ and colima-VZ builds, not on other platforms - The commands involved are trivial: read a file, list directory contents, push traefik config - A container running `cat file && exit` should complete in <100ms ## Manual Testing Instructions 1. On a Lima or Colima-VZ Mac: `DDEV_DEBUG=true ddev start` for a project that triggers `GetExistingDBType` or Traefik config push 2. Look for `RunSimpleContainer: ContainerInspect attempt #1` in output 3. Check if "returned after" appears immediately or only after 30m 4. Report which mode is occurring ## Automated Testing Overview No new tests - this is a diagnostic-only change to gather information for the fix. ## Release/Deployment Notes Debug-only logging - no behavior change. Logs only appear with `DDEV_DEBUG=true`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rfay
added a commit
that referenced
this pull request
Apr 11, 2026
…ip ci] On Windows-native, findWindowsPortProcesses returning empty was treated as "IN USE (unable to identify)" because the isPortFree check was gated behind !nodeps.IsWindows(). This produced false positives for every free port. Add an explicit isPortFree check on Windows before the "unable to identify" path so that genuinely free ports report Available. Found during Windows native manual testing (test matrix scenario #1). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rfay
added a commit
that referenced
this pull request
Apr 15, 2026
…ip ci] On Windows-native, findWindowsPortProcesses returning empty was treated as "IN USE (unable to identify)" because the isPortFree check was gated behind !nodeps.IsWindows(). This produced false positives for every free port. Add an explicit isPortFree check on Windows before the "unable to identify" path so that genuinely free ports report Available. Found during Windows native manual testing (test matrix scenario #1). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem:
The Fix:
The Test:
Automation Overview:
Related Issue Link(s):
Release/Deployment notes: