-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminating sync session is stuck forever on Docker Desktop for Mac Edge #223
Comments
Thanks for bringing this to my attention. I don't see anything in the recent Docker for Mac Edge release notes that would explain this, so I'll have to try to reproduce and debug it. I'm traveling at the moment, but I'll try to have a look as soon as possible. Do you know if it also happens on the 2.3.5.0 build of Docker for Mac Edge? |
I have not tried 2.3.5.0 as my goal is a speedy development environment. I am really sorry, I have no time to spare for such an experiment. However, it is so easy to reproduce, so it should not be a problem for you to check it out with 2.3.5.0 in the background while you making some other stuff. |
my first try with mutagen compose after mutagen was removed in Docker For Mac Edge, and I get this same issue - terminating session never ends. Using Docker For Mac Edge 2.3.5.0 |
Having the same issue on Edge 2.3.5.0. I think (part of) the issue relies in the sidecar container. If I run |
I can still repro this on edge 2.3.6.1. |
We were using SIGKILL for transport process termination, but this led to zombie processes in certain cases (e.g. in issue #223). This commit switches to using SIGTERM, which is not only more idiomatic, but also doesn't have this problem. Of course, SIGTERM has some downsides, namely that transport processes can ignore it and may not forward it properly, but that doesn't really happen with our transports (SSH and Docker), and even if it did we would still just block in a wait call until someone kill -9'd the offending process. The only reason we were using SIGKILL to begin with was because it was offered by a convenience method. On Windows, we stick with os.Process.Kill, but this isn't sending a signal, it's using TerminateProcess. Signed-off-by: Jacob Howard <jacob@havoc.io>
Thanks for all of the reports here, they were very helpful. I think I've diagnosed the issue, which is that Docker for Mac's I'm sorry I didn't catch this during earlier testing—I primarily use a standalone It seems like the reason things work when the sidecar container is shut down is that the So there are two solutions: either switch to a forward-able termination signal (e.g. the more appropriate The current use of I've just released v0.12.0-beta2 with this fix (as well as an update to Go 1.15.2). Please let me know if this issue still occurs. |
@havoc-io I can confirm that this is fixed on edge 2.3.6.1 with mutagen v0.12.0-beta2. 🎉 |
We were using SIGKILL for transport process termination, but this led to zombie processes in certain cases (e.g. in issue #223). This commit switches to using SIGTERM, which is not only more idiomatic, but also doesn't have this problem. Of course, SIGTERM has some downsides, namely that transport processes can ignore it and may not forward it properly, but that doesn't really happen with our transports (SSH and Docker), and even if it did we would still just block in a wait call until someone kill -9'd the offending process. The only reason we were using SIGKILL to begin with was because it was offered by a convenience method. On Windows, we stick with os.Process.Kill, but this isn't sending a signal, it's using TerminateProcess. This is a backport from v0.12.x. Signed-off-by: Jacob Howard <jacob@havoc.io>
We've been bitten again by golang/go#23019, so rather than a quick hack, it's time to properly fix agent termination signaling. While the original plan was to use killpg and Windows job objects to manage agent process hierarchies, it quickly became clear when trying to implement that behavior that the APIs needed to accomplish it simply weren't there on POSIX or Windows. Moreover, it became even less clear what the correct signaling and forceful termination mechanisms should be, especially since Mutagen has no insight into transport process hierarchies. Thus, Mutagen now takes a staged approach to transport process termination, first waiting, then closing standard input, then (on POSIX) sending SIGTERM, and finally forcing termination. It relies on transport processes to correctly forward these mechanisms to the Mutagen agent and to manage their own internal process hierarchies accordingly once the Mutagen agent terminates. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Backport of 8556e07 Signed-off-by: Jacob Howard <jacob@mutagen.io>
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Backport of 8556e07 Signed-off-by: Jacob Howard <jacob@mutagen.io>
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in mutagen-io/mutagen@6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates mutagen-io#223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
Which version of Mutagen are you using (
mutagen version
)?0.12.0-beta1
Which operating system (platform/version/architecture) are you using?
MacOS 10.14.6 (18G6020)
What is the issue that you're experiencing?
A terminating sync session is stuck forever.
What are the steps to reproduce the issue?
brew install mutagen-io/mutagen/mutagen-beta
cd /tmp
git clone https://github.com/mutagen-io/example-voting-app.git ./voting
cd ./voting
mutagen compose up -d
mutagen compose down
ORmutagen sync terminate vote
What is the expected result of these steps in the absence of the issue?
The compose project is stopped. The sync session is terminated.
What is the actual result?
And this longs for hours.
Is there any other information that might be helpful?
Running:
In a different terminal, the list shows 2 identical sessions (not always 2). The status can be different: "Watching files" or "Error", and it does not matter in the case.
mutagen daemon stop
stops daemon, but leaves .lock file. Somutagen daemon stop && rm ~/.mutagen/daemon/daemon.lock && mutagen daemon start
helps, however the problem persists.The only way to fix a problem:
mutagen daemon stop && rm -rf ~/.mutagen
The text was updated successfully, but these errors were encountered: