-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutagen-compose down fails #11
Comments
Bump @xenoscopic - anything you'd recommend I try here to mitigate the issue? I'm happy to share more logs, run a beta build, make a video capture, or hop on screen share. Cheers! |
I apologize that I haven't gotten back to you on this yet (especially after such a detailed report). I'll have a look at this tomorrow morning and let you know if I need anything else, but hopefully it should be pretty easy to suss out. |
@xenoscopic no worries and I appreciate any attention you can give it 👍 |
So, reviewing the issue, it seems like what's blocking shutdown is the pause operation on the Mutagen synchronization session. I've had sporadic reports of session pausing/termination hanging with Docker, but the only way I've been able to reproduce this is by trying to pause/terminate a Mutagen synchronization session targeting a paused Docker container (which is a known issue that will be solved soon with an additional heartbeat mechanism). You didn't mention it, so I assume the Mutagen container is not paused in this case, but it's entirely possible a similar issue is preventing the sync session from shutting down. Are you able to reproduce this hang at least somewhat reliably? If so, it would be really useful if I could get a stack trace of the Mutagen daemon on your system once this hang occurs. Unfortunately this is a little bit complicated and would require doing a custom build of Mutagen, but I could provide instructions if you're willing/able to reproduce. |
Willing and able - seems to happen every time for me |
Awesome! Here are the basic instructions. Some minor tweaking may be necessary depending on your exact system setup, but if something doesn't work, let me know - definitely don't spend hours digging into it 😅. Delve can be a little finicky, but it should work. First, you'll need a Go 1.17 toolchain installed and in your path. Then, the setup is basically the following: # Install Delve (assuming you have $GOPATH/bin in your path), otherwise install
# instructions are available here:
# https://github.com/go-delve/delve/tree/master/Documentation/installation
go install github.com/go-delve/delve/cmd/dlv@latest
# Create a debuggable Mutagen build
git clone https://github.com/mutagen-io/mutagen.git
cd mutagen
git checkout v0.13.0
go run scripts/build.go --mode=slim # ignore warnings about cgo/macOS
# Stop the existing Mutagen daemon (if running - ignore errors if not)
mutagen daemon stop
# Run the daemon in a debugger
dlv exec ./build/mutagen.exe -- daemon run
(dlv) continue
# In a separate terminal, start up your Compose project, then tear it down to
# the point where it hangs on "Pausing session ..."
# Then, back in the debugger, hit CTRL+C, and then type
(dlv) goroutines -t
# ...and copy this output... If you can copy that output, either here or to a Gist, I should be able to figure out exactly what's hanging. I may have some follow-up questions, but hopefully it should be all the info I need. Feel free to audit the output for any info that needs to be redacted, but there shouldn't be any user-visible data in it. Thanks in advance! |
@xenoscopic thanks for the clear instructions! Here is the output:
Let me know if you need anything else. Cheers! |
We were previously hesitant to close standard input because the close could block indefinitely if writes were pending, but with the poll-based pipe I/O introduced in Go 1.10 (see golang/go@187957d), this should no longer be an issue, and it's going to be necessary (for now) to work around the issues described in golang/go#23019, which are causing Docker transport shutdown hangs. Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
Excellent, thank you very much! As expected, that pointed right to the issue. It would have been really difficult to figure out without the stack trace, so thanks again! I'm fairly certain it's the problem addressed by this proposal and this proposal. In Mutagen's case, the issue is that (with Docker Desktop) the So as a quick fix, I've added closure of the standard input pipe, which should trickle down to I've got a build testing this change, but it would be really helpful if you could test whether or not it actually fixes the issue for you before I tag a release. Would you be able to replace
Then try to bring the Compose project up/down and see if it still hangs. If it does, can you redo the Delve procedure (with this new |
We were previously hesitant to close standard input pipes because the close operation could block indefinitely if writes were pending. However, with the poll-based pipe I/O introduced in Go 1.10 (see golang/go@187957d), this should no longer be an issue, and it's going to be necessary (for now) to work around the issues described in golang/go#23019, which are causing Docker transport shutdown hangs. Essentially, closing the standard input pipe should (in theory) signal to the com.docker.cli process that it's time to shut down. Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
@xenoscopic I rebuilt and ran as described above using the new |
@darrylkuhn Awesome, thanks for all your help. I've got a bit more testing and validation I want to do for this change tomorrow, but I'll get it tagged into v0.13.1 and get that out the door ASAP. I'll close out this issue once that's shipped. |
Fantastic - appreciate it! |
We've been bitten again by golang/go#23019, so rather than a quick hack, it's time to properly fix agent termination signaling. While the original plan was to use killpg and Windows job objects to manage agent process hierarchies, it quickly became clear when trying to implement that behavior that the APIs needed to accomplish it simply weren't there on POSIX or Windows. Moreover, it became even less clear what the correct signaling and forceful termination mechanisms should be, especially since Mutagen has no insight into transport process hierarchies. Thus, Mutagen now takes a staged approach to transport process termination, first waiting, then closing standard input, then (on POSIX) sending SIGTERM, and finally forcing termination. It relies on transport processes to correctly forward these mechanisms to the Mutagen agent and to manage their own internal process hierarchies accordingly once the Mutagen agent terminates. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
Just an additional update: while the quick hack will work, I've decided to address this more comprehensively in mutagen-io/mutagen@b910ae7. I'll need to do a bit more testing with this tomorrow since it's such a core change, but should be able to ship it by the end of the weekend. |
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Backport of 8556e07 Signed-off-by: Jacob Howard <jacob@mutagen.io>
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Backport of 8556e07 Signed-off-by: Jacob Howard <jacob@mutagen.io>
This should now be fixed in Mutagen/Mutagen Compose v0.13.1 (note that you'll have to update both due to their version matching requirement at the moment). Please let me know if you run into trouble with this newer fix. It's not exactly the same as the code you tried, but it's the same idea. Thanks again for your help. |
Downloaded and installed |
We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in mutagen-io/mutagen@6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates mutagen-io#223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>
I'm new to mutagen and mutagen compose but have been able to get things up and running (what an improvement to our hot module reloading times as compared to bind mounts!). I am running into one issue though (with mutagen-compose I think). I bring the orchestration up with
mutagen-compose --project-name vue up
and everything works as expected, however when I runmutagen-compose --project-name vue down
the vue container comes down but the mutagen sidecar and sync session just keep running:So far as I can tell the
SIGTERM
is not making it to the container from themutagen-compose down
command. When I issue adocker container stop vue-mutagen-1
the rest of the orchestration comes down (network, etc...) and the sync is stopped and removed.I am running:
And for reference here is the mutagen compose file:
Happy to provide any other details, logs, etc...
The text was updated successfully, but these errors were encountered: