mutagen-compose down fails #11

darrylkuhn · 2022-01-19T17:45:40Z

I'm new to mutagen and mutagen compose but have been able to get things up and running (what an improvement to our hot module reloading times as compared to bind mounts!). I am running into one issue though (with mutagen-compose I think). I bring the orchestration up with mutagen-compose --project-name vue up and everything works as expected, however when I run mutagen-compose --project-name vue down the vue container comes down but the mutagen sidecar and sync session just keep running:

 - Container vue-mutagen-1  Stopping                                                                             336.0s
 - Container vue-vue-1      Removed                                                                               11.6s
 - Mutagen                  Pausing session sync_afv4HMEvIQedDHD9zYR5nCQm7gokr7gazBO0BU8xtz6...                  336.0s

So far as I can tell the SIGTERM is not making it to the container from the mutagen-compose down command. When I issue a docker container stop vue-mutagen-1 the rest of the orchestration comes down (network, etc...) and the sync is stopped and removed.

I am running:

Windows 11 Enterprise 21H2 (OS build 22000.434)
Mutagen version 0.13.0
Compose version v2.2.3
Docker version v20.10.7 (back by wsl2)

And for reference here is the mutagen compose file:

# This compose file must be run using mutagen-compose - please be sure you have
# it installed (see: https://mutagen.io/documentation/orchestration/compose)
version: '3.4'

services:
  vue:
    image: "node:14-alpine"
    volumes:
      # use a mutagen sync to address performance issues of bind mounts
      - "vuesrc:/web"
      # write node_modules to a named volume to reduce sync io
      - "node_modules:/web/node_modules" 
    working_dir: /web
    # We use the existance of the /web/node_modules/.bin directory as a proxy 
    # for whether or not node modules are already installed. This command will
    # test to see if it exists and if not will run npm install (which takes a 
    # while), otherwise we skip it and in either case run the serve command
    command:
      - /bin/sh
      - -c
      - |
        test -d /web/node_modules/.bin || npm install
        npm run serve
    ports:
     - "8080:8080"
    environment:
     # Required to support hot module reloading
     - CHOKIDAR_USEPOLLING=true
x-mutagen:
  sync:
    defaults:
      ignore:
        vcs: true
        paths:
          - "/node_modules"
    vuesrc:
      alpha: "."
      beta: "volume://vuesrc"
volumes:
  vuesrc:
  node_modules:

Happy to provide any other details, logs, etc...

The text was updated successfully, but these errors were encountered:

darrylkuhn · 2022-02-01T17:30:19Z

Bump @xenoscopic - anything you'd recommend I try here to mitigate the issue? I'm happy to share more logs, run a beta build, make a video capture, or hop on screen share. Cheers!

xenoscopic · 2022-02-01T18:07:00Z

I apologize that I haven't gotten back to you on this yet (especially after such a detailed report). I'll have a look at this tomorrow morning and let you know if I need anything else, but hopefully it should be pretty easy to suss out.

darrylkuhn · 2022-02-01T18:13:17Z

@xenoscopic no worries and I appreciate any attention you can give it 👍

xenoscopic · 2022-02-02T11:31:16Z

So, reviewing the issue, it seems like what's blocking shutdown is the pause operation on the Mutagen synchronization session. I've had sporadic reports of session pausing/termination hanging with Docker, but the only way I've been able to reproduce this is by trying to pause/terminate a Mutagen synchronization session targeting a paused Docker container (which is a known issue that will be solved soon with an additional heartbeat mechanism).

You didn't mention it, so I assume the Mutagen container is not paused in this case, but it's entirely possible a similar issue is preventing the sync session from shutting down.

Are you able to reproduce this hang at least somewhat reliably? If so, it would be really useful if I could get a stack trace of the Mutagen daemon on your system once this hang occurs. Unfortunately this is a little bit complicated and would require doing a custom build of Mutagen, but I could provide instructions if you're willing/able to reproduce.

darrylkuhn · 2022-02-02T14:18:12Z

Willing and able - seems to happen every time for me

xenoscopic · 2022-02-02T16:05:49Z

Awesome! Here are the basic instructions. Some minor tweaking may be necessary depending on your exact system setup, but if something doesn't work, let me know - definitely don't spend hours digging into it 😅. Delve can be a little finicky, but it should work.

First, you'll need a Go 1.17 toolchain installed and in your path.

Then, the setup is basically the following:

# Install Delve (assuming you have $GOPATH/bin in your path), otherwise install
# instructions are available here:
# https://github.com/go-delve/delve/tree/master/Documentation/installation
go install github.com/go-delve/delve/cmd/dlv@latest

# Create a debuggable Mutagen build
git clone https://github.com/mutagen-io/mutagen.git
cd mutagen
git checkout v0.13.0
go run scripts/build.go --mode=slim   # ignore warnings about cgo/macOS

# Stop the existing Mutagen daemon (if running - ignore errors if not)
mutagen daemon stop

# Run the daemon in a debugger
dlv exec ./build/mutagen.exe -- daemon run
(dlv) continue

# In a separate terminal, start up your Compose project, then tear it down to
# the point where it hangs on "Pausing session ..."

# Then, back in the debugger, hit CTRL+C, and then type
(dlv) goroutines -t
# ...and copy this output...

If you can copy that output, either here or to a Gist, I should be able to figure out exactly what's hanging. I may have some follow-up questions, but hopefully it should be all the info I need. Feel free to audit the output for any info that needs to be redacted, but there shouldn't be any user-visible data in it.

Thanks in advance!

darrylkuhn · 2022-02-02T16:47:55Z

@xenoscopic thanks for the clear instructions! Here is the output:

PS C:\Users\DarrylKuhn\source\mutagen> dlv exec .\build\mutagen.exe -- daemon run
Type 'help' for list of commands.
(dlv) continue
2022/02/02 08:40:42 [forwarding|info] Looking for existing sessions
2022/02/02 08:40:42 [forwarding|info] Session manager initialized
2022/02/02 08:40:42 [synchronization|info] Looking for existing sessions
2022/02/02 08:40:42 [synchronization|info] Loading session sync_kUvNxRuXM33krgJxAkd6M1e8OKCewX8W9D85VWXZmtp
2022/02/02 08:40:42 [synchronization.sync_kUvNxRuXM33krgJxAkd6M1e8OKCewX8W9D85VWXZmtp|info] Session loaded
2022/02/02 08:40:42 [synchronization|info] Session manager initialized
2022/02/02 08:41:16 [synchronization.sync_S3d2G5WizvG7p7PZTiyadq6NRPsgNw4qMbRG5adjgD3|info] Connecting to alpha endpoint
2022/02/02 08:41:16 [synchronization.sync_S3d2G5WizvG7p7PZTiyadq6NRPsgNw4qMbRG5adjgD3|info] Connecting to beta endpoint
2022/02/02 08:41:20 [synchronization.sync_S3d2G5WizvG7p7PZTiyadq6NRPsgNw4qMbRG5adjgD3.beta.remote|info] [housekeeping|info] Performing initial housekeeping
2022/02/02 08:41:20 [synchronization.sync_S3d2G5WizvG7p7PZTiyadq6NRPsgNw4qMbRG5adjgD3|info] Starting synchronization loop
2022/02/02 08:41:20 [synchronization.sync_S3d2G5WizvG7p7PZTiyadq6NRPsgNw4qMbRG5adjgD3|info] Session initialized
received SIGINT, stopping process (will not forward signal)
2022/02/02 08:43:23 [synchronization|info] Shutting down
2022/02/02 08:43:23 [synchronization|info] Halting session sync_kUvNxRuXM33krgJxAkd6M1e8OKCewX8W9D85VWXZmtp
2022/02/02 08:43:23 [synchronization|info] Halting session sync_S3d2G5WizvG7p7PZTiyadq6NRPsgNw4qMbRG5adjgD3
Stopped at: 0x7fffe7cc3dc4
=>   1: no source available
(dlv) goroutines -t
  Goroutine 1 - User: c:/program files/go/src/runtime/proc.go:6382 sync.runtime_doSpin (0xc7f01a) (thread 15404)
         0  0x0000000000c84346 in runtime.procyield
             at c:/program files/go/src/runtime/asm_amd64.s:657
         1  0x0000000000c7f01a in sync.runtime_doSpin
             at c:/program files/go/src/runtime/proc.go:6382
         2  0x0000000000c9c225 in sync.(*Mutex).lockSlow
             at c:/program files/go/src/sync/mutex.go:101
         3  0x000000000109fabb in sync.(*Mutex).Lock
             at c:/program files/go/src/sync/mutex.go:81
         4  0x000000000109fabb in github.com/mutagen-io/mutagen/pkg/synchronization.(*controller).halt
             at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/controller.go:563
         5  0x00000000010a5876 in github.com/mutagen-io/mutagen/pkg/synchronization.(*Manager).Shutdown
             at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/manager.go:194
         6  0x00000000010daf86 in github.com/mutagen-io/mutagen/cmd/mutagen/daemon.runMain·dwrap·4
             at c:/users/darrylkuhn/source/mutagen/cmd/mutagen/daemon/run.go:60
         7  0x00000000010dac40 in github.com/mutagen-io/mutagen/cmd/mutagen/daemon.runMain
             at c:/users/darrylkuhn/source/mutagen/cmd/mutagen/daemon/run.go:113
         8  0x0000000000da7a8e in github.com/spf13/cobra.(*Command).execute
             at c:/users/darrylkuhn/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:856
         9  0x0000000000da817c in github.com/spf13/cobra.(*Command).ExecuteC
             at c:/users/darrylkuhn/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:974
        10  0x00000000010f8431 in github.com/spf13/cobra.(*Command).Execute
             at c:/users/darrylkuhn/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:902
        11  0x00000000010f8431 in main.main
             at c:/users/darrylkuhn/source/mutagen/cmd/mutagen/main.go:115
        (truncated)
  Goroutine 2 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [force gc (idle) 456404h17m47.1402569s]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c59471 in runtime.goparkunlock
            at c:/program files/go/src/runtime/proc.go:372
        2  0x0000000000c59471 in runtime.forcegchelper
            at c:/program files/go/src/runtime/proc.go:306
        3  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 3 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC sweep wait]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c43238 in runtime.goparkunlock
            at c:/program files/go/src/runtime/proc.go:372
        2  0x0000000000c43238 in runtime.bgsweep
            at c:/program files/go/src/runtime/mgcsweep.go:182
        3  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 4 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC scavenge wait]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c414a8 in runtime.goparkunlock
            at c:/program files/go/src/runtime/proc.go:372
        2  0x0000000000c414a8 in runtime.bgscavenge
            at c:/program files/go/src/runtime/mgcscavenge.go:314
        3  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 5 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [finalizer wait 456404h17m47.1422572s]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c38c73 in runtime.goparkunlock
            at c:/program files/go/src/runtime/proc.go:372
        2  0x0000000000c38c73 in runtime.runfinq
            at c:/program files/go/src/runtime/mfinal.go:177
        3  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 6 - User: c:/program files/go/src/runtime/sigqueue.go:169 os/signal.signal_recv (0xc7ff18) (thread 2740)
        0  0x00007fffe7cc33e4 in ???
            at ?:-1
        1  0x0000000000c82400 in runtime.systemstack_switch
            at c:/program files/go/src/runtime/asm_amd64.s:350
        2  0x0000000000c52f5d in runtime.stdcall
            at c:/program files/go/src/runtime/os_windows.go:1068
        3  0x0000000000c53105 in runtime.stdcall2
            at :0
        4  0x0000000000c5232f in runtime.semasleep
            at c:/program files/go/src/runtime/os_windows.go:818
        5  0x0000000000c2bb96 in runtime.notetsleep_internal
            at c:/program files/go/src/runtime/lock_sema.go:214
        6  0x0000000000c2becf in runtime.notetsleepg
            at c:/program files/go/src/runtime/lock_sema.go:296
        7  0x0000000000c7ff18 in os/signal.signal_recv
            at c:/program files/go/src/runtime/sigqueue.go:169
        8  0x0000000001044f79 in os/signal.loop
            at c:/program files/go/src/os/signal/signal_unix.go:24
        9  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 11 - User: c:/program files/go/src/runtime/syscall_windows.go:497 syscall.Syscall6 (0xc80dba) (thread 8244)
        0  0x00007fffe7cc3484 in ???
            at ?:-1
        1  0x0000000000c82400 in runtime.systemstack_switch
            at c:/program files/go/src/runtime/asm_amd64.s:350
        2  0x0000000000c2430c in runtime.cgocall
            at c:/program files/go/src/runtime/cgocall.go:166
        3  0x0000000000c80dba in syscall.Syscall6
            at c:/program files/go/src/runtime/syscall_windows.go:497
        4  0x0000000000c86f1a in syscall.Syscall6
            at :0
        5  0x0000000001030f74 in github.com/Microsoft/go-winio.getQueuedCompletionStatus
            at c:/users/darrylkuhn/go/pkg/mod/github.com/!microsoft/go-winio@v0.5.1/zsyscall_windows.go:357
        6  0x000000000102d1aa in github.com/Microsoft/go-winio.ioCompletionProcessor
            at c:/users/darrylkuhn/go/pkg/mod/github.com/!microsoft/go-winio@v0.5.1/file.go:167
        7  0x000000000102cd66 in github.com/Microsoft/go-winio.initIo·dwrap·1
            at c:/users/darrylkuhn/go/pkg/mod/github.com/!microsoft/go-winio@v0.5.1/file.go:73
        8  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 31 - User: c:/program files/go/src/runtime/syscall_windows.go:497 syscall.Syscall6 (0xc80dba) (thread 18068)
         0  0x00007fffe7cc3424 in ???
             at ?:-1
         1  0x0000000000c82400 in runtime.systemstack_switch
             at c:/program files/go/src/runtime/asm_amd64.s:350
         2  0x0000000000c2430c in runtime.cgocall
             at c:/program files/go/src/runtime/cgocall.go:166
         3  0x0000000000c80dba in syscall.Syscall6
             at c:/program files/go/src/runtime/syscall_windows.go:497
         4  0x0000000000c86f1a in syscall.Syscall6
             at :0
         5  0x0000000000cd201e in syscall.ReadFile
             at c:/program files/go/src/syscall/zsyscall_windows.go:1024
         6  0x0000000000ccac2e in syscall.Read
             at c:/program files/go/src/syscall/syscall_windows.go:380
         7  0x0000000000cf4a54 in internal/poll.(*FD).Read
             at c:/program files/go/src/internal/poll/fd_windows.go:427
         8  0x0000000000cfdfde in os.(*File).read
             at c:/program files/go/src/os/file_posix.go:32
         9  0x0000000000cfdfde in os.(*File).Read
             at c:/program files/go/src/os/file.go:119
        10  0x0000000000cc3972 in io.copyBuffer
             at c:/program files/go/src/io/io.go:423
        (truncated)
  Goroutine 50 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle) 456404h17m47.2254867s]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 51 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle)]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 52 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle) 456404h16m30.2663036s]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 66 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle)]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 67 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle)]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 82 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle)]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 98 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle)]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 114 - User: c:/program files/go/src/runtime/proc.go:367 runtime.gopark (0xc595d6) [GC worker (idle)]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c3b725 in runtime.gcBgMarkWorker
            at c:/program files/go/src/runtime/mgc.go:1200
        2  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 149 - User: c:/users/darrylkuhn/source/mutagen/pkg/process/connection.go:159 github.com/mutagen-io/mutagen/pkg/process.(*Stream).Close (0x103c6fa) [chan receive 456404h16m30.2980813s]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c269cc in runtime.chanrecv
            at c:/program files/go/src/runtime/chan.go:576
        2  0x0000000000c263f8 in runtime.chanrecv1
            at c:/program files/go/src/runtime/chan.go:439
        3  0x000000000103c6fa in github.com/mutagen-io/mutagen/pkg/process.(*Stream).Close
            at c:/users/darrylkuhn/source/mutagen/pkg/process/connection.go:159
        4  0x00000000010d47a2 in github.com/mutagen-io/mutagen/pkg/synchronization/endpoint/remote.(*endpointClient).Shutdown
            at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/endpoint/remote/client.go:452
        5  0x00000000010a0667 in github.com/mutagen-io/mutagen/pkg/synchronization.(*controller).run
            at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/controller.go:781
        6  0x000000000109dc3e in github.com/mutagen-io/mutagen/pkg/synchronization.newSession·dwrap·1
            at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/controller.go:232
        7  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 214 - User: c:/program files/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0xc7f785) [semacquire]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c686b3 in runtime.goparkunlock
            at c:/program files/go/src/runtime/proc.go:372
        2  0x0000000000c686b3 in runtime.semacquire1
            at c:/program files/go/src/runtime/sema.go:144
        3  0x0000000000c7f785 in sync.runtime_Semacquire
            at c:/program files/go/src/runtime/sema.go:56
        4  0x0000000000c9db71 in sync.(*WaitGroup).Wait
            at c:/program files/go/src/sync/waitgroup.go:130
        5  0x00000000010163cc in google.golang.org/grpc.(*Server).serveStreams
            at c:/users/darrylkuhn/go/pkg/mod/google.golang.org/grpc@v1.43.0/server.go:931
        6  0x0000000001015ca6 in google.golang.org/grpc.(*Server).handleRawConn.func1
            at c:/users/darrylkuhn/go/pkg/mod/google.golang.org/grpc@v1.43.0/server.go:847
        7  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 215 - User: c:/users/darrylkuhn/source/mutagen/pkg/synchronization/controller.go:578 github.com/mutagen-io/mutagen/pkg/synchronization.(*controller).halt (0x109fb45) [chan receive 456404h16m30.3312283s]
         0  0x0000000000c595d6 in runtime.gopark
             at c:/program files/go/src/runtime/proc.go:367
         1  0x0000000000c269cc in runtime.chanrecv
             at c:/program files/go/src/runtime/chan.go:576
         2  0x0000000000c263f8 in runtime.chanrecv1
             at c:/program files/go/src/runtime/chan.go:439
         3  0x000000000109fb45 in github.com/mutagen-io/mutagen/pkg/synchronization.(*controller).halt
             at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/controller.go:578
         4  0x00000000010a66bc in github.com/mutagen-io/mutagen/pkg/synchronization.(*Manager).Pause
             at c:/users/darrylkuhn/source/mutagen/pkg/synchronization/manager.go:375
         5  0x00000000010ae969 in github.com/mutagen-io/mutagen/pkg/service/synchronization.(*Server).Pause
             at c:/users/darrylkuhn/source/mutagen/pkg/service/synchronization/server.go:97
         6  0x00000000010b3f50 in github.com/mutagen-io/mutagen/pkg/service/synchronization._Synchronization_Pause_Handler
             at c:/users/darrylkuhn/source/mutagen/pkg/service/synchronization/synchronization_grpc.pb.go:231
         7  0x0000000001018c6f in google.golang.org/grpc.(*Server).processUnaryRPC
             at c:/users/darrylkuhn/go/pkg/mod/google.golang.org/grpc@v1.43.0/server.go:1282
         8  0x000000000101ca4a in google.golang.org/grpc.(*Server).handleStream
             at c:/users/darrylkuhn/go/pkg/mod/google.golang.org/grpc@v1.43.0/server.go:1616
         9  0x0000000001016798 in google.golang.org/grpc.(*Server).serveStreams.func1.2
             at c:/users/darrylkuhn/go/pkg/mod/google.golang.org/grpc@v1.43.0/server.go:921
        10  0x0000000000c84781 in runtime.goexit
             at c:/program files/go/src/runtime/asm_amd64.s:1581
  Goroutine 216 - User: c:/program files/go/src/os/exec/exec.go:515 os/exec.(*Cmd).Wait (0xe49ca8) [chan receive 456404h16m30.3589574s]
        0  0x0000000000c595d6 in runtime.gopark
            at c:/program files/go/src/runtime/proc.go:367
        1  0x0000000000c269cc in runtime.chanrecv
            at c:/program files/go/src/runtime/chan.go:576
        2  0x0000000000c263f8 in runtime.chanrecv1
            at c:/program files/go/src/runtime/chan.go:439
        3  0x0000000000e49ca8 in os/exec.(*Cmd).Wait
            at c:/program files/go/src/os/exec/exec.go:515
        4  0x000000000103c869 in github.com/mutagen-io/mutagen/pkg/process.(*Stream).Close.func1
            at c:/users/darrylkuhn/source/mutagen/pkg/process/connection.go:109
        5  0x0000000000c84781 in runtime.goexit
            at c:/program files/go/src/runtime/asm_amd64.s:1581
[20 goroutines]
(dlv)

Let me know if you need anything else. Cheers!

We were previously hesitant to close standard input because the close could block indefinitely if writes were pending, but with the poll-based pipe I/O introduced in Go 1.10 (see golang/go@187957d), this should no longer be an issue, and it's going to be necessary (for now) to work around the issues described in golang/go#23019, which are causing Docker transport shutdown hangs. Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>

xenoscopic · 2022-02-02T19:04:47Z

Excellent, thank you very much!

As expected, that pointed right to the issue. It would have been really difficult to figure out without the stack trace, so thanks again!

I'm fairly certain it's the problem addressed by this proposal and this proposal.

In Mutagen's case, the issue is that (with Docker Desktop) the docker command launches an internal com.docker.cli command to actually do its work. When Mutagen wants to terminate its agent processes running via docker exec, it issues SIGTERM on POSIX and uses TerminateProcess on Windows. The problem is that (for reasons I don't fully understand due to closed-source), the termination doesn't always propagate to com.docker.cli (and actually I'm not sure that it should via TerminateProcess). However, since com.docker.cli has inherited standard input/output/error from docker, those pipes stay open, and the Wait operation that Mutagen performs here actually ends up waiting on a Goroutine that's waiting for an EOF from com.docker.cli's standard error output that will never come.

So as a quick fix, I've added closure of the standard input pipe, which should trickle down to com.docker.cli and trigger the mutagen-agent process to exit on its own. Mutagen used to do this a really long time ago, but OS pipes used to be blocking in Go, so I abandoned that behavior. However, since Go 1.9, pipes are non-blocking, so this should no longer be an issue, and should be sufficient to signal termination to com.docker.cli. It's worth noting that this is only an issue with the Docker Desktop docker command.

I've got a build testing this change, but it would be really helpful if you could test whether or not it actually fixes the issue for you before I tag a release.

Would you be able to replace <mutagen>/pkg/process/connection.go (in your v0.13.0 checked out tree) with this copy and then run

go run scripts/build.go --mode=slim   # Again, ignore macOS warnings
build/mutagen daemon stop      # Ignore warnings if not running
build/mutagen daemon run

Then try to bring the Compose project up/down and see if it still hangs.

If it does, can you redo the Delve procedure (with this new connection.go code build) and provide the same output?

We were previously hesitant to close standard input pipes because the close operation could block indefinitely if writes were pending. However, with the poll-based pipe I/O introduced in Go 1.10 (see golang/go@187957d), this should no longer be an issue, and it's going to be necessary (for now) to work around the issues described in golang/go#23019, which are causing Docker transport shutdown hangs. Essentially, closing the standard input pipe should (in theory) signal to the com.docker.cli process that it's time to shut down. Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>

darrylkuhn · 2022-02-02T19:59:50Z

@xenoscopic I rebuilt and ran as described above using the new connection.go and am happy to report that it worked like a charm

xenoscopic · 2022-02-02T20:13:01Z

@darrylkuhn Awesome, thanks for all your help. I've got a bit more testing and validation I want to do for this change tomorrow, but I'll get it tagged into v0.13.1 and get that out the door ASAP. I'll close out this issue once that's shipped.

darrylkuhn · 2022-02-02T20:50:28Z

Fantastic - appreciate it!

We've been bitten again by golang/go#23019, so rather than a quick hack, it's time to properly fix agent termination signaling. While the original plan was to use killpg and Windows job objects to manage agent process hierarchies, it quickly became clear when trying to implement that behavior that the APIs needed to accomplish it simply weren't there on POSIX or Windows. Moreover, it became even less clear what the correct signaling and forceful termination mechanisms should be, especially since Mutagen has no insight into transport process hierarchies. Thus, Mutagen now takes a staged approach to transport process termination, first waiting, then closing standard input, then (on POSIX) sending SIGTERM, and finally forcing termination. It relies on transport processes to correctly forward these mechanisms to the Mutagen agent and to manage their own internal process hierarchies accordingly once the Mutagen agent terminates. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>

xenoscopic · 2022-02-03T17:22:26Z

Just an additional update: while the quick hack will work, I've decided to address this more comprehensively in mutagen-io/mutagen@b910ae7. I'll need to do a bit more testing with this tomorrow since it's such a core change, but should be able to ship it by the end of the weekend.

We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>

We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in 6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates #223 Updates mutagen-io/mutagen-compose#11 Backport of 8556e07 Signed-off-by: Jacob Howard <jacob@mutagen.io>

xenoscopic · 2022-02-04T15:12:39Z

This should now be fixed in Mutagen/Mutagen Compose v0.13.1 (note that you'll have to update both due to their version matching requirement at the moment).

Please let me know if you run into trouble with this newer fix. It's not exactly the same as the code you tried, but it's the same idea.

Thanks again for your help.

darrylkuhn · 2022-02-04T18:40:57Z

Downloaded and installed 0.13.1 and can confirm issue remains resolved thanks @xenoscopic!

We've already seen several manifestations of golang/go#23019, so this commit refactors the way that agent input, output, and error streams are managed, as well as the way that agent process termination is signaled. Historically (in mutagen-io/mutagen@6c1a47c), we avoided closing standard input/output pipes because they were blocking, meaning that a close wouldn't preempt a read/write and that the close itself could potentially block. However, this hasn't been the case since Go 1.9, when os.File was switched to use polling I/O (at least for pipes and other pollable files). As such, we can now use closure of standard input as a signal to agent processes (via their intermediate transport processes) that they should terminate. Failing that, we still fall back to signal-based termination, but this standard input closure mechanism is particularly important on Windows, where no "soft" signaling mechanism (like SIGTERM) exists and transport process termination via TerminateProcess often triggers golang/go#23019. This is especially problematic with Docker Desktop, where an intermediate com.docker.cli process is used underneath the primary Docker CLI, and standard input closure is the only reliable termination signaling mechanism. Just to clarify, there are mechanisms like WM_CLOSE and CTRL_CLOSE_EVENT on Windows, which some runtimes (such as Go's) will convert to a SIGTERM, but there's no telling how intermediate transport processes will interpret these messages. They don't necessarily have the same semantics as SIGTERM. And, just in case none of our signaling mechanisms works, we now avoid using os/exec.Cmd's forwarding Goroutines entirely, meaning that its Wait method will close all pipes as soon as the child process is terminated. As part of this refactor, I've also looked at switching to a more systematic approach to manage the process hierarchies that could potentially be generated by transport processes. Things like killpg on POSIX or job objects on Windows could theoretically facilitate such management. However, the fact of the matter is that there's simply no way to reliably enforce termination of such hierarchies and, more importantly, no way for Mutagen to know what termination signaling mechanism would be appropriate for any intermediate processes. Essentially, we just have to rely on transport processes to either correctly forwarded standard input closure (especially on Windows) and/or correctly forward SIGTERM on POSIX. But, if they don't, we will forcibly terminate them and any associated resources in the Mutagen daemon. If their subprocesses linger on afterward, that's a bug in the transport process, not Mutagen. This commit also takes the opportunity to impose a size limit on the in-memory error buffer used to capture transport errors after a handshake failure. Updates mutagen-io#223 Updates mutagen-io/mutagen-compose#11 Signed-off-by: Jacob Howard <jacob@mutagen.io>

xenoscopic mentioned this issue Feb 4, 2022

agent: refactor termination and stream management mutagen-io/mutagen#316

Merged

xenoscopic closed this as completed Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutagen-compose down fails #11

mutagen-compose down fails #11

darrylkuhn commented Jan 19, 2022

darrylkuhn commented Feb 1, 2022

xenoscopic commented Feb 1, 2022

darrylkuhn commented Feb 1, 2022

xenoscopic commented Feb 2, 2022

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 2, 2022

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 2, 2022 •

edited

Loading

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 2, 2022

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 3, 2022

xenoscopic commented Feb 4, 2022

darrylkuhn commented Feb 4, 2022

mutagen-compose down fails #11

mutagen-compose down fails #11

Comments

darrylkuhn commented Jan 19, 2022

darrylkuhn commented Feb 1, 2022

xenoscopic commented Feb 1, 2022

darrylkuhn commented Feb 1, 2022

xenoscopic commented Feb 2, 2022

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 2, 2022

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 2, 2022 • edited Loading

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 2, 2022

darrylkuhn commented Feb 2, 2022

xenoscopic commented Feb 3, 2022

xenoscopic commented Feb 4, 2022

darrylkuhn commented Feb 4, 2022

xenoscopic commented Feb 2, 2022 •

edited

Loading