Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix signal handling for subprocess #479

Merged
merged 6 commits into from May 14, 2022

Conversation

marco-m-pix4d
Copy link
Contributor

hello @andreynering, the fix is quite simple :-D, what is complicated and worthwhile to spend some time to understand is the first commit a496a75, that introduces extensive tests to reproduce the behavior.

I realized that to fix this problem, it is not possible to tweak the timeout of mvdan.cc/sh, we must be more drastic and don't pass to it any cancellable context. The problem that I observed is the following (reproduced in the tests):

  • As explained in In case of interrupt signal, Task does not give time to a subprocess to cleanup #458, when hitting CTRL-C, the terminal driver sends a SIGINT to the foreground process group, which is composed of Task and of all the subprocesses that Task has spawned via mvdan.cc/sh.
  • Task receives the signal and forwards it to mvdan.cc/sh, which immediately sends it to the subprocess.
  • The result is that the subprocess receives always 2 signals: one from the process group, and one from Task.
  • Depending on how the program of the subprocess deals with interrupts, this can be unnoticeable or cause immediate termination, which is the case of Terraform.
  • Terraform on reception of the first signals enters a cleanup phase, but if it receives a second signal, it will terminate immediately the cleanup, leaving a broken state.
  • Due to the signal amplification explained above, Terraform always receives two signals when driven from Task...

I suggest to checkout a496a75, run

task install && go test -run '^TestSignalSentToProcessGroup' -count=1 -v=1

and get acquainted with the (expected) failures.

Then, checkout the tip of the branch and run the same tests, to validate that everything is fixed.

There are some things that are still missing: testing what happens on Windows and deciding how far propagate in the Task functions the fact that the context should not even passed. I mean: func RunCommand(ctx context.Context, opts *RunCommandOptions) is now lying, since the context will not be passed to anything.

As a reference, here are the failing tests on Linux:

task install && go test -run '^TestSignalSentToProcessGroup' -count=1 -v=1
task: [install] go install -v -ldflags="-w -s -X main.version=a496a75b05" ./cmd/task
github.com/go-task/task/v3/internal/execext
=== RUN   TestSignalSentToProcessGroup
=== RUN   TestSignalSentToProcessGroup/child_simulates_terraform:_receives_1_sigint_and_does_cleanup
    unix_test.go:174: 
        wanted but not found:
        [sleepit: cleanup done
         task: Failed to run task "default": exit status 3
        ]
    unix_test.go:179: 
        unwanted but found:
        [sleepit: got signal=interrupt count=2
         sleepit: cleanup canceled
         task: Failed to run task "default": exit status 4
        ]
    unix_test.go:183: 
        output:
        [task: [default] /home/mmolteni/opensource/go-task/tmp/sleepit handle -term-after=2 -sleep=10s -cleanup=50ms
         sleepit: ready
         sleepit: PID=3713003 sleep=10s cleanup=50ms
         sleepit: work started
         task: signal received: interrupt
         sleepit: got signal=interrupt count=1
         sleepit: got signal=interrupt count=2
         sleepit: work canceled
         sleepit: cleanup started
         sleepit: cleanup canceled
         task: Failed to run task "default": exit status 4
         ]
=== RUN   TestSignalSentToProcessGroup/child_does_not_handle_sigint:_receives_sigint_and_terminates_immediately
    unix_test.go:174: 
        wanted but not found:
        [task: Failed to run task "default": exit status 130
        ]
    unix_test.go:179: 
        unwanted but found:
        [task: Failed to run task "default": context canceled
        ]
    unix_test.go:183: 
        output:
        [task: [default] /home/mmolteni/opensource/go-task/tmp/sleepit default -sleep=10s
         sleepit: ready
         sleepit: PID=3713018 sleep=10s cleanup=0s
         sleepit: work started
         task: signal received: interrupt
         task: Failed to run task "default": context canceled
         ]
=== RUN   TestSignalSentToProcessGroup/child_intercepts_sigint:_receives_sigint_and_does_cleanup
    unix_test.go:174: 
        wanted but not found:
        [sleepit: cleanup done
         task: Failed to run task "default": exit status 3
        ]
    unix_test.go:179: 
        unwanted but found:
        [sleepit: got signal=interrupt count=2
         task: Failed to run task "default": context canceled
        ]
    unix_test.go:183: 
        output:
        [task: [default] /home/mmolteni/opensource/go-task/tmp/sleepit handle -sleep=10s -cleanup=2s
         sleepit: ready
         sleepit: PID=3713030 sleep=10s cleanup=2s
         sleepit: work started
         sleepit: got signal=interrupt count=1
         task: signal received: interrupt
         sleepit: work canceled
         sleepit: got signal=interrupt count=2
         sleepit: cleanup started
         task: Failed to run task "default": context canceled
         ]
--- FAIL: TestSignalSentToProcessGroup (2.13s)
    --- FAIL: TestSignalSentToProcessGroup/child_simulates_terraform:_receives_1_sigint_and_does_cleanup (0.11s)
    --- FAIL: TestSignalSentToProcessGroup/child_does_not_handle_sigint:_receives_sigint_and_terminates_immediately (0.01s)
    --- FAIL: TestSignalSentToProcessGroup/child_intercepts_sigint:_receives_sigint_and_does_cleanup (2.01s)
FAIL
exit status 1
FAIL    github.com/go-task/task/v3      2.136s

See go-task/task/go-task#458

Helper (sleepit) and test code based on https://github.com/marco-m/timeit
…c/sh

We used to pass to mvdan.cc/sh/interp.Runner a context that was cancelled on
reception of a OS signal. This caused the Runner to terminate the subprocess
abruptly.

The correct behavior instead is for us to completely ignore the signal and let
the subprocess deal with it. If the subprocess doesn't handle the signal, it
will be terminated. If the subprocess does handle the signal, it knows better
than us wether it wants to cleanup and terminate or do something different.

So now we pass an empty context just to make the API of interp.Runner happy

Fixes go-task/task/go-task#458
@marco-m-pix4d
Copy link
Contributor Author

Ah, the failing tests on CI are due to them not finding the task executable? I will look at them later, the instructions in the PR description do work locally.

@marco-m-pix4d
Copy link
Contributor Author

Hello @andreynering, any opinions ? :-)

@marco-m-pix4d
Copy link
Contributor Author

Updated this PR with current master.

On Github actions, the destination path of "go install" ($GOPATH/bin) is not
in $PATH, thus the error.
For the life of me I could not understand how to change the $PATH environment
variable in an Actions workflow, so I encode the full path of the just-built
task executable in the tests, which probably was the right thing to do since
the beginning.
@neerfri
Copy link

neerfri commented Mar 28, 2022

@andreynering is there a reason why this PR is not being merged? Thanks

@marco-m-pix4d
Copy link
Contributor Author

I cannot speak for Andrey, but this PR works fine for Unix systems, while I don't know what to do for Windows. Thus merging this PR might make the situation on Windows worse. It is a tricky situation and requires somebody to take the time to test /think on Windows...

@ghostsquad
Copy link
Collaborator

I can confirm, this is also what mvdan does in gosh.

https://github.com/mvdan/sh/blob/master/cmd/gosh/main.go#L63-L64

I had to double check my understanding of subprocess signal handling and confirm that indeed, mvdan is spawning subprocesses, not threads.

I think this is a valuable bugfix.

@aliculPix4D
Copy link

Updated this PR with current master + additional commit removing redundant newline linter error

@mvgijssel
Copy link

Running into a similar broken state case using Pulumi and go-task when CTRL-C during an up command. Would love to see this merged ❤️

Copy link
Collaborator

@ghostsquad ghostsquad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is well written, with good tests, lots of comments. Given all tests have passed, and the amount of information provided in this PR, associated issue, in the comments and the tests, I think that even if we run into an issue later, it will be easy to follow along and debug.

@ghostsquad ghostsquad merged commit 367c0b3 into go-task:master May 14, 2022
@ghostsquad
Copy link
Collaborator

Master build failed immediately after merging. It's possible this introduced flaky tests.

@andreynering
Copy link
Member

Sorry for not looking into it sooner guys. Since this is a complex problem I wanted to do some tests to try to understand the problem and fix, but I suppose we can trust the intensive tests @marco-m-pix4d did on this 😉.

Saw that @ghostsquad opened #728. I ask the people interested to test the fix in master and comment about any issues there on #728.

@ghostsquad Small ask: add an entry to CHANGELOG.md whenever a relevant enough feature or fix is added to master (title should be "Unreleased" for now, like this), linking related issue and PRs. I always copy those to the release when the time comes.

@ghostsquad
Copy link
Collaborator

I figured that changelog should be updated only during a release.

@andreynering
Copy link
Member

@ghostsquad Adding there sooner is useful so we don't forgot to add on the release. Also, people interested in the development can see the unreleased changes. Other projects do that, like Ruby on Rails.

@ghostsquad
Copy link
Collaborator

ghostsquad commented May 15, 2022

Automated generation of the CHANGELOG.md and Release Notes attached to the Github Release via something like https://github.com/git-chglog/git-chglog can handle the "so we don't forget to add".

Here's a small (incomplete) snippet of how this can be made possible.

Taskfile.yml

  publish:
     cmds:
     - task: github:env:export
       vars:
         KEY: GIT_TAG
         VALUE: '{{.GIT_TAG}}'
     - <something here>

  github:env:export:
    preconditions:
    - '[ "${CI}" == "true" ]'
    vars:
      KEY: '{{ .KEY | default "TEST_KEY" }}'
      VALUE: '{{ .VALUE | default "TEST_VALUE" }}'
    cmds:
    - echo '{{.KEY}}={{.VALUE}}' >> "${GITHUB_ENV}"
    - echo 'This is accessible using {{ printf `${{ env.%s }}` .KEY }}' 1>&2
  changelog:
    cmds:
    - cmd: git-chglog {{.ARGS | default .CLI_ARGS}}

workflow.yml

      - name: Generate Changelog
        run: task changelog -- ${{ env.GIT_TAG }} > ./tmp/CHANGELOG.md
      # TODO: inline this into task...
      - name: Create a GitHub release
        uses: softprops/action-gh-release@v1
        with:
          token: ${{ secrets.APP_GITHUB_TOKEN }}
          body_path: ./tmp/CHANGELOG.md
          tag_name: ${{ env.GIT_TAG }}

@ghostsquad
Copy link
Collaborator

Oh, though we are already using goreleaser, so this should be taken care of.

@andreynering
Copy link
Member

andreynering commented May 15, 2022

Let's move this kind of discussion to the Discord server form now on to void polluting these issues, but...

The fact is that I want and like to handwrite the changelog entries / release notes because those are meant to be the more concise and legible as possible by users, to understand what changed recently. I really value when other projects that I use do the same as well.

Automation uses commit messages, which are often polluted, unreadable, lacks links to documentation when useful, etc.

andreynering added a commit that referenced this pull request May 22, 2022
andreynering added a commit that referenced this pull request Jun 12, 2022
Task will now give time for the processes running to do cleanup work

Ref #458
Ref #479
Fixes #728

Co-authored-by: Marco Molteni <marco.molteni@pix4d.com>
Co-authored-by: aliculPix4D <aleksandar.licul_ext@pix4d.com>
andreynering added a commit that referenced this pull request Jun 12, 2022
Task will now give time for the processes running to do cleanup work

Ref #458
Ref #479
Fixes #728

Co-authored-by: Marco Molteni <marco.molteni@pix4d.com>
Co-authored-by: aliculPix4D <aleksandar.licul_ext@pix4d.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants