Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

spawn the agent in a process group / job #1174

Closed

Conversation

bmc-msft
Copy link
Contributor

When we terminate the agent, any child processes it's created do not get
terminated but should.

By launching in a process group (or job) using the command-group crate,
we can terminate the child and all of it's children automatically.

When we terminate the agent, any child processes it's created do not get
terminated but should.

By launching in a process group (or job) using the command-group crate,
we can terminate the child and all of it's children automatically.
@bmc-msft
Copy link
Contributor Author

Note, on Windows, command-group uses a job object created with winapi::um::jobapi2::CreateJobObjectW with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE set. On kill, the job object is terminated, which should kill all of the children (and grand children).

@bmc-msft bmc-msft marked this pull request as draft August 23, 2021 22:22
@bmc-msft
Copy link
Contributor Author

All of the windows tasks are failing during integration testing. Investigating.

@bmc-msft
Copy link
Contributor Author

This PR pins to the commit hash associated with a PR that addresses handling timeouts to GetQueuedCompletionStatus.

watchexec/command-group#3

Once this PR, or something like it, has been merged & released, we should move to the released version of command-group.

@bmc-msft bmc-msft linked an issue Sep 7, 2021 that may be closed by this pull request
Copy link
Member

@ranweiler ranweiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall approach seems reasonable, at least as a short-term improvement.

I guess I'd then ask:

  1. Should we being using this for every child process, or at least many others?
  2. I wouldn't think that the temp dir bug in the onefuzz-agent should be impacted by this at all. In particular, if the supervisor is "done", or knows that it is stopping, it should never be possible for a the onefuzz-agent to cause a task failure, period. Furthermore, if a supervisor is running task A, and we tell it to stop because task A should be stopped, this should already be reflected in the server, and I'd expect any subsequent "task A failed" messages should just be dropped as spurious. If that's not happening, there's a deeper problem we should also be fixing.

An aside, in terms of merging: we should split the process group changes out from the unrelated addition of all the extra context() calls.

@stishkin
Copy link
Contributor

Closing.

Will revisit some time later

@stishkin stishkin closed this Feb 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terminate fuzzers before node teardown
5 participants