Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup created docker containers #198

Merged
merged 1 commit into from
Nov 3, 2022

Conversation

mikz
Copy link
Contributor

@mikz mikz commented Nov 2, 2022

Cancelled or timeouted workflow would keep the docker container running. Closes #197

This has two parts:

Part one. The entrypoints.

runs.post: GitHub Action metadata allow running something after the action (regardless of a failure, crash, timeout, ...). https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runspost

However, it needs to be a .js file and it can't be configured to pass any arguments.

There already was index.js used as the main entrypoint. The build process of this file uses typescript compiler and ncc to pack all dependencies into one .js file. And ncc has no way of generating multiple files in one go, so the only solution would be to run ncc twice and generate two independent files.

That would be quite unfortunate, wasting time and storage. So I rather came up with a new entrypoint that symlinked from two locations. And this new entrypoint understands how it was executed, so it can run the correct behaviour. This makes it easy to add runs.pre if needed.

This new entrypoint is in index.ts. The original index.ts is now in main.ts.

Part two.
The signals. I've tried:

  • try/catch/finally around the await Docker.run. Catch and finally are not executed when process receives SIGINT. See the discussion in: Better Signal Handling nodejs/node#29480
  • New AbortController and AbortSignal. Great concept, but the action.exec does not support it. So it can't be aborted.
  • Doing cleanup on process.on('exit'). Unfortunately you can't really do async stuff from there, so can't really run the docker rm command to delete the container.
  • Using process.on('SIGINT'). For some reason that wasn't really executing for me. I'd not put my hand in fire for this, but I assume because it was in the signal handler it does something special, or would heed to be scheduled for later with setTimeout(0).

Evaluating all these I came to a conclusion that it is fragile and just relying on a runs.post is much better and safer.

I've not tested this on Windows, so that probably needs to happen. I've tried running locally with act and it worked. Worked also on my hosted linux runner.

@github-actions
Copy link

github-actions bot commented Nov 2, 2022

Cat Gif

@webbertakken
Copy link
Member

Great explanation! Thanks for the elaborate details, it will be very helpful for merging this.

Evaluating all these I came to a conclusion that it is fragile and just relying on a runs.post is much better and safer.

Ok, based on your research I would agree. Let's go for your approach.

Using process.on('SIGINT'). For some reason that wasn't really executing for me. I'd not put my hand in fire for this, but I assume because it was in the signal handler it does something special, or would heed to be scheduled for later with setTimeout(0).

As I understand it, that's probably because an event listener doesn't stop the event from bubbling and you only have a few milliseconds to do any cleanup.

Copy link
Member

@webbertakken webbertakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid to me. Awesome work!

src/index.ts Outdated Show resolved Hide resolved
src/model/docker.ts Outdated Show resolved Hide resolved
src/post.ts Outdated Show resolved Hide resolved
dist/main.js Show resolved Hide resolved
dist/post.js Show resolved Hide resolved
src/post.ts Outdated

export async function run() {
try {
const runnerTemporaryPath = process.env.RUNNER_TEMP;
Copy link
Member

@GabLeRoux GabLeRoux Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked myself what's RUNNER_TEMP, it's documented here:
https://docs.github.com/en/actions/learn-github-actions/environment-variables

RUNNER_TEMP The path to a temporary directory on the runner. This directory is emptied at the beginning and end of each job. Note that files will not be removed if the runner's user account does not have permission to delete them. For example, D:\a_temp

Just a note for other readers, nothing to do here 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken this from the already existing code that relies on it too.

Ideally this would be extracted into own function, so it is evaluated the same on all the branches.
But I didn't want to rewrite the whole docker.ts, since I didn't find any unit tests and wanted to minimize the risk of breaking something.

Copy link
Member

@webbertakken webbertakken Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use https://github.com/game-ci/unity-test-runner/blob/main/src/model/action.ts to get tempPath (does not yet exist, but should have)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've actually implemented something that ended up in Input, but it clearly belongs to the Action. Going to move it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to refactor the command generation of docker which should be changed to use this new tempPath.

It also deserves to not be text, but rather just an array, so it can be passed to the action/exec as args and it would correctly do all the escaping of arguments.

Also it would be easy to reuse some common base for the linux and win commands (cidfile, env, ...).

@mikz mikz force-pushed the cleanup-container branch 2 times, most recently from 5b8e100 to b13aac0 Compare November 3, 2022 09:21
@mikz
Copy link
Contributor Author

mikz commented Nov 3, 2022

I've updated it to clear up things brought in the review.

I've also added some type annotations to the new code, so it explicitly handles default values.
IMO that is important in case this ever gets executed outside of Github Action Runner (like unit tests, or trying it locally).

src/model/input.ts Outdated Show resolved Hide resolved
Cancelled or timeouted workflow would keep the docker container running.
Closes game-ci#197

This has two parts:

Part one. The entrypoints.

`runs.post`: GitHub Action metadata allow running something after the
action (regardless of a failure, crash, timeout, ...).
https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runspost

However, it needs to be a `.js` file and it can't be configured to pass
any arguments.

There already was `index.js` used as the main entrypoint.
The build process of this file uses typescript compiler and ncc to pack
all dependencies into one .js file. And ncc has no way of generating
multiple files in one go, so the only solution would be to run ncc twice
and generate two independent files.

That would be quite unfortunate, wasting time and storage. So I rather
came up with a new entrypoint that symlinked from two locations.
And this new entrypoint understands how it was executed, so it can run
the correct behaviour. This makes it easy to add `runs.pre` if needed.

This new entrypoint is in `index.ts`. The original `index.ts` is now in
`main.ts`.

Part two.
The signals. I've tried:
* try/catch/finally around the `await Docker.run`. Catch and finally are
  not executed when process receives SIGINT. See the discussion in: nodejs/node#29480
* New AbortController and AbortSignal. Great concept, but the
  action.exec does not support it. So it can't be aborted.
* Doing cleanup on `process.on('exit')`. Unfortunately you can't really
  do async stuff from there, so can't really run the docker rm command
to delete the container.
* Using `process.on('SIGINT')`. For some reason that wasn't really
  executing for me. I'd not put my hand in fire for this, but I assume
because it was in the signal handler it does something special, or would
heed to be scheduled for later with `setTimeout(0)`.

Evaluating all these I came to a conclusion that it is fragile and just
relying on a `runs.post` is much better and safer.
`
@webbertakken
Copy link
Member

LGTM. This can be merged once it's tested.

@mikz
Copy link
Contributor Author

mikz commented Nov 3, 2022

I'm running this on my hosted runner with concurrency limited cancellable workflows and it works well.
Don't have Windows pipeline to try it on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does not clean up created docker containers after timeout/cancellation
3 participants