Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral Runners? #182

Closed
HenryNguyen5 opened this issue Sep 1, 2020 · 15 comments
Closed

Ephemeral Runners? #182

HenryNguyen5 opened this issue Sep 1, 2020 · 15 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@HenryNguyen5
Copy link
Contributor

Problem to solve

As a developer interacting with a public repository, I want to be able to have ephemeral instances so that I can safely use self-hosted runners in a public repo.

Intended users

Any public repository user where github actions are used, and the default github hosted runners do not provide sufficient resources.

Proposal

  • Have a warm pool of idling runners waiting for a job from github (polling sqs queue or something)
  • When a idling runner gets a job, execute that job, and delete the runner when finished (the lifetime of the runner is the same as the GHA job it is executing)

What does success look like, and how can we measure that?

  • Jobs are quickly executed since runners are pre-provisioned
  • Security concerns over persistence of data across jobs are addressed since the lifetime of runners are tied to a single github job.
@npalm
Copy link
Member

npalm commented Sep 2, 2020

Thanks for creating the suggestion, and I really support this idea. Any idea for iimplementation. There are potential some caveats:

  • Not every event that we recieve from GitHub is matching 1-1 with a job. So that part needs then more adjustment.
  • Not easy, but possible to detect that a job is finish, requires maybe a small agent on the host.
  • Having hot instances only requires to configure the runner at the moment a job should start, a command via a SSM document seems a logical choice.

@npalm npalm added enhancement New feature or request help wanted Extra attention is needed labels Sep 2, 2020
@directionless
Copy link

For similar reasons to the OP, this is a feature I also like. A related project, envoyproxy/ci-infra does this by having the runners in an ASG, and detaching on job start.

@mrmeyers99
Copy link

mrmeyers99 commented Jun 16, 2021

Upvoting this as well. I don't want a bad job to leave behind stuff that could impact a future job.

@npalm
Copy link
Member

npalm commented Jun 16, 2021

Would be great if we can support this strategy as well. Really like the idee. Hope I have the next months some time to do some experiments!

@mrmeyers99
Copy link

It looks like there is a --once flag you can pass to the runner on startup but it's not officially supported yet and apparently has some issues. actions/runner#510. Maybe the best solution is to wait for this to be officially supported?

@npalm
Copy link
Member

npalm commented Jul 14, 2021

GitHub is also started with testing an event for workflow jobs to handle a better sclaing

@uilton-oliveira
Copy link

It looks like that the support for ephemeral was finally merged on actions/runner#660
Would be nice to have support here aswell for a better scale-up/down...

@aidan-mundy
Copy link

It isn't clear whether or not the new --ephemeral flag is supported for GHES. Regardless, I see this as an essential tool, and would love to see it included here.

@gertjanmaas
Copy link
Collaborator

is there anyone that can test the ephemeral flag with GHES?

@aidan-mundy
Copy link

It has been confirmed in actions/runner#660 that --ephemeral is not yet supported on GHES

@jensenbox
Copy link
Contributor

What changes are required to implement this? I understand that it is not available for GHES but in other environments this would be amazing. https://docs.github.com/en/actions/hosting-your-own-runners/autoscaling-with-self-hosted-runners

@gertjanmaas
Copy link
Collaborator

From the top of my head:

  1. Add the --ephemeral flag to the code that installs the runner on the EC2 instance
  2. Figure out a way to terminate the EC2 instance after the runner is done
  3. Implement new scaling logic for ephemeral runners, because the current logic will probably not work (e.g. the idle configuration)
  4. TEST! 😄

Of course all of this needs to be configurable and backwards compatible for GHES.

@mrmeyers99
Copy link

It would be nice if the idle configuration meant how many runners should be registered and waiting for jobs. Every time one getsa job, the scale up lambda would register another.

@uilton-oliveira
Copy link

actions/runner#660 (comment)

--ephemeral will ship in the next version of GHES (but as I said above it requires server changes to fix the race condition with client side only assignment)

@ScottGuymer
Copy link
Member

Implementing this is in progress. Closing this in favour of #1372. Will track the implementation of this there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants