Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horizontal Pod Autoscaler - Busy Runner Scheme #223

Merged
merged 5 commits into from
Dec 12, 2020

Conversation

ZacharyBenamram
Copy link
Contributor

@ZacharyBenamram ZacharyBenamram commented Dec 4, 2020

Providing an alternative to the current hpa scheme. The TotalNumberOfQueuedAndInProgressWorkflowRuns scheme requires that the hpa be aware of all the repositories that are currently managed by actions - this doesnt scale well in larger organizations where multiple teams are enrolling into actions - they must be aware that the hpa spec be updated.

The PercentageRunnersBusy scheme is a simple autoscaling scheme that retrieves all action runners from github, finds the subset of runners that exist in the hpa namespace by comparing it against the runner resources, and then evaluates the scale up / scale down condition.

This scheme also addresses the issue where an organization has various environments + multiple hpas - where each hpa should be responsible for scaling only the runners the shared namespace.

Screen Shot 2020-12-04 at 3 46 23 PM

@ZacharyBenamram
Copy link
Contributor Author

#213 to cross link another active hpa PR

@ahmad-hamade
Copy link
Contributor

Hello @ZacharyBenamram, I have a small question and I appreciate it if you can help me to understand more about this.

Assume I have 3 runners (running in separate EKS clusters) that are added to a specific Github repository, How does the controller know which runner deployment to scale up/down?

I think it supposed to be based on the runner labels maybe? but I guess it will always scale all the 3 runners (due to a workflow that is in the queue pending).

@ZacharyBenamram
Copy link
Contributor Author

ZacharyBenamram commented Dec 8, 2020

@ahmad-hamade this autoscaling scheme will scale the runners that your runner-deployment creates on your behalf, or the runners you've created Runner resources for. If you are running 3 runners all in separate EKS clusters - i am also assuming you are also running 3 separate controllers. If that is the case, each horizontal runner autoscaler that you have defined in each cluster will be responsible for scaling that clusters runners - which is a benefit over the previous implementation, where a single horizontal runner autoscaler is scaling across all runners, since it cannot distinguish pods running in separate clusters.

@ahmad-hamade
Copy link
Contributor

Thank you @ZacharyBenamram 👍

var desiredReplicas int
fractionBusy := float64(numRunnersBusy) / float64(numRunners)
if fractionBusy >= scaleUpThreshold {
scaleUpReplicas := int(float64(numRunners)*scaleUpFactor + 0.5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scaleUpReplicas := int(float64(numRunners)*scaleUpFactor + 0.5)
scaleUpReplicas := int(float64(numRunners)*scaleUpFactor)

Can we remove the magic number 0.5 out of this calculation? I think the defaultScaleUpFactor of 1.3 is enough, and you can optionally add that to the default value so that it becomes defaultScaleUpFactor = 1.8.

Copy link
Collaborator

@mumoshu mumoshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZacharyBenamram Awesome work! Thank you so much for your contribution ☺️

I had a comment regarding the defaultScaleUpFactor so it would be great if you could address that in comment or another pull request.

@mumoshu mumoshu merged commit 466b307 into actions:master Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants