Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide ability to configure "max-idle-timeout" for an elastic agent pod #54

Closed
adityasood opened this issue May 24, 2018 · 4 comments
Closed

Comments

@adityasood
Copy link
Contributor

There should be a configuration option while configuring an elastic agent profile to control "max-idle-timeout" for an agent pod. Currently, this defaults to 10 mins. This would help in

  1. Not have multiple agents show up when running multiple jobs and clutter the UI.

  2. Idle agents tend to confuse users, in case they are around to pick new jobs. While the current behavior is one agent per job.

@varshavaradarajan
Copy link
Contributor

This defaults to the time specified under Agent auto-register timeout. Either we provide the max idle timeout property or since EA v3, since agents aren't reused, we always terminate the agent after say, 1 minute. That is, it need not be a property for users to configure as the agents aren't of any use after the job.

@matthewrj
Copy link

This is causing us issues as we are having to over-provision nodes on our kubernetes cluster so that new jobs can be scheduled while old agents are sitting around waiting to time out taking up valuable CPU and memory allocations.

We use an auto scaling kubernetes cluster and our agent pod definition contains requests for 1 CPU and 4g memory. For kubernetes to provide the resources requested for an agent pod, it is autoscaling up more nodes as the existing CPU and memory is allocated to agents waiting in the 10 min timeout period.

@sheroy
Copy link
Contributor

sheroy commented Aug 7, 2018

Hi @matthewrj,

We are looking into adding a notification at the end of a job run that will terminate the Kubernetes pod. While we scope that change out, you could look at adjusting the Agent auto-register timeout to a value that works better for your setup. I would look at the average run time of jobs and reduce this timeout to be close to that value.

If your average run time of jobs is around 3-4 minutes, I would set this value to something like 5 minutes. That way the plugin doesn't keep agents around for too much longer after jobs finish.

The fix for this would be to introduce a notification at the end of a job run, based on which the plugin can terminate the Kubernetes pod.

@chadlwilson
Copy link
Member

Closing this as it seems to implementation changed subsequent to this in EAv3, and elastic agent pods are single shot. There is a proposed implementation to re-enable agent re-use with idle semantics at #355 upon which this will be relevant to consider.

@chadlwilson chadlwilson closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants