Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirrord hangs when agent is not created due to OutOfCpu #579

Closed
shaikustin opened this issue Oct 16, 2022 · 2 comments
Closed

mirrord hangs when agent is not created due to OutOfCpu #579

shaikustin opened this issue Oct 16, 2022 · 2 comments
Assignees
Labels
bug Something isn't working user

Comments

@shaikustin
Copy link

Bug Description

when the agent pod is not created due to OutOfCPU error, mirrord keeps on trying to create the agent and hangs on waiting for pods to be ready.
error on cluster is

 Warning  OutOfcpu  6s    kubelet  Node didn't have enough resource: cpu, requested: 100, used: 1858, capacity: 1930                                                                                                                      

Steps to Reproduce

highly active node when running mirrord

Backtrace

No response

Relevant Logs

No response

Your operating system and version

non relevant

Local process

non relevant

Local process version

No response

Additional Info

No response

@shaikustin shaikustin added the bug Something isn't working label Oct 16, 2022
@eyalb181 eyalb181 added the user label Oct 16, 2022
@aviramha aviramha changed the title mirrord hands when agent is not created due to OutOfCpu mirrord hangs when agent is not created due to OutOfCpu Oct 17, 2022
@aviramha
Copy link
Member

aviramha commented Oct 17, 2022

This is interesting - we don't have any resources request as we're not a SLA-based service, so we just use "extra" resources (atleast what we want).
I assume your cluster defines a default requests policy, so even if a pod spawns with no requests specified it adds it.
I'm thinking to address this issue with two changes:

  1. add requests to our pod, so we won't get the default 100 which isn't probably fully used (I'd assume we're around 0-20)
  2. better error handling in this case.

I'll take care of the default requests.

aviramha added a commit to aviramha/mirrord that referenced this issue Oct 17, 2022
bors bot pushed a commit that referenced this issue Oct 17, 2022
Agent pod definition now has `requests` specifications to avoid being defaulted to high values. See [#579](#579).

Co-authored-by: Aviram Hassan <aviramyhassan@gmail.com>
@eyalb181
Copy link
Member

Should be resolved in #599

@bors bors bot closed this as completed in 4650c85 Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants