Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Batch charges for private instance creation that fails with exhausted resource errors. #14505

Open
cseed opened this issue Apr 26, 2024 · 0 comments

Comments

@cseed
Copy link
Collaborator

cseed commented Apr 26, 2024

What happened?

Due to limited GPU availability, it is common for GPU private jobs (esp. preemptible) to fail multiple times with exhausted resource errors before obtaining a VM. When this happens, Batch still changes for the attempt. An example is batch 8166586, job 1, attempt ZMkGaS, instance ID batch-worker-default-job-private-u4fxc which failed with ZONE_RESOURCE_POOL_EXHAUSTED.

Version

SaaS

Relevant log output

No response

@cseed cseed added needs-triage A brand new issue that needs triaging. batch labels Apr 26, 2024
@patrick-schultz patrick-schultz removed the needs-triage A brand new issue that needs triaging. label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants