Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubernetes] Restrict final pod name to ~200 symbols #232

Closed
aivanou opened this issue Oct 11, 2021 · 1 comment
Closed

[Kubernetes] Restrict final pod name to ~200 symbols #232

aivanou opened this issue Oct 11, 2021 · 1 comment
Labels
bug Something isn't working kubernetes kubernetes and volcano schedulers module: runner issues related to the torchx.runner and torchx.scheduler modules
Milestone

Comments

@aivanou
Copy link
Contributor

aivanou commented Oct 11, 2021

We need to restrict the max pod name produced by the torchx kubernetes-scheduler to the max length allowed length. If we do not do this, it would be hard to debug these kind of jobs.

@d4l3k d4l3k added bug Something isn't working kubernetes kubernetes and volcano schedulers module: runner issues related to the torchx.runner and torchx.scheduler modules labels Oct 20, 2021
@kiukchung kiukchung added this to the 0.1.1 release milestone Oct 21, 2021
@aivanou
Copy link
Contributor Author

aivanou commented Nov 2, 2021

It seems it got fixed in the latest volcano version, now volcano produces an error:

Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '28f9da0c-046f-41c5-b26e-a615663b998b', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 02 Nov 2021 19:24:22 GMT', 'Content-Length': '389'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"validatejob.volcano.sh\" denied the request: create pod with name torchx-cv-trainer-tf7xpsc4ndcmcd-my-torchx-cv-trainer-awesome-worker-0-0 validate failed [name part must be no more than 63 characters]; unable to find job queue: queues.scheduling.volcano.sh \"test\" not found","code":400}

@aivanou aivanou closed this as completed Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kubernetes kubernetes and volcano schedulers module: runner issues related to the torchx.runner and torchx.scheduler modules
Projects
None yet
Development

No branches or pull requests

3 participants