Validate job spec prior submission in kubernetes scheduler

## Description
Since we use custom CRDs (volcano) and not typed k8s constructs this validation does not happen up until the job is picked up by the scheduler.

E.g. https://kubernetes.io/docs/concepts/workloads/controllers/job clearly states that:
```
When the control plane creates new Pods for a Job, the .metadata.name of the Job is part of the basis for naming those Pods. The name of a Job must be a valid [DNS subdomain](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names) value, but this can produce unexpected results for the Pod hostnames. For best compatibility, the name should follow the more restrictive rules for a [DNS label](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names). Even when the name is a DNS subdomain, the name must be no longer than 63 characters.
```
Yet we are allowed to send this freely only to see a silent failure:
```
Execute plugin when job add failed, err: Service "blablablabla..." is invalid: [metadata.name: Invalid value: "blablablabla...": must be no more than 63 characters, spec.selector: Invalid value: "blablablabla...": must be no more than 63 characters]
```

## Motivation/Background
If we know the resource violates the spec - we should fail fast.


## Detailed Proposal
Add a flag (defaults to `False` for now), e.g. `validate_spec` to k8s scheduler that would call `create_namespaced_custom_object` prior actual job submission.


## Alternatives
1. Do manual validation using code from TorchX itself, but it's brittle.
2. Status quo - leads to unpleasantly surprising customer experience


## Additional context/links
* https://kubernetes.io/docs/concepts/workloads/controllers/job
* https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
* https://github.com/kubernetes-client/python/blob/a486091a91fdd0d0d389bc9538533118ee6fa3c8/kubernetes/client/api/custom_objects_api.py#L200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Validate job spec prior submission in kubernetes scheduler #1152

Description

Motivation/Background

Detailed Proposal

Alternatives

Additional context/links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Validate job spec prior submission in kubernetes scheduler #1152

Description

Description

Motivation/Background

Detailed Proposal

Alternatives

Additional context/links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions