fix(trainer): raise ValueError for missing containers instead of Stop…#348
Conversation
…Iteration next() on a bare generator with no default raises StopIteration which Python 3.7+ (PEP 479) promotes to RuntimeError with no pod/container context. Replace with next(..., None) + explicit ValueError in both get_trainjob_initializer_step and get_trainjob_node_step. Add first test coverage for both functions (8 parametrized cases total). Signed-off-by: Ayush Petwal <ayushpetwal.0105@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
This PR hardens two utility functions in utils.py to replace bare next(generator) calls — which raise StopIteration (promoted to RuntimeError by PEP 479 in Python 3.7+) — with next(..., None) + explicit ValueError with descriptive messages. It also adds first unit test coverage for both affected functions.
Changes:
get_trainjob_initializer_stepinutils.py: replaced barenext()withnext(..., None)and an explicitValueErrornaming the pod and listing found containers.get_trainjob_node_stepinutils.py: same pattern applied for the node container lookup.- New parametrized test cases in
utils_test.pycovering both success and error paths for both functions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
kubeflow/trainer/backends/kubernetes/utils.py |
Replaces bare next(gen) calls with safe next(..., None) + ValueError, adds Raises: docstring sections |
kubeflow/trainer/backends/kubernetes/utils_test.py |
Adds 8 parametrized test cases (4 each) covering success and error paths for the two updated functions |
|
/ok-to-test |
Fiona-Waters
left a comment
There was a problem hiding this comment.
This is great! Thanks @1Ayush-Petwal
/lgtm
| ( | ||
| c | ||
| for c in pod_spec.containers | ||
| if c.name in {constants.DATASET_INITIALIZER, constants.MODEL_INITIALIZER} | ||
| ), | ||
| None, |
There was a problem hiding this comment.
How would it be possible that JobSet doesn't have such containers if we define label selector here to query the initializer Pods: https://github.com/1Ayush-Petwal/sdk/blob/cbb4ebd7ed7d24d3d4a70a579cc106c387f1efb7/kubeflow/trainer/backends/kubernetes/backend.py#L687-L689
We reserve the container name for the initializers, and validate it in TrainJob webhook: https://github.com/kubeflow/trainer/blob/master/pkg/webhooks/trainingruntime_webhook.go#L89-L101
There was a problem hiding this comment.
You're right. I missed the label selector, this case is effectively unreachable. Would it be ok to add a comment documenting this invariant for future contributors?
There was a problem hiding this comment.
Yes, please close this PR, and leave comment in the issue that these containers should always exist.
|
Closing this PR, the StopIteration case is unreachable. The label selector ensures only initializer pods reach this code path, and the TrainJob webhook guarantees those pods always have the required container. |
What this PR does / why we need it:
next()on a bare generator with no default raisesStopIteration, whichPython 3.7+ (PEP 479) promotes to a
RuntimeErrorwith no pod/containercontext. Hardens two functions in
utils.py:get_trainjob_initializer_step: replaced barenext(gen)withnext(..., None)+ explicitValueErrornaming the pod and listingactual containers foundget_trainjob_node_step: same pattern appliedRaises:docstring section to both functionsWhich issue(s) this PR fixes
General SDK hardening (noticed while exploring the codebase for #164).
Checklist:
-sflag on commit)fix(trainer):)