fix(trainer): validate polling_interval in wait_for_job_status#554
fix(trainer): validate polling_interval in wait_for_job_status#554HarshPopat23 wants to merge 6 commits into
Conversation
Adds a ValueError guard in TrainerClient.wait_for_job_status() to reject non-positive polling_interval values before delegating to the backend. Previously this could cause a CPU busy-loop (polling_interval=0) or a cryptic stdlib error (negative values). Fixes kubeflow#550 Signed-off-by: HarshPopat23 <musichk61@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Adds early input validation to the user-facing TrainerClient.wait_for_job_status() API to reject invalid polling_interval values before delegating to the selected backend, and adds a unit test to cover the new validation behavior.
Changes:
- Added a guard in
TrainerClient.wait_for_job_status()to raiseValueErrorwhenpolling_interval <= 0. - Added unit tests verifying
ValueErroris raised forpolling_intervalof0and-5.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| kubeflow/trainer/api/trainer_client.py | Adds polling_interval validation before calling backend wait_for_job_status(). |
| kubeflow/trainer/api/trainer_client_test.py | Adds a unit test ensuring invalid polling_interval values raise ValueError. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: HarshPopat23 <musichk61@gmail.com>
0629885 to
96650d5
Compare
…rror messages Signed-off-by: HarshPopat23 <musichk61@gmail.com>
…ed utils Move polling_interval and timeout validation into a shared common_utils.validate_wait_intervals() function, used by both TrainerClient and OptimizerClient. Also renamed test function per review feedback and fixed missing newline at end of test file. Signed-off-by: HarshPopat23 <musichk61@gmail.com>
Signed-off-by: HarshPopat23 <musichk61@gmail.com>
Signed-off-by: HarshPopat23 <musichk61@gmail.com>
|
Small naming suggestion: since this helper validates both polling_interval and timeout specifically for wait_for_job_status, would something like validate_wait_for_job_status_args() (or ..._params()) make its purpose a bit clearer? |
Hi @Goku2099, thank you for taking the time to review my PR and for the suggestion! I actually used this function name based on a recommendation from @andreyvelich. If they agree that validate_wait_for_job_status_args() (or a similar name) is more appropriate, I'd be happy to update it. Thanks again for the helpful feedback! |
What this PR does / why we need it:
Adds input validation to
TrainerClient.wait_for_job_status()to rejectnon-positive
polling_intervalvalues before delegating to the backend.Previously, passing
polling_interval=0could cause a CPU busy-loop(
time.sleep(0)in a tight polling loop), and negative values raised acryptic stdlib
ValueErrorfrom deep insidetime.sleep()with nocontext about which argument was invalid.
Which issue(s) this PR fixes:
Fixes #550
Checklist:
ValueError: The input values are incorrect.— no change needed)