Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK]Support Docker image as objective in the tune API #2338

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

akhilsaivenkata
Copy link

What this PR does / why we need it: Supporting Docker image as an objective in the tune API.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2326

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: akhilsaivenkata <akhilammu1@gmail.com>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@akhilsaivenkata
Copy link
Author

akhilsaivenkata commented May 30, 2024

Hi @andreyvelich , I made the changes to katib_client.py. Do we have any test cases that need to be updated or implemented for this change? I also wanted to test these new changes on my local machine so I ran "make check" & "make test" commands on my local machine and there were no failures. I wonder if you could see and suggest any action items that needs to taken up here.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this @akhilsaivenkata!
I left a few comments.

parameters: Dict[str, Any],
base_image: str = constants.BASE_IMAGE_TENSORFLOW,
#base_image: str = constants.BASE_IMAGE_TENSORFLOW,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should keep the base_image, since we use it when user set objective as train function.

@@ -400,12 +407,12 @@ def tune(
trial_template = models.V1beta1TrialTemplate(
primary_container_name=constants.DEFAULT_PRIMARY_CONTAINER_NAME,
retain=retain_trials,
trial_parameters=trial_params,
trial_parameters=trial_params if callable(objective) else [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trial_parameters still be required even when user sets Docker image.
You can check example here: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/hp-tuning/random.yaml#L31-L36

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @andreyvelich , I will revert this change and keep the trial_parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would like to know if we need to write new unit test cases or change existing ones ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhilsaivenkata Yes, since we merge this PR: #2325, please add unit test for tune function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_params = {}
experiment_params = []
trial_params = []
base_image = constants.BASE_IMAGE_TENSORFLOW,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @andreyvelich , i did keep the base_image here at this line, I have added it in the if block so this code change got mixed up with all the other lines

@andreyvelich
Copy link
Member

@akhilsaivenkata Please rebase your PR.

trial_spec=trial_spec,
)

# Add parameters to the Katib Experiment.
experiment.spec.parameters = experiment_params
experiment.spec.parameters = experiment_params if callable(objective) else []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think parameters field is also needed since trial_paramaters is still required.

WDTY👀 @akhilsaivenkata @andreyvelich

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @Electronic-Waste , I have just reverted these two code changes and pushed it now.

@google-oss-prow google-oss-prow bot added size/M and removed size/L labels Jul 16, 2024
Signed-off-by: akhilsaivenkata <akhilammu1@gmail.com>
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great. Thank you @akhilsaivenkata ! I left a question for you and @andreyvelich .

Comment on lines +406 to +407
command=["bash", "-c"] if callable(objective) else None,
args=[exec_script] if callable(objective) else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm not sure if we can assign None to command and args here when we use Docker image as objective.

As @andreyvelich shows an example for us, we sometimes need to pass command and args to the training container to execute python scripts with some parameters.

Could you explain your idea in details so that I can understand more? WDYT👀 @akhilsaivenkata @andreyvelich

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, initially we can just allow user to set image as objective without command and args.
Similar to how we allow create training job using base_image parameter: https://github.com/kubeflow/training-operator/blob/master/sdk/python/kubeflow/training/api/training_client.py#L327C35-L327C45.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SDK] Support Docker image as objective in the tune API
3 participants