Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python SDK for Kubeflow Training Operator #1380

Closed
alembiewski opened this issue Aug 26, 2021 · 21 comments · Fixed by #1420
Closed

Python SDK for Kubeflow Training Operator #1380

alembiewski opened this issue Aug 26, 2021 · 21 comments · Fixed by #1420

Comments

@alembiewski
Copy link
Member

Are there any plans to add Python SDK for the new all-in-one Kubeflow Training Operator?

@alembiewski
Copy link
Member Author

/kind question

@gaocegege
Copy link
Member

/cc @kubeflow/wg-training-leads

@gaocegege
Copy link
Member

I think we should have it, but do not have the bandwidth for it now.

@johnugeorge
Copy link
Member

Yes.
/help

@google-oss-robot
Copy link

@johnugeorge:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Yes.
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Jeffwan
Copy link
Member

Jeffwan commented Aug 27, 2021

FYI

Ketan ping me earlier and seems Flyte user wants to use it to submit jobs. flyteorg/flyte#1375

@Jeffwan
Copy link
Member

Jeffwan commented Aug 27, 2021

@alembiewski Do you have any specific requirements? Existing SDK should work out of the box since we have not changed API yet. If that works for you, we can consider to extend it to other frameworks. If not, what level of abstraction do you need?

@alembiewski
Copy link
Member Author

alembiewski commented Aug 27, 2021

@Jeffwan, I think by having the unified SDK that supports multiple frameworks we could possibly reduce code duplication between client APIs by introducing a generic API client, which supports multiple model types (pytorchjob api and tfjob api look really similar to each other) - this approach will streamline the process of adding new frameworks to the SDK, but could be tricky to implement.

The current model with separate SDK per framework works for us as well, although there are few serious limitations:

It seems to me that such limitations could be dropped just by regenerating the SDK models with OpenAPI generator as it was done recently for Katib: kubeflow/katib#1572, although maybe there are limitations that I'm not aware of.

@andreyvelich
Copy link
Member

I agree with @alembiewski, it's better to have unify SDK for the training operator.
Our users just can run: pip install kubeflow-training to use it.

It seems to me that such limitations could be dropped just by regenerating the SDK models with OpenAPI generator as it was done recently for Katib: kubeflow/katib#1572, although maybe there are limitations that I'm not aware of.

Yes, we should re-generate SDK with OpenAPI to support the latest Kubernetes python client version.

@kumare3
Copy link

kumare3 commented Aug 27, 2021

FYI

Ketan ping me earlier and seems Flyte user wants to use it to submit jobs. flyteorg/flyte#1375

@Jeffwan we use a custom python module to submit codes. We just reference the go api in Flyte

@Jeffwan
Copy link
Member

Jeffwan commented Aug 29, 2021

kubernetes client version lock-in makes the SDK incompatible with the other Kubeflow SDKs, such as KFserving 0.5.1, KFP with the upcoming Katib SDK.

Agree. It would be hard for all projects to have same version. We can try to have same version as others.

inconsistencies between the API and SDK (e.g.: Update swagger.json schema for TFJobSpec to include RunPolicy #1278)

Yeah, I think at least we should regenerate it to reflect latest changes. I cut #1389 and it introduces some unnecessary files. We will resolve it later

@alembiewski
Copy link
Member Author

alembiewski commented Sep 21, 2021

@Jeffwan, any updates on this? Is there anything I can help with?

@Jeffwan
Copy link
Member

Jeffwan commented Oct 4, 2021

It's auto closed. I keep it open for pypi release update.

@alembiewski
Copy link
Member Author

alembiewski commented Oct 4, 2021

Thanks, @Jeffwan! I built and uploaded the package to the TestPyPI repository for testing:
https://test.pypi.org/project/kubeflow-training/1.3.0/#description
After it is verified and tested, we can then release it to PyPI. Should we update the release notes mentioning the updated SDK for 1.3.0 after the package is published?

@alembiewski
Copy link
Member Author

Hey @Jeffwan, is the testing of the SDK still ongoing? Are there any estimations regarding when the package will be published to the PyPI?

@Jeffwan
Copy link
Member

Jeffwan commented Oct 14, 2021

Hey @Jeffwan, is the testing of the SDK still ongoing? Are there any estimations regarding when the package will be published to the PyPI?

HI @alembiewski sorry I miss last message. The testing is done and we are good to go!

I. Should we update the release notes mentioning the updated SDK for 1.3.0 after the package is published?

We can address it in README.md. For training blog stuff, I really like to promote this work in kubeflow/blog#110. WDYT?

@alembiewski
Copy link
Member Author

Sounds good, thanks @Jeffwan! Looking forward for the SDK to be released. Please let me know if any help is needed with publishing it to PyPI, glad to help with that

@Jeffwan
Copy link
Member

Jeffwan commented Oct 16, 2021

@alembiewski If you give a hand, that would be great. Could you upload this package? Please add andreyvelich, jiaxin.shan and @terrytangyuan as maintainer as well.

@terrytangyuan Can you share your account?

@terrytangyuan
Copy link
Member

Mine is terrytangyuan (same as my GitHub ID).

Please add us as “owner”s once the package is uploaded to PyPI. Thank you!

@alembiewski
Copy link
Member Author

The package has been uploaded to PyPI: https://pypi.org/project/kubeflow-training/ 🚀 🎉
Added accounts mentioned above as owners, please check your mailbox.

@Jeffwan
Copy link
Member

Jeffwan commented Oct 17, 2021

Great job cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants