-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add paddle operator proposal to kubeflow community. #502
Conversation
|
||
## Alternatives Considered | ||
|
||
One option is to add PaddlePaddle support to the existing tf-operator, but the parameters and operations between two frameworks are quite different. Combining them may make the user experience unnecessarily complicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. What about the implementation of the operator? Could you leverage https://github.com/kubeflow/common's interface that's used for other operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using kube-builder as the skeleton for the paddle-operator, part of the kubeflow common components are not necessary for our project, like job controller etc. We'll see whether we can leverage on kubeflow common for advanced features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply, we are aware of your concern, technically, we are building our operator with the newest version of kubebuilder which take care of almost all the staff like Informer/Indexer/clienset etc. It leaves out the Reconcile function to be implemented, it also provides CRUD operations directly on resources with context, which means that the kubeflow/common may not necessary in this circumstances.
cc @kubeflow/wg-training-leads @kubeflow/wg-automl-leads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM from me. Please take another look @kubeflow/wg-training-leads @kubeflow/wg-automl-leads as well.
@tizhou86 @kuizhiqing Are you planning to transfer the existing repo https://github.com/PaddleFlow/paddle-operator to Kubeflow org once this is approved? @Bobgy @theadactyl Any questions or concerns on this?
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Only nits comment
* Paddle Operator Architecture on Kubernetes, please check out [design-arch](https://github.com/PaddleFlow/paddle-operator/blob/main/docs/design-arch.md) | ||
* Paddle training job instance fault tolerant, please check out [design-fault-tolerant](https://github.com/PaddleFlow/paddle-operator/blob/main/docs/design-fault-tolerant.md) | ||
* Co-scheduling training job to prevent job instances from resource deadlock, please check out [design-coschedule](https://github.com/PaddleFlow/paddle-operator/blob/main/docs/design-coschedule.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All link expired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've just merged the new refactoring, I'll fix it soon.
We briefly talked about this story in AutoML and training bi-weekly meeting. People are interested in support plan as well. For example, if there's active maintainers for this project. If it want to be part of kubeflow release, etc. |
Yes, Paddle-operator is one of our ecosystem projects that we want to transfer to kubeflow org for maintenance, It's approved by the Paddle team already, we think the open governance will be good for the Paddle ecosystem projects. |
Great! If possible, we can do a presentations on kubeflow weekly meeting, and then we can discuss about the future plan in the meeting. |
@gaocegege: GitHub didn't allow me to request PR reviews from the following users: zw0610. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gaocegege Last WG-training meeting was CN time friendly and I miss it. Any conclusion? |
We think we can accept it. If we decide to merge into one controller, we can refactor the paddle operator later. WDYT |
Agree. Let's kick off the transfer process. /lgtm /cc @Bobgy @theadactyl @gaocegege @terrytangyuan and I help review the proposal from WG-Training side and it looks good to us. Can you help take a review? |
/cc @Bobgy Could you please help us create the repository for the paddle-operator? |
Hi Bob, please let us know if anything should be done before the transfer process, thanks! |
The original repository is: https://github.com/PaddleFlow/paddle-operator , the new repository I suppose will be : https://github.com/kubeflow/paddle-operator , we've recently added more user guides and documentations for developers. |
/cc @Bobgy for fear that this thread may have escaped from attention. :-) |
Sorry for the delay, @kubeflow/wg-training-leads. I discussed with @kubeflow/project-steering-group and we would suggest merging all training operators into one monorepo. How do feel about that? The rationale: |
I think the proposal also relates to #512 |
@Bobgy Please see our discussion on merging the operators in kubeflow/common#103. Although I think this should be discussed separately and should not block the acceptance of paddle-operator. Release is also worth a separate discussion as it's the release process in general that's complex but not specific to training operators. |
@terrytangyuan regardless of whether operators are merged, what about using a monorepo? Looks like we can wait for kubeflow/common#103 (comment) to see more updates? |
em. Since this is internal process and we don't have any clues, could you help us understand the efforts? It helps WG leads to make deliberate decision at the time we have these kind of repo requests. |
Yeah, we are really looking forward to have paddle operator to integrated into Kubeflow community, some of our end users are building the PaddlePaddle deep learning system leveraging on Kubeflow and paddle operator, we want to make it more straightforward to our users. And with the impact of the PaddlePaddle community(15.4k for the paddle main repo and 50k+for the paddle org projects), we think it will be a win-win solution for both community. |
Sorry for the long delay, let me give an update:
|
Hi all, @kubeflow/project-steering-group agreed to accept this proposal and start a new repo for paddle-operator without waiting for potential merge of Training WG operators. @kubeflow/wg-training-leads may I confirm that you are still willing to accept the proposal too, right? because it's been a while. If everyone agrees, I can start working with @tizhou86 on the process to adopt the paddle-operator repo. |
+1 from me. @kubeflow/wg-training-leads Any concerns? |
+1 |
SGTM |
Great, thanks @Bobgy ! Please let me know if anything is needed from PaddlePaddle team. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Bobgy, gaocegege, Jeffwan, terrytangyuan, tizhou86 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold UPDATE: for adoption progress, refer to #520 |
Add paddle operator proposal to kubeflow community.