Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize controllers code structure #1302

Merged

Conversation

Jeffwan
Copy link
Member

@Jeffwan Jeffwan commented Jul 14, 2021

This PR is part of #1299 kubeflow/common#138

It resolves "3. Simplify controller code structures. Merge pods.go service.go job.go into one file which is the core JobController implementation."

Major changes

  1. Fix crd installation issue. OpenAPI v3.0 validation schema doesn't allow fields other than metadata.name and metadata.generateName if metada is specified. (https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#specifying-a-structural-schema)
  2. Embed common.JobController instead of TFController in TFJobReconciler. The major motivation is
  • TFController implementation still uses low level tfjob clients which should be deprecated
  • We need another migration at the time we cleanup TFController codes.
  1. Reorganize Controllers code structures to make them clean. After the clean up, using pytorch as an example, pytorch_controller.go has all codes of its controller implementation. Language specific setting will be in pytorch.go.

Note: tensorflow is an exception, I didn't clean up codes because we want to keep them for compatibility.


pkg/controller.v1
├── mxnet
│   ├── mxjob_controller.go
│   ├── mxnet.go
│   └── suite_test.go
├── pytorch
│   ├── pytorch.go
│   ├── pytorchjob_controller.go
│   └── suite_test.go
├── tensorflow
│   ├── controller.go
│   ├── controller_test.go
│   ├── informer.go
│   ├── job.go
│   ├── job_test.go
│   ├── pod.go
│   ├── pod_test.go
│   ├── status.go
│   ├── status_test.go
│   ├── suite_test.go
│   ├── tensorflow.go
│   ├── tensorflow_test.go
│   ├── tfjob_controller.go
│   ├── util.go
│   └── util_test.go
└── xgboost
    ├── suite_test.go
    ├── xgboost.go
    └── xgboostjob_controller.go

/cc @kubeflow/wg-training-leads @zw0610

CustomResourceDefinition.apiextensions.k8s.io "tfjobs.kubeflow.org" is invalid:
spec.validation.openAPIV3Schema.properties[metadata]: Forbidden: must not specify anything other than name and generateName,
but metadata is implicitly specified

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
1. Move all ControllerInterface implementations to [framework]_controller.go
2. Clean up job.go pod.go service.go status.go
3. Follow exact same method order here https://github.com/kubeflow/common/blob/9ffa565bc60e08936f7f80cb3f491cf53f256e7f/pkg/apis/common/v1/interface.go

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 14, 2021

/hold

I notice previous change embed TFController which means most of the methods still use tfjob client. We need to make controller consistent and `TFJobReconciler` should override `common.JobController`.

With this change, it would be very easy to make original tf operator and new reconciler work together. We also avoid one more refactor when we will remove original TFController codes

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
@gaocegege
Copy link
Member

Thanks for your contribution! 🎉 👍

Is it ready to review?

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 15, 2021

Thanks for your contribution! 🎉 👍

Is it ready to review?

@gaocegege It is. I am running some coarse-grained testing and leave /hold just in case I need more changes

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 20, 2021

/hold cancel

}
// setup FieldIndexer to inform the manager that this controller owns pods and services,
// so that it will automatically call Reconcile on the underlying MXJob when a Pod or Service changes, is deleted, etc.
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &corev1.Pod{}, jobOwnerKey, func(rawObj client.Object) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference of this from earlier watch?

Copy link
Member Author

@Jeffwan Jeffwan Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge c, err := controller.New(controllerName, mgr, controller.Options{Reconciler: r}) creates a separate controller and it is still a low level concepts which leverages "sigs.k8s.io/controller-runtime/pkg/controller" in v1. ctrl.Manager facing programing is higher level and use doesn't need handles things like predicate funds.

I think both way work and this is just kubebuilder v2-v3 fashion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use controller.Watch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaocegege do you mean you preferred to use controller directly with some event handlers?

Technically, ctrl.NewControllerManagedBy(mgr).....Complete(r) will create the controller eventually.
https://github.com/kubernetes-sigs/controller-runtime/blob/1e4d87c9f9e15e4a58bb81909dd787f30ede7693/pkg/builder/controller.go#L169

https://github.com/kubernetes-sigs/controller-runtime/blob/1e4d87c9f9e15e4a58bb81909dd787f30ede7693/pkg/builder/controller.go#L190

https://github.com/kubernetes-sigs/controller-runtime/blob/1e4d87c9f9e15e4a58bb81909dd787f30ede7693/pkg/builder/controller.go#L314

I am ok with either way, Kubebuilder v1 does use controller way https://book-v1.book.kubebuilder.io/basics/simple_controller.html . In kubebuilder v2 and v3, this has been changed. I am running more tests if high level can meet all the cases. If not, we can fallback to lower level. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am just curious. Both way LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaocegege

It's pretty buggy using kubebuilder v3 + common (with many expectation operations). I think it's better to change back to kubebuilder v1 way to manually control the pods events.

@gaocegege
Copy link
Member

LGTM

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm

if h := os.Getenv("HOME"); h != "" {
return h
}
return os.Getenv("USERPROFILE") // windows
}

// TODO (Jeffwan@): Find an elegant way to either use delegatingReader or directly use clientss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO (Jeffwan@): Find an elegant way to either use delegatingReader or directly use clientss
// TODO (Jeffwan@): Find an elegant way to either use delegatingReader or directly use clients

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I will address all grammar issue in one PR.

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 23, 2021

/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan, terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot google-oss-robot merged commit b8d5d3c into kubeflow:all-in-one-operator Jul 23, 2021
@Jeffwan Jeffwan deleted the cleanup_controllers_new branch July 23, 2021 00:55
Jeffwan added a commit that referenced this pull request Aug 5, 2021
* Fix crd installation issue

CustomResourceDefinition.apiextensions.k8s.io "tfjobs.kubeflow.org" is invalid:
spec.validation.openAPIV3Schema.properties[metadata]: Forbidden: must not specify anything other than name and generateName,
but metadata is implicitly specified

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>

* Clean up controllers

1. Move all ControllerInterface implementations to [framework]_controller.go
2. Clean up job.go pod.go service.go status.go
3. Follow exact same method order here https://github.com/kubeflow/common/blob/9ffa565bc60e08936f7f80cb3f491cf53f256e7f/pkg/apis/common/v1/interface.go

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>

* Embed common.JobController instead of TFController

I notice previous change embed TFController which means most of the methods still use tfjob client. We need to make controller consistent and `TFJobReconciler` should override `common.JobController`.

With this change, it would be very easy to make original tf operator and new reconciler work together. We also avoid one more refactor when we will remove original TFController codes

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants