Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-1037: operator - simplify controllers #4

Merged

Conversation

alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Apr 24, 2024

The PR aims at removing dnsnameresolver-crd controller and making the dnsnameresolver controller managed. The required CRDs are installed as prerequisite so we don't need to ensure them at runtime. The dnsnameresolver controller can be started by the controller runtime's manager.

$ make install
hack/update-generated-crd.sh
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/dnsnameresolvers.network.openshift.io created

$ make deploy IMG=quay.io/alebedev/dnsnameresolver-operator:4.24.173
hack/update-generated-crd.sh
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cd config/manager && /home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 edit set image controller=quay.io/alebedev/dnsnameresolver-operator:4.24.173
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 build config/default | kubectl apply -f -
namespace/dnsnameresolver-operator created
serviceaccount/dnsnameresolver-operator-controller-manager created
role.rbac.authorization.k8s.io/dnsnameresolver-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-proxy-rolebinding created
service/dnsnameresolver-operator-controller-manager-metrics-service created
deployment.apps/dnsnameresolver-operator-controller-manager created

$ oc -n dnsnameresolver-operator get pods
NAME                                                           READY   STATUS    RESTARTS   AGE
dnsnameresolver-operator-controller-manager-5dd7f8cb86-nmx6w   2/2     Running   0          11s

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2024
Comment on lines 55 to 80
// Create a new cache for tracking the DNSNameResolver resources in
// the DNSNameResolverNamespace.
dnsNameResolverCache, err := cache.New(mgr.GetConfig(), cache.Options{
Scheme: mgr.GetScheme(),
DefaultNamespaces: map[string]cache.Config{
config.DNSNameResolverNamespace: {},
},
})
if err != nil {
return nil, nil, err
}

// Create a new cache to track the EndpointSlices corresponding to the
// CoreDNS pods.
corednsEndpointsSliceCache, err := cache.New(mgr.GetConfig(), cache.Options{
Scheme: mgr.GetScheme(),
DefaultNamespaces: map[string]cache.Config{
config.OperandNamespace: {},
},
DefaultLabelSelector: labels.SelectorFromSet(labels.Set{
discoveryv1.LabelServiceName: config.ServiceName,
}),
})
if err != nil {
return nil, nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we still use the independent caches, rather than the mgr.GetCache() and mgr.GetClient().

If we don't use dnsNameResolverCache and enable watching of the config.DNSNameResolverNamespace by default by the manager's cache, then in CDO it'll always watch for the resources in that namespace even if the DNSNameResolver feature gate is not enabled.

Regarding using the client and not the cache for the coredns endpoint slice, I am a little bit sceptical about it. The minimum TTL for DNS resolution is 5 seconds and in that scenario the client will be sending requests to the api-server every 5 seconds. On the other hand the coredns pods' ips may not change that frequently. Thus it'll be more optimal to use the cache rather than the client.

PLMKyour thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use dnsNameResolverCache and enable watching of the config.DNSNameResolverNamespace by default by the manager's cache, then in CDO it'll always watch for the resources in that namespace even if the DNSNameResolver feature gate is not enabled.

Why creating this controller if the feature gate is disabled?

Regarding using the client and not the cache for the coredns endpoint slice, I am a little bit sceptical about it. The minimum TTL for DNS resolution is 5 seconds and in that scenario the client will be sending requests to the api-server every 5 seconds. On the other hand the coredns pods' ips may not change that frequently. Thus it'll be more optimal to use the cache rather than the client.

The default controller-runtime's client is a split one and it does use the cache for the reads. As a matter of fact, it would be more consistent to use it everywhere.

Upd: updated to use controller runtime's client everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use dnsNameResolverCache and enable watching of the config.DNSNameResolverNamespace by default by the manager's cache, then in CDO it'll always watch for the resources in that namespace even if the DNSNameResolver feature gate is not enabled.

Why creating this controller if the feature gate is disabled?

We won't create the controller when the feature gate is disabled, however as we are using the manager's cache it'll anyway get the events for the config.DNSNameResolverNamespace resources. If we use the specific caches then only when the controller is created at that time the cache will be created in CDO.

Regarding using the client and not the cache for the coredns endpoint slice, I am a little bit sceptical about it. The minimum TTL for DNS resolution is 5 seconds and in that scenario the client will be sending requests to the api-server every 5 seconds. On the other hand the coredns pods' ips may not change that frequently. Thus it'll be more optimal to use the cache rather than the client.

The default controller-runtime's client is a split one and it does use the cache for the reads. As a matter of fact, it would be more consistent to use it everywhere.

This will still have the issue regarding the cache getting events for the controller when it is not created.

@alebedev87 should we just stick to removing the dnsnameresolver-crd controller in this PR? We can take up changes to the dnsnameresolver controller in a different PR. We can discuss regarding the pros and cons in that PR. I also want to have @Miciah's opinion on this matter. This is something that was previously discussed and Miciah had shared the discussion that we had here (#2 (comment)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't create the controller when the feature gate is disabled, however as we are using the manager's cache it'll anyway get the events for the config.DNSNameResolverNamespace resources.

config.DNSNameResolverNamespace is a namespace you can skip adding it to the default namespaces of the manager's cache if the feature gate is disabled. Then you won't create the controller therefore no Watch will be used, no events either.

should we just stick to removing the dnsnameresolver-crd controller in this PR?

Then how would we resolve the dilemma in C-D-O?

I also want to have @Miciah's opinion on this matter. This is something that was previously discussed and Miciah had shared the discussion that we had here (#2 (comment)).

Miciah is on PTO till the end of May, we have take a decision on our own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just stick to removing the dnsnameresolver-crd controller in this PR?

Then how would we resolve the dilemma in C-D-O?

This is the dilemma specific for the helper controller (dnsnameresolver-crd controller). It's not related to the different caches being used in dnsnameresolver controller. We can still have multiple caches as can be seen here in cluster-ingress-operator which can be added to the manager when the controller is created: https://github.com/openshift/cluster-ingress-operator/blob/5b4f6283f48046b85f60bf9d75fa5c9222329cd1/pkg/operator/controller/route-metrics/controller.go#L43-L51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Yes, we can follow up on the cache logic later. Let's keep this PR related to the removal of the crd controller.

@@ -1,6 +1,6 @@
module github.com/openshift/coredns-ocp-dnsnameresolver/operator

go 1.19
go 1.21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also update this in the README as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// want the controller to start if we need it. The dnsnameresolvercrd
// controller starts it and the caches after it creates the
// DNSNameResolver CRD.
dnsNameResolverController, dnsNameResolverControllerCaches, err :=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are removing the caches, can we make the manager's cache to watch for only the OperandNamespace and DNSNameResolverNamespace namespaces?

Comment on lines 125 to 126
if err := r.dnsNameResolverCache.Get(ctx, request.NamespacedName, dnsNameResolverObj); err != nil {
if err := r.client.Get(ctx, request.NamespacedName, dnsNameResolverObj); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be reverted then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -40,18 +39,19 @@ type Config struct {

// reconciler handles the actual DNSNameResolver reconciliation logic in response to events.
type reconciler struct {
dnsNameResolverCache cache.Cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache needs to be added back to the reconciler struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Remove dnsnameresolver-crd controller
- Make dnsnameresolver managed
- CRDs should be installed as prerequisite
@alebedev87 alebedev87 changed the title operator: simplify controllers CFE-1037: operator - simplify controllers May 14, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 14, 2024

@alebedev87: This pull request references CFE-1037 which is a valid jira issue.

In response to this:

The PR aims at removing dnsnameresolver-crd controller and making the dnsnameresolver controller managed. The required CRDs are installed as prerequisite so we don't need to ensure them at runtime. The dnsnameresolver controller can be started by the controller runtime's manager.

$ make install
hack/update-generated-crd.sh
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/dnsnameresolvers.network.openshift.io created

$ make deploy IMG=quay.io/alebedev/dnsnameresolver-operator:4.24.173
hack/update-generated-crd.sh
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cd config/manager && /home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 edit set image controller=quay.io/alebedev/dnsnameresolver-operator:4.24.173
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 build config/default | kubectl apply -f -
namespace/dnsnameresolver-operator created
serviceaccount/dnsnameresolver-operator-controller-manager created
role.rbac.authorization.k8s.io/dnsnameresolver-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-proxy-rolebinding created
service/dnsnameresolver-operator-controller-manager-metrics-service created
deployment.apps/dnsnameresolver-operator-controller-manager created

$ oc -n dnsnameresolver-operator get pods
NAME                                                           READY   STATUS    RESTARTS   AGE
dnsnameresolver-operator-controller-manager-5dd7f8cb86-nmx6w   2/2     Running   0          11s

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

openshift-ci bot commented May 14, 2024

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@arkadeepsen
Copy link
Contributor

/lgtm
/approve

Copy link

openshift-ci bot commented May 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arkadeepsen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
@arkadeepsen
Copy link
Contributor

@snarayan-redhat @melvinjoseph86 @CFields651 adding the labels as the changes in this PR will be imported into openshift/cluster-dns-operator#394

/label docs-approved
/label qe-approved
/label px-approved

@openshift-ci openshift-ci bot added docs-approved Signifies that Docs has signed off on this PR qe-approved Signifies that QE has signed off on this PR labels May 14, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 14, 2024

@alebedev87: This pull request references CFE-1037 which is a valid jira issue.

In response to this:

The PR aims at removing dnsnameresolver-crd controller and making the dnsnameresolver controller managed. The required CRDs are installed as prerequisite so we don't need to ensure them at runtime. The dnsnameresolver controller can be started by the controller runtime's manager.

$ make install
hack/update-generated-crd.sh
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/dnsnameresolvers.network.openshift.io created

$ make deploy IMG=quay.io/alebedev/dnsnameresolver-operator:4.24.173
hack/update-generated-crd.sh
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cd config/manager && /home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 edit set image controller=quay.io/alebedev/dnsnameresolver-operator:4.24.173
/home/alebedev/git/coredns-ocp-dnsnameresolver/operator/bin/kustomize-v5.3.0 build config/default | kubectl apply -f -
namespace/dnsnameresolver-operator created
serviceaccount/dnsnameresolver-operator-controller-manager created
role.rbac.authorization.k8s.io/dnsnameresolver-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/dnsnameresolver-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/dnsnameresolver-operator-proxy-rolebinding created
service/dnsnameresolver-operator-controller-manager-metrics-service created
deployment.apps/dnsnameresolver-operator-controller-manager created

$ oc -n dnsnameresolver-operator get pods
NAME                                                           READY   STATUS    RESTARTS   AGE
dnsnameresolver-operator-controller-manager-5dd7f8cb86-nmx6w   2/2     Running   0          11s

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label May 14, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit c41cdd1 into openshift:main May 14, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants