Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: creation of monitor service when operator is cluster-scoped #2601

Merged

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Feb 26, 2020

Description of the change:

  • Fix ServiceMonitor creation when the operator is cluster-scoped and the environment variable WATCH_NAMESPACE has not as value the namespace where the operator was deployed.
  • Update mock data ( test-framework ) with the latest version of the main.go.

Motivation for the change:
Closes: #2602
See: #2494 (comment)

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 26, 2020
@camilamacedo86 camilamacedo86 changed the title fix: creation of monitor service when operator is cluster-scoped WIP fix: creation of monitor service when operator is cluster-scoped Feb 26, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2020
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 26, 2020
@camilamacedo86 camilamacedo86 changed the title WIP fix: creation of monitor service when operator is cluster-scoped fix: creation of monitor service when operator is cluster-scoped Feb 26, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2020
Copy link
Member

@fabianvf fabianvf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2020
@camilamacedo86 camilamacedo86 added kind/bug Categorizes issue or PR as related to a bug. metrics labels Feb 27, 2020
Copy link
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this should go into the version migration guide, since it requires users to change a scaffolded file.

CHANGELOG.md Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2020
@openshift-ci-robot
Copy link

New changes are detected. LGTM label has been removed.

@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Feb 27, 2020

Hi @estroz,

Screenshot 2020-02-27 at 19 27 28

Note that in this specific scenario I am planning to do more changes in the metrics scaffold see: #2603 and #2606

So, I'd like to add all in the same section in the migration guide. Could we agree in move forward with the following PR in this case?

The following PR is: https://github.com/operator-framework/operator-sdk/pull/2625/files

estroz
estroz previously requested changes Mar 6, 2020
Copy link
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the correct namespace to addMetrics instead of interpolating it within the function is a better way to handle cluster-scoped operator metrics issues, as @shawn-hurley proposed.

Camila Macedo added 2 commits March 9, 2020 13:40
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 9, 2020
@camilamacedo86 camilamacedo86 dismissed estroz’s stale review March 9, 2020 13:52

Done changes requested by @estroz. Dimissing his review because he is in PTO

@asmacdo asmacdo removed their request for review March 9, 2020 14:11
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 9, 2020
@camilamacedo86
Copy link
Contributor Author

HI @estroz and @shawn-hurley,

Your suggestion here #2601 (comment) and:

Screenshot 2020-03-09 at 19 59 00

Cannot be done

Following the reasons:

1 - The Operator namespace is not required before we call the addMetrics, so, we have not this info before that.
2 - The getOperatorNamespace cannot be called outside of the addMetrics because it will break the project when the operator is running locally. See the addMetrics will be ignored/skipped when we are checking it with operator-sdk run --local for we do not face the issue:

return "", ErrRunLocal
.
3 - Note that ns that has been passed for it before this change is NOT the ns that we need in the metrics.

@camilamacedo86
Copy link
Contributor Author

Hi @estroz and @shawn-hurley,

I am moving forward with this one and I hope that the #2601 (comment) clarifies why we cannot address your suggestion. However, please feel free to reach me out if you think that some following up pr should be done here.

@camilamacedo86 camilamacedo86 merged commit 770a2c5 into operator-framework:master Mar 10, 2020
@camilamacedo86 camilamacedo86 deleted the fix-issue-metrics branch March 10, 2020 09:29
camilamacedo86 added a commit that referenced this pull request Mar 10, 2020
…lded files. #2625

- Add info over the changes required in the default scaffold done in the `main.go` file to address bug fixes and improvements made in metrics. Related to PR's #2606, #2603 and #2601
_, err = metrics.CreateServiceMonitors(cfg, namespace, services)

// Get the namespace the operator is currently deployed in.
operatorNs, err := k8sutil.GetOperatorNamespace()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 - Shouldn't https://github.com/operator-framework/operator-sdk/pull/2601/files#diff-6438d7f67aed0d3fd6727a1f27133faeR237 be updated to invoke k8sutil.GetWatchNamespace() if the CRD is cluster-scoped and WATCH_NAMESPACE is set to ""? When using cluster-scoped CRDs, the CRs themselves don't live in the namespace returned by k8sutil.OperatorNamespace(), right?

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the metrics service is created in the same namespace where the operator is deployed. See: service/memcached-operator-metrics

$ kubectl get all -n memcached 
NAME                                      READY   STATUS    RESTARTS   AGE
pod/example-memcached-7c4df9b7b4-lzd6j    1/1     Running   0          64s
pod/example-memcached-7c4df9b7b4-wbtkz    1/1     Running   0          64s
pod/example-memcached-7c4df9b7b4-wt6jb    1/1     Running   0          64s
pod/memcached-operator-56f54d84bf-zrtfv   1/1     Running   0          69s

NAME                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/example-memcached            ClusterIP   10.108.124.47   <none>        11211/TCP           63s
service/memcached-operator-metrics   ClusterIP   10.108.67.82    <none>        8383/TCP,8686/TCP   66s

NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/example-memcached    3/3     3            3           65s
deployment.apps/memcached-operator   1/1     1            1           70s

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/example-memcached-7c4df9b7b4    3         3         3       65s
replicaset.apps/memcached-operator-56f54d84bf   1         1         1       70s

I think you are trying to raise here the scenario where the service/CR should be created in the namespace where the CR is applied. Am I right? I am raising an issue to check this scenario and see how better we can address it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See; #2707

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I'm not talking about the creation of the actual Kubernetes Service and ServiceMonitor resources. I'm talking about the way in which the Go code that serves the custom resource metrics tracks which custom resources to report on. After filtering the GVKs, serveCRMetrics restricts which namespaces should be searched for custom resources:

	// Get the namespace the operator is currently deployed in.
	operatorNs, err := k8sutil.GetOperatorNamespace()
	if err != nil {
		return err
	}
	// To generate metrics in other namespaces, add the values below.
	ns := []string{operatorNs}
	// Generate and serve custom resource specific metrics.
	err = kubemetrics.GenerateAndServeCRMetrics(cfg, ns, filteredGVK, metricsHost, operatorMetricsPort)
	if err != nil {
		return err
	}

The call to k8sutil.GetOperatorNamespace(), returns the name of the namespace in which the operator is running, but cluster-scoped resources don't belong to a namespace. I think the call to k8sutil.GetOperatorNamespace() need to be replaced with a call to k8sutil. GetWatchNamespace(). Only then will the namespace against which to filter CRs be correctly set to "" when using cluster-scoped CRDs

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @robbie-demuth,

Note that it was NOT really changed in this PR. Before that, it was using the OperatorNamespace. See; https://github.com/operator-framework/operator-sdk/blob/v0.15.x/internal/scaffold/cmd.go#L212. So, let's discuss your suggestion and check it via issue #2707
c/c @estroz

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I just commented here because this PR was listed as a fix for #1858 and, without the update to serveCRMetrics, the issue isn't really resolved

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that issue #1858 can be caused by many raisons and we addressed the diff scenarios in more than one pr. If you check the current scaffold you will see:

         // The metrics will be generated from the namespaces which are returned here.
	// NOTE that passing nil or an empty list of namespaces in GenerateAndServeCRMetrics will result in an error.
	ns, err := kubemetrics.GetNamespacesForMetrics(operatorNs)
	if err != nil {
		return err
	}
	// Generate and serve custom resource specific metrics.
	err = kubemetrics.GenerateAndServeCRMetrics(cfg, ns, filteredGVK, metricsHost, operatorMetricsPort)
	if err != nil {
		return err
	}

Then, it has a clear msg over what is expected in this place. NOTE that passing nil or an empty list of namespaces in GenerateAndServeCRMetrics will result in an error. and then, because of this, we cannot use the WATCH_NAMESPACE in this point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated one of my projects to call k8sutil.GetWatchNamespace() instead of k8sutil.GetOperatorNamespace() and it does work. It's important to note that if WATCH_NAMESPACE is set to "", the list of namespaces passed to GenerateAndServeCRMetrics won't be nil or empty. Instead it'll be equivalent to []string{""}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to create ServiceMonitor when WATCH_NAMESPACE has not the namespace used to deploy the operator
7 participants