-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to list *unstructured.Unstructured errors in kubemetrics #2577
Comments
Hello @miguelsorianod Thank you! |
Hi @miguelsorianod, See that your error is related to permissions.
Shows that you are missing the permission to list the Could you please ensure that you have all permissions which are scaffolds by default such here and let us know if your problem was solved? |
Hi @camilamacedo86, Our operator does not make use of replicasets.
A commit that can be used to see/use the operator code is:
|
HI @miguelsorianod, The permission in the file 3scale/3scale-operator@61d3d73 apparently shows correctly. However, note that you are creating a Role and RoleBinding. Are the ALL resources applied in the same namespace as your operator? If not, then, you need to set up it as cluster-scope. If yes, ensure that you are applying all resources indeed the role, service account and etc in the same namespace used by your operator. Please, let us know if it worked. |
Hi, I can confirm we ensure that we are applying/creating all the resources (role, service account and rolebinding) in the same namespace used by our operator. |
Hi @miguelsorianod, See that the error faced by you is:
So, it means that an RBCA permission is missing and because of this, you face an issue in serveCRMetrics in your code implementation. However, by looking at your code you are NOT using the SDK implementation to generate the metrics for the custom resources. See that your serveCRMetrics is not the SDK implementation to do it. Also, note that you have other customizations in place as well to achieve it. In this way, as you can check in the logs and in your code, it is NOT related to SDK features and implementations provided by it. Then, I am closing it as sorted out since we could confirm that the error faced came from a code which is NOT the same provided by SDK to export the metrics as we could ensure that this error cannot be reproduced with the implementations provided by SDK as well. I hope that the above info helps you solve the problem in your code customizations used to achieve this goal. Please, let us know if you think it should be re-opened because something is missing to be addressed here. |
Hi @camilamacedo86, I think there's been some confusion. The implementation you linked to belongs to a new commit that we added to try to workaround the issue that we reported.
This commit is the second one in that branch and does not include the workaround customizations that we applied. If you checkout that commit (instead of the "changes-operator-sdk-upgrade" branch directly) you will see that those modifications do not exist and that we are using the SDK implementation for serveCRMetrics: Sorry for the confusion. |
Hi @miguelsorianod, Note that it still having customizations. See for example https://github.com/3scale/3scale-operator/blob/61d3d730556cb19c2e11c7b8180b293d8bf0a54f/cmd/manager/main.go#L134. Also, note your imports. And if I check the open PRs, if you are using this one https://github.com/3scale/3scale-operator/pull/327/files#diff-9e63f7b0791edcb4a53c944552b88970R199 to test it is not using the SDK code impl to create thee metrics. Also, we do not really know how your project works, however, we are unable to reproduce your scenario by using any project scaffolded with SDK which is using the implementation provided by it without any customization. Could you provide the steps for we face the same by using the Memcached Samples, for example, or by following the Getting or Quick Started? |
Hi, I'm unable to reproduce it with an empty operator-sdk project either. An empty operator project called "exampleproject" has been created in the following way:
Regarding to still having customizations: Those are not related to serveCRMetrics. In any case, to avoid confusions when referring to files and just in case to discard any problem related to that I created a new branch called "remove-maincustomization" available here: https://github.com/3scale/3scale-operator/tree/remove-maincustomization . By diffing locally the main.go file in the project where we have the issue with the main.go file of the new empty operator project (named exampleproject) it can be verified that there's not customization:
Notice how the only differences detected are due to changes in the operator project name. Then we deploy the operator with:
and looking at the operator pod logs we see all those RBAC related problems |
Hi @miguelsorianod, Note that it is really more specific to the process to upgrade your project and its code than SDK features. So, I am not sure how we can help you more with. I'd probably recommend you do it step by step to ensure all and has fewer chances to see what may be wrong or missing. Please, let us know if you could solve it. |
Hi @miguelsorianod, All information provided so far still valid. However, we could reproduce a similar issue which may also be valid in your case. See #1858 (comment) Then, I thought that the above scenario can be helpful for you. Also, if you are not using the default metrics provided by the SDK then you may also can just not use the addMetrics implementation for now until all your project be upgraded successfully since it can be added later. c/c @varshaprasad96 |
Hi @camilamacedo86, We have been doing more tests and this is what we have found: The issue we experienced is not related to the operator-sdk upgrade process for our project. For what we have seen, kube-metrics tries to list all the GVKs defined in the GKVs it receives, whether they are being used or not.
Creating a new empty operator-sdk project v0.15.2 and just setting that modification we are able to reproduce the problem. If we focus for example in the line imagev1.Install, "imagev1" is the import:
And imagev1.Install corresponds to: Which is an addToScheme function. That addToScheme function (https://github.com/openshift/api/blob/c99157bc6492218400b18cde7f66510f9ee56a9a/image/v1/register.go#L16) does:
Notice for example the dockerpre012.AddToScheme, which is another package in OpenShift
Our operator does not make use of image.openshift.io/pre012 defined Kinds. Consequently, we don't have permissions for those kinds or API Group on the role.yaml as we don't use them in our operator nor we desire to give unnecessary permissions to it. We just use a subset of the registered Kinds. However kube-metrics tries to performs actions with all of them (List). This happens with all registered Kinds that are automatically added when you add them to the Scheme. The observed log errors are due to those Kinds that we don't have permissions for, as we don't use them, but they are added to the list of GKVs anyway and kube-metrics list them all. In order to avoid having to add permissions in our role.yaml for Kinds or API groups that our operator doesn't use at all we have tried to apply a workaround to further filter the GKVs before sending them to kube-metrics. This can be seen here: |
Hi @miguelsorianod, Thank you for your input and reply. Also, really thank you for providing the steps and information for we are able to check it. We will be checking this one and I will keep you updated. c/c @varshaprasad96 |
Hi @miguelsorianod, Really thank you for your inputs. They were very helpful. Besides, I understand it is solved for you already let's add a summary here for you check the taken actions as well. The error
Please, let us know if you have anything more that you believe that should be addressed here. Otherwise, this issue will be closed when the PR #2606 get merged. |
Hi,
I'm trying to upgrade our existing operator to be based on operator-sdk v0.15.2 but I'm experiencing problems that seem related to the addition of metrics.
What did you do?
Run an operator.
I've also made sure that the changes implemented in 0.9.x related to metrics https://github.com/operator-framework/operator-sdk/blob/master/doc/migration/version-upgrade-guide.md#v09x and newer changes are incorporated into the cmd/manager/main.go file.
What did you expect to see?
The operator starts without errors
What did you see instead? Under which circumstances?
When starting the operator, lots of errors related to kube-metrics are reported.
For example:
Other surprising errors that we've seen among the logs/executions are:
or:
The errors keep being repeated constantly.
Additionally, each time the operator starts the failed to list errors are different each time apparently (the types that they refer to).
Some of the error messages are kind of surprising too. For example, this one:
That refers to daemonsets. In our operator code we are not using any DaemonSet.
The role.yaml file is:
To provide some more information that might be relevant, we are deploying OpenShift objects too, and on our addtoscheme_* file in the pkg/apis package we add some OpenShift schemes into the "AddToSchemes" variable:
Do you know what might be happening?
Environment
Kubernetes cluster kind:
OpenShift 4.1.24 installation
Are you writing your operator in ansible, helm, or go?
The operator is written in Go
The text was updated successfully, but these errors were encountered: