Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not install NodeFeatureDiscovery on OKD4.5 and 4.6 Cluster by operatorHub. #144

Closed
rupang790 opened this issue Mar 18, 2021 · 16 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@rupang790
Copy link

On my OKD Cluster, I found the NFD operator on operatorHub, so I tried to install it from operatorHub.
But It was not installed.
(Before I found it on operatorHub, I used to install it manually, which I cloned from git and use make command to deploy.)

Installation proceeded in two ways ( Install on All Namespaces-default / specific Namespace ), but the following error occurred:
image

{"level":"info","ts":1616045944.9220765,"logger":"cmd","msg":"Go Version: go1.15.5"}
{"level":"info","ts":1616045944.9221418,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1616045944.9221659,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0+git"}
{"level":"info","ts":1616045944.922823,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1616045946.8118556,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1616045946.8415146,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1616045950.8142676,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1616045950.8169394,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1616045950.817697,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1616045950.818323,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8187845,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8190293,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8192632,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8194685,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8199208,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1616045950.8208203,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8210897,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8213153,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.821549,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8217416,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8219752,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1616045950.8221507,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"nodefeaturediscovery-controller"}
{"level":"info","ts":1616045951.0225158,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"nodefeaturediscovery-controller","worker count":1}

How can I solve this? Do anyone have same error with?

@lgc0313
Copy link

lgc0313 commented Mar 23, 2021

same error on ocp 4.6.19

@ArangoGutierrez
Copy link
Contributor

Hi @lgc0313 I am going to fix this today, looks like the community operator on OperatorHub is outdated

@lgc0313
Copy link

lgc0313 commented Mar 23, 2021

@ArangoGutierrez Thanks.Waiting for repair online

@ArangoGutierrez
Copy link
Contributor

operator-framework/community-operators#3402 is merged, give it a day to roll out, all fixes should be in place now

@lgc0313
Copy link

lgc0313 commented Apr 7, 2021

@ArangoGutierrez There is still the same issue now.

@ArangoGutierrez
Copy link
Contributor

Hi @rupang790 @lgc0313 is this still an issue?
if so, would you let me know the versions of your cluster, and version of the operatorhub?

@rupang790
Copy link
Author

Hi, @ArangoGutierrez
on my OKD Cluster (4.6.0-0.okd-2021-02-14-205305) It is installed well through operatorHub.
I used NFD version 4.7 and It works for me. For @lgc0313, I will not close this issue.

However, if I would like to install Special-Resource-operator, should I delete NFD-operator for it?
(Because It saw Special-Resource-Operator install NFD itself again.)

Thank you for fix the issue.

@ArangoGutierrez
Copy link
Contributor

for SRO let's ask @dagrayvid

@dagrayvid
Copy link
Contributor

dagrayvid commented Apr 19, 2021

Hi @rupang790, we shouldn't need to uninstall NFD before installing SRO. I think the reason that it would have been installing NFD again is because SRO's dependency on NFD was out-of-date and looking for an NFD version older than 4.7, hence it was installing 4.5 or 4.6 in addition to the already installed NFD 4.7. This was updated last Friday in communityoperators so it should now work with NFD 4.7 already installed.

Let me know if you have any questions or are still running into this issue!

@rupang790
Copy link
Author

rupang790 commented Apr 20, 2021

@dagrayvid, I tried to install SRO on my cluster but it seems not installing status.
I used OperatorHub to install and check NFD version as 4.7.
image
image

I can see that it created service only (no deployments or daemonset about operators)

Every 1.0s: oc get all -n openshift-operators                                                                                                                                                          okd-bastion01: Tue Apr 20 08:21:27 2021

NAME                                READY   STATUS    RESTARTS   AGE
pod/nfd-master-2xk2r                1/1     Running   0          31h
pod/nfd-master-qzxs2                1/1     Running   0          31h
pod/nfd-master-w5t5c                1/1     Running   0          31h
pod/nfd-operator-576d77d47f-r9qrf   1/1     Running   0          31h
pod/nfd-worker-2hc2f                1/1     Running   0          31h
pod/nfd-worker-4jrs8                1/1     Running   0          31h
pod/nfd-worker-4q8jt                1/1     Running   0          31h
pod/nfd-worker-k5xmz                1/1     Running   0          31h

NAME                                                          TYPE        CLUSTER-IP	   EXTERNAL-IP   PORT(S)     AGE
service/nfd-master                                            ClusterIP   172.30.101.100   <none>        12000/TCP   31h
service/special-resource-controller-manager-metrics-service   ClusterIP   172.30.200.31    <none>        8443/TCP    10m

NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
daemonset.apps/nfd-master   3         3         3	3            3           node-role.kubernetes.io/master=   31h
daemonset.apps/nfd-worker   4         4         4	4            4           <none>                            31h

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nfd-operator   1/1     1            1           31h

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/nfd-operator-576d77d47f   1         1         1       31h

NAME                                                              AGE
vmimportconfig.v2v.kubevirt.io/vmimport-kubevirt-hyperconverged   39d

Do there have any logs about installation of operators?
One more thing is, Is SRO could be installed when GPU device existed on Node?

If I should create new Issue about this on SRO GitHub, please tell me.
Thank you for comment.

@dagrayvid
Copy link
Contributor

Hi @rupang790 I haven't seen this error before, so I will have to investigate. Please open an issue on the upstream SRO GitHub to track this.

Is this on OKD? Having GPU devices on the node should not cause any problems.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2021
@paravatha
Copy link

I am trying install NFD 4.7 on OCP 4.6.26, but getting this error

image

@rupang790 @dagrayvid were you able to get this working?

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 29, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Sep 28, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 28, 2021

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants