Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GET result: Internal Server Error on installation of worker node #174

Closed
james-liubang opened this issue May 11, 2020 · 16 comments
Closed
Labels
platform/vsphere triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@james-liubang
Copy link

james-liubang commented May 11, 2020

  1. Describe the bug
    regarding to installation as the steps of the link,
    On ESXI6.5, I have already installed service node, 1 bootstrap, 3 master node , when installed 2 worker nodes then they threw the issue as below:
    [025. 447099] ignition[665] : GET result: Internal Server Error
    [025. 447726] ignition[665] : GET https://api-int.lab.okd.local:22623/config/worker: attempt openshift-install doesn't support AWS GovCloud #179

i can see the console of the system on ESXI 6.5 it can't be login via ssh.

  1. Version
    https://github.com/openshift/okd/releases/download/4.4.0-0.okd-2020-04-21-163702-beta4/openshift-client-linux-4.4.0-0.okd-2020-04-21-163702-beta4.tar.gz

  2. How reproducible
    100%

  3. Log bundle

bootstrap boudle.log

**[core@okd4-bootstrap ~]$ oc adm must-gather
error: Missing or incomplete configuration info. Please login or point to an existing, complete config file:

  1. Via the command-line flag --kubeconfig
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

To view or setup config directly use the 'config' command**

@vrutkovs vrutkovs added platform/vsphere triage/needs-information Indicates an issue needs more information in order to work on it. labels May 11, 2020
@giatule
Copy link

giatule commented May 11, 2020

same issues, I tried so many times; the issue happen with compute1, compute 2 is ok

@andriuss-amd
Copy link

Is this happening before master nodes are bootstrapped or after? (before is a deliberate change in machine-config-operator)

@james-liubang
Copy link
Author

@andriuss-xilinx it is happening after master nodes are bootstrapped .

before this , i also see another problem 'ignition[483]: Get Error: x509: certificate signed by unknown authority', seems like this:
#165

@eselvam
Copy link

eselvam commented May 17, 2020

I also face same issue while building worker node after master node. It is unable to get machine config from 22623 on bootstrap node. How to fix it.

@james-liubang
Copy link
Author

Hi team, i re-install all then the above issues didn't appeared but new problem:
[core@okd4-bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
-- Logs begin at Sun 2020-05-17 06:25:46 UTC. --
May 18 07:28:47 okd4-bootstrap bootkube.sh[849]: E0518 07:28:47.736943 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:48 okd4-bootstrap bootkube.sh[849]: E0518 07:28:48.738187 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:49 okd4-bootstrap bootkube.sh[849]: E0518 07:28:49.739736 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:50 okd4-bootstrap bootkube.sh[849]: E0518 07:28:50.741369 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:51 okd4-bootstrap bootkube.sh[849]: E0518 07:28:51.742795 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:52 okd4-bootstrap bootkube.sh[849]: E0518 07:28:52.744078 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:53 okd4-bootstrap bootkube.sh[849]: E0518 07:28:53.745425 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:28:54 okd4-bootstrap bootkube.sh[849]: E0518 07:28:54.746999 1 reflector.go:153] k8s.io/client-go@v0.17.1/tools/cache/reflector.go:105: Failed to list *v1.Etcd: Get https://api-int.lab.okd.local:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0: EOF
May 18 07:29:01 okd4-bootstrap bootkube.sh[849]: I0518 07:29:01.236780 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
May 18 07:29:02 okd4-bootstrap bootkube.sh[849]: I0518 07:29:02.263193 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.

@james-liubang
Copy link
Author

[admin@okd4-services ]$ export KUBECONFIG=/install_dir/auth/kubeconfig
[admin@okd4-services ~]$ oc whoami
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get users.user.openshift.io ~)
[admin@okd4-services ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
okd4-control-plane-1 NotReady master,worker 21h v1.17.1
okd4-control-plane-2 NotReady master,worker 21h v1.17.1
okd4-control-plane-3 NotReady master,worker 21h v1.17.1
[admin@okd4-services ~]$ oc get csr
NAME AGE REQUESTOR CONDITION
csr-2nl5k 55m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-2pfzr 4h26m system:node:okd4-control-plane-2 Approved,Issued
csr-2qp9n 173m system:node:okd4-control-plane-2 Pending
csr-2wpzs 4h26m system:node:okd4-control-plane-3 Approved,Issued
csr-478lr 21h system:node:okd4-control-plane-3 Approved,Issued
csr-49dpp 106m system:node:okd4-control-plane-1 Pending
csr-4zjqf 148m system:node:okd4-control-plane-3 Pending
csr-54f9n 21h system:node:okd4-control-plane-1 Approved,Issued
csr-5lfj5 4h59m system:node:okd4-control-plane-1 Approved,Issued
csr-5sj4x 4h50m system:node:okd4-control-plane-2 Approved,Issued
csr-5wp45 3h3m system:node:okd4-control-plane-1 Pending
csr-5xn9f 169m system:node:okd4-control-plane-2 Pending
csr-64mwr 121m system:node:okd4-control-plane-1 Approved,Issued
csr-662jr 169m system:node:okd4-control-plane-3 Pending
csr-666rv 143m system:node:okd4-control-plane-2 Pending
csr-699h4 70m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-6lnzb 21h system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-769zr 4h58m system:node:okd4-control-plane-3 Pending
csr-7b67c 3h13m system:node:okd4-control-plane-3 Pending
csr-7fssl 148m system:node:okd4-control-plane-2 Pending
csr-7g79b 96m system:node:okd4-control-plane-2 Approved,Issued
csr-7s7vm 4h8m system:node:okd4-control-plane-3 Approved,Issued
csr-7thr7 4h58m system:node:okd4-control-plane-2 Approved,Issued
csr-7z4gr 19m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-8694v 5h32m system:node:okd4-control-plane-2 Approved,Issued
csr-8kmmj 106m system:node:okd4-control-plane-1 Approved,Issued
csr-96bpv 3h13m system:node:okd4-control-plane-2 Pending
csr-97l5f 3h32m system:node:okd4-control-plane-2 Approved,Issued
csr-998rd 39m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-bcbm7 138m system:node:okd4-control-plane-3 Pending
csr-bcwkt 86m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-bjwtr 24m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-bwkbj 8m20s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-clspl 3h57m system:node:okd4-control-plane-1 Approved,Issued
csr-dwnvf 4h14m system:node:okd4-control-plane-1 Approved,Issued
csr-frbkk 3h18m system:node:okd4-control-plane-1 Pending
csr-fxtl4 3h13m system:node:okd4-control-plane-2 Pending
csr-g84bv 5h34m system:node:okd4-control-plane-1 Approved,Issued
csr-gbdh5 3h19m system:node:okd4-control-plane-3 Approved,Issued
csr-gbh2w 3h34m system:node:okd4-control-plane-1 Approved,Issued
csr-gdmw2 34m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-gqjvb 122m system:node:okd4-control-plane-1 Pending
csr-gsnxz 3h49m system:node:okd4-control-plane-3 Approved,Issued
csr-gvl48 3h34m system:node:okd4-control-plane-3 Approved,Issued
csr-gwk6m 4h52m system:node:okd4-control-plane-2 Approved,Issued
csr-h4zkl 86m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-h5nzx 3h13m system:node:okd4-control-plane-2 Approved,Issued
csr-h689s 4h32m system:node:okd4-control-plane-1 Approved,Issued
csr-hv45c 21h system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-hx6jm 23m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-j6rvj 5h35m system:node:okd4-control-plane-2 Approved,Issued
csr-jd6ht 3h32m system:node:okd4-control-plane-3 Approved,Issued
csr-jttcb 3h32m system:node:okd4-control-plane-2 Approved,Issued
csr-jvr6v 106m system:node:okd4-control-plane-1 Approved,Issued
csr-jzs2g 49m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-k7z8h 3h4m system:node:okd4-control-plane-1 Approved,Issued
csr-kd794 4h14m system:node:okd4-control-plane-1 Approved,Issued
csr-krjjs 5h32m system:node:okd4-control-plane-1 Approved,Issued
csr-l48wp 70m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-l6tc4 106m system:node:okd4-control-plane-3 Pending
csr-ldswc 122m system:node:okd4-control-plane-3 Pending
csr-lg2hn 173m system:node:okd4-control-plane-3 Pending
csr-lmxv7 5h13m system:node:okd4-control-plane-3 Pending
csr-lvnm7 8m57s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-lwpsl 5h4m system:node:okd4-control-plane-2 Approved,Issued
csr-mghkk 106m system:node:okd4-control-plane-2 Pending
csr-mxjgf 143m system:node:okd4-control-plane-3 Pending
csr-n86n2 3h13m system:node:okd4-control-plane-3 Pending
csr-nfvdn 4h26m system:node:okd4-control-plane-3 Pending
csr-nh6mq 39m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-nlnvv 3h40m system:node:okd4-control-plane-1 Approved,Issued
csr-nvzrd 80m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-nwwh9 138m system:node:okd4-control-plane-3 Pending
csr-pfsxd 142m system:node:okd4-control-plane-1 Pending
csr-pjqtw 3h51m system:node:okd4-control-plane-2 Approved,Issued
csr-pl6sc 142m system:node:okd4-control-plane-2 Pending
csr-pplfx 4h50m system:node:okd4-control-plane-3 Pending
csr-qk7xx 3h50m system:node:okd4-control-plane-1 Pending
csr-qlkc2 139m system:node:okd4-control-plane-3 Pending
csr-rc9z8 5h20m system:node:okd4-control-plane-2 Approved,Issued
csr-rlskf 3h19m system:node:okd4-control-plane-1 Pending
csr-rpnf8 65m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-rsftf 121m system:node:okd4-control-plane-1 Approved,Issued
csr-s2vdf 55m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-sts4c 142m system:node:okd4-control-plane-1 Pending
csr-t25z2 3m53s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-t2rvl 112m system:node:okd4-control-plane-2 Approved,Issued
csr-t7z6v 122m system:node:okd4-control-plane-3 Approved,Issued
csr-tn822 3h24m system:node:okd4-control-plane-1 Approved,Issued
csr-v45k5 139m system:node:okd4-control-plane-3 Pending
csr-v9nts 21h system:node:okd4-control-plane-2 Approved,Issued
csr-vd72r 3h50m system:node:okd4-control-plane-3 Approved,Issued
csr-vx79j 3h49m system:node:okd4-control-plane-2 Pending
csr-w5z4d 167m system:node:okd4-control-plane-1 Pending
csr-wgp4w 5h20m system:node:okd4-control-plane-1 Approved,Issued
csr-whzsh 4h52m system:node:okd4-control-plane-3 Approved,Issued
csr-wth6q 21h system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-wvzqz 142m system:node:okd4-control-plane-3 Pending
csr-x4hhr 139m system:node:okd4-control-plane-2 Pending
csr-x5nzp 3h49m system:node:okd4-control-plane-2 Approved,Issued
csr-xvnjg 106m system:node:okd4-control-plane-3 Approved,Issued
csr-z9xp5 139m system:node:okd4-control-plane-1 Pending
csr-zbnl6 3h3m system:node:okd4-control-plane-3 Approved,Issued
csr-zggnb 54m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-znths 5h35m system:node:okd4-control-plane-1 Approved,Issued
csr-zrw5k 122m system:node:okd4-control-plane-2 Pending

@thomasmeeus
Copy link

same issue here; Bootstrapping master nodes goes fine. Worker nodes get an "Internal server errror" when fetching their config from the internal api. 100% reproducable with 4.4.0-0.okd-2020-05-18-142413

@thomasmeeus
Copy link

@vrutkovs log bundle can be downloaded over here: https://sector7g.be/log-bundle-20200518213832.tar.gz

Fyi: I've deployed similar clusters with the redhat openshift flavor without issues, so I'm quite experienced with the deployment procedure, but it could be something very stupid I'm doing wrong

some additional info:

  • $loadbalancer_ip:6443 & $loadbalancer_ip:22623 are "green" in the loadbalancer for all master nodes (so listening on the tcp port). The 2 workers stay red. Both workers have the "Internetal server error" on the vmware console.
  • it seems that after a long time (+1 hour) of getting the "internal server error", it seems the worker nodes seem to be able to pull in their config. It takes at least +100 failed iterations. Although, it seems the cluster never becomes a healthy state.
$ .tmp/./openshift-install wait-for bootstrap-complete --dir .tmp/config/
INFO Waiting up to 20m0s for the Kubernetes API at https://api.ocp4.cegeka.io:6443...
INFO API v1.17.1 up
INFO Waiting up to 40m0s for bootstrapping to complete...
INFO It is now safe to remove the bootstrap resources
$ oc login https://api.ocp4.cegeka.io:6443
The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): yes

error: couldn't get https://api.ocp4.cegeka.io:6443/.well-known/oauth-authorization-server: unexpected response status 404
curl -k -v https://api.ocp4.cegeka.io:6443
* Rebuilt URL to: https://api.ocp4.cegeka.io:6443/
*   Trying 172.29.49.101...
* TCP_NODELAY set
* Connected to api.ocp4.cegeka.io (172.29.49.101) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=api.ocp4.cegeka.io
*  start date: May 18 19:31:58 2020 GMT
*  expire date: Jun 17 19:31:59 2020 GMT
*  issuer: OU=openshift; CN=kube-apiserver-lb-signer
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7febef00b400)
> GET / HTTP/2
> Host: api.ocp4.cegeka.io:6443
> User-Agent: curl/7.54.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 403
< audit-id: 8f1c62b6-edd5-4c53-8bc6-8345cc1f48e3
< content-type: application/json
< x-content-type-options: nosniff
< content-length: 233
< date: Mon, 18 May 2020 20:15:01 GMT
<
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
* Connection #0 to host api.ocp4.cegeka.io left intact
$ export KUBEADMIN=auth/kubeconfig
$ oc get nodes
error: You must be logged in to the server (Unauthorized)

@vrutkovs
Copy link
Member

INFO It is now safe to remove the bootstrap resources

Is the bootstrap node removed after that message?

@thomasmeeus
Copy link

Nope, the bootstrap node remains online. I know by experience that this is fine for Redhat Openshift.
Should it get removed for OKD? Maybe because it doesn't server the compute-configuration and the loadbalancer still points to it?

@vrutkovs
Copy link
Member

Nope, the bootstrap node remains online.

Why? Bootstrap's MCS won't serve files to any node except masters - openshift/machine-config-operator@c894879

@thomasmeeus
Copy link

Cool. I didn't know it came that close, but removing the bootstrap node indeed fixed it.
Maybe the log line could be clarified a bit, cause now I got the impression that it wasn't mandatory to remove the boostrap node.

Many thanks!
I'm not the owner of this issue, but I guess it's solved now

@vrutkovs
Copy link
Member

No must-gather for more than a week, closing.

Please refrain from "same here" and "+1", file different issues

@eselvam
Copy link

eselvam commented May 19, 2020

so, after master node bootstraped, do we need to remove bootstrap node before we bring up worker node?

Kindly clarify.

@rainbowjose
Copy link

Its is normal. You just need to wait until installing complete. Patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/vsphere triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

7 participants