Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEBUG Still waiting for the Kubernetes API: Get https://mydomain.kz:6443/version?timeout=32s: EOF #2615

Closed
Nurlan199206 opened this issue Nov 2, 2019 · 36 comments
Labels
platform/google triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Nurlan199206
Copy link

Nurlan199206 commented Nov 2, 2019

I wanna build Openshift Container Platform cluster on bare metal. I am using GCP ComputeEngine for this.

RHEL 7 on VM instances...

i have:
1 bootstrap
3 masters
2 workers
1 LB for API (haproxy)

Version

4.2

$ openshift-install version
openshift-install v4.2.0
built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb
release image quay.io/openshift-release-dev/ocp-release@sha256:c5337afd85b94c93ec513f21c8545e3f9e36a227f55d41bc1dfb8fcc3f2be129

Platform:

What happened?

DEBUG OpenShift Installer v4.2.0                   
DEBUG Built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb 
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp.sysadm.kz:6443... 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF

Enter text here.
See the troubleshooting documentation for ideas about what information to collect.
For example, if the installer fails to create resources, attach the relevant portions of your .openshift_install.log.

What you expected to happen?

Openshift can't find API....
Enter text here.

How to reproduce it (as minimally and precisely as possible)?

./openshift-install wait-for bootstrap-complete --log-level debug

$ ./openshift-install wait-for bootstrap-complete --log-level debug

Anything else we need to know?

image
my DNS.
image
image

my LB config

my LB config

listen stats
    bind :9000
    mode http
    stats enable
    stats uri /
    monitor-uri /healthz
frontend openshift-api-server
    bind 10.172.0.3:6443
    default_backend openshift-api-server
    mode tcp
    option tcplog
backend openshift-api-server
    balance source
    mode tcp
    server bootstrap 10.132.0.2:6443 check
    server master0 10.166.0.2:6443 check
    server master1 10.164.0.23:6443 check
    server master2 10.166.0.6:6443 check
    
frontend machine-config-server
    bind 10.172.0.3:22623
    default_backend machine-config-server
    mode tcp
    option tcplog
backend machine-config-server
    balance source
    mode tcp
    server bootstrap 10.132.0.2:22623 check
    server master0 10.166.0.2:22623 check
    server master1 10.164.0.23:22623 check
    server master2 10.166.0.6:22623 check
  
frontend ingress-http
    bind 10.172.0.3:80
    default_backend ingress-http
    mode tcp
    option tcplog
backend ingress-http
    balance source
    mode tcp
    server worker0 10.166.0.4:80 check
    server worker1 10.166.0.5:80 check
   
frontend ingress-https
    bind 10.172.0.3:443
    default_backend ingress-https
    mode tcp
    option tcplog
backend ingress-https
    balance source
    mode tcp
    server worker0 10.166.0.4:443 check
    server worker1 10.166.0.5:443 check

Enter text here.

References

  • enter text here.
@Nurlan199206
Copy link
Author

ANY HELP????

@abhinavdahiya
Copy link
Contributor

Make sure you have the DNS, LB, conenctivity setup correctly based on
https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#installation-network-user-infra_installing-bare-metal
https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#installation-dns-user-infra_installing-bare-metal

Also, you can capture the failure logs by using

openshift-install gather bootstrap --bootstrap <bootstrap-host-ip> --master <control-plane-host-ip> [--master <control-plane-host-ip> ...]

which will provide us the necessary logs to debug the failure.

@Nurlan199206
Copy link
Author

Nurlan199206 commented Nov 5, 2019

@abhinavdahiya i need buy something from here? https://cloud.redhat.com/openshift/install/metal/user-provisioned for example pull secret? []

@abhinavdahiya
Copy link
Contributor

@abhinavdahiya i need buy something from here? https://cloud.redhat.com/openshift/install/metal/user-provisioned for example pull secret? []

i'm not sure what you mean by buy something from here, you need the pullsecret so that you can pull container images for the redhat components.

@redmark-redhat
Copy link

redmark-redhat commented Nov 15, 2019

I'm seeing the same error here, an solution?

fatal: [192.168.79.2]: FAILED! => {"changed": true, "cmd": "openshift-install --dir=pwd wait-for bootstrap-complete --log-level debug", "delta": "0:30:00.132730", "end": "2019-11-15 10:11:17.169260", "msg": "non-zero return code", "rc": 1, "start": "2019-11-15 09:41:17.036530", "stderr": "level=debug msg="OpenShift Installer unreleased-master-1805-g425e4ff0037487e32571258640b39f56d5ee5572"\nlevel=debug msg="Built from commit 425e4ff"\nlevel=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ocp-ppc64le-test-099bdc.redhat.com:6443...\"\nlevel=debug msg="Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")"

Also tried wget

wget https://api.ocp-ppc64le-test-099bdc.redhat.com:6443
--2019-11-15 10:27:53-- https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/
Resolving api.ocp-ppc64le-test-099bdc.redhat.com (api.ocp-ppc64le-test-099bdc.redhat.com)... 192.168.122.168
Connecting to api.ocp-ppc64le-test-099bdc.redhat.com (api.ocp-ppc64le-test-099bdc.redhat.com)|192.168.122.168|:6443... connected.
ERROR: The certificate of ‘api.ocp-ppc64le-test-099bdc.redhat.com’ is not trusted.
ERROR: The certificate of ‘api.ocp-ppc64le-test-099bdc.redhat.com’ hasn't got a known issuer.

@abhinavdahiya
Copy link
Contributor

@redmark-alt

I'm seeing the same error here, an solution?

it isn't the same error..

DEBUG Still waiting for the Kubernetes API: Get https://api.ocp.sysadm.kz:6443/version?timeout=32s: EOF 

vs yours

Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")
  1. is this the same platform as above ie GCP
  2. how are you creating the cluster?

and are you using layer-4 LB and hopefully your LB is not doing the tls termination.

@redmark-redhat
Copy link

No, the platform is RHEL 8 with the OpenShift cluster configured in a KVM environment. We have a set of ansible playbooks configuring the cluster. This the command that fails

name: wait for bootstrap complete
  tags: config
  shell: openshift-install --dir=`pwd` wait-for bootstrap-complete --log-level debug
  args:
    chdir: "{{ workdir }}"
  retries: 1
  delay: 0

Yesterday the error message was a little different as seen here.

Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: EOF\"\nlevel=debug msg=\"Still waiting for the Kubernetes API: Get https://api.ocp-ppc64le-test-099bdc.redhat.com:6443/version?timeout=32s: EOF\"\nlevel=debug

I don't remember making a change to any of the install playbooks. Let me run it again.

@Nurlan199206
Copy link
Author

@abhinandan13jan

./openshift-install gather bootstrap --bootstrap 10.132.0.2 --master ocp-master01.sysadm.kz
INFO Pulling debug logs from the bootstrap machine 
FATAL failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

@Nurlan199206
Copy link
Author

but SSH via ssh root@ocp-master01.sysadm.kz it works between bootstrap and master01 nodes..

@Nurlan199206
Copy link
Author

Nurlan199206 commented Nov 23, 2019

Still endless :6443/version?timeout=32s: EOF HELP!!!! LB,DNS settings correct!!!

@Nurlan199206
Copy link
Author

Openshift 4.x supports only RedHat CoreOS? becuase i'm using RHEL 7 for cluster.

@ChrystianDuarte
Copy link

Still endless :6443/version?timeout=32s: EOF HELP!!!! LB,DNS settings correct!!!

I have the same problem
Any ideas?

@jomeier
Copy link
Contributor

jomeier commented Dec 1, 2019

I had the same problem yesterday. I often create / delete VMs for tests.

Restart the load Balancer. In my case that helped.

@abhinavdahiya
Copy link
Contributor

but SSH via ssh root@ocp-master01.sysadm.kz it works between bootstrap and master01 nodes..

Make sure you are using RHCOS for control-plane that's the only supported OS.
and the user used by installer gather is core and not root.

if you specified the public SSH key during installation, the machines should already have that.

And as for the error. the only way we can help debug is if you provide the log bundle using openshift-install gather bootstrap --bootstrap <bootstrap-host-ip> --master <control-plane-0-ip> [--master <control-plane-$idx-ip>]

you can run openshift-install gather bootstrap --help for information on how to specify the SSH key, otherwise it tries to use an already running SSH agent..

@abhinavdahiya abhinavdahiya added the triage/needs-information Indicates an issue needs more information in order to work on it. label Dec 2, 2019
@whls
Copy link

whls commented Dec 19, 2019

@abhinavdahiya I Ihave the some error:
[root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer v4.2.1
DEBUG Built from commit e349157
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF

I already collect logs with command:
[root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com
INFO Pulling debug logs from the bootstrap machine
INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"

Could you please help to debug this problem?
log-bundle-20191219151525.tar.gz

@jomeier
Copy link
Contributor

jomeier commented Dec 19, 2019 via email

@whls
Copy link

whls commented Dec 19, 2019

@jomeier Thanks for you reply.
Yes, I have a haproxy server for LB.
Here is my HAproxy server configuration

[root@api ocp4]# cat /etc/haproxy/haproxy.cfg

log         127.0.0.1 local2

chroot      /var/lib/haproxy
pidfile     /var/run/haproxy.pid
maxconn     4000
user        haproxy
group       haproxy
daemon


stats socket /var/lib/haproxy/stats

defaults
mode                    http
log                     global
option                  httplog
option                  dontlognull
option http-server-close

option                  redispatch
retries                 3
timeout http-request    10s
timeout queue           1m
timeout connect         10s
timeout client          1m
timeout server          1m
timeout http-keep-alive 10s
timeout check           10s
maxconn                 3000

listen stats
bind :9000
mode http
stats enable
stats uri /
monitor-uri /healthz


frontend openshift-api-server
bind *:6443
default_backend openshift-api-server
mode tcp
option tcplog

backend openshift-api-server
balance source
mode tcp
server bootstrap 9.98.30.45:6443 check
server master0 9.98.30.46:6443 check
server master1 9.98.30.47:6443 check
server master2 9.98.30.48:6443 check

frontend machine-config-server
bind *:22623
default_backend machine-config-server
mode tcp
option tcplog

backend machine-config-server
balance source
mode tcp
server bootstrap 9.98.30.45:22623 check
server master0 9.98.30.46:22623 check
server master1 9.98.30.47:22623 check
server master2 9.98.30.48:22623 check

frontend ingress-http
bind *:80
default_backend ingress-http
mode tcp
option tcplog

backend ingress-http
balance source
mode tcp
server worker0 9.98.30.54:80 check
server worker1 9.98.30.55:80 check
server worker2 9.98.30.56:80 check

frontend ingress-https
bind *:443
default_backend ingress-https
mode tcp
option tcplog

backend ingress-https
balance source
mode tcp
server worker0 9.98.30.54:443 check
server worker1 9.98.30.55:443 check
server worker2 9.98.30.56:443 check

The HAproxy service is running ,and the port is opening

[root@api ocp4]# netstat -tunlp |grep 80
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      5294/haproxy
udp        0      0 0.0.0.0:67              0.0.0.0:*                           7780/dnsmasq
[root@api ocp4]# netstat -tunlp |grep 443
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      5294/haproxy
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      5294/haproxy
[root@api ocp4]# netstat -tunlp |grep 22623
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      5294/haproxy

@whls
Copy link

whls commented Dec 19, 2019

Here is my DNS configuration

[root@ns1 ignition]# cat /var/named/data/whls.com.zone
$TTL 1W
@       IN      SOA     ns1.whls.com.   root (
                        2019070700      ; serial
                        3H              ; refresh (3 hours)
                        30M             ; retry (30 minutes)
                        2W              ; expiry (2 weeks)
                        1W )            ; minimum (1 week)
        IN      NS      ns1.whls.com.
        IN      MX 10   smtp.whls.com.
;
;
ns1     IN      A       9.98.30.44
smtp    IN      A       9.98.30.44
;
; The api points to the IP of your load balancer
api.ocp4                IN      A       9.98.30.59
api-int.ocp4            IN      A       9.98.30.59
;
; The wildcard also points to the load balancer
*.apps.ocp4             IN      A       9.98.30.59
;
; Create entry for the bootstrap host
bootstrap.ocp4  IN      A       9.98.30.45
;
; Create entries for the master hosts
master0.ocp4            IN      A       9.98.30.46
master1.ocp4            IN      A       9.98.30.47
master2.ocp4            IN      A       9.98.30.48
;
; Create entries for the worker hosts
worker0.ocp4            IN      A       9.98.30.54
worker1.ocp4            IN      A       9.98.30.55
worker2.ocp4            IN      A       9.98.30.56
;
; The ETCd cluster lives on the masters...so point these to the IP of the masters
etcd-0.ocp4     IN      A       9.98.30.46
etcd-1.ocp4     IN      A       9.98.30.47
etcd-2.ocp4     IN      A       9.98.30.48
;
; The SRV records are IMPORTANT....make sure you get these right...note the trailing dot at the end...
_etcd-server-ssl._tcp.ocp4.whls.com     IN      SRV     0 10 2380 etcd-0.ocp4.whls.com.
_etcd-server-ssl._tcp.ocp4.whls.com     IN      SRV     0 10 2380 etcd-1.ocp4.whls.com.
_etcd-server-ssl._tcp.ocp4.whls.com     IN      SRV     0 10 2380 etcd-2.ocp4.whls.com.
;
;EOF


[root@ns1 ignition]# cat /var/named/data/named.whls.zone
$TTL 1W
@       IN      SOA     ns1.whls.com.   root (
                        2019070700      ; serial
                        3H              ; refresh (3 hours)
                        30M             ; retry (30 minutes)
                        2W              ; expiry (2 weeks)
                        1W )            ; minimum (1 week)
        IN      NS      ns1.whls.com.
;
; syntax is "last octet" and the host must have fqdn with trailing dot
46      IN      PTR     master0.ocp4.whls.com.
47      IN      PTR     master1.ocp4.whls.com.
48      IN      PTR     master2.ocp4.whls.com.
;
45      IN      PTR     bootstrap.ocp4.whls.com.
;
59      IN      PTR     api.ocp4.whls.com.
59      IN      PTR     api-int.ocp4.whls.com.
;
54      IN      PTR     worker0.ocp4.whls.com.
55      IN      PTR     worker1.ocp4.whls.com.
56      IN      PTR     worker2.ocp4.whls.com.
;
;EOF

@jomeier
Copy link
Contributor

jomeier commented Dec 19, 2019

Have you restarted HAProxy right after the bootstrap server has finished / after the control plane with the masters was ready?

@abhinavdahiya
Copy link
Contributor

@abhinavdahiya I Ihave the some error:
[root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer v4.2.1
DEBUG Built from commit e349157
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF

I already collect logs with command:
[root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com
INFO Pulling debug logs from the bootstrap machine
INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"

Could you please help to debug this problem?
log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving

The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

@whls
Copy link

whls commented Dec 20, 2019

@abhinavdahiya I Ihave the some error:
[root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer v4.2.1
DEBUG Built from commit e349157
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF
I already collect logs with command:
[root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com
INFO Pulling debug logs from the bootstrap machine
INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"
Could you please help to debug this problem?
log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving

The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

Thanks for your help.
Yes, I checked my DNS server. It can't be resolved quay.io.
Must all nodes be able to access quay.io? include bootstrap, master and worker?

@jomeier
Copy link
Contributor

jomeier commented Dec 20, 2019 via email

@whls
Copy link

whls commented Dec 20, 2019

@abhinavdahiya @jomeier
Thanks for all your help!
After setup DNS forward to public, I have completed the cluster installation. :)
Another question:
I configuration 3 worker nodes for cluster, but after installation, only 2 worker nodes joined cluster, So whether only two work nodes can join automatically by default, If you want more work nodes, you need to join the cluster manually?

@Nurlan199206
Copy link
Author

openshift-install gather bootstrap --bootstrap 10.166.0.2 --master 10.132.0.2
INFO Pulling debug logs from the bootstrap machine 
FATAL failed to run remote command: Process exited with status 127 

Снимок экрана 2020-01-26 в 00 58 41

@abhinavdahiya
Copy link
Contributor

openshift-install gather bootstrap --bootstrap 10.166.0.2 --master 10.132.0.2
INFO Pulling debug logs from the bootstrap machine 
FATAL failed to run remote command: Process exited with status 127 

Снимок экрана 2020-01-26 в 00 58 41

What Image are you using to boot your bootstrap, control plane and compute?

@Dennys503
Copy link

I have the same problem: openshift-install wait-for bootstrap-complete --log-level debug
2020-01-22T17:22:24-06:00" level=debug msg="OpenShift Installer v4.2.13"
level=debug msg="Built from commit 46f909e"
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.openshift.empresa.com:6443..."
level=debug msg="Still waiting for the Kubernetes API: Get https://api.openshift.empresa.com:6443/version?timeout=32s: EOF"
level=debug msg="Still waiting for the Kubernetes API: Get https://api.openshift.empresa.com:6443/version?timeout=32s: EOF"

openshift-install gather bootstrap --bootstrap bootstrap.openshift.empresa.com --master master.openshift.empresa.com
INFO Pulling debug logs from the bootstrap machine
FATAL failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

@Dennys503
Copy link

@abhinavdahiya I Ihave the some error:
[root@api ocp4]# ./openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer v4.2.1
DEBUG Built from commit e349157
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.whls.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.whls.com:6443/version?timeout=32s: EOF
I already collect logs with command:
[root@api log]# /root/ocp4/openshift-install gather bootstrap --bootstrap bootstrap.ocp4.whls.com --master master0.ocp4.whls.com
INFO Pulling debug logs from the bootstrap machine
INFO Bootstrap gather logs captured here "log-bundle-20191219151525.tar.gz"
Could you please help to debug this problem?
log-bundle-20191219151525.tar.gz

from bootstrap/journals/release-image.service

Dec 19 05:58:46 bootstrap.ocp4.whls.com release-image-download.sh[1602]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0: pinging docker registry returned: Get https://quay.io/v2/: dial tcp: lookup quay.io on 9.98.30.44:53: server misbehaving

The bootstrap-host cannot connect to quay.io to download the release-image. That seems to be the cause for failure..

Thanks for your help.
Yes, I checked my DNS server. It can't be resolved quay.io.
Must all nodes be able to access quay.io? include bootstrap, master and worker?

how did you test your dns connectivity with quay.io

@Nurlan199206
Copy link
Author

Nurlan199206 commented Feb 1, 2020

how to bypass this? i'm stuck on endless unable to get REST mapping for
log-bundle-20200201134119.tar.gz

Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: [#2652] failed to create some manifests: Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: [#2653] failed to create some manifests: Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: [#2654] failed to create some manifests: Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:21 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:22 localhost bootkube.sh[6878]: [#2655] failed to create some manifests: Feb 01 18:41:22 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-master-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-master-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1" Feb 01 18:41:22 localhost bootkube.sh[6878]: "99_openshift-machineconfig_99-worker-ssh.yaml": unable to get REST mapping for "99_openshift-machineconfig_99-worker-ssh.yaml": no matches for kind "MachineConfig" in version "machineconfiguration.openshift.io/v1"
photo_2020-02-02 00 46 14
photo_2020-02-02 00 46 20
photo_2020-02-02 00 46 25

@vrutkovs
Copy link
Member

vrutkovs commented Feb 1, 2020

CVO doesn't have a place to run:

I0201 18:41:21.143569       1 apps.go:115] Deployment cluster-version-operator is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1, reason: MinimumReplicasUnavailable, message: Deployment does not have minimum availability.)

log bundle contains only one master, which is not sufficient for install. You'd need 3 masters + 2 workers, see https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#machine-requirements_installing-bare-metal

/close

@openshift-ci-robot
Copy link
Contributor

@vrutkovs: Closing this issue.

In response to this:

CVO doesn't have a place to run:

I0201 18:41:21.143569       1 apps.go:115] Deployment cluster-version-operator is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1, reason: MinimumReplicasUnavailable, message: Deployment does not have minimum availability.)

log bundle contains only one master, which is not sufficient for install. You'd need 3 masters + 2 workers, see https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html#machine-requirements_installing-bare-metal

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@milan-dikkumburage
Copy link

Hi @Nurlan199206 are you able to fix the issue. What are steps you take to resolve the issue ?

I'm getting slimier error
image

[core@okd4-services ~]$ openshift-install gather bootstrap --dir=install_dir/ --bootstrap xxx.xxx.xxx.xxx --master xxx.xxx.xxx.xxx
INFO Pulling debug logs from the bootstrap machine
FATAL failed to run remote command: Process exited with status 127

@josephsadek
Copy link

@abhinavdahiya @jomeier
Thanks for all your help!
After setup DNS forward to public, I have completed the cluster installation. :)
Another question:
I configuration 3 worker nodes for cluster, but after installation, only 2 worker nodes joined cluster, So whether only two work nodes can join automatically by default, If you want more work nodes, you need to join the cluster manually?

can you show my how to configure DNS forward to public

@sheetalp304
Copy link

@abhinavdahiya @jomeier
Thanks for all your help!
After setup DNS forward to public, I have completed the cluster installation. :)
Another question:
I configuration 3 worker nodes for cluster, but after installation, only 2 worker nodes joined cluster, So whether only two work nodes can join automatically by default, If you want more work nodes, you need to join the cluster manually?

I am facing the same issue, not able to resolve quay.in
Can you provide the steps to set DNS forward to public which worked in your case?

@ablaabiyad
Copy link

Still endless :6443/version?timeout=32s: EOF HELP!!!! LB,DNS settings correct!!!

I have the same issue on Virtualbox, if you managed to correct this, would you please share a hint?

@ablaabiyad
Copy link

@ablaabiyad check this:

https://github.com/Nurlan199206/okd4/blob/master/local

https://github.com/Nurlan199206/okd4/blob/master/haproxy.cfg

Still have the same issue using your haproxy and I cannot even retrieve logs even I can access ssh with root and core to the bootstrap machine.
FATAL failed to create SSH client: failed to use the provided keys for authentication: ssh: handshake failed: ssh: unable to authenticate,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/google triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests