Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCP v3.9.0-0.53.0 "oc cluster up" never finishes #18747

Closed
adelton opened this Issue Feb 26, 2018 · 22 comments

Comments

Projects
None yet
9 participants
@adelton
Copy link
Contributor

adelton commented Feb 26, 2018

Attempt to start cluster with oc cluster up with OCP oc and image never finishes.

Version

[provide output of the openshift version or oc version command]

oc v3.9.0-0.53.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Steps To Reproduce
  1. Configure 3.9 repo.
  2. yum install /usr/bin/oc
  3. Log in to registry.reg-aws.openshift.com.
  4. docker pull registry.reg-aws.openshift.com/openshift3/ose:v3.9.0-0.53.0
  5. oc cluster up '--image=registry.reg-aws.openshift.com/openshift3/ose'
Current Result
Using nsenter mounter for OpenShift volumes
Using 127.0.0.1 as the server IP
Starting OpenShift using registry.reg-aws.openshift.com/openshift3/ose:v3.9.0-0.53.0 ...
Expected Result

Output similar to what I get with OCP 3.7:

Starting OpenShift using registry.access.redhat.com/openshift3/ose:v3.7.31 ...
OpenShift server started.

The server is accessible via web console at:
    https://127.0.0.1:8443

You are logged in as:
    User:     developer
    Password: <any value>

To login as administrator:
    oc login -u system:admin

and the oc cluster up command finishing, putting me back to the terminal.

Additional Information

[try to run $ oc adm diagnostics (or oadm diagnostics) command if possible]
[if you are reporting issue related to builds, provide build logs with BUILD_LOGLEVEL=5]
[consider attaching output of the $ oc get all -o json -n <namespace> command to the issue]
[visit https://docs.openshift.org/latest/welcome/index.html]

This is both on RHEL 7.4 and RHEL 7.5 nightly.

@jwforres

This comment has been minimized.

Copy link
Member

jwforres commented Feb 26, 2018

@adelton if you keep the same oc binary but tell it to use a different image version with --version=3.7.0 does that work? Might be something like missing an expected image and so something its waiting to start up never appears.

@jwforres

This comment has been minimized.

Copy link
Member

jwforres commented Feb 26, 2018

@openshift/sig-master

Think we need more info before knowing which part of cluster up is causing the problem.

@mfojtik

This comment has been minimized.

Copy link
Member

mfojtik commented Feb 26, 2018

@adelton can you try to run this with --loglevel=5 so we can see why it gets stuck... also docker ps -a might reveal if the container crashed with something weird.

@adelton

This comment has been minimized.

Copy link
Contributor Author

adelton commented Feb 27, 2018

Trying with the same atomic-openshift-clients-3.9.0-0.53.0.git.0.3b81e2d.el7.x86_64 and 3.7 makes things finish fine:

# docker pull registry.reg-aws.openshift.com/openshift3/ose:v3.7
Trying to pull repository registry.reg-aws.openshift.com/openshift3/ose ... 
v3.7: Pulling from registry.reg-aws.openshift.com/openshift3/ose
9a32f102e677: Already exists 
b8aa42cec17a: Already exists 
5f74dd56d4e1: Pull complete 
2fe007d0ec2d: Pull complete 
Digest: sha256:bf047055665628ee3dae30400c54a0b87229b3d4997c89c8cb29852317fcd95e
Status: Downloaded newer image for registry.reg-aws.openshift.com/openshift3/ose:v3.7
# oc cluster up --image='registry.reg-aws.openshift.com/openshift3/ose' --version=v3.7
Using nsenter mounter for OpenShift volumes
Using 127.0.0.1 as the server IP
Starting OpenShift using registry.reg-aws.openshift.com/openshift3/ose:v3.7 ...
OpenShift server started.

The server is accessible via web console at:
    https://127.0.0.1:8443

You are logged in as:
    User:     developer
    Password: <any value>

To login as administrator:
    oc login -u system:admin
@adelton

This comment has been minimized.

Copy link
Contributor Author

adelton commented Feb 27, 2018

Running with --loglevel=5 ends with infinite

I0227 09:37:52.832039   32310 import.go:41] Creating openshift-infra/openshift-web-console
I0227 09:37:52.835956   32310 up.go:1417] Importing template service broker apiserver from install/templateservicebroker/apiserver-template.yaml
I0227 09:37:52.836663   32310 decoder.go:224] decoding stream as YAML
I0227 09:37:52.847375   32310 import.go:41] Creating openshift-infra/template-service-broker-apiserver
-- Installing web console ... 
I0227 09:37:52.856967   32310 webconsole.go:87] instantiating web console template with parameters map[LOGLEVEL:0 NAMESPACE:openshift-web-console API_SERVER_CONFIG:apiVersion: webconsole.config.openshift.io/v1
clusterInfo:
  consolePublicURL: https://127.0.0.1:8443/console/
  loggingPublicURL: ""
  logoutPublicURL: ""
  masterPublicURL: https://127.0.0.1:8443
  metricsPublicURL: ""
extensions:
  properties: null
  scriptURLs: []
  stylesheetURLs: []
features:
  clusterResourceOverridesEnabled: false
  inactivityTimeoutMinutes: 0
kind: WebConsoleConfiguration
servingInfo:
  bindAddress: 0.0.0.0:8443
  bindNetwork: tcp4
  certFile: /var/serving-cert/tls.crt
  clientCA: ""
  keyFile: /var/serving-cert/tls.key
  maxRequestsInFlight: 0
  namedCertificates: null
  requestTimeoutSeconds: 0
 IMAGE:registry.reg-aws.openshift.com/openshift3/ose-web-console:v3.9.0-0.53.0]
I0227 09:37:53.900484   32310 webconsole.go:96] polling for web console server availability
I0227 09:37:54.900466   32310 webconsole.go:96] polling for web console server availability
I0227 09:37:55.902514   32310 webconsole.go:96] polling for web console server availability
I0227 09:37:56.900533   32310 webconsole.go:96] polling for web console server availability
I0227 09:37:57.900518   32310 webconsole.go:96] polling for web console server availability
I0227 09:37:58.900536   32310 webconsole.go:96] polling for web console server availability
I0227 09:37:59.900494   32310 webconsole.go:96] polling for web console server availability
I0227 09:38:00.900576   32310 webconsole.go:96] polling for web console server availability
I0227 09:38:01.900517   32310 webconsole.go:96] polling for web console server availability
I0227 09:38:02.900540   32310 webconsole.go:96] polling for web console server availability
I0227 09:38:03.900534   32310 webconsole.go:96] polling for web console server availability

It looks like oc cluster up is not able to pull the image from the authenticated registry.

@adelton

This comment has been minimized.

Copy link
Contributor Author

adelton commented Feb 27, 2018

But I get the same result even if I manually do

docker pull registry.reg-aws.openshift.com/openshift3/ose-web-console:v3.9.0-0.53.0
@adelton

This comment has been minimized.

Copy link
Contributor Author

adelton commented Feb 27, 2018

What sort of automated testing do we have for oc cluster up? Is it passing?

@jwforres

This comment has been minimized.

Copy link
Member

jwforres commented Feb 27, 2018

@spadgett can you take a look re: web console image

@mffiedler

This comment has been minimized.

Copy link
Contributor

mffiedler commented Feb 27, 2018

Output similar to what I get with OCP 3.7:
Starting OpenShift using registry.access.redhat.com/openshift3/ose:v3.7.31 ...
OpenShift server started.

You are using registry.access.redhat.com here which is not an authenticated registry. In the failing case, you are accessing registry.reg-aws.openshift.com which is private and authenticated. In QE openshift-ansible installs we use:

oreg_auth_user=(user)
oreg_auth_password=(pw)

in the inventory. Not sure how this translates to oc cluster up.

@spadgett

This comment has been minimized.

Copy link
Member

spadgett commented Feb 27, 2018

Console only pulls the image if not present, so pulling it manually before calling cluster up should work.

If you are able to run oc login -u system:admin, can you check what's happening in the openshift-web-console namespace to troubleshoot? You might try

$ oc get pods -n openshift-web-console
$ oc get events -n openshift-web-console
$ oc logs deployment/webconsole -n openshift-web-console

In the meantime, I'm setting up an environment to reproduce.

@adelton

This comment has been minimized.

Copy link
Contributor Author

adelton commented Feb 28, 2018

The core issue seems to be that even if I docker login against the authenticated registry beforehands, oc cluster up does not seem to take advantage of it, and neither does it have options to pass user / password to it.

For the web console to start, I had to pull both

docker pull registry.reg-aws.openshift.com/openshift3/ose-web-console:v3.9.0-0.53.0

and

docker pull registry.reg-aws.openshift.com/openshift3/ose-pod:v3.9.0-0.53.0

Then oc cluster up finished fine.

So it looks like more images are needed in 3.9 than there were in 3.7 where having single

registry.reg-aws.openshift.com/openshift3/ose:v3.7

is enough for oc cluster up --image='registry.reg-aws.openshift.com/openshift3/ose' --version=v3.7 to pass.

@mfojtik mfojtik added priority/P2 and removed priority/P1 labels Mar 5, 2018

@mareklibra

This comment has been minimized.

Copy link

mareklibra commented Mar 12, 2018

It's not working even without authenticated registry involved:

# oc cluster up --metrics=true --service-catalog --version=v3.9
Starting OpenShift using openshift/origin:v3.9 ...
Pulling image openshift/origin:v3.9
Pulled 1/4 layers, 26% complete
Pulled 2/4 layers, 55% complete
Pulled 3/4 layers, 83% complete
Pulled 4/4 layers, 100% complete
Extracting
Image pull complete
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v3.9 image ... 
   Pulling image openshift/origin:v3.9
   Pulled 1/4 layers, 26% complete
   Pulled 2/4 layers, 55% complete
   Pulled 3/4 layers, 83% complete
   Pulled 4/4 layers, 100% complete
   Extracting
   Image pull complete
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... 
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Checking service catalog version requirements ... OK
-- Starting OpenShift container ... 
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
   OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... 
   scc "privileged" added to: ["system:serviceaccount:default:registry"]
-- Installing router ... OK
-- Installing metrics ... OK
-- Importing image streams ... OK
-- Importing templates ... OK
-- Importing internal templates ... OK
-- Installing web console ... FAIL
   Error: failed to start the web console server: timed out waiting for the condition
@spadgett

This comment has been minimized.

Copy link
Member

spadgett commented Mar 12, 2018

@mareklibra What do oc version say?

@mareklibra

This comment has been minimized.

Copy link

mareklibra commented Mar 12, 2018

# oc logs po/webconsole-758f485567-rvb96 -n openshift-web-console
W0312 13:58:25.040132       1 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, web console start will continue.
W0312 13:58:25.040500       1 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, web console start will continue.
Error: AssetConfig.webconsole.config.openshift.io "" is invalid: [config.clusterInfo.consolePublicURL: Invalid value: "": must contain a scheme (e.g. https://), config.clusterInfo.consolePublicURL: Invalid value: "": must contain a host, config.clusterInfo.consolePublicURL: Invalid value: "": must have a trailing slash in path, config.clusterInfo.masterPublicURL: Invalid value: "": must contain a scheme (e.g. https://), config.clusterInfo.masterPublicURL: Invalid value: "": must contain a host]
Usage:
  origin-web-console [flags]
...
@mareklibra

This comment has been minimized.

Copy link

mareklibra commented Mar 12, 2018

# oc version
oc v3.9.0-alpha.3+78ddc10
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.9.0-alpha.4+6d21b7d-539
kubernetes v1.9.1+a0ce1bc657
@spadgett

This comment has been minimized.

Copy link
Member

spadgett commented Mar 12, 2018

You needed a newer oc built the current release-3.9 branch. There were some incompatible API changes between 3.9.0-alpha.3 and the current v3.9 console image, which is why the console won't start.

@mareklibra

This comment has been minimized.

Copy link

mareklibra commented Mar 12, 2018

This is the latest I can get from https://github.com/openshift/origin/releases .
Can you please point me to better source?

@spadgett

This comment has been minimized.

Copy link
Member

spadgett commented Mar 12, 2018

@mareklibra You'll have to build from source unfortunately until 3.9 is released. Or cluster up with --verison=v3.9.0-alpha.3, although that is a bit old now.

You might be able to use sudo docker cp origin:/usr/bin/oc . if you're running Linux to get the right oc.

@openshift-bot

This comment has been minimized.

Copy link

openshift-bot commented Jun 10, 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-bot

This comment has been minimized.

Copy link

openshift-bot commented Jul 10, 2018

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

jstaffor added a commit to aerogear/mobile-docs that referenced this issue Jul 30, 2018

Getting started procedure RHEL update (#301)
* Prereqs updated due to bug in Openshift (openshift/origin#18747)
@openshift-bot

This comment has been minimized.

Copy link

openshift-bot commented Aug 9, 2018

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@pmacko1

This comment has been minimized.

Copy link

pmacko1 commented Oct 30, 2018

Hi was this eventually fixed in 3.9 ? We are using oc cluster up for upstream product Aerogear Mobile Services and listed this as a blocker for running the product on RHEL 7.4/7.5. https://docs.aerogear.org/aerogear/latest/getting-started.html#prerequisites.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.