Streamline cluster up output #13636

csrwng · 2017-04-05T13:28:51Z

Outputs previous messages when either an error occurs during startup or
loglevel > 0.

Now only executes container network test if loglevel > 0 to speed up
startup time.

csrwng · 2017-04-05T13:30:47Z

Output when image present:

Starting OpenShift using openshift/origin:v3.6.0-alpha.0 ...
OpenShift server started.

The server is accessible via web console at:
    https://127.0.0.1:8443

You are logged in as:
    User:     developer

To login as administrator:
    oc login -u system:admin

Output when having to pull image:

Starting OpenShift using openshift/origin:v3.6.0-alpha.0 ...
Pulling image openshift/origin:v3.6.0-alpha.0
Pulled 1/3 layers, 38% complete
Pulled 2/3 layers, 86% complete
Pulled 3/3 layers, 100% complete
Extracting
Image pull complete
OpenShift server started.

The server is accessible via web console at:
    https://127.0.0.1:8443

You are logged in as:
    User:     developer

To login as administrator:
    oc login -u system:admin

Old output is still available with --loglevel=1

csrwng · 2017-04-05T13:31:15Z

@bparees @smarterclayton ptal
@jorgemoralespou fyi

bparees · 2017-04-05T13:53:33Z

lgtm but better let @smarterclayton have the final say since he instigated these changes.

jorgemoralespou · 2017-04-05T13:54:18Z

@csrwng I see a difference between a cluster up with and without pulling images but I don't see a difference from cluster start if you keep config for second boot. In that case showing information on users is superfluous and might not be correct as the user is not really logged in. I would so this info in that case.

Also I would move the layers percentage in the pull to loglevel=1.

csrwng · 2017-04-05T13:56:26Z

@jorgemoralespou the reason the layers percentage is displayed is that if you have a particularly slow connection, you wouldn't see anything happening for a good while and you'd think that the command is just stuck.

Would this be ok for when you're reusing existing config/data?

Starting OpenShift using openshift/origin:v3.6.0-alpha.0 ...
OpenShift server started.

The server is accessible via web console at:
    https://127.0.0.1:8443

jorgemoralespou · 2017-04-05T14:01:27Z

@csrwng yes, it would be great. There was also some input from the cdk tan so adding them here. cc/ @hferentschik @LalatenduMohanty @praveenkumar I can't lookup the issue but you were also interested in this. Please, comment. Regarding the log for layers. Is there a chance that it could be on a oneliner? If not, it's fine. Not really a big issue.

csrwng · 2017-04-05T14:12:56Z

@jorgemoralespou updated the display for when you're reusing config/data.
The progress writer for download is a bigger change... we should tackle it in a different pull.

jorgemoralespou · 2017-04-05T14:48:53Z

@csrwng sounds good.

smarterclayton · 2017-04-05T14:58:36Z

I'd prefer not to get clever on progress writer. If you have to wait, as long as we don't go past 5-6 lines it's not really a problem. Clever is bad.

smarterclayton · 2017-04-05T14:59:25Z

Network check tends to be very slow for me - that's another spot where some output is useful.

smarterclayton · 2017-04-05T14:59:46Z

Nm, saw your comment

jorgemoralespou · 2017-04-05T15:18:58Z

@smarterclayton dios it mean you approve? You're the only one that matters.

smarterclayton · 2017-04-05T18:41:20Z

How often does the container network test fail? Can we only run it if something else fails first?

csrwng · 2017-04-05T18:45:38Z

@smarterclayton so it likely fails when you first run 'cluster up' on a machine that doesn't have the right firewall rules set. Unfortunately, it's not something that you notice in the initial setup of things. Everything will succeed but then you either won't be able to push to the registry or your dns lookups will fail.

So the issue is that you pay this premium every time you start cluster up when after the first time you run it successfully, you likely won't need to check any more.

csrwng · 2017-04-05T18:48:04Z

Something that would be nice would be to start the test asynchronously and then notify you that things are not right as you try to use openshift. But there's not a single interaction entry point, so that's hard.

smarterclayton · 2017-04-06T08:42:17Z

Is there a reason the test takes so long? Putting the test at loglevel 1 means no one ever runs it. I'm just looking for a way to have the value of the test without the large cost.

jorgemoralespou · 2017-04-06T09:13:03Z

I would consider these options: - have a flag to execute the tests. In case of failure print a mage to run with the flag. - have a command "oc cluster diagnose" or test/validate that could run these tests and tell you if the host is ready for rubbing oc cluster. I would move here the checks to the insecure registry and the network (and anything else we test) - always running the test on new clusters but avoid the tests if you have kept the config

smarterclayton · 2017-04-06T09:18:27Z

I'd like to just make the test faster. Flag or another command won't help new users. New clusters are annoyingly slow with it. On Apr 6, 2017, at 10:13 AM, Jorge Morales Pou <notifications@github.com> wrote: I would consider these options: - have a flag to execute the tests. In case of failure print a mage to run with the flag. - have a command "oc cluster diagnose" or test/validate that could run these tests and tell you if the host is ready for rubbing oc cluster. I would move here the checks to the insecure registry and the network (and anything else we test) - always running the test on new clusters but avoid the tests if you have kept the config — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13636 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_pw54cMBZIEVFRokF-mXwt_3ppQqYks5rtKyhgaJpZM4M0Pbm> .

jorgemoralespou · 2017-04-06T09:32:05Z

If the tests can be made faster I'm up for it. But I wouldn't trade off speed of creating a cluster of improvements can not be made for the annoyance that introducing a new flag or command would have, as users will eventually learn. And in any case, are tests really required on a node for a second time? In my laptop, once I've set up the insecure registry and firewall the chances I might hit that problem again are very low. Also in a non local environment.

csrwng · 2017-04-06T13:06:51Z

I'm investigating what's making the test slow. In theory, it should not take that long. I am hitting the master api endpoint from a container after the healthz endpoint is returning ok.

https://github.com/openshift/origin/blob/master/pkg/bootstrap/docker/openshift/cnetwork.go#L5-L20

The DNS server would maybe take a little longer to come up, but I wouldn't expect it to take as long as 20sec as I've seen sometimes.

If for whatever reason that test can't be made faster, an alternate test could be done with a pair of containers, one using the pod network and the other one using the host network.

csrwng · 2017-04-06T19:39:44Z

So the container networking test is much faster now that I've fixed a very embarrassing bug (the first part of the test was not working at all and only failing after 40 tries). So now it will run every time no matter what. If the firewall is setup correctly, it doesn't add any/much time to startup.

csrwng · 2017-04-06T19:40:18Z

[test]

jorgemoralespou · 2017-04-06T19:42:33Z

Could this embarrasing fix be backported to 1.5 branch?

csrwng · 2017-04-06T19:56:36Z

@jorgemoralespou I'll submit a fix for that branch

csrwng · 2017-04-06T20:13:54Z

@jorgemoralespou actually in 1.5 it's not broken in the same way

smarterclayton · 2017-04-06T20:36:02Z

Changes look good to me beyond that - thanks for digging in on the network bug.

csrwng · 2017-04-07T13:50:41Z

integration test seems to have gotten stuck... restesting

csrwng · 2017-04-07T16:46:01Z

#12007
[test]

Outputs previous messages when either an error occurs during startup or loglevel > 0. Now only executes container network test if loglevel > 0 to speed up startup time.

openshift-bot · 2017-04-07T18:45:28Z

Evaluated for origin test up to 55dc4ef

openshift-bot · 2017-04-07T20:08:48Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/646/) (Base Commit: 44d4f23)

csrwng · 2017-04-07T20:09:45Z

[merge]

openshift-bot · 2017-04-07T21:05:44Z

Evaluated for origin merge up to 55dc4ef

openshift-bot · 2017-04-07T22:29:32Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/282/) (Base Commit: 1ea122b) (Image: devenv-rhel7_6125)

hferentschik · 2017-04-10T14:49:48Z

@csrwng sorry, late to the game. +1 for improving on the output.

What is the best approach to get an oc version containing this change? Any chance you are building and hosting binaries as part of a pull request build? Or do I need to rebuild latest origin master myself?

csrwng · 2017-04-10T15:45:26Z

@hferentschik this will be included in the next release for origin. In the meantime, you can build master locally. If you have a working openshift environment, this is easy to do with a template:
https://github.com/csrwng/build-origin

csrwng force-pushed the clusterup_shorter_display branch from d1ee956 to 0ebd8b7 Compare April 5, 2017 14:11

LalatenduMohanty mentioned this pull request Apr 6, 2017

Print out all user information after stop/start minishift/minishift#694

Closed

csrwng force-pushed the clusterup_shorter_display branch from 0ebd8b7 to 0d63ba6 Compare April 6, 2017 19:27

csrwng force-pushed the clusterup_shorter_display branch from 0d63ba6 to 40fc3ed Compare April 6, 2017 20:10

csrwng force-pushed the clusterup_shorter_display branch from 40fc3ed to ae9c5c4 Compare April 7, 2017 13:50

Streamline cluster up output

55dc4ef

Outputs previous messages when either an error occurs during startup or loglevel > 0. Now only executes container network test if loglevel > 0 to speed up startup time.

csrwng force-pushed the clusterup_shorter_display branch from ae9c5c4 to 55dc4ef Compare April 7, 2017 18:44

openshift-bot merged commit 0d82899 into openshift:master Apr 7, 2017

csrwng deleted the clusterup_shorter_display branch April 10, 2017 15:45

Streamline cluster up output #13636

Streamline cluster up output #13636

Conversation

csrwng commented Apr 5, 2017

csrwng commented Apr 5, 2017 • edited Loading

csrwng commented Apr 5, 2017

bparees commented Apr 5, 2017

jorgemoralespou commented Apr 5, 2017

csrwng commented Apr 5, 2017

jorgemoralespou commented Apr 5, 2017 via email

csrwng commented Apr 5, 2017

jorgemoralespou commented Apr 5, 2017 via email

smarterclayton commented Apr 5, 2017 via email

smarterclayton commented Apr 5, 2017

smarterclayton commented Apr 5, 2017

jorgemoralespou commented Apr 5, 2017 via email

smarterclayton commented Apr 5, 2017

csrwng commented Apr 5, 2017

csrwng commented Apr 5, 2017

smarterclayton commented Apr 6, 2017 via email

jorgemoralespou commented Apr 6, 2017 via email

smarterclayton commented Apr 6, 2017 via email

jorgemoralespou commented Apr 6, 2017 via email

csrwng commented Apr 6, 2017

csrwng commented Apr 6, 2017

csrwng commented Apr 6, 2017

jorgemoralespou commented Apr 6, 2017 via email

csrwng commented Apr 6, 2017

csrwng commented Apr 6, 2017

smarterclayton commented Apr 6, 2017 via email

csrwng commented Apr 7, 2017

csrwng commented Apr 7, 2017

openshift-bot commented Apr 7, 2017

openshift-bot commented Apr 7, 2017

csrwng commented Apr 7, 2017

openshift-bot commented Apr 7, 2017

openshift-bot commented Apr 7, 2017 • edited Loading

hferentschik commented Apr 10, 2017

csrwng commented Apr 10, 2017

csrwng commented Apr 5, 2017 •

edited

Loading

openshift-bot commented Apr 7, 2017 •

edited

Loading