New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenShift fails to start when Minishift instance has changed IP address #343

Open
cmoulliard opened this Issue Jan 30, 2017 · 35 comments

Comments

Projects
None yet
@cmoulliard
Copy link

cmoulliard commented Jan 30, 2017

Version of Minishift : 1.0.0.Beta2

This error is not resorted the first time you create the VM but if you have stopped & restarted minishift

minishift start --openshift-version=v1.4.1 --memory=4000 --vm-driver=virtualbox --iso-url=https://github.com/minishift/minishift-centos-iso/releases/download/v1.0.0-beta.1/minishift-centos.iso --docker-env=[storage-driver=devicemapper]
Starting local OpenShift instance using 'virtualbox' hypervisor...
Provisioning OpenShift via '/Users/chmoulli/.minishift/cache/oc/v1.4.1/oc [cluster up --use-existing-config --host-config-dir /var/lib/minishift/openshift.local.config --host-data-dir /var/lib/minishift/hostdata]'
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 192.168.99.100 as the server IP
-- Starting OpenShift container ... FAIL
   Error: Docker run error rc=2
   Details:
     Image: openshift/origin:v1.4.1
     Entrypoint: [/bin/bash]
     Command: [-c for name in 192.168.99.100 minishift; do ls /var/lib/origin/openshift.local.config/node-$name &> /dev/null && echo $name && break; done]
E0130 10:23:41.050352    2473 start.go:246] Error starting 'cluster up':  exit status 1
@coolbrg

This comment has been minimized.

Copy link
Member

coolbrg commented Jan 30, 2017

Hi @cmoulliard ,

Could you check with minishift 1.0.0.Beta3? It support 1.4.1 by default.

Till then we will be trying to reproduce it.

@cmoulliard

This comment has been minimized.

Copy link
Author

cmoulliard commented Jan 30, 2017

I suspect that the error was coming from the fact that the ISO file wasn't completely downloaded & minishift was already trying to ssh to the VM created ....

Downloading ISO 'https://github.com/minishift/minishift-centos-iso/releases/download/v1.0.0-rc.1/minishift-centos7.iso'
 295.52 MB / 329.00 MB [==================================================================================================================================================================>------------------]  89.82% 13s
E0130 11:01:36.831166    3152 start.go:135] Error starting the VM: read tcp 192.168.1.80:53769->52.216.64.240:443: read: operation timed out. Retrying.
@cmoulliard

This comment has been minimized.

Copy link
Author

cmoulliard commented Jan 30, 2017

I have retested and now the iso has been downloaded correctly

Maybe, an additional test should be included within the code to avoid to ssh within the VM if if the iso file hasn't been downloaded completely. The question will be : how can we figure out if the iso file is 100% downloaded

minishift start --memory=4000 --vm-driver=virtualbox --iso-url=https://github.com/minishift/minishift-centos-iso/releases/download/v1.0.0-rc.1/minishift-centos7.iso --docker-env=[storage-driver=devicemapper]
Starting local OpenShift cluster using 'virtualbox' hypervisor...
Provisioning OpenShift via '/Users/chmoulli/.minishift/cache/oc/v1.4.1/oc [cluster up --use-existing-config --host-config-dir /var/lib/minishift/openshift.local.config --host-data-dir /var/lib/minishift/hostdata]'
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ...
   Deleted existing OpenShift container
-- Checking for openshift/origin:v1.4.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 192.168.99.102 as the server IP
-- Starting OpenShift container ...
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
   OpenShift server started
-- Removing temporary directory ... OK
-- Server Information ...
   OpenShift server started.
   The server is accessible via web console at:
       https://192.168.99.102:8443

   To login as administrator:
       oc login -u system:admin
@coolbrg

This comment has been minimized.

Copy link
Member

coolbrg commented Jan 30, 2017

Maybe, an additional test should be included within the code to avoid to ssh within the VM if if the iso file hasn't been downloaded completely. The question will be : how can we figure out if the iso file is 100% downloaded

What's your thought here @hferentschik ?

@hferentschik

This comment has been minimized.

Copy link
Member

hferentschik commented Jan 30, 2017

I suspect that the error was coming from the fact that the ISO file wasn't completely downloaded & minishift was already trying to ssh to the VM created .

the ISO must have been there, otherwise the VM would not be running. At the stage where you actually have OpenShift provisioning occurring, the VM has already started. Also 'oc cluster up' started to make its check and there was even a running Docker daemon. So ISO must be there and there must even have been a running Docker daemon. Why it failed, I am not sure. It might be an OpenShift provisioning issue or maybe a network issues, ...

Looking at the error message, it seems the OpenShift container itself did not start. @cmoulliard, I guess you don't have this VM around anymore. In this case we would need more context. Given that the VM was up, I would expect that 'minishift ssh' would have worked. In this case one could enter the VM and try to inspect the Docker log files, system log files, etc.

I guess the question is, are we able to re-produce this and how?

@sobkowiak

This comment has been minimized.

Copy link

sobkowiak commented Mar 3, 2017

I could successfully play with v1.0.0-beta.4 yesterday. I could stop it and start again. But when I'm trying to start it today I'm getting following error

minishift start --vm-driver virtualbox
Starting local OpenShift cluster using 'virtualbox' hypervisor...
Provisioning OpenShift via '/home/kso/.minishift/cache/oc/v1.4.1/oc [cluster up --use-existing-config --host-config-dir /var/lib/minishift/openshift.local.config --host-data-dir /var/lib/minishift/hostdata --host-volumes-dir /var/lib/minishift/openshift.local.volumes]'
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ... 
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 192.168.99.100 as the server IP
-- Starting OpenShift container ... FAIL
   Error: Docker run error rc=2
   Details:
     Image: openshift/origin:v1.4.1
     Entrypoint: [/bin/bash]
     Command: [-c for name in 192.168.99.100 minishift; do ls /var/lib/origin/openshift.local.config/node-$name &> /dev/null && echo $name && break; done]
E0304 00:48:00.016213   19262 start.go:370] Error starting the cluster:  exit status 1
@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 4, 2017

This error is basically because the ip of the vm has changed since it was created.
It's a known issue of oc cluster.

cc/ @hferentschik @csrwng

@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 4, 2017

This is a WIP PR for the issue. openshift/origin#13112

@sobkowiak

This comment has been minimized.

Copy link

sobkowiak commented Mar 7, 2017

Is it possible to prevent the ip change?

@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 8, 2017

@praveenkumar @hferentschik @gbraad this problem should be avoided by the Virt driver (KVM, Virtualbox,...). I thought Virtualbox avoided it, but I might be wrong.

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Mar 8, 2017

At the moment we can not really prevent the IP change well... especially for Hyper-V, and the external switch, we are limited by what the remote DHCP server does. we can try to implement an IP address mechanism, but this could also lead to possible conflicts when vbox (or other hypervisor) would hand out the address to another machine

We are dependent of the DHCP issuing of the hypervisor+networking setup. Not sure how they are handled when a static address is on the network... will they detect this, like with ARP or other lookup table?

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Mar 8, 2017

we might have to bump the IP address issue. Each hypervisor might react differently in these situations...

@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 8, 2017

@gbraad in that case, you'll run into issues probably until "oc 1.6"

See openshift/origin#13112

@gbraad gbraad changed the title Error starting 'cluster up': exit status 1 OpenShidt fails to start when Minishift instance has changed IP address Mar 23, 2017

@gbraad gbraad changed the title OpenShidt fails to start when Minishift instance has changed IP address OpenShift fails to start when Minishift instance has changed IP address Mar 23, 2017

@LalatenduMohanty

This comment has been minimized.

Copy link
Member

LalatenduMohanty commented Mar 23, 2017

Till this is fix openshift/origin#13112 comes down to Minishift, I suggest we document this and the possible workaround i.e. $ minishift delete then $ minishift start. As discussed with @gbraad HyperV is more prone to this issue as it depend on the external networks DHCP server for each start of minishift.

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Mar 23, 2017

@LalatenduMohanty a delete is a hard removal... would be be possible to do a more selective minishift ssh "rm -rf /var/lib/origin/" instead? That way containers (pulled) are preserved...

@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 23, 2017

@LalatenduMohanty remove the VM is a no go for me to promote minishift for windows using Hyper-V. I second @gbraad into there needs to be an easier workaround.
Pulling down GBs of images every time you destroy the VM is one of the biggest pushes backs that people will have towards minishift, so until a solution to save containers on the host is not there, we can not delete the VMs with such a happiness.

Just think what you're proposing. I am a developer. I'm working connected to my wired connection in my home office, I then decided to work from my sitting room, I connect to my wifi instead and I need to pull down some GBs of images again? That is "insane" and excuse me for saying this.

@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 23, 2017

@gbraad

At the moment we can not really prevent the IP change well... especially for Hyper-V, and the external switch, we are limited by what the remote DHCP server does. we can try to implement an IP address mechanism, but this could also lead to possible conflicts when vbox (or other hypervisor) would hand out the address to another machine
We are dependent of the DHCP issuing of the hypervisor+networking setup. Not sure how they are handled when a static address is on the network... will they detect this, like with ARP or other lookup table?

Is there no way to set a fixed IP, like with vagrant, or use an internal network that could be configured via a command line flag? So, with CDK it used to default to 10.2.2.2 and then if you were in a network with that ip range used, you could specify another one. Couldn't it be possible to follow same approach and not rely on dhcp?

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Mar 24, 2017

@jorgemoralespou
A lot of information in a very dense form. If it is unclear, please let me know...

Are you sure Hyper-V+Vagrant supported a local/fixed network? https://www.vagrantup.com/docs/hyperv/limitations.html =>

A result of this is that networking configurations in the Vagrantfile are
completely ignored with Hyper-V. Vagrant cannot enforce a static IP or
automatically configure a NAT.

I have tried setting local fixed addresses on the internal network, but ran into issues (eventually found out this was related to a known issues with Microsoft's virtual switch infra with VPN software installed; this is resolved in Insider builds). Instead, I tried to use just the internal network, but this way you get an IPv6 addresses, which causes other issues along the way #418 (and I believe OpenShift is also not very happy with an IPv6 only network? This is at least what I experienced some time ago. Please correct me if wrong.).

You can try using something like a tiny DHCP server on the internal switch, such as http://www.dhcpserver.de/cms/, and then allow NAT for outside access... but, if you have Docker for Windows installed, you are screwed as Windows 10 only allows ONE network to use NAT (and DockerNAT is already using this). and the DockerNAT network is relying on IPv6 to operate... :-s. More about the limitations can be found here: https://blogs.technet.microsoft.com/virtualization/2016/05/25/windows-nat-winnat-capabilities-and-limitations/

Over the weekend I will spend some more time on this... but I would suggest you to try using the local DHCP server approach and configuring NAT: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/setup-nat-network. You need to have the Anniversary Update of Windows 10 for this.

@jorgemoralespou

This comment has been minimized.

Copy link
Contributor

jorgemoralespou commented Mar 26, 2017

@sobkowiak

This comment has been minimized.

Copy link

sobkowiak commented Mar 27, 2017

But would not allow us to create a demo at home, go to an event, and show the demo.

I had the same problem last time. I have prepared a demo at home and wanted to check in the evening before the presentation. Had to spend almost the whole night to provision new fabric8 (due to slow internet connection in conference venue) and let it running until the presentation. Not too comfortable.

@lpsantil

This comment has been minimized.

Copy link

lpsantil commented May 10, 2017

Issue is also occurring on RHEL 7.3 w/VirtualBox using minishift 1.0.0. NAT causes minishift to use DHCP on the external side which can (and will sometimes) change when a lease expires. Bridge is slightly better as I can then hard code a DHCP reservation on my firewall/router (SmallWall). An option to set an IP would be nice or to specify the networking mode. Vagrant docs are pretty light here[0].

[0] https://www.vagrantup.com/docs/networking/public_network.html

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented May 18, 2017

@lpsantil Vagrant documentation?

Minishift does not use vagrant as a the provisioning mechanism, but instead uses libmachine. There is no functionality to handle this.... and therefore it is better to describe it as a known issue, WDYT?
/cc: @LalatenduMohanty @budhrg @praveenkumar @hferentschik

@praveenkumar

This comment has been minimized.

Copy link
Contributor

praveenkumar commented May 31, 2017

Minishift does not use vagrant as a the provisioning mechanism, but instead uses libmachine.

👍

and therefore it is better to describe it as a known issue, WDYT?

@gbraad I agree for us right now it's going to be a known issue.

@ahilbig

This comment has been minimized.

Copy link

ahilbig commented Jul 10, 2017

Please have a look at https://github.com/ahilbig/docker-machine-ipconfig . It's a workaround because the solution for assigning a static IP is not finally fixed in docker-machine - just install the script and call 'minishift-ipconfig static' after starting the cluster first time. Tested on Cygwin.

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Sep 9, 2017

Hyper-V solution being worked on: #1316

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Oct 7, 2017

General solution to assign a fixed address is being investigated: #1457

@stale

This comment has been minimized.

Copy link

stale bot commented Dec 6, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale label Dec 6, 2017

@thesteve0

This comment has been minimized.

Copy link

thesteve0 commented Dec 6, 2017

@stale stale bot removed the status/stale label Dec 6, 2017

@gbraad

This comment has been minimized.

Copy link
Member

gbraad commented Dec 6, 2017

The issue was only marked as stale, not closed. (Personally I do not like to keep this issue open, as it does not provide a lot of detailed information... and some of it is even out-of-date)

However, for Hyper-V we already have a solution which is part of the 2 releases of Minishift. See #1316 and the related documentation:
https://docs.openshift.org/latest/minishift/using/experimental-features.html#hyperv-static-ip

We also have issues that provide in more detail what the problem is, as at the moment due to the way oc cluster up creates certificates (see #1515) we can not restart the instance with a changed IP. A work around has been described, but this means trashing your OpenShift configuration.

Since some of the discussions I have worked on #1457 which forces the same IP address to the instance. But this solution is still in progress...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment