Skip to content
This repository

vagrant ssh only possible after restart network #391

Closed
sebastian-alfers opened this Issue June 15, 2011 · 115 comments
Sebastian Alfers

Hey,

i can not log into my VM after "vagrant up"

i have to start it in gui-mode, then retart my network adapter "sudo /etc/init.d/networking restart"
after this, my VM gets an ip (v4) address and my mac is able to ssh the VM and do the provisioning.

any idea on this?

same isse as here: http://groups.google.com/group/vagrant-up/browse_frm/thread/e951417f59e74b9c

the box is about 5 days old!

Thank you!
Seb

Mitchell Hashimoto
Owner

Ah, so we tried to fix this in the thread. I'm not entirely sure what the cause of this is, although it has something to do with the setup of the box. I've put a sleep in the bootup process. Please verify you have a pre-up sleep 2 in your /etc/network/interfaces file.

Otherwise, any other hints would be helpful :-\

Benedict Steele

I too am having this problem. I've tried both lucid32 & lucid64, which I downloaded today.

Before running sudo /etc/init.d/networking restart the /etc/network/interfaces looks like

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
pre-up sleep 2

Afterward restarting the networking and running vagrant reload, the file looks like

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
pre-up sleep 2
#VAGRANT-BEGIN
# The contents below are automatically generated by Vagrant.
# Please do not modify any of these contents.
auto eth1
iface eth1 inet static
      address 33.33.33.10
      netmask 255.255.255.0
#VAGRANT-END

Any ideas?

Hedgehog

ssh doesn't like two hosts at the one address.
I've seen this with two VM's getting the same address and SSH showing the same behavior (below).

Now it turns out SSH doesn't like two redirected port connections to the same port.

Symptom:

$ ssh vagrant@127.0.0.1 -p 2222 -i /path/to/private/key/vagrant -vvv
OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009
debug1: Reading configuration data /home/hedge/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2222.
debug1: Connection established.
debug3: Not a RSA1 key file /path/to/private/key/vagrant.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /path/to/private/key/vagrant type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
^C

Now I see two connections to 127.0.0.1:222

$ lsof -i :2222
ruby    9851 hedge   12u  IPv4 13467394      0t0  TCP localhost:55035->localhost:2222 (ESTABLISHED)
ruby    9851 hedge   13u  IPv4 13469354      0t0  TCP localhost:55098->localhost:2222 (ESTABLISHED)

Confirm that this is vagrant:

$ ps uax|grep 9851
hedge     9851  6.4  0.2 256080 47836 pts/4    Sl+  12:38   0:16 ruby /home/hedge/.rvm/gems/ruby-1.9.2-p180@thinit/bin/vagrant up

Confirm there is only one vm running:

$ ps aux|grep startvm
hedge     9873  4.9  2.6 706800 441432 ?       Sl   12:39   0:29 /usr/lib/virtualbox/VBoxHeadless --comment www --startvm 82cb3255-940b-48f6-b2c7-8ec50ae6500d --vrde config

So it seems the problem is that somewhere in vagrant two connections are being established to port 2222.

Correct?

Jude Venn
judev commented July 05, 2011

Could this be some sort of timing issue with the linux networking trying to start (or get an IP) before Virtualbox has finished setting up the interface? Must admit that I don't know the internals so not sure if this is even likely.
When I enable the Virtualbox GUI and login (while vagrant is still trying to connect via ssh), ifconfig reports no IPv4 address. If I then run sudo dhclient vagrant successfully connects within a couple of seconds.

Mitchell Hashimoto
Owner

@judev

If this was the case then switching VirtualBox versions back would fix the issue, which I'm not sure is the case (it may be, I don't know). I say this because previous versions of Vagrant worked just fine. This is still an isolated issue but annoying enough that I'd like to really figure it out, but haven't been able to yet.

Hedgehog

@mitchellh, in my case switching VB back to 4.0.4 seems to have eliminated the issue. VB 4.0.10 was a problem. From memory I upgraded from 4.0.6 because I was hitting some issues. At the time I had 4.06 I wasn't using vagrant much.

Anyway, stepping back to VB 4.0.4 is definitely a fix for this issue in my case.
We also can't rule out the Host OS. I say this simply because the packaged OSE versions of VB on lucid seem to be 4.0.4.

Hedgehog

@judev, what happens if you vagrant reload that VM after you have connected to it via ssh?
Are you able to ssh to it again? Run the lsof -i 2222 and note the connection details of your established ssh connection. In my case I'd see two established connections to localhost:2222 after the reload, one of them being the connection from before the reload.

Hedgehog

@judev, please add your failing and passing configuration details to this page:
https://github.com/jedi4ever/veewee/wiki/vagrant-(veewee)-+-virtualbox-versions-test-matrix

The page has an example script that makes it easy to test (change the Ruby and gem versions to what you have).
It shouldn't pollute your system if you have rvm installed.

Jude Venn
judev commented July 11, 2011

sorry for the delay - I've tried with each version of VirtualBox from 4.0.4 to 4.0.10, same problem when using the latest lucid32 box, but everything works fine using "ubuntu 11.04 server i386" from http://vagrantbox.es

@hedgehog, when I did sudo dhclient, connected over ssh, then did vagrant reload I still could not connect until doing another sudo dhclient. The previous connection did not show using lsof

Thanks for your help, am happy to say things are working really well with ubuntu 11.04.

Hedgehog

@judev, Do I understand correctly: lsof -i :2222 returned nothing after vagrant reload, then there was one connection after running sudo dhclient?

Or: Does lsof -i :2222 show two connections after vagrant reload, and this then falls to one connection after sudo client. Might help if you gave the actual commands and their outputs.

mabroor

I get the same issue.. latest version of vagrant, vbox on win7 x64 using jruby (as mentioned in the docs). Running sudo dhclient on the gui was able to get my puppet manifest running.
Strange thing is that I had another machine with the exact same setup where I encountered this issue only one i the last week. This machine has this problem constantly...

Hedgehog

@mabroor could you give the additional cmd output, in sequence, requested above?

mabroor

@hedgehog

I tried after a vagrant halt
Problem returns.. below is teh output from netstat while vagrant is waiting for the vbox to boot (it is already booted)

netstat -an
 TCP    0.0.0.0:2222           0.0.0.0:0              LISTENING
 TCP    127.0.0.1:2222         127.0.0.1:54436        TIME_WAIT
 TCP    127.0.0.1:2222         127.0.0.1:54612        FIN_WAIT_2
 TCP    127.0.0.1:2222         127.0.0.1:54618        ESTABLISHED
 TCP    127.0.0.1:2222         127.0.0.1:54624        ESTABLISHED```

I then login to the vbox and run ```sudo dhclient``` and it works fine..  When vagrant has done its thing.. so connections are shown established using netstat. I am using windows so can't use the native ssh to show verbose output.
Jonas Grimfelt
grimen commented July 27, 2011

Same issue but @sudo /etc/init.d/networking restart@ didn't solve it for me. I'm trying another box now, let's hop it works.

mabroor

@grimen: try sudo dhclient
Always works for me now.

Hedgehog

@mabroor, is it the case that, according to netstat, there are always two est connections when you cannot connect and only one when you can connect?

mabroor
Jonas Grimfelt
grimen commented July 29, 2011

@mabroor Do you maybe know the OS X corresponding solution?

mabroor

@grimen the command I mentioned has to be run in the vm. I didn't know the problem existed in OSX, I had the issue on Windows 7 x64.

Jonas Grimfelt
grimen commented July 29, 2011

@mabroor Ouch, yes of course then it even makes sense. :) Problem though is that I cannot get into the vm - how did u do that?

mabroor

config.vm.boot_mode = :gui in your vagrantfile to run the vm in gui mode.

Jonas Grimfelt
grimen commented July 29, 2011

@mabroor Thanks - will try that!

Jonas Grimfelt

I got the GUI now but none of the proposals in this thread works for me (for "lucid32" and "lucid64" that is - those seems to be flawed as 'talifun' works). :(

Michael Rolli

My combo shows the same issue: Mac OS X 10.7.1, Vagrant 0.8.5, Virtualbox 4.1.0, lucid64 with correct guest additions

After first boot vagrant could not connect to vm. In vm (GUI) there was no IP-address set. Did a sudo dhclient while vagrant was hanging and vagrant connected instantly after the guest finally had an IP.

Meanwhile I did vagrant reload twice and never had to do a sudo dhclient.

Vasko Markovski

I'm using Mac OS X 10.7.1, Vagrant 0.8.6, VirtualBox 4.1.2, lucid32 with the 4.1.0 guest additions.

I've added the following line to my Vagrant::Config and it boots up and works fine now.
config.vm.provision :shell, :inline => "/etc/init.d/networking restart"

It's not the ideal situation, but it works without needing to go into the GUI.

UPDATE: Okay. I've run this a few times and it doesn't always work. Especially when I'm connected to the internal network without an internet connection it seems.

Anatoly Mikhailov

that works for me:

1) login with :gui by login/pass: vagrant/vagrant
2) modify the “/etc/rc.local” file 
to include the line “sh /etc/init.d/networking restart” just before “exit 0″.
3) disable :gui
4) vagrant reload
Cyril Mougel

There are no technic without hacking on gui mode ?

Vasko Markovski

I've repeated the below process at least 5 times now for all scenarios.

Running vagrant up after I've started the VirtualBox application works every time.

Running vagrant up without starting the VirtualBOX application fails every time, with or without the ":gui" option.

From my simple testing it seems to be an issue with running headless.

UPDATE: I've just found this article http://serverfault.com/questions/91665/virtualbox-headless-server-on-ubuntu-missing-vrdp-options. I've just installed the Extensions pack and I've had no issues since. VRDP was removed from VirtualBox 4.0 and moved into the extension pack. I believe this might also be related to this issue #455.

UPDATE: I jumped the gun on this I think. I'm having trouble with lucid32 and lucid64 running without the ":gui" option.

Anatoly Mikhailov

I have ExtensionxPack for all time I use VirtualBox, but the network issue still on place.
It would be great to have the permanent solution for that

Hedgehog hedgehog referenced this issue from a commit in hedgehog/vagrant October 21, 2011
Hedgehog ssh shared connections. closes GH issue #391, maybe several others too
This should help the ssh connections refused errors.
It seems that it might also make redundant the new ssh session
caching code, but I really couldn't follow what was trying to be achieved
there.
6bd1b4a
Hedgehog

Can people with this issue confirm that the following pull request fixes this issue for them?

#534

Marcin Kulik

Hey @hedgehog. I've just tried your fork and it didn't solve the issue for me unfortunately :/

Hedgehog

@sickill, thanks. I think the changes are useful in speeding up the ssh connections, but they also exposed what I think is the real cause, and that is Net::SSH. I'm not sure if the problem is with Net::SSH perse, or just how it is used. Still working on a fix....

Hedgehog

By replacing Net::SSH.start(...) I was able to determine that the likely ssh error is Connection timed out during banner exchange, and occurs after the connection is established (note the timeout is set in the ssh cmd):

<snip>
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2206.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 1000 ms remain after connect
debug3: Not a RSA1 key file /home/hedge/.rvm/gems/ruby-1.9.2-p290@thvirt/gems/vagrant-0.8.7/keys/vagrant.
<snip>
debug1: identity file /home/hedge/.rvm/gems/ruby-1.9.2-p290@thvirt/gems/vagrant-0.8.7/keys/vagrant type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
Connection timed out during banner exchange

Can anyone confirm this by running (assuming a blocked VM):

  • In a bash shell running (setting 1 sec timeout):

    ssh -p 2206 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -i /home/hedge/.rvm/gems/ruby-1.9.2-p290@thvirt/gems/vagrant-0.8.7/keys/vagrant -o ControlMaster=auto -o ControlPath=~/.ssh/vagrant-multiplex-%r@%h:%p -o ConnectTimeout=1 -vvvv vagrant@127.0.0.1
    

Possibly related Issues:
chromium-os issue 20514
chromium-os issue 21739

Hedgehog hedgehog referenced this issue from a commit in hedgehog/vagrant October 29, 2011
Hedgehog ssh shared connections. closes GH issue #391, #455
    Should fix the ssh connection refused error.
     - Banner connection error handled.
     - Vagrant bails when orphaned Vagrant ssh sessions are around
     - Multiplexing SSH conecctions
     - Establish remote shell session is responsive before proceeding
     - Net::SSH and Net::Scp are removed
     - Use Aruba/ChildProcess to manage sessions (no threading)
     - tested on Ubuntu Lucid +chef-solo (0.10.4)
     - Distribution config variable + others (no parsing ssh output)
    TODO
     - Confirm with other provisioners.
     - Confirm on other distributions.
bc973cb
Hedgehog hedgehog referenced this issue from a commit in hedgehog/vagrant November 01, 2011
Hedgehog ssh shared connections. closes GH issue #391, #455, etc.
        Should fix the ssh connection refused error.
         - Banner connection error handled.
         - Vagrant bails when orphaned Vagrant ssh sessions are around
         - Multiplexing SSH conecctions
         - Establish remote shell session is responsive before proceeding
         - Net::SSH and Net::Scp are removed
         - Use Aruba/ChildProcess to manage sessions (no threading)
         - tested on Ubuntu Lucid +chef-solo (0.10.4)
         - Distribution config variable + others (no parsing ssh output)
        TODO
         - Confirm with other provisioners.
         - Confirm on other distributions.

    Likely addresses issues:

    GH issue #391, GH issue #410, GH issue #424, GH issue #443, GH issue #455, GH issue #493

    Possibly addresses/affects issues:

    GH issue #516, GH issue #353

    Overview

    Essentially between 1%-2% of reloads psuedo-fail.
    I say psuedo-fail in the sense of current behavior.
    Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

    I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
    I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

    One thing to think about in the code review is auto-triggering a reload when
    the banner exchange error occurs.
    Otherwise I think less faultly up and reloading will have to wait bootstrapping
    via a serial console.

    Command

        rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

    Total 101

    success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
    banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
    orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

    Attempt counts:

    1 (count: 1.9.2 = 155, 1.8.7 = 161)
    2 (count: 1.9.2 = 311, 1.8.7 = 317)
    3 (count: 1.9.2 = 34,  1.8.7 = 17)
    4 (count: 1.9.2 = 168, 1.8.7 = 167)
    5 (count: 1.9.2 = 31,  1.8.7 = 32)
    6 (count: 1.9.2 = 1,   1.8.7 = 96)
    7 (count: 1.9.2 = 0,   1.8.7=)
0a0cb76
Eric Plaster xuru referenced this issue from a commit in xuru/vagrant November 01, 2011
Hedgehog ssh shared connections. closes GH issue #391, #455, etc.
        Should fix the ssh connection refused error.
         - Banner connection error handled.
         - Vagrant bails when orphaned Vagrant ssh sessions are around
         - Multiplexing SSH conecctions
         - Establish remote shell session is responsive before proceeding
         - Net::SSH and Net::Scp are removed
         - Use Aruba/ChildProcess to manage sessions (no threading)
         - tested on Ubuntu Lucid +chef-solo (0.10.4)
         - Distribution config variable + others (no parsing ssh output)
        TODO
         - Confirm with other provisioners.
         - Confirm on other distributions.

    Likely addresses issues:

    GH issue #391, GH issue #410, GH issue #424, GH issue #443, GH issue #455, GH issue #493

    Possibly addresses/affects issues:

    GH issue #516, GH issue #353

    Overview

    Essentially between 1%-2% of reloads psuedo-fail.
    I say psuedo-fail in the sense of current behavior.
    Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

    I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
    I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

    One thing to think about in the code review is auto-triggering a reload when
    the banner exchange error occurs.
    Otherwise I think less faultly up and reloading will have to wait bootstrapping
    via a serial console.

    Command

        rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

    Total 101

    success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
    banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
    orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

    Attempt counts:

    1 (count: 1.9.2 = 155, 1.8.7 = 161)
    2 (count: 1.9.2 = 311, 1.8.7 = 317)
    3 (count: 1.9.2 = 34,  1.8.7 = 17)
    4 (count: 1.9.2 = 168, 1.8.7 = 167)
    5 (count: 1.9.2 = 31,  1.8.7 = 32)
    6 (count: 1.9.2 = 1,   1.8.7 = 96)
    7 (count: 1.9.2 = 0,   1.8.7=)
cb1f08c
Hedgehog

Debian/Ubuntu users:

Can you try rebuilding your boxes with this workaround:
jedi4ever/veewee#159

Please report in the veewee issue if this:

  • resolves the issue as far as you can tell (I had a reload loop succeed 101 times)
  • only reduces the severity of the issue

Non Debian/Ubuntu users, there is likely a similar facility to make these changes before the first reboot in veewee's build process.

Hedgehog hedgehog referenced this issue from a commit in hedgehog/vagrant October 29, 2011
Hedgehog ssh shared connections. closes GH issue #391, #455
    Should fix the ssh connection refused error.
     - Banner connection error handled.
     - Vagrant bails when orphaned Vagrant ssh sessions are around
     - Multiplexing SSH conecctions
     - Establish remote shell session is responsive before proceeding
     - Net::SSH and Net::Scp are removed
     - Use Aruba/ChildProcess to manage sessions (no threading)
     - tested on Ubuntu Lucid +chef-solo (0.10.4)
     - Distribution config variable + others (no parsing ssh output)
    TODO
     - Confirm with other provisioners.
     - Confirm on other distributions.

Likely addresses issues:

GH issue #391, GH issue #410, GH issue #424, GH issue #443, GH issue #455, GH issue #493

Possibly addresses/affects issues:

GH issue #516, GH issue #353

Overview

Essentially between 1%-2% of reloads psuedo-fail.
I say psuedo-fail in the sense of current behavior.
Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

One thing to think about in the code review is auto-triggering a reload when
the banner exchange error occurs.
Otherwise I think less faultly up and reloading will have to wait bootstrapping
via a serial console.

Command

    rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

Total 101

success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

Attempt counts:

1 (count: 1.9.2 = 155, 1.8.7 = 161)
2 (count: 1.9.2 = 311, 1.8.7 = 317)
3 (count: 1.9.2 = 34,  1.8.7 = 17)
4 (count: 1.9.2 = 168, 1.8.7 = 167)
5 (count: 1.9.2 = 31,  1.8.7 = 32)
6 (count: 1.9.2 = 1,   1.8.7 = 96)
7 (count: 1.9.2 = 0,   1.8.7=)
3478eac
Mitchell Hashimoto
Owner

I'm going to go ahead and close this issue because while it is a bug with Vagrant, it is really more of a bug with net-ssh and not being robust enough to handle various edge cases of SSH. I don't see any clear way at the moment to fix this bug (which is very frustrating), but if I do I will fix it ASAP.

Mitchell Hashimoto mitchellh closed this January 11, 2012
catditch

How about printing a warning on Vagrant startup when using headless mode?

Kim Burgestrand

Which boxes are people using? I could not get any of the ubuntu boxes past this issue, but I tried an archlinux box (from vagrantbox.es) instead and it works flawlessly (so far!).

Marc Abramowitz

Another possible cause of this issue (which I just ran into): if /Applications is world-writable then VirtualBox will refuse to start the vm apparently.

Ramon van Alteren

FTR I had similar problems on Mac OSX 10.7 with vagrant 1.0.2 and Virtualbox 4.1.8r75467 and a debian squeeze based box from http://puppetlabs.s3.amazonaws.com/pub/squeeze64.box

It turns out that all the connection issues in my case directly had to do with being in a bridged network setup.
The bridged setup will do two interfaces eth0 on internal range (10.0.2.2 by convention I think) and eth1 which will get an ipaddress from the bridged network.

For reasons unclear to me in some cases eth1 will come up with a different macaddress causing the udev rules to rename it to eth2 and all networking scripts will subsequently fail.... => broken network => no "up" report to vagrant.

Fixed by deleting /etc/udev/rules.d/70-persistent-net.rules or removing the broken entries from there.

Because of the way udev persistent net rules work the interface will continue to receive the same name afterwards since it's new mac-address is now recorded into the rules file.

Ramon van Alteren

I added a options hash to the vm.config.network param which will fixate the mac-address of the bridge adapter, this solves it for me...

Garth Kidd

I'm now seeing this on every attempt to vagrant up on one of my Macs. There's nothing in /etc/udev/rules.d/70-persistent-net.rules, my only interface is eth0, I have pre-up sleep 2 in /etc/network/interfaces, and adding sh /etc/init.d/networking restart before exit 0 in /etc/rc.local doesn't help. Any ideas?

UPDATE: destroying all other VMs and re-creating them fixed the problem.

Ramon van Alteren

Boot with gui and login to the console and check if there is actually a network interface up ?
Which one is it and which network setup are you using (hostonly, nat, bridged), what OS is running on this VM ?

Jeanmonod David

My experience on that issue is that it's clearly related to the Internet connection I used:

  • At home (wifi): Freeze on vagrant up
  • At office (wifi): Works great
  • At home (using iPhone as a proxy): Works great
  • And so on...

I'm not good enough in networking, to tell what the exact difference it is, but it's clearly an issue about what connection I use on my macbook...

Ramon van Alteren

wild guess, could it be that you're wifi connection uses the same iprange as vagrant does by default ?
aka 10.0.2.0/24 ?

Jeanmonod David
Ramon van Alteren

I'm seeing these again on a intermittent basis.
The problem in my case is that the primary nic (eth0) does not receive an ipaddress from the virtualbox build-in dhcp server

It is the virtualbox nat engine bug again :(

Jeanmonod David
Dan Hively

I had the same networking issue and then I remembered that I'm the paranoid sort. I have a VPN automatically start on my OS X Lion Macbook Pro. After I disconnected the VPN all worked as it should! BTW I'm using veewee and VirtualBox 4.1.12.

xmartinez

I have been having this issue with a Linux guest (lucid32.box). The
NAT interface sometimes does not get a DHCP assigned address during
boot up. Running sudo dhclient in :gui mode allowed me to connect
to the VM.

After some digging up, I have traced the problem to an incorrect
setting of the VM hardware clock. Adding the following option to
Vagrantfile seems to solve the issue:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

i.e, vagrant can always connect to the VM after boot up.

As this issue is closed, I have opened a new one to address the
configuration problem (see #912).

Wes Thompson
uresu commented May 02, 2012

This workaround is not working for me.

Martin

I have had the same issues with lucid32, lucid64 and a self propelled ubuntu server instance. Each one failed with ssh connection.

After trying http://vagrantbox.es/170/ I didnt see the issue anymore. What is the difference between lucid* and tim huegdons base box?

John Sterling
johnste commented May 10, 2012

I'm having the same problem with lucid32+64. GUI workaround with network restart works.

Denis
rhodee commented May 21, 2012

vagrant halt returns an error and vagrant up hangs. I've built a box from scratch and can verify it works. However when I try to create an new instance of the box I have issues with the above commands.

On vagrant up my console spits out the following:
[default] Clearing any previously set forwarded ports...
[default] Forwarding ports...
[default] -- 22 => 2222 (adapter 1)
[default] Creating shared folders metadata...
[default] Clearing any previously set network interfaces...
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.

When I CTRL-C out of vagrant up and then do vagrant ssh I can enter my box. Even though the command is hanging, I can see that the VM is running from VirtualBox.

When I exit the guest and run vagrant halt, I get:

The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

shutdown -h now

The box can be halted running vagrant suspend then vagrant halt-weird.

Running VirtualBox 4.1.14r7740 and vagrant 1.0.2

Thanks for any assistance you can provide.

Marcus Cobden
leth commented May 21, 2012

the veewee tool has a handy set of tests to check you've set everything up right. It's under the valdate command.

Denis
rhodee commented May 21, 2012

@leth I can confirm that I am experiencing a similar result from building it from scratch (previous post). While using veewee 0.3.alpha9 to build a VM from ubuntu-12.04-amd64 template, I can't ssh into the box.

I waited less than 5m for the VM to boot.

[default] Failed to connect to VM!
Failed to connect to VM via SSH. Please verify the VM successfully booted
by looking at the VirtualBox GUI.

It is running in VirtualBox.

Romain Champourlier

I've been using this specific configuration in my Vagrantfile for some time and it works perfectly on my Macbook under OS X Lion 10.7.3, VirtualBox 4.1.14r77440 (from VBoxManage -v) while it was not starting up correctly more than 2 times out of 3.

First be sure there is no conflict between your box's network and any other active virtual machine. I use hostonly networks and ensure different networks are used for each machine I configure:

config.vm.network :hostonly, "10.10.10.2"

This is the trick found above in this thread:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

Should it still does not work, I want it to notify me faster, so I reduce the number of tries for SSH:

config.ssh.max_tries = 10

Hope it will help!

Denis
rhodee commented May 21, 2012

@rchampourlier thanks for the tip. I added those to my Vagrantfile and still no luck, I updated my post above with the output. I am going to review issue #14

Francesco Levorato

Hi everybody,
as the issue is over one year old, could anybody define/suggest a common test case that can be run in order to pin point the issue?
I am experiencing similar problems and I would like to provide help in debugging and/or trying different configurations.

aelmadho

I ran into this issue recently, with Fedora 13 only, where Fedora 16 does not show this issue. It is network related, since when I log in using the GUI, eth0 is not active.

I have disabled NetworkManager and set NM_CONTROLLED="no" in my ifcfg-eth0 file. This was a defect according to Redhat https://bugzilla.redhat.com/show_bug.cgi?id=597515, which is no longer maintained.

So I can agree, this goes back to an issue with bringing up the interfaces, it has nothing to do with SSH, or maybe other edge cases are present...

What distribution are you running? and what does "dmesg" show if you login using the GUI?
and if you login using the gui and do /etc/init.d/network restart, what happens?

Bastien

Hi, I did what @mikhailov suggests in #issuecomment-2078383 (restart the networking service in rc.local) and it worked for me

Francesco Levorato

So, what I found is:
I am using https://github.com/erivello/sf2-vagrant and the lucid32 distributed by Vagrant. I am trying the same exact configuration in 2 identical iMac's at my company: same hardware, same OS X version 10.6.8, same virtualbox version (4.1.16), same vagrant version (1.0.3).
In one of the iMac's the machine boots up just fine, in the other one it lags at the SSH connection.

This makes me think it's something different in the host environment or in the interaction between host and vm.

I also tried to do a complete reinstall virtualbox and deleting the ~/.vagrant.d folder to start fresh, but I still get the error.

EDIT: I retried after a few days and now it's working: probably a host reboot fixed the problem? Or this is something random.

Julien Ammous

just got this one twice already today using the vagrant lucid32 basebox.
I also got it on the first boot with this vm just after the first "vagrant up" with virtualbox 4.1.10

zejn
zejn commented July 05, 2012

I've used this crontab entry as a workaround:

@reboot  /sbin/ifdown eth0 ; /bin/sleep 5 ; /sbin/ifup eth0  
Hedgehog

@schmurfy, thanks for your work on Goliath, hope the following helps bring you up to speed on the state of play...

Somewhere in this, or related issue threads, is my finding that this is caused by ssh trying to completed its handshake while the OS is not ready for this, e.g. cycles spent in motd. The ssh connection is made, just not completed.

There are several ways to mitigate this, no motd, restart network services, bring the iface down then up etc.

There is no solution just workarounds, and a Google project (which I can't recall right now - search my nickname+vagrant in their issues) is having this same issue, also in a VM booting context.

Bootstrapping via VB level commands were investigated by Mitchell and weren't feasible due to VB issues. Bootstrapping over serial console likewise was suggested but not completed for good reasons that escape my memory right now.

HTH

Julien Ammous

@hedgehog with the removal of the router part I am not sure a lot of my code remains in goliath xD

Given that some solutions exists as proposed here it would be nice if the base image came with it, I think I will try creating my own base image with one of the proposed fix, thanks.

Marcin Kulik

+1 for creating base images with any working workaround.

Indra BW
indwic commented July 09, 2012

My box CentOS 6.2 32bit, stalled when vagrant up at "Waiting for VM to boot. This can take a few minutes.". SSH to box work. This happen when host not connected to wifi/internet. So as workaround i disabled the firewall at guest box and it work, also check host firewall.

Arthur Koziel

I recently ran into the same problem. I'm using vagrant 1.0.3, VirtualBox 4.1.18 and the standard lucid32 box. This workaround from @xmartinez worked for me:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]
Bastien

It is easier to debug when you can boot the machine with the GUI. I've experienced this issue on a local Mac machine where it is easy to boot the GUI but I also experienced it on a remote Debian server where I had to install X11 and then do X11 forwarding to get to boot the GUI on the local Mac and then debug it and turn the config back to no GUI.

gfreeau

If you are using the latest ubuntu or variants like mint, there has been changes to how ubuntu handles dns. Try running

sudo dpkg-reconfigure resolvconf

And using the first option to create a symlink to /etc/resolv.conf. Virtualbox needs this file to set the DNS correctly for NAT. This should be made more obvious.

Doing this fixed the problems for me, I didn't have to set ssh timeouts, restart networking or use --rtcuseutc

In headless mode with eth1 as host-only networking, I also get a hanging vagrant waiting for the ssh port forward to connect. I can ssh to eth1 fine so I think this is a problem with port forwarding or NAT eth0. Hard to test because I can't ssh to eth0 directly from OSX.

To fix, a simple "ifdown eth0; ifup eth0". Suspect it's some timing around eth0, vboxservice loading, port mapped.

The ifdown eth0 has this error from dhcp:

DHCPRELEASE on eth0 to 10.0.2.2 port 67
send_packet: Network is unreachable
send_packet: please consult README file regarding broadcast address.

After an ifup, further ifdown are successful.

Don't even need the ifup/ifdown, a "dhclient eth0" will let vagrant resume.

I've been reloading my vagrant over and over for an hour, each reload takes 90 seconds. No hangs.

I don't use the "pre-up sleep 2" or any of the workarounds in this thread.

In rc.local on Ubuntu, right before the exit 0, I put "dhclient eth0". This won't disturb the network, it'll just kick eth0 in the butt and get it working again. Since it runs last, I hope it avoids whatever it is that is hanging the ifup during network init, because that's what I saw for both eth0 NAT and eth1 host-only interfaces on my guests -- ifup still running, their child processes blocked.

Sergej

I try to restart networking service on boot but, for some reasons, I can't access to webserver.
So I've to restart to times networking service and it works.
But I can't stop the vm with "vagrant halt" (I've before to run "vagrant suspend") and I can't access to ssh with "vagrant ssh" (I've to use "ssh vagrant@IP).

Philippe Gerber

starting the VMs in GUI mode and then executing "sudo dhclient eth0" resumed vagrant for me, too.

Mitchell Hashimoto
Owner

@destructuring Awesome! I'm going to put this into the base boxes I release and hope this solves this issue. :)

Mitchell Hashimoto
Owner

I just uploaded new lucid32/lucid64/precise32/precise64 with @destructuring's changes. Let me know if this is gone!

James Hu

None of these solutions are working for me. The only thing that does is to rebuild the box. I noticed only one other person has commented on @destructuring's solution working for them. Can I get a sanity check?

James Hu

I get

RTNETLINK answers: File exists

when running sudo dhclient eth0.

expunged

I'm not sure this is the proper forum but I've been running into similar problems with hangs on 'vagrant up'

I'm posting here because I'm seeing postings that indicate different behavior in different environments, which is what I ran into and there seem to be multiple tickets tied to the same core issue. This seemed as good a spot as any :) The solution seems to be outside vagrant.

If you are behind a proxy (likely to be the case at work but not at home) you will need to configure the guest system with your proxy settings. Setting http_proxy and https_proxy environment variables in /etc/bashrc worked for me, made them system wide and available for the ssh access required by vagrant. If you do not specify the proxy you will receive the dreaded ifup message and your boot will hang.

The caveat here is that if you set this and try to boot while you are not behind the configured proxy you will receive the same message and hang on boot.

Jan Schumann

For me this issue is not closed. I found a workflow to reproduce it. Please read (and edit/comment) https://github.com/mitchellh/vagrant/wiki/%60vagrant-up%60-hangs-at-%22Waiting-for-VM-to-boot.-This-can-take-a-few-minutes%22

Justin

I'm also having the hang at "[default] Waiting for VM to boot. This can take a few minutes." but I've somewhat figured out the cause. The DNS proxying is not working causing ssh connections to take 10 seconds to be established. This cause the probe to timeout. vagrant ssh and other commands seem to have a longer timeout and they run OK.

Some base boxes also boot OK because they do not have UseDNS yes in /etc/ssh/sshd_config and don't run into this problem at all.

For me restarting networking does not work.. it seems the dns proxy stuff just doesn't work on the version of vagrant in ubuntu 12.10 (1.0.3) with virtualbox 4.1.18

Justin

ah, somewhat figured it out:

my resolv.conf has

nameserver 127.0.1.1

The code in vagrant only checks for 127.0.0.1 when disabling the DNS proxy. That said, I fixed the regex but dns still doesn't work in the VM. It'll work fine if I change the DNS server to 192.168.1.1 or 8.8.8.8, so it's not completely broken, something is just breaking the autoconfiguration.

James Hu

I've been having success with /etc/init.d/networking restart in /etc/rc.local.

Justin

I'm not sure why restarting networking works for some people, it doesn't work here.

It looks like HEAD already has the fix for 127.0.1.1 issue, so that's good..

as for the other issue, looking here: https://bugs.launchpad.net/ubuntu/+source/virtualbox/+bug/1031217 the fix for the issue is stated to be to turn natdnshostresolver1 on, but the code in vagrant that is linked from that bug turns it off. I'm not sure why there is a discrepancy, but this probably has something to do with my problem.

Bastien

I have just retried with a freshly downloaded official lucid32 and on a remote debian and it works fine without doing anything special.

Jos Houtman

For me this issue was in the dns configuration..
setting: VBoxManage modifyvm "puppet-playground_1357644642" --natdnshostresolver1 on

fixed this for me.

Sergio Moya

Sometimes GRUB starts in failsafe mode (when box downs in Ubuntu by example) and sets a grub timeout of -1.

Fix:

  • Edit /etc/grub.d/00_header, and find:

    if [ "\${recordfail}" = 1 ]; then
      set timeout=-1
    
  • Change it to...

    if [ "\${recordfail}" = 1 ]; then
      set timeout=10
    
  • And run update-grub

David Kinzer

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

works for me, but only if I included it before configuring the vm network:
config.vm.network :hostonly, "10.10.10.2"

John Haugeland

So, I've found a completely unrelated cause of these circumstances. I doubt many people are having the stupid problem I was, but, for the record, if your brand new laptop has VX turned off, then when the interior VM can't start, the way it manifests from the outside is the SSH server being unwilling to accept connections (because it's not running.)

And so you end up with the repeated attempts to hit 2222, all failing.

And you really can't tell the difference, from the outside, against any of these other causes.

The way to test if you've got my problem is just to run the VM directly from the VB manager. If you get a message talking about how you can't boot without VT-X/AMD-V, then, well, ha ha.

Older machines, go into the BIOS and turn it on.

Newer machines, UEFI gets in your way. From Win8, go to the start screen, and type bios. It'll say that no apps match your search, but if you look, one setting does. Hit settings - you'll see "advanced startup options." Go in there, and under the general tab, go to the bottom, where there's a button "restart now" under the heading "advanced startup."

When you hit that, it doesn't actually restart now; it brings up another menu, one item of which allows you to get at your bios. Follow that, and you'll get in.

Then go turn on whatever your BIOS calls your hypervisor hardware. (There's like six names for it, but it's usually VT-X or AMD-V.) Enable, save, and shut down.

On reboot, vagrant will be happy again.

Barnabas Debreczeni

adding

ifdown eth0
sleep 1
ifup eth0
exit 0

to /etc/rc.local solved it. dhclient eth0 solves it too.

A weird thing is that when I build by base box image, doing apt-get install dkms before installing VirtualBox additions made it work 100% afterwards.

Julien Phalip

I've run into the same frustrating issue while building a CentOS base box. What completely fixed it for me was to add dhclient eth0 to /etc/rc.local as suggested by @keo above. I wonder if this is something that Vagrant itself could help with, by systematically kicking eth0 on startup...

Roman Zenka

I have the same issue with CentOS 6.3.

My suspicion is that the 10.0.2.2 gateway actually EXISTS on our network:

10.0.2.0 * 255.255.255.0 U 0 0 0 eth0
link-local * 255.255.0.0 U 1002 0 0 eth0
default 10.0.2.2 0.0.0.0 UG 0 0 0 eth0

So if my networking is going through some poor random server, no wonder it takes forever for the packets to go through.

I will try to figure out how to set up the networking differently.

Edit: I resolved my issue. I needed to reconfigure the network VirtualBox uses for DHCP.

http://stackoverflow.com/questions/15512073/set-up-dhcp-server-ip-for-vagrant

Added following code:

  config.vm.provider :virtualbox do |vb|
    vb.customize ["modifyvm", :id, "--natnet1", "192.168/16"]
  end

You can check for this issue easily - even before you start Vagrant, ping 10.0.2.2 - if you get a response, you are in trouble.

jamshid

To anybody else trying to make their way through this, if you're trying one of the suggested workarounds:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

this does not work if your Vagrantfile is version 2 and starts with:

Vagrant.configure("2") do |config|

You'll get errors like:

$ vagrant destroy
/Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/kernel_v2/config/vm.rb:147:in `provider': wrong number of arguments (0 for 1) (ArgumentError)
    from /Users/jamshid/tmp/70/Vagrantfile:13:in `block in <top (required)>'
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/config/v2/loader.rb:37:in `call'
    ...
from /Applications/Vagrant/bin/../embedded/gems/bin/vagrant:23:in `<main>'

Instead use:

config.vm.provider :virtualbox do |vb|
    vb.customize ["modifyvm", :id, "--rtcuseutc", "on"]
end
Mitchell Hashimoto
Owner

Does anyone have this issue anymore without a CentOS machine? CentOS the issue is most likely forgetting to remove the udev rules. But anyone getting this with the precise64 box or some Ubuntu based box or some box where they KNOW they cleared the udev rules?

Mike Conigliaro

I found that after running yum groupinstall Desktop and rebooting a CentOS 6.4 guest, the VM could not communicate with the outside world at all. The fix for me was to disable NetworkManager and restart networking.

chkconfig NetworkManager off
service network restart
Hayden

For me, it ended up being that I had to make sure "Cable connected" was checked under the adapter settings in VirtualBox.

screen shot 2013-04-29 at 10 32 58 pm

Ace Suares

For a problem with the same symptoms, but a different cause and solution, see #1792

jack dempsey

Just a public +1 and thanks to @jphalip whose tip fixed things up for me on a centos vm

Justin Searls
searls commented June 25, 2013

I just ran into this error after setting up SSH agent forwarding for (OS X host). Removing config.ssh.private_key_path = "~/.ssh/id_rsa" fixed my issue.

Duke Jones

Try playing with:
VBoxManage modifyvm "VM name" --natdnsproxy1 on
and
VBoxManage modifyvm "VM name" --natdnshostresolver1 on

Also if you don't mind altering your VM after the fact, @jphalip / @keo above suggest to add dhclient eth0 to /etc/rc.local.

Josh Betz joshbetz referenced this issue in Automattic/vip-quickstart November 06, 2013
Open

Networking breaks after sleep #62

Josh Betz joshbetz referenced this issue in Varying-Vagrant-Vagrants/VVV December 17, 2013
Closed

HTTP timeout error in VVV environments #211

rakm

Is there an agreed upon solution for this? I'm running VirtualBox 4.3.6 and Vagrant 1.4.1 on RHEL 6.2 and unable to run vagrant ssh. I see the Wiki page, but since I am accessing the host machine through SSH, I don't have access to VirtualBox GUI?

Shimon Doodkin

i had a problem the vagrant was freezing after restore from hibernate.
in windows 7 after unchecking "enable switch off this device to save power", in wifi card driver(change adapter settings link on the side in network and sharing center,rightclick and properties on an adapter,configuration button), in power management options , the problem seems gone.

probably the problem is like a 'broken pipe' kind of problem. something with the network device. because network device is diconnected before hibernate and on startup

Curtis Stewart

Seeing this issue running Vagrant 1.4.3 with Virtualbox 4.3.6r91406 on Ubuntu 12.04. Are there specific host network settings that are required for Vagrant to work correctly?

Alvaro Miranda
Curtis Stewart

I'm using test-kitchen and this is the generated Vagrantfile:

Vagrant.configure("2") do |c|
  c.vm.box = "opscode-ubuntu-12.04"
  c.vm.box_url = "https://opscode-vm-bento.s3.amazonaws.com/vagrant/virtualbox/opscode_ubuntu-12.04_chef-provisionerless.box"
  c.vm.hostname = "host-ubuntu-1204.vagrantup.com"
  c.vm.synced_folder ".", "/vagrant", disabled: true
  c.vm.provider :virtualbox do |p|
    p.customize ["modifyvm", :id, "--memory", "512"]
  end
end
Alvaro Miranda

@cstewart87 Worked for my with your vagrantfile, no issues at all.

Jean Jordaan
jean commented March 10, 2014

I added a public network to my VM. This booted fine and worked great. Then I tried to restart. Subsequently:

19:16 jean@klippie:~/vagrant/geonode$ VAGRANT_LOG=DEBUG vagrant halt
 INFO global: Vagrant version: 1.2.2
[...]
DEBUG subprocess: Waiting for process to exit. Remaining to timeout: 32000
DEBUG subprocess: Exit status: 0
DEBUG virtualbox_4_2:   - [1, "ssh", 2222, 22]
DEBUG ssh: Checking key permissions: /home/jean/.vagrant.d/insecure_private_key
 INFO ssh: Attempting SSH. Retries: 100. Timeout: 30
 INFO ssh: Attempting to connect to SSH...
 INFO ssh:   - Host: 127.0.0.1
 INFO ssh:   - Port: 2222
 INFO ssh:   - Username: vagrant
 INFO ssh:   - Key Path: /home/jean/.vagrant.d/insecure_private_key
DEBUG ssh: == Net-SSH connection debug-level log START ==
DEBUG ssh: D, [2014-03-10T19:16:53.664503 #12855] DEBUG -- net.ssh.transport.session[4caaa54]: establishing connection to 127.0.0.1:2222
D, [2014-03-10T19:16:53.665283 #12855] DEBUG -- net.ssh.transport.session[4caaa54]: connection established
I, [2014-03-10T19:16:53.665407 #12855]  INFO -- net.ssh.transport.server_version[4caa09a]: negotiating protocol version

DEBUG ssh: == Net-SSH connection debug-level log END ==
 INFO retryable: Retryable exception raised: #<Timeout::Error: execution expired>
 INFO ssh: Attempting to connect to SSH...
 INFO ssh:   - Host: 127.0.0.1
 INFO ssh:   - Port: 2222
[...] # repeats endlessly
Alvaro Miranda

@jean I see you are posting in a bug that is closed, perhaps you want to try the mailing list.

I can tell you that I have seen issues when the vagrant file have some errors in the logic, or the base box had the issues.

you can send an email tot he mailing list, with the Vagrantfile and we can take it from there.,

Jean Jordaan
jean commented March 10, 2014

@kikitux thanks for your answer, posting to the list :bow:

Stuart Laverick

@axsuul I am getting this issue with a CentOS box running on Ubuntu 12.04 host. The issue appeared after a kernel update in Ubuntu which caused the DKMS entry for VirtualBox to be corrupted. This may be related or may be coincidence.
I tried several of the fixes here, but only adding /etc/init.d/networking restart to /etc/rc.local has let me get the box up and running again.

stenver

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/797544

This is pretty much to root of the isse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.