Sometimes hangs on "Waiting for VM to boot. This can take a few minutes." #455

Closed
jonleighton opened this Issue Aug 5, 2011 · 95 comments

Projects

None yet

HI there,

Sometimes (I mean, fairly often, maybe 30-50% of the time for me) vagrant seems to hang on:

Waiting for VM to boot. This can take a few minutes.

I mean possibly it would finish eventually, but I have never waited to potentially infinite length of time to see. It certainly seems to take longer than 'usual', when I do manage to successfully boot.

When this happens, the only thing I can do is poweroff the VM through VBoxManage and try again.

Is there any way I can get more output about what it's doing in order to help debug this?

Cheers

Contributor
wsc commented Aug 5, 2011

This has happened to me for a while now and I'm not sure what's up.

pattern commented Aug 6, 2011

I have noticed the same. It feels very "non-deterministic" in that there is an random chance the VM will just keep thrashing at 100% CPU. I found that when I didn't specify a config.vm.network in the Vagrantfile, there is a much lower chance of the VM entering this state. This makes me think it has something to do with the networking/dhcp configuration. For what it's worth, I also have config.ssh.max_tries = 100.

Were either of you using config.vm.network to specify a specific IP? If so, try commenting it out and see if that works.

tomusher commented Aug 6, 2011

Same problem here; seems to have started when I updated my lucid32.box to fix #445.

I'm not setting config.vm.network but it does seem network configuration related - to work around it I'm using config.vm.boot_mode = :gui and when it gets stuck, manually logging in to the machine and running sudo /etc/init.d/networking restart.

This happens to me too:

[mathew@thepixeldeveloper]$ vagrant up
[default] VM already created. Booting if its not already running...
[default] Clearing any previously set forwarded ports...
[default] Forwarding ports...
[default] -- ssh: 22 => 2222 (adapter 1)
[default] Cleaning previously set shared folders...
[default] Creating shared folders metadata...
[default] Running any VM customizations...
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
[default] Failed to connect to VM!
Failed to connect to VM via SSH. Please verify the VM successfully booted
by looking at the VirtualBox GUI.

I can see the machine has booted with the GUI.

I also tried using the vagrant ssh command when the vagrant up command failed. SSH fails with the following error message:

[mathew@thepixeldeveloper]$ vagrant ssh
ssh_exchange_identification: Connection closed by remote host

Rebooting sudo reboot from the GUI fixes this for me.

Gems

  • vagrant (0.8.6, 0.8.5)
  • virtualbox (0.9.2, 0.9.1, 0.9.0)

Virtualbox

Virtualbox 4.1.2r73507

bump. I just posted question in a group with a same issue.
My vagrant up works only once after reboot/ reinstall Virtual Box.

Quick question, are you guys running boxes built with VeeWee?

I am not

No, clean vagrant.
But I think I found a solution - check your networking settings for Virtual Box. (on Mac command +, then Networking, host only networks. I deleted a host only network which happened to be there and now I can restart my VMs without restarting Mac.
Folks if you can confirm verify it, that would be excellent.

I only have one networked device listed (NAT).

@AlexMikhalev You have two networked devices because at some point you enabled the

# Assign this VM to a host only network IP, allowing you to access it
# via the IP.
config.vm.network "33.33.33.10"

option which meant Vagrant created the Host-Only interface.

However, still having just the NAT interface was no improvement for me.

@ThePixelDeveloper I didn't mean second adapter on VM, but common settings for host only networks in Virtual Box preferences - basically I removed vboxnet0 completely from my host. But it didn't help in truth.

I suspect this is a VirtualBox bug. The networking interface is failing to get an IP address from the DHCP server for whatever reason. Which releases of Virtualbox are we running? I can rule out Virtualbox 4.1.2r73507 already, I'll go backwards until it's "fixed"

I think it may be related to issue described here:
http://blog.techprognosis.com/2011/02/28/how-to-enable-dhcp-in-virtualbox-4.html
I have a feeling that DHCP server for NAT addresses broken, but I wasn't able to influence it with commands like:
VBoxManage dhcpserver add –netname vboxnet0 –ip 10.0.3.100 –netmask 255.255.255.0 –lowerip 10.0.3.101 –upperip 10.0.3.254 –enable
I know it should be for internal network, but I feel that dhcp server for NAT doesn't work or issue incorrect IP addresses.

I don't think it's broken because if it was then you wouldn't be able to get an IP address running sudo dhclient. Lets see ...

  • Is this isolated to Vagrant boxes?
  • Does anyone have problems on the same operating system, but not built with Vagrant/VeeWee?

I have another Ubuntu server I just booted and don't have such problems (it doesn't have the VBadditions). I installed the VBadditions and still no problems there. Very very strange.

I do not use VeeWee.
In my VBox logs difference between successful boot and failed in these lines:
00:00:26.584 NAT: IPv6 not supported
00:00:26.622 NAT: DHCP offered IP address 10.0.2.15
00:00:26.623 NAT: DHCP offered IP address 10.0.2.15
while hang up start finishes at:
00:00:24.642 NAT: IPv6 not supported

I use lucid32 and lucid64 base boxes and both have same issue. This issue is not related to vagrant specifically in my case as I have a same problem trying to start vagrant generated boxes from virtual box GUI - sometimes I get ip (10.0.2.15), sometimes I don't - so I need to run sudo dhclient and get same ip from DHPC server 10.0.2.15.
If I start two VM's - one with lucid32 and other with lucid64, they both have same IP - 10.0.2.15 after I will run `sudo dhclient'

update: I downloaded box from http://opscode-vagrant-boxes.s3.amazonaws.com/ubuntu10.04-gems.box - same behaviour, I can start it first time with vagrant up successfully, shut it down, then attempt to start again with vagrant up hangs forever.

This issue is not related to vagrant specifically in my case as I have a same problem trying to start vagrant generated boxes from virtual box GUI

I mean you should try and install and run the operating system without using a vagrant base box.

I have another that works fine, you should try it too, then we can confirm if it's to do with Vagrant or not.

Look at this for the explanation of the 10.0.2.15 IP Address


Edit. I'm out of ideas on this one. I've built a system using box using VeeWee which works as expected, then seemingly fails once it's been compiled into a box and imported into Vagrant. I have no idea what Vagrant does to the image when it's been packaged, maybe something to look into.

I fixed this for me, or at least I think I did. Start the troubled machine in gui mode, login and execute the following commands as root:

  1. rm /var/lib/dhcp3/* - Removes any existing DHCP leases

Disable automatic udev rules for network interfaces in Ubuntu

  1. rm /etc/udev/rules.d/70-persistent-net.rules
  2. mkdir /etc/udev/rules.d/70-persistent-net.rules

The machine now starts up and has the correct IP address.


Perhaps this has something to do with the different network adapter MAC addresses. The base box would have been built on a VirtualBox instance where the MAC is different to the one that your using now, just a thought.

ThePixelDeveloper - tried you solution, doesn't work for me on lucid32.

rozza commented Sep 1, 2011

setting gui on and then manually logging in and restarting networking fixed it for me..

niko commented Sep 4, 2011

Had the same issue. I could work around by booting in gui mode, logging in and manually doing


Any progress on this? It's definitely Vagrant causing trouble here, from my experiments every other machine I've built with VirtualBox (with the same configuration) doesn't show this problem.

To be more clear, something happens when Vagrant builds the box and not when Vagrant launches the box. So booting the box without the help of Vagrant still displays the problem. If someone can point me towards the code where Vagrant does its building I can take a look.

What version of VirtualBox are you all using?

I haven't experienced the problem recently, and I think VirtualBox may have been upgraded on my system at some point after I filed this bug (I'm on Fedora so I have package management...)

My current VirtualBox version is 4.1.2 r73507. Anyone on the same or later and still experiencing this?

rozza commented Sep 6, 2011

Its happening to me on: VirtualBox version is 4.1.2 r73507

niko commented Sep 6, 2011

I had the issue with the lucid32 box (http://www.vagrantbox.es/1/). Using
the ubuntu 11.04 box (http://www.vagrantbox.es/26/) doesn't show the issue.

sickill commented Sep 9, 2011

Same issue here. (Ubuntu 11.04, VirtualBox 4.1.2, vagrant 0.8.6).

I wanted to try ubuntu 11.04 box (http://www.vagrantbox.es/26/) but after downloading I got:

[vagrant] Extracting box...
[vagrant] Verifying box...
[vagrant] Cleaning up downloaded box...
The box file you're attempting to add is invalid. This can be
commonly attributed to typos in the path given to the box add
command. Another common case of this is invalid packaging of the
box itself.

I had a repeatable same issue with Mac OS X Snow Leopard and Ubuntu 10.04 LTS as a virtual box hosts.
I repeat it with various boxes - building box using VeeWee or downloading ready ones.

Same issue here. After starting in gui mode, logging in and doing sudo /etc/init.d/networking restart it'll work from command line again.

This issue is very annoying as it happens on every new box after installing the first one.

I can confirm this is happening on my OS X Lion box as well, problem is with both Lucid64 and Natty64 boxes. I have tried both VirtualBox from 4.1.0 to 4.1.2 and the problem occurs on virtually every vagrant up command. vagrant is now unusable due to this issue :(

Can we confirm it only happens with Vagrant and NOT with a VirtualBox Machine with the same specifications (disk, network, etc ...).

there is a temporary solution until Virtualbox DHCP dhclient fixed:

  1. run virtual machine with :gui

  2. sudo vi /etc/rc.local

' #/bin/sh -e
' sh /etc/init.d/networking restart
' exit 0
  1. sudo halt

Will try this mikhailov, thanks.

probably this better:

sudo vi /etc/network/interfaces
pre-up sleep 10

@mikhailov That doesn't work.

This line is actually included in the VeeWee build scripts: https://github.com/jedi4ever/veewee/blob/master/templates/ubuntu-10.04.3-server-amd64/postinstall.sh#L88

I've used a bigger value and it didn't seem to make a difference.

Wanting to build my own base box while I was having issues with 'vagrant up' I updated to the latest VirtualBox, installed VeeWee and built a new Ubuntu 11.04 box. Since then I haven't had this problem (even with the old boxes).

I did do a gem update after the install of VeeWee - and I did notice that net-ssh was updated as part of this, I'm not sure if it could be related?

@ThePixelDeveloper yes, that seems doesn't work. So I should login with :gui for first time and update /etc/rc.local every time I run a new instance until it fixed

Gonzih commented Sep 19, 2011

Any progress? Same issue.

Arch Linux 32
Guest Additions Version: 4.1.0
VirtualBox Version: 4.1.2_OSE
Vagrant version 0.8.7
Ruby 1.9.2
lucid32 box

Contributor
leth commented Sep 30, 2011

I think I've found the problem but I have no idea how to build a new box, so can't test it.

The DHCP client will wait 60 seconds for replies to its request.
If there was no response and there are no old leases to fall back to it will then wait five minutes before retrying.

Hopefully adding a shorter timeouts like timeout 2 and retry 2 into /etc/dhcp3/dhclient.conf will fix the problems.

Gonzih commented Oct 1, 2011

@leth nope, still same issue for me with that options in dhclient.conf. Use vagrant package for create new box.

duffj commented Oct 2, 2011

The sudo dhclient approach works well for me as a temporary fix, I'd love to get this fixed permanently though because this will be the setup process for potentially hundreds of developers at my company.

More information exists at Stack Overflow.

sickill commented Oct 4, 2011

Just checked on Fedora 15, VirtualBox 4.1.4 (latest), vagrant 0.8.7 - the issue still exists.

I guess it's time to pull out git bisect and start the arduous journey.

On 4 October 2011 18:48, Marcin Kulik <
reply@reply.github.com>wrote:

Just checked on Fedora 15, VirtualBox 4.1.4 (latest), vagrant 0.8.7 - the
issue still exists.

Reply to this email directly or view it on GitHub:
#455 (comment)

sickill commented Oct 4, 2011

+1 for git bisect

Contributor
leth commented Oct 6, 2011

I think I've pinned it down to /etc/udev/rules.d/70-persistent-net.rules
We obviously need to keep it empty, but making it into a directory seems to break things.

I tried making it a non-writable file, but that still broke things.

To fix:

 sudo rmdir /etc/udev/rules.d/70-persistent-net.rules
 sudo touch /etc/udev/rules.d/70-persistent-net.rules

EDIT:

After 3 successful tries, I tried it a fourth, and it's still broken. >_<

Contributor
leth commented Oct 6, 2011

It looks like it might be a virtualbox issue: https://www.virtualbox.org/ticket/4038

I was having the same problem:

vagrant upkarel@rolmops:~/vagrant/c57$ vagrant up
[default] Importing base box 'centos-57'...
[default] Preparing host only network...
[default] Matching MAC address for NAT networking...
[default] Clearing any previously set forwarded ports...
[default] Forwarding ports...
[default] -- ssh: 22 => 2222 (adapter 1)
[default] Creating shared folders metadata...
[default] Running any VM customizations...
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
[default] Failed to connect to VM!
Failed to connect to VM via SSH. Please verify the VM successfully booted
by looking at the VirtualBox GUI.

My host is ubuntu 11.04 + virtualbox 4.1.4 (vagrant gem 0.8.7). The guest is centos 5.7 + virtualbox 4.1.4.
The Vagrantfile has

config.vm.network="33.33.33.10"

If I add
config.ssh.max_tries = 150 everything works

But a lot of time gets lost (waiting for a DHCP lease which can't be obtained on that interface - it needs to time out)
I could add some configuration to the box which avoids sending DHCP requests - e.g. adding 'dummy' eth1-9 config files disabling those interfaces on the first boot.

Its the same for me. This bug might gonna take a long time to resolve.
Everytime rebuilding is eating up my time and my slow connection to dwnld boxes by boxes.
Makes tired rebuilding everytime.

But for temporary fix, launching in gui mode and sudo dhclient and vg ssh will work.

enr commented Oct 17, 2011

I got the same problem, but apparently only with boxes built using Veewee.
I'm using Ubuntu as host, VirtualBox 4.1.4; for Vagrant and Veewee I've tried a lot of combinations.
I don't know the internal of VirtualBox or Vagrant, so I don't know if it's important but I see in ~/.VirtualBox/VBoxSVC.log some errors detected:

ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={c28be65f-1a8f-43b4-81f1-eb60cb516e66} aComponent={VirtualBox} aText={Could not find a registered machine named 'oct15'}, preserve=false
ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={c28be65f-1a8f-43b4-81f1-eb60cb516e66} aComponent={VirtualBox} aText={Could not find an open hard disk with location '/home/enrico/VirtualBox VMs/oct15/box-disk1.vmdk'}, preserve=false
ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={c28be65f-1a8f-43b4-81f1-eb60cb516e66} aComponent={VirtualBox} aText={Could not find a registered machine named 'oct15_1318713570'}, preserve=false

in ~/.VirtualBox/VirtualBox.xml:

<MachineEntry uuid="{3cfe8af8-96da-41d3-ac3e-7266d3bb8f49}" src="/home/enrico/VirtualBox VMs/oct15_1318713570/oct15_1318713570.vbox"/>

doing a 'ps aux | grep virtual' I see the actual command:

/usr/lib/virtualbox/VBoxHeadless --comment oct15_1318713570 --startvm 3cfe8af8-96da-41d3-ac3e-7266d3bb8f49 --vrde config

Is VirtualBox looking for a box registered with a different uiid, or the aIID in the log is apart from the uiid in the configuration file and in command line?

This one is killing me... having the same problem here but none of the workarounds (dhclient, reboot, restart networking) help me. Is there a combo of older vbox/vagrant/base box that can get me back to work? Peace, Mike

Ah OK, things seem fixed here after much flailing about. Fix appears to be to use vagrant HEAD. Maybe I had a different problem with the same symptoms¿ Hope this is helpful and not just a bunch of noise. -Mike

Contributor
leth commented Oct 19, 2011

I can't see any commits since the last release which would fix it. Probably just random luck I suspect.

duffj commented Oct 19, 2011

The workaround proposed by @karel1980 works for me:

config.ssh.max_tries = 150
Contributor

People following this issue might like to report what ssh error they see per the request:

#391 (comment)

@hedgehog hedgehog added a commit to hedgehog/vagrant that referenced this issue Oct 29, 2011
@hedgehog hedgehog ssh shared connections. closes GH issue #391, #455
    Should fix the ssh connection refused error.
     - Banner connection error handled.
     - Vagrant bails when orphaned Vagrant ssh sessions are around
     - Multiplexing SSH conecctions
     - Establish remote shell session is responsive before proceeding
     - Net::SSH and Net::Scp are removed
     - Use Aruba/ChildProcess to manage sessions (no threading)
     - tested on Ubuntu Lucid +chef-solo (0.10.4)
     - Distribution config variable + others (no parsing ssh output)
    TODO
     - Confirm with other provisioners.
     - Confirm on other distributions.
bc973cb
@hedgehog hedgehog added a commit to hedgehog/vagrant that referenced this issue Nov 1, 2011
@hedgehog hedgehog ssh shared connections. closes GH issue #391, #455, etc.
        Should fix the ssh connection refused error.
         - Banner connection error handled.
         - Vagrant bails when orphaned Vagrant ssh sessions are around
         - Multiplexing SSH conecctions
         - Establish remote shell session is responsive before proceeding
         - Net::SSH and Net::Scp are removed
         - Use Aruba/ChildProcess to manage sessions (no threading)
         - tested on Ubuntu Lucid +chef-solo (0.10.4)
         - Distribution config variable + others (no parsing ssh output)
        TODO
         - Confirm with other provisioners.
         - Confirm on other distributions.

    Likely addresses issues:

    GH issue #391, GH issue #410, GH issue #424, GH issue #443, GH issue #455, GH issue #493

    Possibly addresses/affects issues:

    GH issue #516, GH issue #353

    Overview

    Essentially between 1%-2% of reloads psuedo-fail.
    I say psuedo-fail in the sense of current behavior.
    Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

    I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
    I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

    One thing to think about in the code review is auto-triggering a reload when
    the banner exchange error occurs.
    Otherwise I think less faultly up and reloading will have to wait bootstrapping
    via a serial console.

    Command

        rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

    Total 101

    success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
    banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
    orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

    Attempt counts:

    1 (count: 1.9.2 = 155, 1.8.7 = 161)
    2 (count: 1.9.2 = 311, 1.8.7 = 317)
    3 (count: 1.9.2 = 34,  1.8.7 = 17)
    4 (count: 1.9.2 = 168, 1.8.7 = 167)
    5 (count: 1.9.2 = 31,  1.8.7 = 32)
    6 (count: 1.9.2 = 1,   1.8.7 = 96)
    7 (count: 1.9.2 = 0,   1.8.7=)
0a0cb76
@xuru xuru added a commit to xuru/vagrant that referenced this issue Nov 8, 2011
@hedgehog @xuru hedgehog + xuru ssh shared connections. closes GH issue #391, #455, etc.
        Should fix the ssh connection refused error.
         - Banner connection error handled.
         - Vagrant bails when orphaned Vagrant ssh sessions are around
         - Multiplexing SSH conecctions
         - Establish remote shell session is responsive before proceeding
         - Net::SSH and Net::Scp are removed
         - Use Aruba/ChildProcess to manage sessions (no threading)
         - tested on Ubuntu Lucid +chef-solo (0.10.4)
         - Distribution config variable + others (no parsing ssh output)
        TODO
         - Confirm with other provisioners.
         - Confirm on other distributions.

    Likely addresses issues:

    GH issue #391, GH issue #410, GH issue #424, GH issue #443, GH issue #455, GH issue #493

    Possibly addresses/affects issues:

    GH issue #516, GH issue #353

    Overview

    Essentially between 1%-2% of reloads psuedo-fail.
    I say psuedo-fail in the sense of current behavior.
    Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

    I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
    I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

    One thing to think about in the code review is auto-triggering a reload when
    the banner exchange error occurs.
    Otherwise I think less faultly up and reloading will have to wait bootstrapping
    via a serial console.

    Command

        rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

    Total 101

    success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
    banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
    orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

    Attempt counts:

    1 (count: 1.9.2 = 155, 1.8.7 = 161)
    2 (count: 1.9.2 = 311, 1.8.7 = 317)
    3 (count: 1.9.2 = 34,  1.8.7 = 17)
    4 (count: 1.9.2 = 168, 1.8.7 = 167)
    5 (count: 1.9.2 = 31,  1.8.7 = 32)
    6 (count: 1.9.2 = 1,   1.8.7 = 96)
    7 (count: 1.9.2 = 0,   1.8.7=)
cb1f08c
Contributor

Debian/Ubuntu users:

Can you try rebuilding your boxes with this workaround:
jedi4ever/veewee#159

Please report in the veewee issue if this:

  • resolves the issue as far as you can tell (I had a reload loop succeed 101 times)
  • only reduces the severity of the issue

Non Debian/Ubuntu users, there is likely a similar facility to make these changes before the first reboot in veewee's build process.

Contributor
leth commented Nov 17, 2011

If anyone has a URL for a box rebuilt with this I'd appreciate it :)

Edit: never mind, I built one myself.

Is rebuilding box is the way to solve the problem or any other steps can be taken?

One way is to rebuild your own box.
Here is the screencast that I'd published if you want to build your own customized box.
http://nepalonrails.tumblr.com/post/13197838780/build-your-own-vagrant-box-ready-to-use-with-chef-solo

Contributor
hedgehog commented Dec 5, 2011

@AlexMikhalev, depending on your distro this might help: #455 (comment)

This was referenced Dec 6, 2011
vadimt commented Dec 15, 2011

I've been having the same issue, found the slowness to be caused by eth1 initialization hanging when the box starts up. Disabling eth1 in network-scripts/eth1 solves it, but repackaging the box doesn't seem to include this config.

@vadimt: I agree with your analysis involving eth1, but there must be some other factor involved as well.
Starting the same box (natty, 64bit, built with veewee) on the same virtualbox+vagrant versions on two different laptops.
On the first (older) laptop, a timeout occurs during 'vagrant up' (with 1 or 2 network devices).
On the newer laptop, there is no timeout (with 1 or 2 network devides).

vadimt commented Dec 20, 2011

@karel1980: Interesting, I have noticed that when there is no networking interface enabled during the initial 'vagrant up' call, things tend to go smoothly. On your older laptop, does it work if you halt the box and clear the networking interfaces manually through the virtual box controls? You might also want to try a longer timeout in the Vagrantfile 'config.ssh.timeout = [longtime]. It helps to see where in the bootup process the box is when it hangs by enabling the gui.

@hedgehog hedgehog added a commit to hedgehog/vagrant that referenced this issue Dec 22, 2011
@hedgehog hedgehog ssh shared connections. closes GH issue #391, #455
    Should fix the ssh connection refused error.
     - Banner connection error handled.
     - Vagrant bails when orphaned Vagrant ssh sessions are around
     - Multiplexing SSH conecctions
     - Establish remote shell session is responsive before proceeding
     - Net::SSH and Net::Scp are removed
     - Use Aruba/ChildProcess to manage sessions (no threading)
     - tested on Ubuntu Lucid +chef-solo (0.10.4)
     - Distribution config variable + others (no parsing ssh output)
    TODO
     - Confirm with other provisioners.
     - Confirm on other distributions.

Likely addresses issues:

GH issue #391, GH issue #410, GH issue #424, GH issue #443, GH issue #455, GH issue #493

Possibly addresses/affects issues:

GH issue #516, GH issue #353

Overview

Essentially between 1%-2% of reloads psuedo-fail.
I say psuedo-fail in the sense of current behavior.
Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

One thing to think about in the code review is auto-triggering a reload when
the banner exchange error occurs.
Otherwise I think less faultly up and reloading will have to wait bootstrapping
via a serial console.

Command

    rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

Total 101

success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

Attempt counts:

1 (count: 1.9.2 = 155, 1.8.7 = 161)
2 (count: 1.9.2 = 311, 1.8.7 = 317)
3 (count: 1.9.2 = 34,  1.8.7 = 17)
4 (count: 1.9.2 = 168, 1.8.7 = 167)
5 (count: 1.9.2 = 31,  1.8.7 = 32)
6 (count: 1.9.2 = 1,   1.8.7 = 96)
7 (count: 1.9.2 = 0,   1.8.7=)
3478eac
Owner

Related to #391. I'm trying to consolidate to one issue. Closing.

@mitchellh mitchellh closed this Dec 26, 2011
did commented Jan 16, 2013

Well, I met the same kind of error with Vagrant (1.0.5) / VirtualBox (4.2.6) / CentOS (6.3). None of the patches here worked for me. But I figured it out.... :-)
I don't know why iptables was launched (by default in CentOS ?) but it blocked all the ports of my box and in particular, my ssh port.
I just shutdown iptables (/etc/init.d/iptables stop) and it automatically unblocked Vagrant.
Hope it helps...

haydenk commented Apr 30, 2013

For me, it ended up being that I had to make sure "Cable connected" was checked under the adapter settings in VirtualBox.

screen shot 2013-04-29 at 10 32 58 pm

cemerick commented Sep 3, 2013

I hit this with vagrant 1.2.7, virtualbox 4.2.16r86992, and attempting to use a raring64 image from canonical (http://cloud-images.ubuntu.com/vagrant/raring/current/). Was "fixed" by switching to use the vagrantup-provided precise box @ http://files.vagrantup.com/precise64.box. Note that the problem only manifested itself when attempting to use :private_network; regular NAT was fine.

reinink commented Sep 4, 2013

I just ran into this issue today using the current Raring distro from Ubuntu. I was able to get it working again by using an older version (see below). Evidentially there is some type of networking issue with this new version.

config.vm.box = "raring64"
config.vm.box_url = "https://cloud-images.ubuntu.com/vagrant/raring/20130824/raring-server-cloudimg-amd64-vagrant-disk1.box"

Update: No longer sure that was the problem. It was working for 5 days, and now suddenly I'm getting the same issue again. Issue only seems to happening with Ubutnu 13.04 (Raring).

podollb commented Sep 10, 2013

Ditto @reinink, same here, it seems intermittent.

@reinink had a failure on my first try with the box you mentioned. @mitchellh Seems this is not closed, and/or different from the ssh problem in #391?

podollb commented Sep 13, 2013

I started using this base box and haven't had any problems since:
https://github.com/downloads/roderik/VagrantQuantal64Box/quantal64.box

reinink commented Sep 14, 2013

@cemerick Yeah, sorry, as mentioned in my update, I'm having issues on all Raring (Ubuntu 13.04) versions.

@podollb From my testing, this issue is limited to Raring. Quantal (Ubuntu 12.10) is fine, and so is Saucy (Ubuntu 13.10).

@reinink No worries, I was just confirming your "update" re: the raring box not addressing the problem. :-)

Good to know re: Saucy. We'll see how it pans out next month.

podollb commented Sep 15, 2013

I was actually having this problem (intermittently) with all the boxes:
https://cloud-images.ubuntu.com/vagrant/**/*.box

Still having this issue on Windows with 1.3.2. All VMs seem to take longer to boot in comparison with 1.2.7. No luck at all booting a Raring VM.

So booting the VM, letting Vagrant time out, halting the VM and booting again seems to work. Still takes a long time for Vagrant to see the booted VM.

wsouto commented Sep 23, 2013

I think I'm having the same issue, let me describe here.

I did myself a Raring 64 machine, basically following the step in the postinstall.sh.

After vagrant up, the machine boots, you can even ssh to the machine normally when vagrant keeps wainting for boot, which is taking a long long time, and then vagrant follows mounting the shared folder. Bottom line, everything goes fine, but with a long time in "Waiting for machine to boot." even with machine already booted.

I'm using vagrant 1.3.2 on Windows.
Sorry if it's not related or not the same issue...

I am going to have to add a "me too". On OS X 10.8.5, VirtualBox 4.2.18 r88780, Vagrant 1.3.1, Raring 64bit.

razius commented Sep 26, 2013

+1

VirtualBox 4.2.10_Ubuntu, Vagrant 1.3.3 on Ubuntu 13.04 64bit with any 32bit or 64bit images of Ubuntu 12.04, 12.10 or 13.04 from http://cloud-images.ubuntu.com/vagrant/

Logging in into the machine using :gui mode and restarting the network (sudo /etc/init.d/networking restart) "fixes" it.

ezintz commented Oct 7, 2013

+1 from me to this issue.

I've build a clean new base box with Debian Wheezy following the instructions on this page: http://docs-v1.vagrantup.com/v1/docs/base_boxes.html

Restarting the networking service does not fixes the problem for me.

OS: OS X 10.8.5
Vagrant: 1.3.4
VirtualBox: 4.2.18 r88780

nocive commented Oct 7, 2013

+1 from me on this as well.

Running in headless mode causes boot failure most of times with different vagrant boxes.

Host Ubuntu 13.04, Guest Ubuntu 12.04
Virtualbox 4.2.10

@ndrluis ndrluis referenced this issue in fgrehm/ventriloquist Oct 22, 2013
Closed

First boot hanging #14

Is this problem fixed? I'm having same issue on Win 7.

nocive commented Nov 6, 2013

Definitely not fixed.
I'm still having the same issue even after upgrading to VBox 4.3.

I've basically given up on headless mode until this is fixed.

SunSparc commented Nov 6, 2013

As a fix, for me, I switched to using a Quantal (12.10) image and have not had problems.

I'm having a problem with Centos 6. This delay during boot of VM it's really a pain. We use vagrant for managing local DEV LINUX boxes and it's really counter productive.

Let's take the problem from a different angle. Can anybody explain what Vagrant is exactly trying to do when "Waiting for VM to boot. This can take a few minutes." message is displayed?

I also have this problem - it seemed to happen after a power outage for me so I suspected filesystem corruption but perhaps that's just a coincidence.

Booting via VirtualBox works fine I can actually ssh into the machine (it has a private_network ip configured) while "vagrant up" continues to say "Waiting for machine to boot".

I'm running VirtualBox 4.3.2 r90405 and vagrant1.3.5 under OS X 10.9 with the precise64 box from vagrantbox.es

I actually had this issue with two boxes but some rebooting over and over magic seemed to fix one of them, the other is still stuck.

I tracked my particular issue down to plugins/communicators/ssh/communicator.rb

It looks like the SSH communicator is not able to access the machine via the host:127.0.0.1 port:2222 that it expects because manually entering the private_network IP address for the machine allows it to work (vagrant up/vagrant halt)

I don't know how to fix this but it jives with the reports of restarting networking helping and possibly the boot/reboot dance fix.

UPDATE - this is possibly not a vagrant problem - denyhosts seems to be getting involved.
UPDATE 2 - clearing out the various denyhost files does indeed seem to have fixed my issue - i guess this would be hard for vagrant to report since the only feedback from ssh is: "ssh_exchange_identification: Connection closed by remote host"

Have same issue on Ubuntu 12.04, Vagrant 1.4.3 , Virtualbox 4.2.20r90983, fresh installed Debian Wheezy.
I upgraded to current version of Virtualbox today, had much older one before.
While in a past I had similar issues with Debian, not Ubuntu, after latest upgrade to of Virtualbox, it seems that behavior is in my case is changed.

Before upgrade re-leasing using dhclient solved the problem.

After upgrade it hangs out on "Waiting for machine to boot." message till timed out , while i am able to log in into the box using : ssh vagrant@127.0.0.1 -p 2222

Am I the only one who had this problem disappear without knowing why? The only thing I can think of is the Vagrant 1.5 release or possibly switching to the 13.10 box from this repo https://github.com/ffuenf/vagrant-boxes instead of Ubuntu's own official box.

nocive commented Mar 25, 2014

No, @jmagnusson, you're not the only one, same thing happened to me.
For me this stopped being an issue after upgrading to virtualbox 4.3 and vagrant 1.4.3.

Well that's great! I'm surprised this hasn't been reported elsewhere?!

Would be nice to hear more confirmations so we could put all of this behind us (and update this page)

This is still happening to me on OSX 10.9 using Vagrant 1.5.4 and VMware Fusion 6.0.3. I've seen some people pointing to this as the cause https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/797544 but I'm using CentOS 6.5 so there's something else going on.

I've also read through this https://github.com/mitchellh/vagrant/wiki/%60vagrant-up%60-hangs-at-%22Waiting-for-VM-to-boot.-This-can-take-a-few-minutes%22 but since I'm always at work on the same network when this happens I don't think it's a networking issue.

afiune commented May 2, 2014

I would like to help but I have never seeing this error on Unix systems. Only for Windows..

Take a look here: WinRb/vagrant-windows#194 It may help ..

This is also happening with me, but only if I have networking enabled:

 config.vm.network :forwarded_port, guest: 3000, host: 3000
facetoe commented Jul 30, 2014

I have the same problem running Ubuntu 13.10.

I have this problem too. It seems like the issue isn't actually 'closed' given how many people are reporting similar issues on mainstream OS's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment