Failed to connect to VM - Occasional #424

rakesh-sankar · 2011-07-22T09:10:23Z

I am facing this problem at the early stage - when starting up the vagrant not when trying to connect to SSH. I am getting this error occasionally not sure what is happening and where to see the problem.

Like suggested in other place (in other similar issues), I looked at the VirtualBox (click and open), I could see a Virtual Machine prompting for the login - I hope this is valid.

BTW, when I try to add the mode to "gui" in VagrantFile, it works like a charm. Curious to know what is happening around and where the problem lies.

Thanks.

mitchellh · 2011-07-23T06:17:05Z

Can you try adding a config.ssh.max_tries = 100 and try without GUI mode and see if that fixes it?

This could be a weird timing issue.

rakesh-sankar · 2011-07-26T09:16:13Z

Hmm.

This is becoming more weird. I couldn't get the vagrant up and running with the option you gave me (is it something with SSH? If yes, I don't have a problem in connecting to guest machine with SSH but bringing up the environment - not sure though correct me if I am wrong).

zimbatm · 2011-07-26T09:32:49Z

Same issue here. From time to time, vagrant up stays blocked and vagrant ssh doesn't work after that.

Today I booted into the GUI to update the VBox additions and noticed that the eth0 interface didn't have an IP address. After running dhclient, everything was fine again.

Instead of adding a timeout in /etc/network/interfaces, I think the best would be to fix the dhclient config. I am not using the following /etc/dhcp3/dhclient.conf config and things works fine for now:

# source: https://calomel.org/dhclient.html
backoff-cutoff 2;
initial-interval 1;
link-timeout 60;
reboot 0;
retry 10;
select-timeout 0;
timeout 30;

interface "eth0" {
  supersede host-name "vagrant";
  supersede domain-name "vagrantup.com";

  request subnet-mask, broadcast-address, routers, domain-name, domain-name-servers, host-name;
  require routers, subnet-mask, domain-name-servers;
}

zimbatm · 2011-07-26T20:15:51Z

Actually you can just use a blank dhclient.conf and it work fine for me. I realized that when seeing that the previous conf had a syntax error.

hedgehog · 2011-07-31T23:27:23Z

@rakesh-sankar and @zimbatm, could you review issue #391 and add the output from the commands mentioned there?
@rakesh-sankar If you think this issue is a duplicate of #391, could you close this issue?

rakesh-sankar · 2011-08-01T04:23:28Z

@hedgehog I think there is a confusion here. my bad, I should have been more clearer. The problem I have is not able to start vagrant ("vagrant up") - like - it stays at the point where it says it is booting up the VBox and after a while it gives me the above error "Failed to Connect to VM" - BTW, I don't have any problem in connecting with SSH so far and this issue is erratic.

(I will update the description)

hedgehog · 2011-08-01T04:34:46Z

@rakesh-sankar, you can if fact see that behavior, 'failed to connect
to VM' in the course of vagrant up, when having trouble connecting
via ssh.
Caould you check the mentioned the command output when you see
vagrant up is stalled, and you may well see the ssh symptom
mentioned. Or have you already done that?

On Mon, Aug 1, 2011 at 2:23 PM, rakesh-sankar
reply@reply.github.com
wrote:

@hedgehog I think there is a confusion here. my bad, I should have been more clearer. The problem I have is not able to start vagrant ("vagrant up") - like - it stays at the point where it says it is booting up the VBox and after a while it gives me the above error "Failed to Connect to VM" - BTW, I don't have any problem in connecting with SSH so far and this issue is erratic.

(I will update the description)

Reply to this email directly or view it on GitHub:
#424 (comment)

πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://hedgehogshiatus.com

rakesh-sankar · 2011-08-01T04:56:09Z

@hedgehog let me take a look at it.

zimbatm · 2011-08-01T11:11:46Z

Update: forget what I said about the blank dhcpcd.conf fixing the issue. I got the problem again with my new lucid64 image today.

zimbatm · 2011-08-01T11:25:54Z

Here is a related VirtualBox issue I found: http://www.virtualbox.org/ticket/4038

… etc. Should fix the ssh connection refused error. - Banner connection error handled. - Vagrant bails when orphaned Vagrant ssh sessions are around - Multiplexing SSH conecctions - Establish remote shell session is responsive before proceeding - Net::SSH and Net::Scp are removed - Use Aruba/ChildProcess to manage sessions (no threading) - tested on Ubuntu Lucid +chef-solo (0.10.4) - Distribution config variable + others (no parsing ssh output) TODO - Confirm with other provisioners. - Confirm on other distributions. Likely addresses issues: GH issue hashicorp#391, GH issue hashicorp#410, GH issue hashicorp#424, GH issue hashicorp#443, GH issue hashicorp#455, GH issue hashicorp#493 Possibly addresses/affects issues: GH issue hashicorp#516, GH issue hashicorp#353 Overview Essentially between 1%-2% of reloads psuedo-fail. I say psuedo-fail in the sense of current behavior. Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed. I've run reload 100 times under 1.9.2 and 1.8.7. Results are below. I've run provision 100 times under 1.9.2 and 1.8.7, with full success. One thing to think about in the code review is auto-triggering a reload when the banner exchange error occurs. Otherwise I think less faultly up and reloading will have to wait bootstrapping via a serial console. Command rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done Total 101 success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99) banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2) orphan master control (count: 1.9.2 = 14, 1.8.7 = 5) Attempt counts: 1 (count: 1.9.2 = 155, 1.8.7 = 161) 2 (count: 1.9.2 = 311, 1.8.7 = 317) 3 (count: 1.9.2 = 34, 1.8.7 = 17) 4 (count: 1.9.2 = 168, 1.8.7 = 167) 5 (count: 1.9.2 = 31, 1.8.7 = 32) 6 (count: 1.9.2 = 1, 1.8.7 = 96) 7 (count: 1.9.2 = 0, 1.8.7=)

Should fix the ssh connection refused error. - Banner connection error handled. - Vagrant bails when orphaned Vagrant ssh sessions are around - Multiplexing SSH conecctions - Establish remote shell session is responsive before proceeding - Net::SSH and Net::Scp are removed - Use Aruba/ChildProcess to manage sessions (no threading) - tested on Ubuntu Lucid +chef-solo (0.10.4) - Distribution config variable + others (no parsing ssh output) TODO - Confirm with other provisioners. - Confirm on other distributions. Likely addresses issues: GH issue hashicorp#391, GH issue hashicorp#410, GH issue hashicorp#424, GH issue hashicorp#443, GH issue hashicorp#455, GH issue hashicorp#493 Possibly addresses/affects issues: GH issue hashicorp#516, GH issue hashicorp#353 Overview Essentially between 1%-2% of reloads psuedo-fail. I say psuedo-fail in the sense of current behavior. Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed. I've run reload 100 times under 1.9.2 and 1.8.7. Results are below. I've run provision 100 times under 1.9.2 and 1.8.7, with full success. One thing to think about in the code review is auto-triggering a reload when the banner exchange error occurs. Otherwise I think less faultly up and reloading will have to wait bootstrapping via a serial console. Command rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done Total 101 success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99) banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2) orphan master control (count: 1.9.2 = 14, 1.8.7 = 5) Attempt counts: 1 (count: 1.9.2 = 155, 1.8.7 = 161) 2 (count: 1.9.2 = 311, 1.8.7 = 317) 3 (count: 1.9.2 = 34, 1.8.7 = 17) 4 (count: 1.9.2 = 168, 1.8.7 = 167) 5 (count: 1.9.2 = 31, 1.8.7 = 32) 6 (count: 1.9.2 = 1, 1.8.7 = 96) 7 (count: 1.9.2 = 0, 1.8.7=)

mitchellh · 2012-01-08T06:19:51Z

This is related to #391. Closing this duplicate.

mitchellh closed this as completed Jul 23, 2011

mitchellh reopened this Jul 23, 2011

hedgehog mentioned this issue Nov 1, 2011

ssh connections. closes GH issue #391, #455, etc. #543

Closed

mitchellh closed this as completed Jan 8, 2012

hashicorp locked and limited conversation to collaborators Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to connect to VM - Occasional #424

Failed to connect to VM - Occasional #424

rakesh-sankar commented Jul 22, 2011

mitchellh commented Jul 23, 2011

rakesh-sankar commented Jul 26, 2011

zimbatm commented Jul 26, 2011

zimbatm commented Jul 26, 2011

hedgehog commented Jul 31, 2011

rakesh-sankar commented Aug 1, 2011

hedgehog commented Aug 1, 2011

rakesh-sankar commented Aug 1, 2011

zimbatm commented Aug 1, 2011

zimbatm commented Aug 1, 2011

mitchellh commented Jan 8, 2012

Failed to connect to VM - Occasional #424

Failed to connect to VM - Occasional #424

Comments

rakesh-sankar commented Jul 22, 2011

mitchellh commented Jul 23, 2011

rakesh-sankar commented Jul 26, 2011

zimbatm commented Jul 26, 2011

zimbatm commented Jul 26, 2011

hedgehog commented Jul 31, 2011

rakesh-sankar commented Aug 1, 2011

hedgehog commented Aug 1, 2011

rakesh-sankar commented Aug 1, 2011

zimbatm commented Aug 1, 2011

zimbatm commented Aug 1, 2011

mitchellh commented Jan 8, 2012