Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to connect to VM - Occasional #424

Closed
rakesh-sankar opened this issue Jul 22, 2011 · 11 comments
Closed

Failed to connect to VM - Occasional #424

rakesh-sankar opened this issue Jul 22, 2011 · 11 comments

Comments

@rakesh-sankar
Copy link

I am facing this problem at the early stage - when starting up the vagrant not when trying to connect to SSH. I am getting this error occasionally not sure what is happening and where to see the problem.

Like suggested in other place (in other similar issues), I looked at the VirtualBox (click and open), I could see a Virtual Machine prompting for the login - I hope this is valid.

BTW, when I try to add the mode to "gui" in VagrantFile, it works like a charm. Curious to know what is happening around and where the problem lies.

Thanks.

@mitchellh
Copy link
Contributor

Can you try adding a config.ssh.max_tries = 100 and try without GUI mode and see if that fixes it?

This could be a weird timing issue.

@mitchellh mitchellh reopened this Jul 23, 2011
@rakesh-sankar
Copy link
Author

Hmm.

This is becoming more weird. I couldn't get the vagrant up and running with the option you gave me (is it something with SSH? If yes, I don't have a problem in connecting to guest machine with SSH but bringing up the environment - not sure though correct me if I am wrong).

@zimbatm
Copy link

zimbatm commented Jul 26, 2011

Same issue here. From time to time, vagrant up stays blocked and vagrant ssh doesn't work after that.

Today I booted into the GUI to update the VBox additions and noticed that the eth0 interface didn't have an IP address. After running dhclient, everything was fine again.

Instead of adding a timeout in /etc/network/interfaces, I think the best would be to fix the dhclient config. I am not using the following /etc/dhcp3/dhclient.conf config and things works fine for now:

# source: https://calomel.org/dhclient.html
backoff-cutoff 2;
initial-interval 1;
link-timeout 60;
reboot 0;
retry 10;
select-timeout 0;
timeout 30;

interface "eth0" {
  supersede host-name "vagrant";
  supersede domain-name "vagrantup.com";

  request subnet-mask, broadcast-address, routers, domain-name, domain-name-servers, host-name;
  require routers, subnet-mask, domain-name-servers;
}

@zimbatm
Copy link

zimbatm commented Jul 26, 2011

Actually you can just use a blank dhclient.conf and it work fine for me. I realized that when seeing that the previous conf had a syntax error.

@hedgehog
Copy link
Contributor

@rakesh-sankar and @zimbatm, could you review issue #391 and add the output from the commands mentioned there?
@rakesh-sankar If you think this issue is a duplicate of #391, could you close this issue?

@rakesh-sankar
Copy link
Author

@hedgehog I think there is a confusion here. my bad, I should have been more clearer. The problem I have is not able to start vagrant ("vagrant up") - like - it stays at the point where it says it is booting up the VBox and after a while it gives me the above error "Failed to Connect to VM" - BTW, I don't have any problem in connecting with SSH so far and this issue is erratic.

(I will update the description)

@hedgehog
Copy link
Contributor

hedgehog commented Aug 1, 2011

@rakesh-sankar, you can if fact see that behavior, 'failed to connect
to VM' in the course of vagrant up, when having trouble connecting
via ssh.
Caould you check the mentioned the command output when you see
vagrant up is stalled, and you may well see the ssh symptom
mentioned. Or have you already done that?

On Mon, Aug 1, 2011 at 2:23 PM, rakesh-sankar
reply@reply.github.com
wrote:

@hedgehog I think there is a confusion here. my bad, I should have been more clearer. The problem I have is not able to start vagrant ("vagrant up") - like - it stays at the point where it says it is booting up the VBox and after a while it gives me the above error "Failed to Connect to VM" - BTW, I don't have any problem in connecting with SSH so far and this issue is erratic.

(I will update the description)

Reply to this email directly or view it on GitHub:
#424 (comment)

πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
  Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://hedgehogshiatus.com

@rakesh-sankar
Copy link
Author

@hedgehog let me take a look at it.

@zimbatm
Copy link

zimbatm commented Aug 1, 2011

Update: forget what I said about the blank dhcpcd.conf fixing the issue. I got the problem again with my new lucid64 image today.

@zimbatm
Copy link

zimbatm commented Aug 1, 2011

Here is a related VirtualBox issue I found: http://www.virtualbox.org/ticket/4038

hedgehog added a commit to hedgehog/vagrant that referenced this issue Nov 1, 2011
… etc.

        Should fix the ssh connection refused error.
         - Banner connection error handled.
         - Vagrant bails when orphaned Vagrant ssh sessions are around
         - Multiplexing SSH conecctions
         - Establish remote shell session is responsive before proceeding
         - Net::SSH and Net::Scp are removed
         - Use Aruba/ChildProcess to manage sessions (no threading)
         - tested on Ubuntu Lucid +chef-solo (0.10.4)
         - Distribution config variable + others (no parsing ssh output)
        TODO
         - Confirm with other provisioners.
         - Confirm on other distributions.

    Likely addresses issues:

    GH issue hashicorp#391, GH issue hashicorp#410, GH issue hashicorp#424, GH issue hashicorp#443, GH issue hashicorp#455, GH issue hashicorp#493

    Possibly addresses/affects issues:

    GH issue hashicorp#516, GH issue hashicorp#353

    Overview

    Essentially between 1%-2% of reloads psuedo-fail.
    I say psuedo-fail in the sense of current behavior.
    Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

    I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
    I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

    One thing to think about in the code review is auto-triggering a reload when
    the banner exchange error occurs.
    Otherwise I think less faultly up and reloading will have to wait bootstrapping
    via a serial console.

    Command

        rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

    Total 101

    success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
    banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
    orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

    Attempt counts:

    1 (count: 1.9.2 = 155, 1.8.7 = 161)
    2 (count: 1.9.2 = 311, 1.8.7 = 317)
    3 (count: 1.9.2 = 34,  1.8.7 = 17)
    4 (count: 1.9.2 = 168, 1.8.7 = 167)
    5 (count: 1.9.2 = 31,  1.8.7 = 32)
    6 (count: 1.9.2 = 1,   1.8.7 = 96)
    7 (count: 1.9.2 = 0,   1.8.7=)
xuru pushed a commit to xuru/vagrant that referenced this issue Nov 8, 2011
… etc.

        Should fix the ssh connection refused error.
         - Banner connection error handled.
         - Vagrant bails when orphaned Vagrant ssh sessions are around
         - Multiplexing SSH conecctions
         - Establish remote shell session is responsive before proceeding
         - Net::SSH and Net::Scp are removed
         - Use Aruba/ChildProcess to manage sessions (no threading)
         - tested on Ubuntu Lucid +chef-solo (0.10.4)
         - Distribution config variable + others (no parsing ssh output)
        TODO
         - Confirm with other provisioners.
         - Confirm on other distributions.

    Likely addresses issues:

    GH issue hashicorp#391, GH issue hashicorp#410, GH issue hashicorp#424, GH issue hashicorp#443, GH issue hashicorp#455, GH issue hashicorp#493

    Possibly addresses/affects issues:

    GH issue hashicorp#516, GH issue hashicorp#353

    Overview

    Essentially between 1%-2% of reloads psuedo-fail.
    I say psuedo-fail in the sense of current behavior.
    Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

    I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
    I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

    One thing to think about in the code review is auto-triggering a reload when
    the banner exchange error occurs.
    Otherwise I think less faultly up and reloading will have to wait bootstrapping
    via a serial console.

    Command

        rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

    Total 101

    success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
    banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
    orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

    Attempt counts:

    1 (count: 1.9.2 = 155, 1.8.7 = 161)
    2 (count: 1.9.2 = 311, 1.8.7 = 317)
    3 (count: 1.9.2 = 34,  1.8.7 = 17)
    4 (count: 1.9.2 = 168, 1.8.7 = 167)
    5 (count: 1.9.2 = 31,  1.8.7 = 32)
    6 (count: 1.9.2 = 1,   1.8.7 = 96)
    7 (count: 1.9.2 = 0,   1.8.7=)
hedgehog added a commit to hedgehog/vagrant that referenced this issue Dec 22, 2011
    Should fix the ssh connection refused error.
     - Banner connection error handled.
     - Vagrant bails when orphaned Vagrant ssh sessions are around
     - Multiplexing SSH conecctions
     - Establish remote shell session is responsive before proceeding
     - Net::SSH and Net::Scp are removed
     - Use Aruba/ChildProcess to manage sessions (no threading)
     - tested on Ubuntu Lucid +chef-solo (0.10.4)
     - Distribution config variable + others (no parsing ssh output)
    TODO
     - Confirm with other provisioners.
     - Confirm on other distributions.

Likely addresses issues:

GH issue hashicorp#391, GH issue hashicorp#410, GH issue hashicorp#424, GH issue hashicorp#443, GH issue hashicorp#455, GH issue hashicorp#493

Possibly addresses/affects issues:

GH issue hashicorp#516, GH issue hashicorp#353

Overview

Essentially between 1%-2% of reloads psuedo-fail.
I say psuedo-fail in the sense of current behavior.
Specifically, now running `vagrant reload` after a 'banner exchange exit' will succeed.

I've run reload 100 times under 1.9.2 and 1.8.7.  Results are below.
I've run provision 100 times under 1.9.2 and 1.8.7, with full success.

One thing to think about in the code review is auto-triggering a reload when
the banner exchange error occurs.
Otherwise I think less faultly up and reloading will have to wait bootstrapping
via a serial console.

Command

    rm up-reload-1.8.log; vagrant halt th_ci_runner_6; vagrant up th_ci_runner_6 2>&1|tee up-reload-1.8.log; for ((n=0;n<100;n++)); do time vagrant reload th_ci_runner_6 2>&1|tee -a up-reload-1.8.log; done

Total 101

success (DEBUG: Exiting) (count: 1.9.2 = 100, 1.8.7 = 99)
banner exchange failed (count: 1.9.2 = 1, 1.8.7 = 2)
orphan master control (count: 1.9.2 = 14, 1.8.7 = 5)

Attempt counts:

1 (count: 1.9.2 = 155, 1.8.7 = 161)
2 (count: 1.9.2 = 311, 1.8.7 = 317)
3 (count: 1.9.2 = 34,  1.8.7 = 17)
4 (count: 1.9.2 = 168, 1.8.7 = 167)
5 (count: 1.9.2 = 31,  1.8.7 = 32)
6 (count: 1.9.2 = 1,   1.8.7 = 96)
7 (count: 1.9.2 = 0,   1.8.7=)
@mitchellh
Copy link
Contributor

This is related to #391. Closing this duplicate.

@hashicorp hashicorp locked and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants