Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machines fail to come back up cleanly with hostmanager installed #121

Open
rthomas opened this issue Oct 8, 2014 · 6 comments
Open

Machines fail to come back up cleanly with hostmanager installed #121

rthomas opened this issue Oct 8, 2014 · 6 comments

Comments

@rthomas
Copy link

rthomas commented Oct 8, 2014

I have the following block in my Vagrantfile to only run hostmanager if it is installed:

  if defined? VagrantPlugins::HostManager
    config.hostmanager.enabled = true
    config.hostmanager.manage_host = true
    config.hostmanager.include_offline = false
    # Custom IP resolver is used to pull the IP out of the private DHCP'd
    # interface eth1 as the hostmanager plugin does not support DHCP'd IPs
    config.hostmanager.ip_resolver = proc do |machine|
      result = ""
      machine.communicate.execute("ifconfig eth1") do |type, data|
        result << data if type == :stdout
      end
      (ip = /inet addr:(\d+\.\d+\.\d+\.\d+)/.match(result)) && ip[1]
    end
  end

When it is installed, provisioning my boxes works fine, however if I halt them, and then do a vagrant up each box fails with the message below - removing hostmanager allows them to start up cleanly.

I have 5 boxes in this config and each one fails at the hostmanager stage.

==> apt-cache: Clearing any previously set forwarded ports...
==> apt-cache: Clearing any previously set network interfaces...
==> apt-cache: Preparing network interfaces based on configuration...
    apt-cache: Adapter 1: nat
    apt-cache: Adapter 2: hostonly
==> apt-cache: Forwarding ports...
    apt-cache: 22 => 2222 (adapter 1)
==> apt-cache: Running 'pre-boot' VM customizations...
==> apt-cache: Booting VM...
==> apt-cache: Waiting for machine to boot. This may take a few minutes...
    apt-cache: SSH address: 127.0.0.1:2222
    apt-cache: SSH username: vagrant
    apt-cache: SSH auth method: private key
    apt-cache: Warning: Connection timeout. Retrying...
==> apt-cache: Machine booted and ready!
==> apt-cache: Checking for guest additions in VM...
==> apt-cache: Setting hostname...
==> apt-cache: Configuring and enabling network interfaces...
==> apt-cache: Mounting shared folders...
    apt-cache: /vagrant => /Users/ryan/src/conex.io/infra
==> apt-cache: Updating /etc/hosts file on active guest machines...
The provider for this Vagrant-managed machine is reporting that it
is not yet ready for SSH. Depending on your provider this can carry
different meanings. Make sure your machine is created and running and
try again. Additionally, check the output of `vagrant status` to verify
that the machine is in the state that you expect. If you continue to
get this error message, please view the documentation for the provider
you're using.

The box is left in an up and running state, it is just annoying to run vagrant up five times in order to bring up all of my boxes after a halt.

@rthomas
Copy link
Author

rthomas commented Oct 9, 2014

I believe I have found the root cause here, which is the usage of active_machines here: https://github.com/smdahlen/vagrant-hostmanager/blob/8ec6108143a6cbf9f9ac839ad9124b92a9b9d881/lib/vagrant-hostmanager/action/update_all.rb#L31

From the Vagrant docs, active_machines is:

Returns a list of machines that this environment is currently managing that physically have been created.

An "active" machine is a machine that Vagrant manages that has been created. The machine itself may be in any state such as running, suspended, etc. but if a machine is "active" then it exists.

So this will return the set of machines that have been created, but will also include those in the poweroff state

@dincho
Copy link

dincho commented Dec 14, 2014

Same problem here.

@pykler
Copy link

pykler commented Jan 7, 2015

Same here

@pykler
Copy link

pykler commented Jan 7, 2015

I believe the problem shows up if you have a custom ip_resolver which most people would since the normal ip_resolver just looks at the ssh_config. With a custom ip_resolver, you just have to catch the exception when trying to ssh into the machine that is down. Here is my ip_resolver for reference:

$logger = Log4r::Logger.new('vagrantfile')
def read_ip_address(machine)
  command = "LANG=en ifconfig  | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1 }'"
  result  = ""

  $logger.info "Processing #{ machine.name } ... "

  begin
    # sudo is needed for ifconfig
    machine.communicate.sudo(command) do |type, data|
      result << data if type == :stdout
    end
    $logger.info "Processing #{ machine.name } ... success"
  rescue
    result = "# NOT-UP"
    $logger.info "Processing #{ machine.name } ... not running"
  end

  # the second inet is more accurate
  result.chomp.split("\n").last
end


Vagrant.configure("2") do |config|
    # ...
    if Vagrant.has_plugin?("HostManager")
        # ...
        config.hostmanager.ip_resolver = proc do |vm, resolving_vm|
          read_ip_address(vm)
        end

@rthomas
Copy link
Author

rthomas commented Jan 20, 2015

Thanks @pykler that worked for me.

@dincho
Copy link

dincho commented Jan 21, 2015

yeah wrap in begin/rescue works for me too. thanks @pykler

openstack-gerrit pushed a commit to openstack/devstack-vagrant that referenced this issue Jun 16, 2015
The bug is discussed here:
devopsgroup-io/vagrant-hostmanager#121
so use the suggested workaround in ip resolver to catch the error.

Change-Id: Ibac510ca9aef57b508036266987b136d95771074
openstack-gerrit pushed a commit to openstack/openstack that referenced this issue Jun 16, 2015
Project: openstack-dev/devstack-vagrant  f748559f8f554b8f80c9f74486bc8a2008e5411b

Fix hostmanger ip resolver exception when guest is not really up

The bug is discussed here:
devopsgroup-io/vagrant-hostmanager#121
so use the suggested workaround in ip resolver to catch the error.

Change-Id: Ibac510ca9aef57b508036266987b136d95771074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants