Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Acceptance testing] Vagrant snapshot/restore networking issues with some host machines #5467

Open
purbon opened this issue Jun 9, 2016 · 7 comments
Labels

Comments

@purbon
Copy link
Contributor

purbon commented Jun 9, 2016

During the development of our internal QA acceptance test framework that runs on vagrant/virtualbox VM's we encountered the need of using the snapshot/restore in vagrant so we could handle situations where the logstash installation state should be reverted to a known state.

The issue detected is basically that after a bootstrap/restore dance is performance the target host of this restore is not accessible from the host machine through ssh, might hashicorp/vagrant#391 be related.

In a test machine, similar to our CI environment, that ubuntu (14.04) as host, is it possible to reproduce this error with this small steps:

  • vagrant up vm-name
  • vagrant snapshot push vm-name
  • vagrant snapshot pop vm-name

the target OS for this test has been ubuntu versions from 14.04 to 16.04 and this is a common output for it https://gist.github.com/purbon/aaf983e5f5c0d4014a94cfc7fb08c7e9

if before the snapshot is requested we halt the machine using vagrant halt vm-name the snapshot/restore dance is done without any issue.

Also important to notice that doing the steps commented earlier with a debian-8 target OS we're not seeing this issue with snapshot (see https://gist.github.com/purbon/3852d18af6e76b8ff569ab47cf045ac2 for details)

During development, this issue was triggered for one developer having this specifications:

Vagrant 1.8.1
MacOS X 10.11.3
Target OS was ubuntu: 1504.
jruby 1.7.23 (1.9.3p551) 2015-11-24 f496dd5 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26 +jit [darwin-x86_64]

but not for other developer having:

Vagrant   1.8.1
MacOS X 10.10.5
virtualbox 5.0.2
jruby 1.7.20 (1.9.3p551) 2015-05-04 3086e6a on Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-ea-b15 +jit [darwin-x86_64]

the test machine specs are:

Ubuntu 14.04.4 LTS, Trusty Tahr
virtualbox-5.0 5.0.20-10693`
Vagrant 1.8.1

as of now, is prefered you avoid using snapshot / restore in your acceptance test code, or in case you need you should make sure either to:

  • halt the machine before you run the snapshot.
  • be sure the process will run smooth for your target os.

related to #5324

@purbon
Copy link
Contributor Author

purbon commented Jun 9, 2016

@dliappis as you created the ubuntu VM images we're using, together with the debian ones, anything that you might see to be sure this issue is not related with the way the ubuntu images are made?

@purbon purbon changed the title [Acceptance test] Vagrant snapshot/restore networking issues with some host machines [Acceptance testing] Vagrant snapshot/restore networking issues with some host machines Jun 9, 2016
@dliappis
Copy link
Contributor

dliappis commented Jun 9, 2016

@purbon there is nothing special about the way our ubuntu images are created apart from disabling predicable interface names due to this vagrant bug -- to be fixed in vagrant release 1.8.3

In order to verify, I tried using the "official" Ubuntu maintained vagrant box for Xenial -- but it is currently unavailable [1]

Then I tried the bento/ubuntu-16.04 box -- a popular common box -- and got the same issue.
Note that the bento boxes do not disable predictable network interface names and as a result even vagrant up won't work if you try configuring a private network interface (as vagrant in current versions will try to use the device eth1)

As next we should try building our own vagrant from master and see if hashicorp/vagrant#7393 resolved this.

[1]:

ubuntu1604vbox: Downloading: https://atlas.hashicorp.com/ubuntu/boxes/xenial64/versions/20160608.0.0/providers/virtualbox.box
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.

@dliappis
Copy link
Contributor

dliappis commented Jun 9, 2016

Just tried the same with vagrant/master without luck. To be more exact the issue appeared less frequently, but it did happen regardless of the base box used.

I followed this process to install vagrant/master https://github.com/mitchellh/vagrant/wiki/Installing-Vagrant-from-Source

@purbon
Copy link
Contributor Author

purbon commented Jun 9, 2016

Thanks for your feedback here Dimitrios, I guess this might be enough test
to validate this issue, we've for now a "plausible" workaround, I guess
next step is to actually push all this information back to vagrant and see
what do they think about it.

Are you ok with it?

On Thu, Jun 9, 2016 at 2:14 PM Dimitrios Liappis notifications@github.com
wrote:

Just tried the same with vagrant/master without luck. To be more exact
the issue appeared less frequently, but it did happen regardless of the
base box used.

I followed this process to install vagrant/master
https://github.com/mitchellh/vagrant/wiki/Installing-Vagrant-from-Source


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5467 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAELvK3QLwfD1NfaDUvZwODjpKPZ4MtZks5qKAOIgaJpZM4Ixxmy
.

@dliappis
Copy link
Contributor

dliappis commented Jun 9, 2016

Actually I waited some time (long time) and at least the bento/ubuntu-16.04 box did come up a couple of times (that is with vagrant/master). On other occasion it went into an infinite loop of Retrying

Even if it does work once in a while, I am not sure if waiting that long is worth it, as it defeats the purpose of snapshoting.

With regards to a workaround, I am not sure there is one apart from doing a vagrant halt before snapshot push/pop

For sure we need to share this information to the corresponding vagrant issue, but what is exactly the requirement for vagrant snapshots you are having? What problem are you solving through it? I am asking to see if there is any other strategy that could be employed.

$ $PWD/vagrant/exec/vagrant snapshot pop ubuntu1604vbox
Vagrant appears to be running in a Bundler environment. Your 
existing Gemfile will be used. Vagrant will not auto-load any plugins
installed with `vagrant plugin`. Vagrant will autoload any plugins in
the 'plugins' group in your Gemfile. You can force Vagrant to take over
with VAGRANT_FORCE_BUNDLER.

You appear to be running Vagrant outside of the official installers.
Note that the installers are what ensure that Vagrant has all required
dependencies, and Vagrant assumes that these dependencies exist. By
running outside of the installer environment, Vagrant may not function
properly. To remove this warning, install Vagrant using one of the
official packages from vagrantup.com.

==> ubuntu1604vbox: Forcing shutdown of VM...
==> ubuntu1604vbox: Restoring the snapshot 'push_1465474238_1035'...
==> ubuntu1604vbox: Deleting the snapshot 'push_1465474238_1035'...
==> ubuntu1604vbox: Snapshot deleted!
==> ubuntu1604vbox: Checking if box 'bento/ubuntu-16.04' is up to date...
==> ubuntu1604vbox: Resuming suspended VM...
==> ubuntu1604vbox: Booting VM...
==> ubuntu1604vbox: Waiting for machine to boot. This may take a few minutes...
    ubuntu1604vbox: SSH address: 127.0.0.1:2200
    ubuntu1604vbox: SSH username: vagrant
    ubuntu1604vbox: SSH auth method: private key
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
    ubuntu1604vbox: Warning: Remote connection disconnect. Retrying...
==> ubuntu1604vbox: Machine booted and ready!

@dliappis
Copy link
Contributor

dliappis commented Jun 9, 2016

@purbon I tried a loop (see below) of 10 push + pop's on my Fedora 23 workstation with the stock vagrant 1.8.1 (where I sorted out the issue missing the vagrant snapshot commands[1]).
This worked flawlessly.

$ vagrant up ubuntu1604vbox
$ for i in $( seq 10 ); do vagrant snapshot push ubuntu1604vbox; vagrant snapshot pop ubuntu1604vbox; done

So with Fedora-23 as host I couldn't reproduce the problem.
Have you noticed other host platforms, apart from Ubuntu 14.04 LTS, manifesting this issue?

[1]: this was because I had the https://github.com/dergachev/vagrant-vbox-snapshot plugin installed, which I have now uninstalled.

@purbon
Copy link
Contributor Author

purbon commented Jun 9, 2016

in our test, was happening for @ph in his mac env, not for me similar mac and in CI all ubuntu hosts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants