fresh vagrant up fails due to machine being locked #8468

samueljc · 2017-04-11T16:54:24Z

Vagrant version(s)

1.9.1
1.9.3

Virtualbox version

5.1.14r112924

Host operating system

Ubuntu 16.04

Guest operating system

Ubuntu 16.04

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "bento/ubuntu-16.04"
end

Debug output

https://gist.github.com/samueljc/74d5d10e99358831da630cf755a33299

Expected behavior

Vagrant machine comes up.

Actual behavior

Vagrant machine fails to come up and reports that it can't continue because VBoxManage failed due to the machine being locked.

Where exactly it fails varies. I've seen the up fail while clearing forwarded ports, setting forwarded ports, clearing previous network interfaces, setting network interfaces, and during startvm.

Other than startvm everything else uses modifyvm. Looking at the driver for Virtualbox 5.0, only one of the modifyvm commands has retryable. Would it be reasonable/safe to retry such commands if they fail with a short sleep between attempts?

Steps to reproduce

Note, this doesn't happen reliably but seems to happen much more frequently if the host machine is under heavy load. We've had the issue happen occasionally in our gitlab-ci pipeline which brings up multiple machines simultaneously - note, all the machines in the pipeline are headless, linked clones.

I was able to reproduce it somewhat reliably (1 or 2 bad executions per attempt) using the provided Vagrantfile and creating 8 machines in parallel.

References

I've seen other instances of machines getting locked but the posts are about rescuing them. Rescuing a locked machine doesn't help me though; I need the machines to come up reliably and they'll be disposed of after using them.

samueljc · 2017-04-13T00:08:21Z

Not sure how much more I can add to satisfy the 'needs-repro' tag as this issue arises from flakiness, but here's the closest thing to a reliable way of reproducing the problem that I've got.

A gist of the script I used to reproduce this issue using the Vagrantfile described above: https://gist.github.com/samueljc/a6c9508e50b2899761086acccbf03984

I also tested it with both the following Vagrantfile to show that the problem exists even while using linked clones and explicitly declaring the resource footprint.

Vagrant.configure("2") do |config|
  config.vm.box = "bento/ubuntu-16.04"
  config.vm.provider('virtualbox') do |vb|
    vb.linked_clone = true
    vb.memory = 1024
    vb.cpus = 1
  end
end

chrisroberts · 2017-04-13T16:14:48Z

@samueljc Hi! The needs-repro isn't for you. It's simply a way to let me know I need to reproduce the error locally to identify the root cause and impact of a fix. The Vagrantfiles you provided are perfect. Thanks!

Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.

samueljc · 2017-04-25T20:28:24Z

Tried my hand at patching this. Using the changes I tested it largely the same way as before but looked for the exit status instead of that specific error, and things performed much better.

Admittedly it didn't run indefinitely. After about a dozen cycles of bringing 8 boxes up at once 1 of them failed to come up even after retrying a command 3 times. Much better than before though where at least one of them would usually fail on the first batch.

wasosa · 2017-05-15T16:53:00Z

@chrisroberts: Hi! I'm interested in seeing this issue fixed, so I took a stab at reviewing Samuel's patch (#8525), in case that helps. Thanks @samueljc!

marek-obuchowicz · 2017-08-16T10:56:21Z

Is there any update on this?
I also observed this issue on vagrant 1.9.2 and after upgrading to 1.9.7. It's a CI system where only on job is setup and people don't have access to this machine. On random steps in our pipeline, which involve vagrant up, i'm getting:

+ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
    default: Adapter 2: hostonly
==> default: Forwarding ports...
    default: 3306 (guest) => 3306 (host) (adapter 1)
    default: 10004 (guest) => 10004 (host) (adapter 1)
    default: 10005 (guest) => 10005 (host) (adapter 1)
    default: 10007 (guest) => 10007 (host) (adapter 1)
    default: 15672 (guest) => 15672 (host) (adapter 1)
    default: 58080 (guest) => 59080 (host) (adapter 1)
    default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
==> default: Machine booted and ready!
[default] GuestAdditions 5.1.26 running --- OK.
==> default: Checking for guest additions in VM...
==> default: Setting hostname...
==> default: Configuring and enabling network interfaces...
==> default: Exporting NFS shared folders...
==> default: Preparing to edit /etc/exports. Administrator privileges will be required...
● nfs-kernel-server.service - LSB: Kernel NFS server support
   Loaded: loaded (/etc/init.d/nfs-kernel-server)
   Active: active (running) since Tue 2017-08-15 19:12:21 CEST; 8h ago
  Process: 12102 ExecStop=/etc/init.d/nfs-kernel-server stop (code=exited, status=0/SUCCESS)
  Process: 13560 ExecStart=/etc/init.d/nfs-kernel-server start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs-kernel-server.service
           └─13585 /usr/sbin/rpc.mountd --manage-gids
==> default: Mounting NFS shared folders...
==> default: Mounting shared folders...
    default: /vagrant => /var/lib/jenkins/jobs/siroop-vm-loki/workspace
==> default: [vagrant-hostmanager:guests] Updating hosts file on active guest virtual machines...
Vagrant can't use the requested machine because it is locked! This
means that another Vagrant process is currently reading or modifying
the machine. Please wait for that Vagrant process to end and try
again. Details about the machine are shown below:

Name: default
Provider: virtualbox

There's no one accessing anything on this machine and it breaks on random basis (some builds are ok, for some the error happens, at random step involving vagrant up. Please let me know if this should be reported as seperate issue, or is related to this one.

Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.

ghost · 2020-03-31T02:31:13Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

chrisroberts added needs-repro provider/virtualbox labels Apr 12, 2017

samueljc mentioned this issue Apr 25, 2017

8468 - make more virtualbox commands retryable #8525

Closed

briancain mentioned this issue Sep 6, 2017

Improving resilience of some VirtualBox commands fixup #8951

Merged

briancain closed this as completed in #8951 Sep 6, 2017

romseygeek mentioned this issue Sep 21, 2018

Occasional failure in vagrant Windows2012r2 packaging test elastic/elasticsearch#33937

Closed

ghost locked and limited conversation to collaborators Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fresh vagrant up fails due to machine being locked #8468

fresh vagrant up fails due to machine being locked #8468

samueljc commented Apr 11, 2017

samueljc commented Apr 13, 2017

chrisroberts commented Apr 13, 2017

samueljc commented Apr 25, 2017

wasosa commented May 15, 2017

marek-obuchowicz commented Aug 16, 2017

ghost commented Mar 31, 2020

fresh vagrant up fails due to machine being locked #8468

fresh vagrant up fails due to machine being locked #8468

Comments

samueljc commented Apr 11, 2017

Vagrant version(s)

Virtualbox version

Host operating system

Guest operating system

Vagrantfile

Debug output

Expected behavior

Actual behavior

Steps to reproduce

References

samueljc commented Apr 13, 2017

chrisroberts commented Apr 13, 2017

samueljc commented Apr 25, 2017

wasosa commented May 15, 2017

marek-obuchowicz commented Aug 16, 2017

ghost commented Mar 31, 2020