-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fresh vagrant up fails due to machine being locked #8468
Comments
Not sure how much more I can add to satisfy the 'needs-repro' tag as this issue arises from flakiness, but here's the closest thing to a reliable way of reproducing the problem that I've got. A gist of the script I used to reproduce this issue using the Vagrantfile described above: https://gist.github.com/samueljc/a6c9508e50b2899761086acccbf03984 I also tested it with both the following Vagrantfile to show that the problem exists even while using linked clones and explicitly declaring the resource footprint. Vagrant.configure("2") do |config|
config.vm.box = "bento/ubuntu-16.04"
config.vm.provider('virtualbox') do |vb|
vb.linked_clone = true
vb.memory = 1024
vb.cpus = 1
end
end |
@samueljc Hi! The |
Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.
Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.
Tried my hand at patching this. Using the changes I tested it largely the same way as before but looked for the exit status instead of that specific error, and things performed much better. Admittedly it didn't run indefinitely. After about a dozen cycles of bringing 8 boxes up at once 1 of them failed to come up even after retrying a command 3 times. Much better than before though where at least one of them would usually fail on the first batch. |
@chrisroberts: Hi! I'm interested in seeing this issue fixed, so I took a stab at reviewing Samuel's patch (#8525), in case that helps. Thanks @samueljc! |
Is there any update on this?
There's no one accessing anything on this machine and it breaks on random basis (some builds are ok, for some the error happens, at random step involving |
Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.
Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.
Issue: hashicorp#8468 A lot of vboxmanage commands are flakey and frequently cause bringing multiple machines up at once to fail, especially when the host system is under heavy load. Most commands are also safe to retry and just result in a no-op, so we can simply add 'retryable' to a lot of existing calls. For the others we need to do a little bit of cleanup or reevaluate the parameters before trying again.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Vagrant version(s)
1.9.1
1.9.3
Virtualbox version
5.1.14r112924
Host operating system
Ubuntu 16.04
Guest operating system
Ubuntu 16.04
Vagrantfile
Debug output
https://gist.github.com/samueljc/74d5d10e99358831da630cf755a33299
Expected behavior
Vagrant machine comes up.
Actual behavior
Vagrant machine fails to come up and reports that it can't continue because VBoxManage failed due to the machine being locked.
Where exactly it fails varies. I've seen the up fail while clearing forwarded ports, setting forwarded ports, clearing previous network interfaces, setting network interfaces, and during startvm.
Other than startvm everything else uses modifyvm. Looking at the driver for Virtualbox 5.0, only one of the modifyvm commands has retryable. Would it be reasonable/safe to retry such commands if they fail with a short sleep between attempts?
Steps to reproduce
Note, this doesn't happen reliably but seems to happen much more frequently if the host machine is under heavy load. We've had the issue happen occasionally in our gitlab-ci pipeline which brings up multiple machines simultaneously - note, all the machines in the pipeline are headless, linked clones.
I was able to reproduce it somewhat reliably (1 or 2 bad executions per attempt) using the provided Vagrantfile and creating 8 machines in parallel.
References
I've seen other instances of machines getting locked but the posts are about rescuing them. Rescuing a locked machine doesn't help me though; I need the machines to come up reliably and they'll be disposed of after using them.
The text was updated successfully, but these errors were encountered: