Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port collision detection broken under VMware Fusion 8.5.1 #7948

Closed
sirn opened this issue Oct 30, 2016 · 111 comments
Closed

Port collision detection broken under VMware Fusion 8.5.1 #7948

sirn opened this issue Oct 30, 2016 · 111 comments

Comments

@sirn
Copy link

sirn commented Oct 30, 2016

Vagrant version

$ vagrant -v
Vagrant 1.8.6

$ vagrant plugin list
vagrant-scp (0.5.7)
vagrant-share (1.1.5, system)
vagrant-vmware-fusion (4.0.14)

Host operating system

macOS Sierra 10.12.1

Guest operating system

Ubuntu 12.04 (VMware Fusion 8.5.1)

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "hashicorp/precise64"
  config.vm.network "forwarded_port", guest: 80, host: 8080
end

Debug output

https://gist.github.com/sirn/b885d89d02ec1b426b91beb35a65d34f

Expected behavior

No port collision.

Actual behavior

Vagrant detected port collision.

Steps to reproduce

  1. Run vagrant up with the provided Vagrantfile, no port collision should happen in this step.
  2. Run vagrant halt to shutdown the machine.
  3. Run vagrant up again.

In /Library/Preferences/VMware Fusion/vmnet8/nat.conf, this is the [incomingtcp] section after the first vagrant up run (port 8080 successfully forwarded):

[incomingtcp]

# Use these with care - anyone can enter into your VM through these...
# The format and example are as follows:
#<external port number> = <VM's IP address>:<VM's port number>
#8080 = 172.16.3.128:80
# VAGRANT-BEGIN: /Users/sirn/Desktop/foobar/.vagrant/machines/default/vmware_fusion/cb8daaaa-02d8-4707-9708-e8c5e30d17dd/precise64.vmx
8080 = 192.168.229.128:80
2222 = 192.168.229.128:22
# VAGRANT-END: /Users/sirn/Desktop/foobar/.vagrant/machines/default/vmware_fusion/cb8daaaa-02d8-4707-9708-e8c5e30d17dd/precise64.vmx

Then the [incomingtcp] section after the second vagrant up that resulted in an error:

[incomingtcp]

# Use these with care - anyone can enter into your VM through these...
# The format and example are as follows:
#<external port number> = <VM's IP address>:<VM's port number>
#8080 = 172.16.3.128:80
2222 = 192.168.229.128:22
8080 = 192.168.229.128:80

I've tried nuking /Library/Preferences/VMware Fusion/networking* and /Library/Preferences/VMware Fusion/vmnet* and re-run VMware Fusion to have it populate the configuration prior to running, but still the same result.

I can also reproduce this issue in my other machine running exactly the same setup as well.

@vdanen
Copy link

vdanen commented Oct 31, 2016

I'm seeing the exact same issue here as well.

@rphillips
Copy link

Same here as well

@mungler
Copy link

mungler commented Nov 2, 2016

Me too. Preventing me from working, currently.

@mungler
Copy link

mungler commented Nov 2, 2016

I managed to recover from this by downloading VMWare Fusion 8.5.0 and installing it over the top of 8.5.1, then nuking the .vagrant directory from my project. Upon vagrant up, the box reprovisioned and the port-forwarding worked fine.

@mungler
Copy link

mungler commented Nov 3, 2016

Actually, I had to completely remove VMWare, including all the files mentioned here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017838

Then I rebooted, installed 8.5.0 cleanly (needed to enter my license code etc again) and finally set Vagrant back up. Things are finally back to a working condition.

@sirn
Copy link
Author

sirn commented Nov 3, 2016

If anyone has this problem and do not mind countless entries in VMware's nat.conf file, you can actually add auto_correct: true and use the same port as before (instead of the corrected port).

The problem seems to be that either Vagrant or VMware couldn't properly cleaned up the port forwarding entries in nat.conf file, so the staled entries remained there, causing Vagrant to incorrectly detect port collision. But since the staled entries actually pointing to the VM we want to connect to, we can actually use those port as if it were successfully forwarded.

@changx
Copy link

changx commented Nov 9, 2016

Not only nat.conf, look at your vmnet8/../networking , lots of add_nat_portfwd items here too.

@chrisroberts
Copy link
Member

Just for reference as I'm trying to get a reproduction of this behavior locally: did this occur after an upgrade to either fusion and/or the vagrant-vmware plugin? If so, can you provide what the previous versions were prior to upgrading? Thanks!

@mungler
Copy link

mungler commented Nov 10, 2016

Hi @chrisroberts - for me, everything was working fine on Fusion 8.5.0 - I upgraded to 8.5.1 and the issues began. Blowing away Fusion entirely (plus all supporting files) then reinstalling from an 8.5.0 download and readding my licence key fixed the problem for me, so I definitely think its a change in 8.5.1 which broke it. For reference, i'm on macOS Sierra 10.12.1, and my vagrant plugin version is 4.0.11

@chrisroberts
Copy link
Member

Okay, great. Thanks!

@sirn
Copy link
Author

sirn commented Nov 10, 2016

Yes, me is the same. It broke after upgrading VMware Fusion to 8.5.1. I reverted VMware Fusion to 8.5.0 and everything back to working normally.

@chrisroberts
Copy link
Member

Still not having any luck reproducing the behavior. Currently using:

  • OS: macOS Sierra
  • Fusion: 8.5.1
  • vagrant 1.8.6 (and 1.8.7)
  • vagrant-vmware-fusion: 4.0.14

This is a fresh VM, so I'm wondering if perhaps there's other content in the section that is causing problems? Do either of your [incomingtcp] sections contain any other entries (from vagrant or otherwise)?

@chrisroberts
Copy link
Member

If it is easily reproducible, can you start with a fresh nat.conf file, and run up, halt, and up with --debug on the commands and provide gists of each of them? That might be the fastest way to identify the problem.

@sirn
Copy link
Author

sirn commented Nov 11, 2016

My [incomingtcp] section do not contain any extra entries. (As mentioned in the original post, I did nuke the whole networking configuration and start anew.) I'll try to reproduce it and get back the debug log. Meanwhile, this is the gist for the second up: https://gist.github.com/sirn/b885d89d02ec1b426b91beb35a65d34f

@chrisroberts
Copy link
Member

@sirn The debug output from the first up and the halt commands would be really helpful as the final up (which you provided the debug output for) shows the [incomingtcp] section being read and shows no vagrant comment markers. I'm hoping to be able to identify where they are being lost in one of the first two commands run. Thanks!

@zienowicz
Copy link

Adding a "Me Too" for this issue in case any data from my configuration would be useful. Same environment as everyone else having the problem (Sierra, latest Vagrant and plugin). In my case the affected machines are Laravel Homestead images that forward a bunch of standard web dev ports (22, 80, 443, 3306, and 5432).

Until recently a vagrant reload fixed the problem some of the time, but that trick seems to have stopped working. I worked around it yesterday by specifying forwarded port numbers rather than letting it detect automatically.

@iamlittle
Copy link

iamlittle commented Nov 12, 2016

Seeing this as well.

  • Fusion: 8.5.1
  • vagrant: 1.8.7
  • vagrant-vmware-fusion: 4.0.14

@chrisroberts
Copy link
Member

For anyone adding a "me too" on this issue, I need some more information to help determine what is causing the issue. Please see this comment: #7948 (comment)

@sirn
Copy link
Author

sirn commented Nov 13, 2016

@chrisroberts sorry that it took so long, got few stuff to finish that I can't nuke my environment.

Debug logs:

Networking logs (hopefully I did not commit any sensitive information here):

https://github.com/sirn/vagrant-vmware-networking-logs/commits/master

Also for a reference, I'm using VMware Fusion 8.5.1 Pro.

@zienowicz
Copy link

@ernestonakamura
Copy link

I had the same issue with VMware Fusion 8.5.1, but after upgrading to 8.5.2 it started to work again.
Here are my working versions:

  • VMware Fusion: 8.5.2
  • vagrant: 1.8.4
  • vagrant-vmware-fusion: 4.0.9

@zienowicz
Copy link

Simply upgrading to Fusion 8.5.2 didn't work for me. I also had to remove a bunch of cruft from the associated nat.conf file. There were a number of entries under [incomingtcp] that were not between VAGRANT-BEGIN and VAGRANT-END comment blocks. See this gist for an excerpt of nat.conf: https://gist.github.com/zienowicz/d414408d2e348b5bb7616fd118339e1d

Removing everything under [incomingtcp] and then running vagrant up a couple of times seems to have fixed nat.conf and allowed the vagrant box to start and forward ports properly. I regularly run three or four different boxes on this machine, so I'll exercise them a bit and report back if I run into additional problems.

@zienowicz
Copy link

Nope. Now I'm getting the dreaded "The VMware 'vmnet' devices are failing to start" message. The box is running, and the ports seem to be forwarding, but the shared folder won't mount. My nat.conf has all those (what I thought were) extraneous entries again, plus the ones clearly added by the Vagrant plugin.

@zienowicz
Copy link

And now we're back to the port collision error (upon subsequent vagrant reload attempts). Please let me know if there is any additional diagnostic info I can provide.

@chrisroberts
Copy link
Member

@sirn Thank you for providing all those logs! While I cannot get this behavior to reproduce on its own, I was able to identify an anomaly within your log files that I was not experiencing. On the second up command, this shows up in the logs:

Hostonly virtual adapter on vmnet1 is disabled
DHCP service on vmnet8 is not running
NAT service on vmnet8 is not running
Hostonly virtual adapter on vmnet8 is disabled
DEBUG subprocess: stderr: DEBUG subprocess: Waiting for process to exit. Remaining to timeout: 32000
DEBUG subprocess: Exit status: 1
DEBUG subprocess: stderr:  INFO vmware_driver: NAT config changed but isn't running, so not restarting.

Just after this, the vmnet services are started. To simulate this, I halted, ran vmnet-cli --stop, then uped. Inspecting the nat.conf file I see:

[incomingtcp]

# Use these with care - anyone can enter into your VM through these...
# The format and example are as follows:
# <external port number> = <VM's IP address>:<VM's port number>
#8080 = 172.16.3.128:80
2222 = 172.16.72.131:22
8080 = 172.16.72.131:80

When vmnet is started back up, it's re-adding these entries, even though they were just removed. And now that the vagrant markers are gone, vagrant will not remove them.

Now that I can reliably reproduce this, I'm working on adding some guards to prevent this behavior.

@zienowicz
Copy link

I had temporarily given up on this and switched to a different machine so I could get work done. Circling back to this issue, I noticed that /Library/Preferences/VMware Fusion/networking contained what appeared to be extraneous port forwarding entries. I cleared those out, made sure that nat.conf had also been cleared of port forwarding entries, halted the Vagrant boxes (which had been suspended), and rebooted. (I did also upgrade Vagrant from 1.8.6 to 1.8.7 before I started.) I have so far successfully started and suspended my Vagrant boxes several times, so fingers crossed that I'm out of the woods. Maybe the networking file also needs some protection against wayward entries?

@vdanen
Copy link

vdanen commented Nov 22, 2016

I had cleaned things up before and just now halted my vm, upgraded to vagrant 1.8.7 and vmware fusion pro 8.5.2 and now it's back again. The error looks different than last time though:

vdanen@sif:~/git/otter|master⚡ ⇒  vagrant up
Bringing machine 'otter' up with 'vmware_fusion' provider...
==> otter: Checking if box 'puppetlabs/centos-7.0-64-nocm' is up to date...
==> otter: There was a problem while downloading the metadata for your box
==> otter: to check for updates. This is not an error, since it is usually due
==> otter: to temporary network problems. This is just a warning. The problem
==> otter: encountered was:
==> otter:
==> otter:
==> otter:
==> otter: If you want to check for box updates, verify your network connection
==> otter: is valid and try again.
==> otter: Skipping vmnet device verification, verify_vmnet is set to false.
==> otter: Preparing network adapters...
Vagrant cannot forward the specified ports on this VM, since they
would collide with some other application that is already listening
on these ports. The forwarded port to 5000 is already in use
on the host machine.

To fix this, modify your current project's Vagrantfile to use another
port. Example, where '1234' would be replaced by a unique host port:

  config.vm.network :forwarded_port, guest: 80, host: 1234

Sometimes, Vagrant will attempt to auto-correct this for you. In this
case, Vagrant was unable to. This is usually because the guest machine
is in a state which doesn't allow modifying port forwarding. You could
try 'vagrant reload' (equivalent of running a halt followed by an up)
so vagrant can attempt to auto-correct this upon booting. Be warned
that any unsaved work might be lost.

Debug output is here: https://gist.github.com/vdanen/95c69e4ae6844ad8f391b8ca4dc9d797

@jhogendorn
Copy link

So theres a new version of VMWare on the horizon, is there any chance of this bug being resolved any time soon? @chrisroberts @mitchellh can we get an update?

@jdelaune
Copy link

Thanks @stevenwaskey that solved the issue for me as well. This bug sure is annoying

@colepeters
Copy link

Thanks so much @stevenwaskey, this fixed the issue for me as well. What a relief.

@jen20
Copy link
Contributor

jen20 commented Nov 6, 2017

I am still seeing this on High Sierra with VMWare 8.5.2 - it should be reopened. Requiring manual editing of files is not an acceptable workaround, and nor is requiring an upgrade to VMWare Fusion 10.0 (and therefore also a plugin license upgrade).

@chrisroberts
Copy link
Member

@jen20 Hi James, what version of the VMware plugin do you currently have installed? Do you have the latest version installed (currently 5.0.2)? There is no upgrade required (VMware or plugin license) to use the latest version of the plugin. The plugin license would only need to be upgraded if you are using Fusion 10.

@sigil66
Copy link

sigil66 commented Nov 6, 2017

I still experience the issue with:
OS X: 10.12.6
VMWare: 8.5.8
vagrant-vmware-fusion: 5.0.2

@jen20
Copy link
Contributor

jen20 commented Nov 6, 2017

I experience it with the same versions of VMWare and the vagrant-vmware-fusion plugin as @sigil66, but with OSX 10.13.

@chrisroberts
Copy link
Member

Has the nat.conf file been cleaned of any old forwards? Are these collisions newly generated? I was able to isolate the source of the collision to the vmnet service rewriting the nat.conf file after a service restart. Updates to the service interaction and networking file configuration resolved the collisions issues (and this was repro + implementation fix was on Fusion 8/Workstation 12).

If your nat file is clean and you are still getting collisions, would you gist a debug run so I can take a look at the behavior. I've got a fresh macos 10.13 + fusion 8 test instance running to see if I can force a reproduction. Some debug output may help me track down a difference to force a collision state.

Thanks!

@jen20
Copy link
Contributor

jen20 commented Nov 6, 2017

Yes, nat.conf has no other forwards (I've been cleaning it out regularly as a workaround for this). Next time I see it I'll gist debug output, but it's fairly non-deterministic. FWIW it appears vastly more frequently on my "fast" machine than my laptop, re-enforcing the idea that it is a race condition. For most work (at least the bits that don't require nested virtualisation that actually works) I've switched to using the Parallels provider as it is open source and I can fix it myself when it breaks.

@jen20
Copy link
Contributor

jen20 commented Nov 6, 2017

OK here's a new one. A box running the latest versions of everything with Fusion 10 instead of 8.5:

❯ vagrant up
Bringing machine 'compile' up with 'vmware_fusion' provider...
==> compile: Cloning VMware VM: 'FreeBSD-12.0-CURRENT-BHYVE-NODEBUG'. This can take some time...
==> compile: Verifying vmnet devices are healthy...
The VMware "vmnet" devices are failing to start. The most common
reason for this is collisions with existing network services. For
example, if a hostonly network space collides with another hostonly
network (such as with VirtualBox), it will fail to start. Likewise,
if forwarded ports collide with other listening ports, it will
fail to start.

Vagrant does its best to fix these issues, but in some cases it
cannot determine the root cause of these failures.

Please verify you have no other colliding network services running.
As a last resort, restarting your computer often fixes this issue.

The second time (with debug logging) it worked though. I think it's safe to say this isn't fixed.

@sigil66
Copy link

sigil66 commented Nov 6, 2017

@jen20 same, if I run vagrant up enough times it will work eventually.

@jen20
Copy link
Contributor

jen20 commented Nov 7, 2017

@chrisroberts I have a new theory here. I've now seen it fail two or three times in a row, then succeed as soon as I enable debug output. I think the additional work of debug logging is sufficient to make the race not happen. I would suggest trying faster hardware to reproduce this.

@jen20
Copy link
Contributor

jen20 commented Nov 8, 2017

@chrisroberts Here is a complete debug log (with only the some directory names redacted in a way which doesn't alter semantics).

https://gist.github.com/jen20/6121900158a5388576c9628150b85db5

This is with Vagrant 2.0.0 rather than 2.0.1 but nothing in the change log suggests that a fix has been applied in the point release.

@jen20
Copy link
Contributor

jen20 commented Nov 8, 2017

The subsequent vagrant up completed successfully with no other operation carried out, lending further credence to the idea that this is a race condition.

@ice799
Copy link

ice799 commented Feb 13, 2018

I can confirm that I recently upgraded my vagrant and vagrant plugins and see this issue on:

Vagrant 2.0.2
VMWare Fusion Version 8.5.10 (7527438)
vagrant-vmware-fusion (5.0.4)

running OSX 10.13.3 High Sierra

with a cleared nat.conf (which I have previously cleared somewhat regularly to avoid this issue, prior to this upgrade) containing no conflicting port forwards.

Running vagrant up multiple times eventually succeeds.

@sigil66
Copy link

sigil66 commented Feb 13, 2018

Wondering why this is closed, this is the issue that continues to give ...

@davosian
Copy link

davosian commented Apr 6, 2018

I am having the same issue using the newer vmware-desktop plugin, version 1.0.2. Clearing out nat.conf manually resolves the issue temporarily.

Vagrant 2.0.3
vagrant-vmware-desktop (1.0.2)
VMWare Fusion Pro 10.1.1

Vagrant installation performed through the official installers and cleared out ~/.vagrant.d to start fresh.

@chrisroberts
Copy link
Member

@davosian Would you halt/destroy all VMware VMs and follow these steps:

  • Remove any port forward entries in /Library/Preferences/VMware Fusion/vmnet8/nat.conf
  • Remove any port forward directives in /Library/Preferences/VMware Fusion/networking
  • Stop VMware networking: sudo /Applications/VMware Fusion/Contents/Library/vmnet-cli --stop
  • Start VMware networking: sudo /Applications/VMware Fusion/Contents/Library/vmnet-cli --start
  • Validate port forward entries/directives do not exist in networking or nat.conf files

After this, do you still collisions occurring?

Thanks!

@davosian
Copy link

davosian commented Apr 9, 2018

@chrisroberts I followed your instructions and so far I am not having any more collisions. networking did not have any port forwarding set up.

Thanks for your support!

1 similar comment
@davosian
Copy link

davosian commented Apr 9, 2018

@chrisroberts I followed your instructions and so far I am not having any more collisions. networking did not have any port forwarding set up.

Thanks for your support!

@davosian
Copy link

davosian commented Apr 9, 2018

I spoke too soon - getting port collisions again when running these steps:

vagrant init centos/7
vagrant up
vagrant halt
vagrant destroy
rm -rf .vagrant
vagrant global-status --> no machine listed

When I then check /Library/Preferences/VMware Fusion/vmnet8/nat.conf I still see a port entry for ssh (2222) and running vagrant up again results in collisions.

VMWare as default provider set in my .zshrc file:

# Vagrant default provider
export VAGRANT_DEFAULT_PROVIDER='vmware_desktop';

Any idea what might be causing this?

@jen20
Copy link
Contributor

jen20 commented Apr 14, 2018

This is still occurring for me with the new vagrant-desktop plugin:

Bringing machine 'linux1' up with 'vmware_desktop' provider...
Bringing machine 'linux2' up with 'vmware_desktop' provider...
==> linux1: Cloning VMware VM: 'bento/ubuntu-18.04'. This can take some time...
==> linux1: Checking if box 'bento/ubuntu-18.04' is up to date...
==> linux1: Verifying vmnet devices are healthy...
==> linux1: Preparing network adapters...
==> linux1: Starting the VMware VM...
==> linux1: Waiting for the VM to receive an address...
Some of the defined forwarded ports would collide with existing
forwarded ports on VMware network devices. This can be due to
existing Vagrant-managed VMware machines, or due to manually
configured port forwarding with VMware. Please fix the following
port collisions and try again:

2222

The Vagrantfile is:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
	config.vm.define "linux1" do |vmCfg|
		vmCfg.vm.box = "bento/ubuntu-18.04"
		vmCfg.vm.box_check_update = true
		vmCfg.vm.hostname = "linux1"

		vmCfg.vm.network "private_network", ip: "10.16.0.10"

		vmCfg.vm.provider "vmware_desktop" do |v|
			v.vmx["memsize"] = "4096"
			v.vmx["numvcpus"] = "2"
		end
	end
	
	config.vm.define "linux2" do |vmCfg|
		vmCfg.vm.box = "bento/ubuntu-18.04"
		vmCfg.vm.box_check_update = true
		vmCfg.vm.hostname = "linux2"
		
		vmCfg.vm.network "private_network", ip: "10.16.0.11"
		
		vmCfg.vm.provider "vmware_desktop" do |v|
			v.vmx["memsize"] = "4096"
			v.vmx["numvcpus"] = "2"
		end
	end
end
$ vagrant version
Installed Version: 2.0.3
Latest Version: 2.0.3

You're running an up-to-date version of Vagrant!
$ vagrant plugin list
vagrant-vmware-desktop (1.0.0)

VMWare Fusion is Professional Version 10.1.1 (7520154)

Looks like this issue needs to be reopened.

@jen20
Copy link
Contributor

jen20 commented Apr 14, 2018

This actually seems to cause numerous issues, including SSH'ing into the wrong box when multi-box setups are present. For example, in the case above, I can vagrant ssh linux1, but get prompted for a password on linux2, and if I enter the password baked into the box, I find myself on linux2.

Not really sure how to proceed at this point - I must use VMware (I need nested virtualization) but cannot fix the bugs in the provider as the source is not available.

@jchappell82
Copy link

I can also confirm that this is still an issue in the latest vagrant-vmware-desktop plugin, running against the most recent version of VMware Fusion 10 on macOS 10.13.4 here as well.

@WizBangCrash
Copy link

I can also confirm that this is still an issue.
I'm using VMWare Fusion 10.1.1, macOS 10.13.4, vagrant 2.0.4, plugin vagrant-share 1.1.9 & plugin vagrant-vmware-desktop 1.0.3

@jscheid
Copy link

jscheid commented Jan 27, 2019

Still an issue for me as well. VMware Fusion 10.1.5, macOS 10.12.6, Vagrant 2.2.3, plugin vagrant-share 1.1.9, plugin vagrant-vmware-desktop 2.0.3. Is there anything else I can do to help fix this? It's quite annoying, takes me around 15 minutes to sort out every time I need to restart the VM (which, for unrelated reasons, is quite often, pretty much once a day).

@ghost
Copy link

ghost commented Oct 13, 2019

Still an issue for me too.

@ghost
Copy link

ghost commented Jan 28, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jan 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests