Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket_vmnet failing on M1 (start(): vmnet_return_t VMNET_FAILURE) #7

Open
jandubois opened this issue Sep 28, 2022 · 7 comments
Open

Comments

@jandubois
Copy link
Member

I've now observed the error from lima-vm/lima#1049 two more times (qemu failing to start up because fd_connect throws an error). Both times have been on an M1 mini; I cannot remember if the bug report on the lima repo was also based on a failure on M1, or if it was Intel.

Unfortunately I've been running with lima 0.12.0, which doesn't have the error reporting fix. However, I can see errors in the daemon logs (after qemu failed):

jan@zilicon _networks % cat rancher-desktop-shared_socket_vmnet.stderr.log
start(): vmnet_return_t VMNET_FAILURE
start: Undefined error: 0
jan@zilicon _networks % cat rancher-desktop-shared_socket_vmnet.stdout.log
Initializing vmnet.framework (mode 1001)
jan@zilicon _networks % cat rancher-desktop-bridged_en0_socket_vmnet.stderr.log
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0

The bridged network was running, but the shared network was not.

The only way I found to get things working again was by rebooting the machine.

jandubois added a commit to rancher-sandbox/lima-and-qemu that referenced this issue Sep 28, 2022
to get the error message when `fd_connect` fails (to help debugging
lima-vm/socket_vmnet#7).

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
@AkihiroSuda AkihiroSuda changed the title socket_vmnet failing on M1 socket_vmnet failing on M1 (start(): vmnet_return_t VMNET_FAILURE) Oct 3, 2022
@AkihiroSuda
Copy link
Member

Does this error happen with vde_vmnet too?
The vmnet code are almost unchanged from vde_vmnet.

@jandubois
Copy link
Member Author

Does this error happen with vde_vmnet too?

It is possible, but I haven't seen it. One difference is that with socket_vmnet the failure is catastrophic: qemu will not start the VM. With vde_vmnet you would just not get an IP address on the interface, so you might not notice unless you use the external IP address for ingress.

We have seen on Rancher Desktop that some users don't get an IP address in specific environments, but have never been able to determine the reason for it. Maybe it is related, but I don't know. We detect this and configure flannel with the SLIRP interface when that happens, so things are still working with reduced functionality in that case.

@jandubois
Copy link
Member Author

It is possible, but I haven't seen it.

All the failures I've seen last week were on a remote M1 mini that is running inside the Vancouver office, so it is a different environment from what I regularly use. However, the failures were not immediate, or frequent, but just once a day after restarting VMs (and daemons) multiple times. The machine was running Big Sur, whereas my regular Intel machine is running Catalina.

@medyagh
Copy link

medyagh commented Nov 15, 2022

i have also noticed this, changing my location and (different wifi) have caused problems that I was able to fix only by uninstalling and rebooting and installing.

@mprimeaux
Copy link

mprimeaux commented Jan 12, 2023

A bit more information is I can confirm my DHCP 'server' is allocating the DHCP address to socket_vmnet as I receive a 'new device detected' alert from my firewall.

Tailing the stderr shows the same errors as reported by @jandubois.

on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0

I'm running macOS Ventura 13.1 (22C65).

@ProjectJYL
Copy link

ProjectJYL commented Jul 18, 2023

I ran into a similar issue. With socket mode instead of shared mode because the socket_vmnet is "unmanaged" meaning it's started or stopped by brew services. First time starting VMs for the day worked fine. After a couple of minutes, the VM network went into unreachable state. Was not able to start the VM after it's stopped.

ha.stderr.log

{"level":"debug","msg":"QEMU version 8.0.2 detected","time":"2023-07-18T13:28:12-04:00"}
{"level":"debug","msg":"firmware candidates = [/Users/jylee/.local/share/qemu/edk2-aarch64-code.fd /opt/homebrew/share/qemu/edk2-aarch64-code.fd /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/qemu-efi-aarch64/QEMU_EFI.fd]","time":"2023-07-18T13:28:12-04:00"}
{"level":"fatal","msg":"template: :1:21: executing \"\" at \u003cfd_connect \"/opt/homebrew/var/run/socket_vmnet\"\u003e: error calling fd_connect: fd_connect: dial unix /opt/homebrew/var/run/socket_vmnet: connect: connection refused","time":"2023-07-18T13:28:12-04:00"}

The socket_vmnet service itself shows

% sudo brew services list
Name         Status     User File
socket_vmnet error  256 root /Library/LaunchDaemons/homebrew.mxcl.socket_vmnet.plist
unbound      none

and /opt/homebrew/var/log/socket_vmnet/stderr shows some iterations of these logs

vmnet_write: Bad file descriptor
writev: Bad file descriptor
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Broken pipe

To restore the network, I had to restart the socket_vmnet service and all the VMs. After a while, this problem repeats. Is there any other workaround to this?

By the way, this doesn't just happen on "socket" mode in case you're wondering. It happened on "shared" mode where socket_vmnet is managed by lima.

I have M1 Macbook Pro on MacOS Ventura 13.4.1. socket_vmnet 1.1.2.

@saintdle
Copy link

I'm seeing the same behaviour on Mac OS X 13.4.1(c) M2 - socket_vmnet 1.1.2 - lima 0.16.0

I can build/start a new VM, soon as I stop it, I see the same behaviour as described here with the same stderr outputs. The difference here is that I don't see an error on the service, as it's not running I can't restart it.

sudo brew services
Name         Status User File
socket_vmnet none   

Only fix I've found so far, is to uninstall socket_vmnet and reinstall it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants