Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang with QEMU 8.1.0 (after showing Linux agpgart interface v0.103) #1758

Closed
dustinsoftware opened this issue Aug 24, 2023 · 13 comments
Closed

Comments

@dustinsoftware
Copy link

Description

This morning I noticed that homebrew updated both lima and qemu. After the update, my x86 VM would no longer boot. I had to downgrade QEMU to 8.0.4 manually to work around the issue.

My environment

  • lima 0.17.2
  • qemu 8.10.0
  • M1 Max Ventura 13.5
  • Homebrew 4.1.6

lima.yml:

arch: "x86_64"

images:
  - location: "https://cloud-images.ubuntu.com/releases/20.04/release/ubuntu-20.04-server-cloudimg-amd64.img"

mounts:
  - location: "~"
  - location: "/tmp/lima"
    writable: true

containerd:
  system: false
  user: false

Issue Repro

➜ limactl start sqlserver
INFO[0000] Using the existing instance "sqlserver"
INFO[0000] [hostagent] Starting QEMU (hint: to watch the boot progress, see "/Users/dustin.masters/.lima/sqlserver/serial.log")
INFO[0000] SSH Local Port: 51020
INFO[0000] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0085] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0170] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0255] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0340] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0425] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
➜  ~ tail /Users/dustin.masters/.lima/sqlserver/serial.log
[    1.654562] Console: switching to colour frame buffer device 160x50
[    1.656358] fb0: EFI VGA frame buffer device
[    1.656358] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    1.656358] ACPI: Power Button [PWRF]
[    1.672642] PCI Interrupt Link [GSIF] enabled at IRQ 21
[    1.673868] PCI Interrupt Link [GSIG] enabled at IRQ 22
[    1.673868] PCI Interrupt Link [GSIE] enabled at IRQ 20
[    1.685615] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    1.689453] 00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    1.723694] Linux agpgart interface v0.103 (hangs here forever)

Workaround:

Manually link qemu@8.0.4

  • curl https://raw.githubusercontent.com/Homebrew/homebrew-core/05d02418e00ef6e9af79018e3655536063f68ab2/Formula/q/qemu.rb -o qemu.rb
  • brew unlink qemu
  • brew install qemu.rb . You should see a warning about qemu being out of date, but it will install.

I do not know why updating qemu from 8.0.4 to 8.1.0 caused this issue. I have tried upgrading / downgrading lima while leaving qemu alone and it made no difference. Only by downgrading qemu was I able to get un-stuck.

@damienmckenna
Copy link

I'm hitting this too.

My environment:

  • 2021 MacBook Pro with M1 Max
  • OS release: Ventura 13.5.1

Some relevant lines from ha.sterr.log:

{"level":"info","msg":"Waiting for the essential requirement 1 of 3: \"ssh\"","time":"2023-08-24T13:57:48-04:00"}
{"level":"debug","msg":"executing script \"ssh\"","time":"2023-08-24T13:57:48-04:00"}
{"level":"info","msg":"[VZ] - vm state change: running","time":"2023-08-24T13:57:48-04:00"}
{"level":"debug","msg":"executing ssh for script \"ssh\": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile=\"/Users/dmckenna/.lima/_config/user\" -o IdentityFile=\"/Users/dmckenna/.ssh/id_rsa\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^aes128-gcm@openssh.com,aes256-gcm@openssh.com\" -o User=dmckenna -o ControlMaster=auto -o ControlPath=\"/Users/dmckenna/.lima/colima/ssh.sock\" -o ControlPersist=5m -p 49853 127.0.0.1 -- /bin/bash]","time":"2023-08-24T13:57:48-04:00"}
{"level":"debug","msg":"stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 49853: Connection refused\\r\\n\", err=failed to execute script \"ssh\": stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 49853: Connection refused\\r\\n\": exit status 255","time":"2023-08-24T13:57:48-04:00"}

Going to revert to Qemu 8.0.4 per the workaround.

@AkihiroSuda
Copy link
Member

arch: "x86_64"

Is this specific to Intel-on-ARM?
Also, does non-Ubuntu image work?

@AkihiroSuda AkihiroSuda changed the title Hang with QEMU 8.1.0 Hang with QEMU 8.1.0 (after showing Linux agpgart interface v0.103) Aug 25, 2023
@AkihiroSuda
Copy link
Member

{"level":"info","msg":"[VZ] - vm state change: running","time":"2023-08-24T13:57:48-04:00"}

Seems different from OP.
Please open another issue.
limactl start --video maybe useful to see how it is hanging.

@damienmckenna
Copy link

You are correct, after a little more looking I realized it was slightly different, though related because we're both having problems with the ssh part and we're both running Ventura. There are a number of others who have reported similar problems both here and on abiosoft/colima.

@rmcveigh
Copy link

Using the downgrade workaround also resolved the issue for me on a MacOS Intel i9.

@dagstuan
Copy link

dagstuan commented Sep 5, 2023

Noticed the same thing. Though using lima via colima.

I'm getting some CPU-related errors in serial.log during startup.

[   60.979262] 	(detected by 1, t=6005 jiffies, g=-1171, q=1981 ncpus=2)
[   60.982317] Sending NMI from CPU 1 to CPUs 0:
[   11.583693] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xb/0x10
[   11.583693] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.006 msecs
[   60.982317] rcu: rcu_preempt kthread timer wakeup didn't happen for 6004 jiffies! g-1171 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[   60.982317] rcu: 	Possible timer handling issue on cpu=0 timer-softirq=15
[   60.982317] rcu: rcu_preempt kthread starved for 6005 jiffies! g-1171 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[   60.982317] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[   60.982317] rcu: RCU grace-period kthread stack dump:
[   60.982317] task:rcu_preempt     state:I stack:0     pid:15    ppid:2      flags:0x00004000```

@AkihiroSuda
Copy link
Member

If this is a regression in QEMU v8.1, please report to QEMU: https://gitlab.com/qemu-project/qemu/-/issues

@dagstuan
Copy link

dagstuan commented Sep 5, 2023

@AkihiroSuda Ok thanks. Reported to qemu: https://gitlab.com/qemu-project/qemu/-/issues/1864

@AkihiroSuda
Copy link
Member

brew install --HEAD qemu may work
https://gitlab.com/qemu-project/qemu/-/issues/1864#note_1543993006

@dustinsoftware
Copy link
Author

Yes, that did the trick. FYI for others on this thread: building qemu from source takes a while and clones a ton of dependencies.

➜  ~ brew unlink qemu
➜  ~ brew install --HEAD qemu
==> Installing qemu --HEAD
==> ./configure --cc=clang --host-cc=clang --disable-bsd-user --disable-guest-agent --enable-slirp --enable-capstone --enable-curses --enable
==> make V=1 install
🍺  /opt/homebrew/Cellar/qemu/HEAD-17780ed_2: 162 files, 533.3MB, built in 6 minutes 8 seconds

@AkihiroSuda
Copy link
Member

The patch is now cherry-picked to qemu 8.1.0_3
Homebrew/homebrew-core@6f7d27f

@dagstuan
Copy link

dagstuan commented Sep 6, 2023

Thank you! qemu 8.1.0_3 works fine on two arm-based machines here 👍

@erikpragt-connectid
Copy link

I ran into the same problem, except I was using qemu 8.2.1 and colima 0.6.8. I tried installing qemu 8.1.0_3, but this doesn't seem to be available anymore.

I got it to work however by:

  • unlinking qemu (brew unlink qemu)
  • installing it from HEAD (brew install --HEAD qemu)
  • then starting colima in x86_64 mode (colima start --arch x86_64)
  • this took a while, I tailed the logs (tail -f ~/.colima/_lima/colima/serial*.log)

This output ended with:

[FAILED] Failed to start cloud-fina… Execute cloud user/final scripts.
See 'systemctl status cloud-final.service' for details.
[  OK  ] Reached target cloud-init.target - Cloud-init target.

But after this, colima was running (and I could run oracle-free on my Mac M1 Pro)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants