Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Kernel panic in macOS causing hard system hang #93

Closed
mborgerson opened this issue Sep 6, 2018 · 19 comments
Closed

Kernel panic in macOS causing hard system hang #93

mborgerson opened this issue Sep 6, 2018 · 19 comments
Labels

Comments

@mborgerson
Copy link
Contributor

mborgerson commented Sep 6, 2018

Currently trying to emulate an x86-based system using QEMU with HAXM on macOS. When using the HAXM accelerator, I'm able to cause a reproducible kernel panic which results in an entire system hard reset (making debugging a significant challenge). Appears to be related to the test instruction, or emulation thereof. Thanks!

Thu Sep  6 01:04:50 2018

*** Panic Report ***
panic(cpu 2 caller 0xffffff80267fd245): Kernel trap at 0x0000000000000000, type 14=page fault, registers:
CR0: 0x0000000080010033, CR2: 0x0000000000000000, CR3: 0x00000003e1de403b, CR4: 0x00000000001626e0
RAX: 0x0000000000000000, RBX: 0xffffff805256c000, RCX: 0x0000000000000000, RDX: 0x0000000000000004
RSP: 0xffffff9214523a48, RBP: 0xffffff9214523ac0, RSI: 0xffffff805256c770, RDI: 0xffffff805256c6a8
R8:  0xffffff80531a0dd8, R9:  0xffffff8026e7eff0, R10: 0x0000000000000000, R11: 0xffffff9214523ae0
R12: 0xffffff805256c6a8, R13: 0x00000000ffffffed, R14: 0xffffff92140a1000, R15: 0xffffff805256c748
RFL: 0x0000000000010246, RIP: 0x0000000000000000, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000000, Error code: 0x0000000000000010, Fault CPU: 0x2, PL: 0, VF: 0

Backtrace (CPU 2), Frame : Return Address
0xffffff92145236d0 : 0xffffff80266e756c 
0xffffff9214523750 : 0xffffff80267fd245 
0xffffff9214523930 : 0xffffff80266985a3 
0xffffff9214523950 : 0x0 
0xffffff9214523ac0 : 0xffffff7fa9eebe24 
0xffffff9214523af0 : 0xffffff7fa9ee683e 
0xffffff9214523b60 : 0xffffff8026952c43 
0xffffff9214523b90 : 0xffffff80269480ee 
0xffffff9214523c10 : 0xffffff8026939874 
0xffffff9214523e10 : 0xffffff8026b50e3b 
0xffffff9214523e40 : 0xffffff8026b9b943 
0xffffff9214523f50 : 0xffffff8026c234d5 
0xffffff9214523fb0 : 0xffffff8026698d96 
      Kernel Extensions in backtrace:
         com.intel.kext.intelhaxm(7.3)[65C423BE-0315-30BD-8DA8-CABFE8487E10]@0xffffff7fa9ee5000->0xffffff7fa9f0cfff

BSD process name corresponding to current thread: qemu-system-x86_

Mac OS version:
16G1510

Kernel version:
Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018;
@raphaelning
Copy link
Contributor

Thanks for the report. I was able to correlate two addresses from the call trace with the disassembly of HAXM 7.3.0 KEXT (otool -tVj intelhaxm.kext/Contents/MacOS/intelhaxm). It seems the crash happens in em_emulate_insn(), so the bug is indeed related to instruction emulation.

Is it possible for us to reproduce the issue? Alternatively, if you have the test instruction in assembly or machine code form, that would also be very helpful.

@raphaelning raphaelning added the bug label Sep 6, 2018
@mborgerson
Copy link
Contributor Author

Thank you for the fast response! The way I've been debugging this is to add instruction decodes around VM exits1. Because the crash brings my entire system down (along with my logging mechanisms), it is difficult to give a precise answer. However, I've crudely resorted to using my cellphone to capture a video of my console log just before it crashes so I did manage to get some data. The culprit appears to be:

test %esi, 0xfec00130

The address decodes to an AC97 MMIO register in my QEMU system.

1: I was previously using version 7.0.0 which also has issues with test instruction emulation. I understand that recently some better instruction emulation was added so I upgraded and this is what I'm running into now.

raphaelning added a commit that referenced this issue Sep 7, 2018
When decoding an instruction with an unsupported opcode (indicated
by the INSN_NOTIMPL flag), em_decode_insn() does not fail, which
can lead to a disaster in em_emulate_insn(), e.g. calling an
invalid handler function (soft_handler == NULL) and causing a host
kernel panic (#93).

1. As soon as the opcode is decoded, check if it is unsupported.
   If so, return a fatal error, raise a vCPU panic, and log the
   raw bytes of the instruction.
2. In em_emulate_insn(), make sure soft_handler is valid before
   calling it.
3. Before decoding a new instruction, reset the emulation context,
   so the old context is not accidentally referred to.
4. Add a unit test for the unsupported opcode case. This requires
   refactoring EmulatorTest::run() first.

Signed-off-by: Yu Ning <yu.ning@intel.com>
@raphaelning
Copy link
Contributor

Thanks for providing the crucial piece of information! This test instruction has the TEST r/m32, r32 (Intel syntax) format, so its opcode is 0x85. It turns out that this opcode is not implemented by core/emulate.c yet. So there are two issues we need to fix here:

a) Any unsupported MMIO instruction crashes the host.
b) HAXM instruction emulator does not support opcode 0x85.

I've fixed a) with #95. Attached is a signed intelhaxm.kext haxm-unimpl-opcode-fix.tar.gz built with this patch. If you want to test it:

  1. Start Console.app to capture HAXM logs.
  2. sudo tar xpf haxm-unimpl-opcode-fix.tar.gz
  3. sudo kextunload /Library/Extensions/intelhaxm.kext
  4. sudo kextutil intelhaxm.kext

This time, hopefully you'll get a guest crash instead of a host crash, and HAXM should report em_decode_insn() failed.

@billtlee
Copy link

billtlee commented Sep 8, 2018

I will give it a try. So, your goal is for it to capture the correct crash info for you to further debug?

@billtlee
Copy link

billtlee commented Sep 8, 2018

I tried to install it but got:

Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/7BC8F971-5866-4328-8506-3AE52F5CE475.kext
Bundle (/System/Library/Extensions/HuaweiDataCardDriver.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/7BC8F971-5866-4328-8506-3AE52F5CE475.kext
Unable to stage kext (/System/Library/Extensions/HuaweiDataCardDriver.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-2147409652) denied: /Library/StagedExtensions/System/Library/Extensions/D3F1BC9E-DF6F-4EDF-9DCB-1AA660F81FD5.kext
Bundle (/System/Library/Extensions/Pen Tablet.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/D3F1BC9E-DF6F-4EDF-9DCB-1AA660F81FD5.kext
Unable to stage kext (/System/Library/Extensions/Pen Tablet.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/93DC5BA7-765F-47E2-B262-DD9560C0B7D3.kext
Bundle (/System/Library/Extensions/USBExpressCardCantWake.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/93DC5BA7-765F-47E2-B262-DD9560C0B7D3.kext
Unable to stage kext (/System/Library/Extensions/USBExpressCardCantWake.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/193756DD-BF4A-47C6-90A0-397B43392B75.kext
Bundle (/System/Library/Extensions/OnKeyHIDOverrideDriver.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/193756DD-BF4A-47C6-90A0-397B43392B75.kext
Unable to stage kext (/System/Library/Extensions/OnKeyHIDOverrideDriver.kext) to secure location.

@raphaelning
Copy link
Contributor

So, your goal is for it to capture the correct crash info for you to further debug?

Ah, not really. It's actually pretty easy to implement opcode 0x85, thanks to the scalable instruction emulator framework we have now. I just need to write a unit test for it before uploading the patch. But I didn't want to fix both issues in one shot, because then we would have no easy way to verify my fix for the host crash bug, unless your guest OS also happened to use another unsupported MMIO instruction.

It's weird that the KEXT load error you got does not mention intelhaxm.kext at all. I'll see if I can reproduce this problem on another Mac.

raphaelning added a commit that referenced this issue Sep 10, 2018
When decoding an instruction with an unsupported opcode (indicated
by the INSN_NOTIMPL flag), em_decode_insn() does not fail, which
can lead to a disaster in em_emulate_insn(), e.g. calling an
invalid handler function (soft_handler == NULL) and causing a host
kernel panic (#93).

1. In em_decode_insn(), check if it is unsupported, i.e. either
   the INSN_NOTIMPL flag is set or there is no emulation handler.
   If so, return a fatal error, raise a vCPU panic, and log the
   raw bytes of the instruction.
2. Before decoding a new instruction, reset the emulation context,
   so the old context is not accidentally referred to.
3. Add unit tests for some unsupported opcode cases. This requires
   refactoring EmulatorTest::run() first.

Signed-off-by: Yu Ning <yu.ning@intel.com>
raphaelning added a commit that referenced this issue Sep 10, 2018
Thanks to the scalable emulator design, implementing opcode 0x84
and 0x85 (TEST r/mN, rN) is quite simple. In addition, add a unit
test for the TEST instruction, and update one of the unimplemented
opcode unit tests to use XCHG instead of TEST.

Fixes #93.

Signed-off-by: Yu Ning <yu.ning@intel.com>
raphaelning added a commit that referenced this issue Sep 10, 2018
When decoding an instruction with an unsupported opcode (indicated
by the INSN_NOTIMPL flag), em_decode_insn() does not fail, which
can lead to a disaster in em_emulate_insn(), e.g. calling an
invalid handler function (soft_handler == NULL) and causing a host
kernel panic (#93).

1. In em_decode_insn(), check if the opcode is unsupported, i.e.
   the INSN_NOTIMPL flag is set or there is no emulation handler.
   If so, return a fatal error, raise a vCPU panic, and log the
   raw bytes of the instruction.
2. Before decoding a new instruction, reset the emulation context,
   so the old context is not accidentally referred to.
3. Add unit tests for two unsupported opcode cases. This requires
   refactoring EmulatorTest::run() first.

Signed-off-by: Yu Ning <yu.ning@intel.com>
raphaelning added a commit that referenced this issue Sep 10, 2018
Thanks to the scalable emulator design, implementing opcode 0x84
and 0x85 (TEST r/mN, rN) is quite simple. In addition, add a unit
test for the TEST instruction, and update one of the unimplemented
opcode unit tests to use XCHG instead of TEST.

Fixes #93.

Signed-off-by: Yu Ning <yu.ning@intel.com>
raphaelning added a commit that referenced this issue Sep 10, 2018
When decoding an instruction with an unsupported opcode (indicated
by the INSN_NOTIMPL flag), em_decode_insn() does not fail, which
can lead to a disaster in em_emulate_insn(), e.g. calling an
invalid handler function (soft_handler == NULL) and causing a host
kernel panic (#93).

1. In em_decode_insn(), check if the opcode is unsupported, i.e.
   the INSN_NOTIMPL flag is set or there is no emulation handler.
   If so, return a fatal error, raise a vCPU panic, and log the
   raw bytes of the instruction.
2. Before decoding a new instruction, reset the emulation context,
   so the old context is not accidentally referred to.
3. Add unit tests for two unsupported opcode cases. This requires
   refactoring EmulatorTest::run() first.

Signed-off-by: Yu Ning <yu.ning@intel.com>
@raphaelning
Copy link
Contributor

raphaelning commented Sep 10, 2018

I'll see if I can reproduce this problem on another Mac.

I was able to load the KEXT on another MacBook, so the KEXT and the install instructions I provided should be fine, although I could add a few details. Since I've updated PR #95, let me update the signed KEXT as well:

i. haxm-PR95-patch1.tar.gz This only includes (the latest revision of) my first patch. Therefore, it doesn't implement opcode 0x85, but should at least avoid the host kernel panic.
ii. haxm-PR95-patch2.tar.gz This includes both patches. Therefore, it should emulate opcode 0x85 correctly.

More detailed instructions on testing these KEXTs:

  1. Start Console.app. In the Search box, type hax and ENTER. Select Action > Include Info Messages.
  2. Unload the official intelhaxm.kext (e.g. 7.3.0): sudo kextunload /Library/Extensions/intelhaxm.kext
  3. Create a new folder for the first KEXT.
  4. cd into that folder.
  5. sudo tar xpf /path/to/haxm-PR95-patch1.tar.gz
  6. Load the first KEXT using either kextload or kextutil: sudo kextload ./intelhaxm.kext
  7. Run QEMU (from another shell).
  8. Unload the first KEXT: sudo kextunload ./intelhaxm.kext
  9. Repeat steps 3-8 for the second KEXT.
  10. Restore the official intelhaxm.kext (e.g. 7.3.0): sudo kextload /Library/Extensions/intelhaxm.kext

@mborgerson Could you try these steps?

@billtlee I'm still not sure why you got those "invalid signature" errors. It's possible that our intelhaxm.kext was loaded successfully despite those errors. If you run kextstat | grep intelhaxm and get something like:

  181    0 0xffffff7f85c75000 0x3a000    0x3a000    com.intel.kext.intelhaxm (1) 54AF3CA4-09E9-385E-BC52-0AF3FAD7EB66 <7 5 4 3 1>

then the test KEXT is indeed loaded. Note that the (1) after com.intel.kext.intelhaxm indicates that this is a test KEXT. If you load intelhaxm.kext from an official HAXM release, you'll see a valid version number instead.

@billtlee
Copy link

I don't know why I can't load the kernel...

Bills-MacBook-Pro:Downloads Bill$ sudo kextutil intelhaxm.kext
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/B75BCD5E-256F-41C7-BA81-726EA5804284.kext
Bundle (/System/Library/Extensions/HuaweiDataCardDriver.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/B75BCD5E-256F-41C7-BA81-726EA5804284.kext
Unable to stage kext (/System/Library/Extensions/HuaweiDataCardDriver.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-2147409652) denied: /Library/StagedExtensions/System/Library/Extensions/C76505B4-46D5-43D2-A5D8-6CC32BF00429.kext
Bundle (/System/Library/Extensions/Pen Tablet.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/C76505B4-46D5-43D2-A5D8-6CC32BF00429.kext
Unable to stage kext (/System/Library/Extensions/Pen Tablet.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/8E17B261-07D5-44CD-949C-1F508664F1FC.kext
Bundle (/System/Library/Extensions/USBExpressCardCantWake.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/8E17B261-07D5-44CD-949C-1F508664F1FC.kext
Unable to stage kext (/System/Library/Extensions/USBExpressCardCantWake.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/DDF009D5-9819-4327-BD5E-43942B6F5358.kext
Bundle (/System/Library/Extensions/OnKeyHIDOverrideDriver.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/DDF009D5-9819-4327-BD5E-43942B6F5358.kext
Unable to stage kext (/System/Library/Extensions/OnKeyHIDOverrideDriver.kext) to secure location.
Kext rejected due to improper filesystem permissions: <OSKext 0x7fa8140ebde0 [0x7fff9c758af0]> { URL = "file:///Library/StagedExtensions/Users/Bill/Downloads/intelhaxm.kext/", ID = "com.intel.kext.intelhaxm" }
Diagnostics for intelhaxm.kext:
Authentication Failures:
File owner/permissions are incorrect (must be root:wheel, nonwritable by group/other):
/Library/StagedExtensions/Users/Bill/Downloads/intelhaxm.kext
Contents
_CodeSignature
CodeResources
MacOS
intelhaxm
Resources
English.lproj
InfoPlist.strings
readme
Info.plist

Bills-MacBook-Pro:Downloads Bill$ kextstat | grep intelhaxm
Bills-MacBook-Pro:Downloads Bill$

@raphaelning
Copy link
Contributor

raphaelning commented Sep 10, 2018

File owner/permissions are incorrect (must be root:wheel, nonwritable by group/other):
/Library/StagedExtensions/Users/Bill/Downloads/intelhaxm.kext

OK, so this is the real error. Could you show me the output of:

ls -l /Library/StagedExtensions/Users/Bill/Downloads/intelhaxm.kext

My guess is that intelhaxm.kext is not owned by root:wheel, probably because you used Finder to extract the .tar.gz. To fix that, please make sure to extract the .tar.gz from the command line using the exact command I give below:

  1. cd ~/Downloads/
  2. mkdir haxm-PR95-patch1
  3. cd haxm-PR95-patch1/
  4. sudo tar xpf ~/Downloads/haxm-PR95-patch1.tar.gz

In particular, sudo and the p (preserve) flag for tar are both very important. Now, if you run ls -l ~/Downloads/haxm-PR95-patch1/intelhaxm.kext/, you should see that it's owned by root:wheel.

@billtlee
Copy link

Bills-MacBook-Pro:Downloads Bill$ sudo tar xpf haxm-PR95-patch2.tar.gz
Bills-MacBook-Pro:Downloads Bill$ sudo kextunload ./intelhaxm.kext
(kernel) Kext com.intel.kext.intelhaxm not found for unload request.
Failed to unload com.intel.kext.intelhaxm - (libkern/kext) not found.
Bills-MacBook-Pro:Downloads Bill$ sudo kextload ./intelhaxm.kext
/Users/Bill/Downloads/intelhaxm.kext failed to load - (libkern/kext) authentication failure (file ownership/permissions); check the system/kernel logs for errors or try kextutil(8).
Bills-MacBook-Pro:Downloads Bill$

@billtlee
Copy link

Bills-MacBook-Pro:Downloads Bill$ ls -l intelhaxm.kext
total 0
drwxr-xr-x@ 6 root wheel 192 Sep 10 16:25 Contents
Bills-MacBook-Pro:Downloads Bill$

@billtlee
Copy link

I'm on high sierra, does that matter? I read high sierra blocks 3rd party kernel's

@raphaelning
Copy link
Contributor

I'm also on High Sierra, so that's not the problem.

I think macOS caches the KEXT you've failed to load in /Library/StagedExtensions/, and may just ignore any change you make to the KEXT (e.g. ownership) as long as the KEXT remains in the same place (in this case ~/Downloads/intelhaxm.kext). That's why I recommended that you create a new folder first, and re-extract the .tar.gz there. Could you try to stick to the exact commands I gave you above?

@billtlee
Copy link

Bills-MacBook-Pro:Downloads Bill$ mkdir temp
Bills-MacBook-Pro:Downloads Bill$ mv haxm-PR95-patch2.tar.gz temp
Bills-MacBook-Pro:Downloads Bill$ cd temp
Bills-MacBook-Pro:temp Bill$ sudo tar xpf haxm-PR95-patch2.tar.gz
Password:
Bills-MacBook-Pro:temp Bill$ ls -l intelhaxm.kext
total 0
drwxr-xr-x@ 6 root wheel 192 Sep 10 16:25 Contents
Bills-MacBook-Pro:temp Bill$ sudo kextutil intelhaxm.kext
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/45C8AC01-B092-42CC-9602-0E0212CA4FE4.kext
Bundle (/System/Library/Extensions/HuaweiDataCardDriver.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/45C8AC01-B092-42CC-9602-0E0212CA4FE4.kext
Unable to stage kext (/System/Library/Extensions/HuaweiDataCardDriver.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-2147409652) denied: /Library/StagedExtensions/System/Library/Extensions/D6AEE54F-F0FB-48CE-8E59-F74140FDF8D8.kext
Bundle (/System/Library/Extensions/Pen Tablet.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/D6AEE54F-F0FB-48CE-8E59-F74140FDF8D8.kext
Unable to stage kext (/System/Library/Extensions/Pen Tablet.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/3D68E2B4-01EA-413D-8080-0B7DF8948CD2.kext
Bundle (/System/Library/Extensions/USBExpressCardCantWake.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/3D68E2B4-01EA-413D-8080-0B7DF8948CD2.kext
Unable to stage kext (/System/Library/Extensions/USBExpressCardCantWake.kext) to secure location.
Untrusted kexts are not allowed
Kext with invalid signature (-67062) denied: /Library/StagedExtensions/System/Library/Extensions/BC903C55-ED53-435A-A417-7F98402B7C49.kext
Bundle (/System/Library/Extensions/OnKeyHIDOverrideDriver.kext) failed to validate, deleting: /Library/StagedExtensions/System/Library/Extensions/BC903C55-ED53-435A-A417-7F98402B7C49.kext
Unable to stage kext (/System/Library/Extensions/OnKeyHIDOverrideDriver.kext) to secure location.
Bills-MacBook-Pro:temp Bill$ kextstat | grep intelhaxm
172 0 0xffffff7f84499000 0x3a000 0x3a000 com.intel.kext.intelhaxm (1) 528B3049-007C-3B7C-A838-C117FBCF7537 <7 5 4 3 1>

Does this mean it successfully installed?

@raphaelning
Copy link
Contributor

Yes it does, congratulations :) If you had started Console.app and prepared it for capturing HAXM logs before the kextutil command, you should now see some output there.

@billtlee
Copy link

I only see the following in the Library/Extensions folder though:

ACS6x.kext
ArcMSR.kext
ATTOCelerityFC8.kext
ATTOExpressSASHBA2.kext
ATTOExpressSASRAID2.kext
BJUSBLoad.kext
CalDigitHDProDrv.kext
CIJUSBLoad.kext
EPSONUSBPrintClass.kext
HighPointIOP.kext
HighPointRR.kext
PromiseSTEX.kext
SoftRAID.kext

@raphaelning
Copy link
Contributor

I believe kextutil/kextload installs the KEXT in /Library/StagedExtensions/, only for this session (meaning that the KEXT won't be automatically loaded after a reboot). Do you really want it to be installed in /Library/Extensions/ and loaded at system startup? I thought the purpose was just to test this KEXT and see if it fixes the kernel panic issue that @mborgerson reported. Actually, do you have the setup to reproduce that issue?

@billtlee
Copy link

I have been experiencing consistent kernel panic crashes and apple support suggested i remove intelhaxm from library/extensions. After I removed it, crashes appear to have stopped. Stumbled upon this post. Wanted to see if the new versions address my crash since I need to use android emulator... Not 100% sure my issue is the same as mborgerson's.

@raphaelning
Copy link
Contributor

Ah, I just saw your comment on #88. Let's discuss your issue there.

@wcwang wcwang closed this as completed in e836110 Sep 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants