Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSOD SYSTEM_SERVICE_EXCEPTION gvm.sys with qemu #14

Open
thesword53 opened this issue Jun 4, 2020 · 36 comments
Open

BSOD SYSTEM_SERVICE_EXCEPTION gvm.sys with qemu #14

thesword53 opened this issue Jun 4, 2020 · 36 comments

Comments

@thesword53
Copy link

Host system gets BSOD when guest (Windows 7) also gets BSOD or during boot.


Systems tested:

cpu: AMD Ryzen 7 3700X
host: Windows 10 Pro
gest: Windows 7 Ultimate


cpu: Intel Core i7-4810MQ
host: Arch Linux (KVM with nested virtualization)
guest1: Windows 7 Ultimate (with gvm installed)
guest2: Windows 7 Ultimate

20200601_155035

@Taogle2018
Copy link
Collaborator

What is the hypervisor used in AMD Ryzen Win10 Pro?

@Taogle2018
Copy link
Collaborator

AND for the Intel case, how can you run gvm on Intel with Android Emulator?

@thesword53
Copy link
Author

What is the hypervisor used in AMD Ryzen Win10 Pro?

gvm

AND for the Intel case, how can you run gvm on Intel with Android Emulator?

I didn't use Android Emulator, I used qemu with gvm acceleration: https://github.com/qemu-gvm/qemu-gvm #5 (qemu-system-x86_64 -accel gvm ...) with Windows 7 as guest.

@Taogle2018
Copy link
Collaborator

OK. I just realized that you are not using Android Emulator.
Thanks for bug report. I myself only tried Ubuntu 18.04 when using gvm as a generic solution. I tried to install Windows 10 but guest hangs. Using this as a generic hypervisor is possible but I did not have much time working on that. It is not on the project plan yet.
I will still try to see if I can fix this. However, please do not set any expectation on when. :)

@thesword53
Copy link
Author

thesword53 commented Jun 6, 2020

Windows 7 and Windows 10 don't work with SeaBIOS. I have to use OVMF UEFI.
Your GVM hypervisor works better than WHPX on qemu, because I am not able to boot Windows 7 at all with WHPX.

@Taogle2018
Copy link
Collaborator

Thanks for the tips. I tried UEFI and now I could install Win7 and Win10. Your information helped me a lot. Here is my result.
My system: Ryzen 2700, Host Win10 2004 Pro, Guest Win7 SP1 Ultimate. I did a fresh install and Win7 booted normally.
Any special operations that can triggered the BSOD?

@thesword53
Copy link
Author

thesword53 commented Jun 8, 2020

Any special operations that can triggered the BSOD?

Boot Windows 7 VM and trigger BSOD on guest (kill csrss.exe process for example). Your host will also get a BSOD.

@Taogle2018
Copy link
Collaborator

I tried but I could not reproduce. When I triggered a crash using NotMyFault from sysinternals, the guest got a crashdump and rebooted. The host is not impacted. It is weird that the BSOD screen does not show inside the guest so it will look like a hang.
I am wondering if there is a way to share your crahdump with me?

@thesword53
Copy link
Author

Here is the crashdump: https://drive.google.com/file/d/1Rrh4qH_-ki1PGLU-DVvajkUsNADPN4OA/view?usp=sharing

The host is not impacted. It is weird that the BSOD screen does not show inside the guest so it will look like a hang.

You need to wait a bit and the host will crash.

@Taogle2018
Copy link
Collaborator

Thanks for the crash dump. It does look like a "use-after-free" issue. I will come back when I find out the reason.

@thesword53
Copy link
Author

I share the memory dump (~700MB) https://drive.google.com/file/d/1qTHQy2uQyN1KzqbJ4rutel9R8m8N9uzK/view?usp=sharing. I found the stack trace with WinDBG but I don't have symbol names of gvm

STACK_TEXT:
fffff880052d9520 fffff88003b035da : fffffa80080f5000 0000000000000003 0000000000000000 fffffa8007d14aa0 : gvm+0x11007
fffff880052d9580 fffff88003b09ba3 : fffffa80080f5000 0000000000186a76 0000000000000000 000000000000008e : gvm+0xf5da
fffff880052d9620 fffff88003b0538f : 000000027eeee000 0000000000000000 0000000000000001 0000000000186a76 : gvm+0x15ba3
fffff880052d9680 fffff88003b14804 : 0000000000000000 0000000000000000 0000000000000000 fffffa80080f5000 : gvm+0x1138f
fffff880052d96d0 fffff88003b167a1 : 0000000000000000 0000000000000000 0000000000000081 fffffa80080f5000 : gvm+0x20804
fffff880052d9740 fffff88003b28340 : 0000000000000000 00000000fffffffb 00000000fffffffb 0000000000002c20 : gvm+0x227a1
fffff880052d9770 fffff88003b283f0 : 0000000000000000 fffffa80080f5000 fffff880052d9b60 fffffa800879fc20 : gvm+0x34340
fffff880052d97e0 fffff88003b2433f : fffffa80080f5000 0000000000000000 fffffa80080f5150 0000000000000001 : gvm+0x343f0
fffff880052d9810 fffff88003b2c43c : fffffa80080f5000 fffff880052d9b60 0000000000000000 fffffa80080f5110 : gvm+0x3033f
fffff880052d9840 fffff88003b29171 : fffffa80080f9f20 fffff880052d9918 fffff880052d9968 fffff800028e704a : gvm+0x3843c
fffff880052d9890 fffff80002d092b5 : fffffa8007c5b3d0 fffff88002f1e180 fffffa8007c5b490 0000000000000000 : gvm+0x35171
fffff880052d98c0 fffff80002b9b5d6 : fffff8a000009b80 0000000000000000 0000000000000000 0000000000000000 : nt!IopXxxControlFile+0x6d5
fffff880052d9a00 fffff800028f2bd3 : 0000000000000000 0000000000000000 0000000000000000 0000000008e6fb20 : nt!NtDeviceIoControlFile+0x56
fffff880052d9a70 0000000076fb98fa : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x13
0000000008e6fa68 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x76fb98fa

@Taogle2018
Copy link
Collaborator

Symbols for 1.5 can be downloaded here. FYI.
https://1drv.ms/u/s!AljlID0ntVyugehHeyCgYHkiJSUAew?e=JekxoT
Thanks for sharing the dump.

@Taogle2018
Copy link
Collaborator

Taogle2018 commented Jul 6, 2020

I think #23 is probably the same issue, although I have not get that dump yet.

@thesword53
Copy link
Author

@Taogle2018
Copy link
Collaborator

https://1drv.ms/u/s!AljlID0ntVyugehxayBpYN3uOnXidw?e=ZQ1cuo
Can you try this build and see if it fixes the problem?

@thesword53
Copy link
Author

https://1drv.ms/u/s!AljlID0ntVyugehxayBpYN3uOnXidw?e=ZQ1cuo
Can you try this build and see if it fixes the problem?

I can't boot Windows 7 guest at all with this build. The "Starting Windows" shows up and the screen become black.

@Taogle2018
Copy link
Collaborator

OK. I will do another build for you, will be back later.

@Taogle2018
Copy link
Collaborator

https://1drv.ms/u/s!AljlID0ntVyugehyHXoKYGgtriDJrA?e=1lAXXh
Can you try this one? This build is exactly v1.5 + intended fix, removing any other irrelevant patches from the former build.

@thesword53
Copy link
Author

https://1drv.ms/u/s!AljlID0ntVyugehyHXoKYGgtriDJrA?e=1lAXXh
Can you try this one? This build is exactly v1.5 + intended fix, removing any other irrelevant patches from the former build.

I have the same issue with this build. The guest seems to get a BSOD but the screen is black.

@Taogle2018
Copy link
Collaborator

Thanks. It is hard to guess the reason as this is actually one line change, which should not alter guest behavior.
Let me explore more before getting back.

@Taogle2018
Copy link
Collaborator

I've tried to install and run a Windows 7 64 guest successfully with both builds. The commandline options are "-accel gvm -cpu host -m 8G -smp cores=8 -hda=win7.file -sdl".
It is weird that these builds brought a black guest for you. So let me confirm, guest is OK when using 1.5 release but turns to black screen when switching to one of these two testing builds.
If that's the case, I can build another one that is exactly the same as v1.5. This will help us to identify anything changed in my local build system. Otherwise, I really cannot think of a reason why.

@thesword53
Copy link
Author

I've tried to install and run a Windows 7 64 guest successfully with both builds. The commandline options are "-accel gvm -cpu host -m 8G -smp cores=8 -hda=win7.file -sdl".
It is weird that these builds brought a black guest for you. So let me confirm, guest is OK when using 1.5 release but turns to black screen when switching to one of these two testing builds.
If that's the case, I can build another one that is exactly the same as v1.5. This will help us to identify anything changed in my local build system. Otherwise, I really cannot think of a reason why.

This issue only happened if I use OVMF UEFI with Windows 7.

@Taogle2018
Copy link
Collaborator

I also used OVMF UEFI bios. So OVMF UEFI with Windows 7 can work with gvm v1.5, but cannot work with the two builds I sent. Right?

@thesword53
Copy link
Author

I also used OVMF UEFI bios. So OVMF UEFI with Windows 7 can work with gvm v1.5, but cannot work with the two builds I sent. Right?

Yes

@thesword53
Copy link
Author

I tested GVM 1.6 and I can't boot any Windows OS.

  • Windows XP SP3 (SeaBIOS):
    GVM internal error. Suberror: 1
    emulation failure
    EAX=80000011 EBX=00067ff2 ECX=00010080 EDX=00000001
    ESI=00061dfa EDI=00007ffa EBP=00060dcc ESP=00067ff2
    EIP=00000255 EFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
    ES =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
    CS =2000 00020000 0000ffff 00009b00 DPL=0 CS16 [-RA]
    SS =22f4 00022f40 0000ffff 00009300 DPL=0 DS16 [-WA]
    DS =22f4 00022f40 0000ffff 00009300 DPL=0 DS16 [-WA]
    FS =0030 00000300 0000ffff 00009300 DPL=0 DS16 [-WA]
    GS =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
    LDT=0000 00000000 000fffff 00000000
    TR =0028 00024470 00000077 00008b00 DPL=0 TSS32-busy
    GDT= 0003f000 000003ff
    IDT= 0003f400 000007ff
    CR0=80000011 CR2=00000000 CR3=00000000 CR4=00000000
    DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
    DR6=00000000ffff0ff0 DR7=0000000000000400
    EFER=0000000000000000
    Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  • Windows 7 (SeaBIOS): Stuck on "Windows is loading files..."

  • Windows 7 (OVMF): Black screen after "Starting Windows..." (I think it's a BSOD)

  • Windows 8 (OVMF): BSOD (system_thread_exception_not_handled) during loading screen

  • Windows 10 (OVMF): System reboots during boot screen

With gvm 1.5 I was able to start Windows 7 (OVMF only) and Windows 10

I will look for Linux guests.

@thesword53
Copy link
Author

I tested Linux (Ubuntu 16.04 and Ubuntu 19.10) and it works but I have lots of hardware errors (machine check exception) on guest.

@Taogle2018
Copy link
Collaborator

On your Intel or AMD, btw?

@thesword53
Copy link
Author

On your Intel or AMD, btw?

Intel

@Taogle2018
Copy link
Collaborator

Perhaps I should find something a similar CPU and do a test. Are you still using nested virtualization with Arch Linux?

@thesword53
Copy link
Author

Perhaps I should find something a similar CPU and do a test. Are you still using nested virtualization with Arch Linux?

Yes I am using nested virtualization with an Intel Core i7-4810MQ (Haswell). I can't test gvm with my AMD computer now because I'm not at home.

@thesword53
Copy link
Author

I tested GVM 1.6 on AMD and it works. On Intel with OVMF, I get a BSOD on guest (system_thread_exception_not_handled) with Windows 7/8/10. I think it's related to 4edc540.

@Taogle2018
Copy link
Collaborator

I feel the same too, as the change may surprise the KVM. It is hard to tell whether this exposes a KVM bug as it does work natively on my Intel. But right now, I am too busy to work on this.

@thesword53
Copy link
Author

I tried to compile GVM, and the Intel issue seems to be caused by c2693c9

@Taogle2018
Copy link
Collaborator

Taogle2018 commented Sep 22, 2020 via email

Taogle2018 added a commit that referenced this issue Sep 22, 2020
This seems to break nested run on top of KVM. See
#14

This reverts commit defbb8dbb2797ef76177ea5a188249225a0e8021.
@Taogle2018
Copy link
Collaborator

Hi, 1.7 is released and c2693c9 is reverted.

@thesword53
Copy link
Author

Hi Taogle2018,

I was wrong. c2693c9 didn't solve the issue.

I also tested an Arch Linux VM and i got kernel panic "MCA architectural violation!"
Panic occurres here: https://github.com/torvalds/linux/blob/v5.16/arch/x86/kernel/cpu/mce/core.c#L361 in ex_handler_msr_mce
I think 4edc540 does something wrong with MSR in a nested VM.

I also tested GVM in a host Intel PC and Windows 7/10 and Arch Linux boot as guest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants