Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBSD 6.4 panic on boot (QEMU) #172

Open
polprog opened this issue Feb 14, 2019 · 19 comments

Comments

Projects
None yet
6 participants
@polprog
Copy link
Contributor

commented Feb 14, 2019

Describe the Bug

Summary: OpenBSD 6.4 panics on boot (#GP code 4)

Host Environment

  • HAXM version: 7.4.1 built from emulators/haxm
  • Host OS version: NetBSD 8.0/amd64
  • Host OS architecture: x86_64
  • Host CPU model: Core i3-3220
  • Host RAM size: 8GB
  • (Optional) Host computer model: Dell Optiplex 7010
    QEMU version 3.1.0

Guest Environment

  • Guest OS version: OpenBSD 6.4
  • Guest OS architecture: x86

To Reproduce

Steps to reproduce the behavior:

  1. Boot in qemu with default options and HAX enabled
    qemu-system-i386 -accel hax -cdrom OPENBSD-cd64.iso

Expected Behavior

Expectation: Guest OS boots like on a physical machine

Reproducibility
Always

Diagnostic Information
See screenshot

Host crash dump: n/a

HAXM log: No useful information (Only the start version info, HAX_LOWMEM_4G ignored and hax_teardown_vm, we've already recompiled it with noisiest loglevel) (in dmesg)

Android Emulator or QEMU log:
The only message states that HAXM is working and emulator runs in fast virt mode

Screenshots:
Panic screenshot

Additional context

QEMU 3.1.0 is known to not work "well" with HAX, I will try to downgrade to 3.0.0 and test it there.
I'm working on porting HAXM to NetBSD with Kamil Rytarowski

@raphaelning

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

Thanks for the detailed report.

The screenshot indicates the guest panic is caused by "trap type 4", which I believe is the #OF (Overflow) exception. According to Intel SDM Vol. 3A, Section 6.15 (Exception and Interrupt Reference), #OF can only be triggered by the INTO instruction. So I'd say that the first step is to disassemble the guest code around the faulting EIP (pc=d03a4861) and try to understand what the correct behavior should be. Maybe you could launch QEMU with -accel tcg -s -S (HAXM off, GDB server on), and then set a breakpoint at the faulting EIP?

@raphaelning raphaelning added the bug label Feb 15, 2019

@raphaelning

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

I'm working on porting HAXM to NetBSD with Kamil Rytarowski

That's great! BTW, we've never tested OpenBSD and Solaris guests before, so the bugs you have run into are probably in the HAXM core. Therefore, I'd really appreciate your effort to track them down and perhaps eventually fix them, which will benefit all HAXM users :-)

@polprog

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

I have connected to qemu with GDB, took out the kenrel ELF to load the symbol table from it.

This is the offending function - cpu_paenable (sic! not pae enable)
Instrunction from the address on the screenshot is this RDMSR here.

(gdb) disas 0xd03a4861
Dump of assembler code for function cpu_paenable:
   0xd03a482c <+0>:	mov    $0xffffffff,%eax  
   0xd03a4831 <+5>:	testl  $0x40,0xd096d010  
   0xd03a483b <+15>:	je     0xd03a488d <cpu_paenable+97>  
   0xd03a483d <+17>:	push   %esi  
   0xd03a483e <+18>:	push   %edi  
   0xd03a483f <+19>:	mov    0xc(%esp),%esi  
   0xd03a4843 <+23>:	mov    %cr3,%edi  
   0xd03a4846 <+26>:	or     $0xfe0,%edi  
   0xd03a484c <+32>:	mov    %edi,%cr3  
   0xd03a484f <+35>:	add    $0xd0000000,%edi
   0xd03a4855 <+41>:	mov    $0x8,%ecx
   0xd03a485a <+46>:	rep movsl %ds:(%esi),%es:(%edi)
   0xd03a485c <+48>:	mov    $0xc0000080,%ecx
   0xd03a4861 <+53>:	rdmsr  
   0xd03a4863 <+55>:	or     $0x800,%eax
   0xd03a4868 <+60>:	wrmsr  
   0xd03a486a <+62>:	mov    %cr4,%eax
   0xd03a486d <+65>:	or     $0x20,%eax
   0xd03a4870 <+68>:	mov    %eax,%cr4
   0xd03a4873 <+71>:	mov    0xc(%esp),%eax
   0xd03a4877 <+75>:	sub    $0xd0000000,%eax
   0xd03a487c <+80>:	mov    %eax,%cr3
   0xd03a487f <+83>:	mov    $0x4000,%eax
   0xd03a4884 <+88>:	mov    %eax,0xd096d0b8
   0xd03a4889 <+93>:	xor    %eax,%eax
   0xd03a488b <+95>:	pop    %edi
   0xd03a488c <+96>:	pop    %esi
   0xd03a488d <+97>:	ret    
End of assembler dump.
(gdb) 

I cant find out which MSR is this, if you could also note where I can find that out, I'd be grateful :)

@VelocityRa

This comment has been minimized.

Copy link

commented Feb 16, 2019

It's the C000_0080h (EFER) MSR. Both the SDM and AMD's docs have an index for MSRs, you can find them easily there.

So, not sure if that helps but the code at that point is enabling no-execute page protection.

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

How to get OPENBSD-cd64.iso? I cannot find references to such file.

Is this the same as pub/OpenBSD/6.4/amd64/cd64.iso?

        case IA32_EFER: {   
            if (!(state->_cr4 & CR4_PAE) && (state->_cr0 & CR0_PG)) {
                r = 1; /// <- returns this
            } else {  
                *val = state->_efer;
            }
            break;
        }

-- core/vcpu.c

If I read it correctly we are landing into the case of injecting an exception.

static int exit_msr_read(struct vcpu_t *vcpu, struct hax_tunnel *htun)
{
    struct vcpu_state_t *state = vcpu->state;
    uint32_t msr = state->_ecx;
    uint64_t val;

    htun->_exit_reason = vmx(vcpu, exit_reason).basic_reason;
 
    if (!handle_msr_read(vcpu, msr, &val)) {
        state->_rax = val & 0xffffffff;
        state->_rdx = (val >> 32) & 0xffffffff;
    } else {
        hax_inject_exception(vcpu, VECTOR_GP, 0);  /// <-- here!
        return HAX_RESUME;
    }

    advance_rip(vcpu);
    return HAX_RESUME;
}

-- core/vcpu.c

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

It looks like OpenBSD/i386 uses PAE in order to enable "no execute" mapping.

slide 7

https://www.openbsd.org/papers/hackfest2015-w-xor-x.pdf

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

On the other hand amd64 guest works.

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

1617ENTRY(cpu_paenable)
1618	movl	$-1, %eax
1619	testl	$CPUID_PAE, _C_LABEL(cpu_feature)
1620	jz	1f
1621
1622	pushl	%esi
1623	pushl	%edi
1624	movl	12(%esp), %esi
1625	movl	%cr3, %edi
1626	orl	$0xfe0, %edi    /* PDPT will be in the last four slots! */
1627	movl	%edi, %cr3
1628	addl	$KERNBASE, %edi /* and make it back virtual again */
1629	movl	$8, %ecx
1630	rep
1631	movsl
1632
1633	movl	$MSR_EFER, %ecx
1634	rdmsr                                  #  !!! HAXM exception! because PAE is disabled
1635	orl	$EFER_NXE, %eax
1636	wrmsr
1637
1638	movl	%cr4, %eax
1639	orl	$CR4_PAE, %eax
1640	movl	%eax, %cr4      /* BANG!!! */
1641

http://src.illumos.org/source/xref/openbsd-src/sys/arch/i386/i386/locore.s#1617

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

So, what's the rationale (x86 specification reference) to prevent reads of EFER with PAE disabled?

With this following change I can get OpenBSD booting.

diff --git a/core/vcpu.c b/core/vcpu.c
index 14990c8..a81811d 100644
--- a/core/vcpu.c
+++ b/core/vcpu.c
@@ -3284,11 +3284,11 @@ static int handle_msr_read(struct vcpu_t *vcpu, uint32_t msr, uint64_t *val)
             break;
         }
         case IA32_EFER: {
-            if (!(state->_cr4 & CR4_PAE) && (state->_cr0 & CR0_PG)) {
-                r = 1;
-            } else {
+//            if (!(state->_cr4 & CR4_PAE) && (state->_cr0 & CR0_PG)) {
+//                r = 1;
+//            } else {
                 *val = state->_efer;
-            }
+//            }
             break;
         }
         case IA32_STAR:

But it breaks later in the booting process with [ 32796,795571] haxm_panic: Unexpected page fault, kill the VM! but it might be another issue.

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

@doug65536 mentioned that this is hardware bug to allow such code to change the interpretation of the page tables while they are being used and it's just asking for break.

However virtualization has to reproduce silicon illegals/bugs and keep software going so we shall allow reading EFER, setting NXE and enabling PAE.

@polprog

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2019

Thanks, this is very interesting (and ironic).

@krytarowski the iso file is here: https://ftp.openbsd.org/pub/OpenBSD/6.4/i386/cd64.iso , for the record my md5 of it is ed6286dba0a4c1d3c523a7fd561aa427 (so that we can make sure its the same version, set, or whatever - it's the current release so the iso might change)

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

After checking deeper this turns out to be legal. SDM3 4.1.2 Paging-Mode Enabling allows such switch PG1 PAE0 -> PG=1 PAE=1, but it's not usual probably due to potential issues.

It's also stated explicitly:

"Software can make transitions between 32-bit paging and PAE paging by changing the value of CR4.PAE with MOV to CR4."

So we can defer the discussion on this.

Back to the booting failure, it's forbidden to enable IA32_EFER.LMA with !PAE.. but it should be allowed to read IA32_EFER. At least I don't see right now any reason in SDM to make it forbidden.

@raphaelning

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2019

After checking deeper this turns out to be legal. SDM3 4.1.2 Paging-Mode Enabling allows such switch PG1 PAE0 -> PG=1 PAE=1, but it's not usual probably due to potential issues.

Thanks for the investigation! I agree that HAXM should allow those valid IA32_EFER read/write requests instead of injecting #GP. cpu_paenable tries to enable PAE paging mode with a security feature (Execute Disable Bit, enabled by IA32_EFER.NXE), which I think is completely reasonable (how can this be a hardware bug?). According to Intel SDM Vol. 4, the IA32_EFER MSR is available as long as the CPU supports either NX (Execute Disable) or LM (Long Mode), and HAXM exposes both features to the guest via CPUID. In particular, the spec requires that IA32_EFER.NXE = 1 be disallowed only when the NX feature is not supported (cf. Intel SDM Vol. 3A 4.1.4), which has nothing to do with the current paging mode.

PAE paging is less common than standard 32-bit paging and 64-bit (4-level) paging modes, and has not been very well tested on HAXM. My guess is that the original author of the IA32_EFER handler simply did not want to support PAE guests at the time. But the fact that 32-bit Windows 7 uses PAE and boots on HAXM now (see #152) should give us some confidence.

Back to the booting failure, it's forbidden to enable IA32_EFER.LMA with !PAE.. but it should be allowed to read IA32_EFER.

As a 32-bit guest, OpenBSD i386 should never attempt to set IA32_EFER.LMA (which is used to enable 64-bit mode). Have you tried to remove the !PAE check from handle_msr_write() as well? It prevents wrmsr in cpu_paenable from enabling NXE.

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2019

how can this be a hardware bug?

We were wondering not about NXE, but about reinterpreting page tables live when they are used and enabling PAE with active 32-bit paging. OpenBSD uses some hack to make it work and SDM allows it so please disregard.

My guess is that the original author of the IA32_EFER handler simply did not want to support PAE guests at the time. But the fact that 32-bit Windows 7 uses PAE and boots on HAXM now (see #152) should give us some confidence.

I see, so legacy code not aware at that time about 32-bit PAE.

Have you tried to remove the !PAE check from handle_msr_write() as well? It prevents wrmsr in cpu_paenable from enabling NXE.

I've tested this patch and it gets booting but breaks due to missing instruction in the emulator. I'm going to file a report for it.

diff --git a/core/vcpu.c b/core/vcpu.c
index 14990c8..91cb18d 100644
--- a/core/vcpu.c
+++ b/core/vcpu.c
@@ -3284,11 +3284,7 @@ static int handle_msr_read(struct vcpu_t *vcpu, uint32_t msr, uint64_t *val)
             break;
         }
         case IA32_EFER: {
-            if (!(state->_cr4 & CR4_PAE) && (state->_cr0 & CR0_PG)) {
-                r = 1;
-            } else {
-                *val = state->_efer;
-            }
+            *val = state->_efer;
             break;
         }
         case IA32_STAR:
@@ -3548,11 +3544,8 @@ static int handle_msr_write(struct vcpu_t *vcpu, uint32_t msr, uint64_t val)
             hax_info("Guest writing to EFER[%u]: 0x%x -> 0x%llx, _cr0=0x%llx,"
                      " _cr4=0x%llx\n", vcpu->vcpu_id, state->_efer, val,
                      state->_cr0, state->_cr4);
-            if ((state->_cr0 & CR0_PG) && !(state->_cr4 & CR4_PAE)) {
-                state->_efer = 0;
-            } else {
-                state->_efer = val;
-            }
+            state->_efer = val;
+
             if (!(ia32_rdmsr(IA32_EFER) & IA32_EFER_LMA) &&
                 (state->_efer & IA32_EFER_LME)) {
                 hax_panic_vcpu(

I will keep this report open until we will get OpenBSD/i386 6.4 booting to a shell prompt.

@raphaelning

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2019

I've tested this patch and it gets booting but breaks due to missing instruction in the emulator. I'm going to file a report for it.

Cool, thanks! That sounds better than the Unexpected page fault error.

@HaHoYou

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

I tried 3 times, no decode error as #182 did.
Image: https://ftp.openbsd.org/pub/OpenBSD/6.4/i386/cd64.iso
Command: qemu-system-x86_64 -accel hax -cdrom ./openBSD6.4cd64.iso
Hax version: latest origin\master (44b21c5 on Mar 5)
Error: 2 kinds:
triple.LOG
mmio.LOG

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

Waiting for #185

@doug65536

This comment has been minimized.

Copy link

commented Mar 31, 2019

The way they implemented paging on x86 they already have a mechanism to flush the TLB when you turn global paging off and on (for instance), so they just threw in a thing to do that same thing if you flick PAE off and on. I call that whole thing one big design bug because elsewhere, it is documented that the TLB is explicitly allowed to toss out entries at any moment. If that moment happens at a time in the middle of changing the page table layout, it's toast.

Relying on that is not very elegant.

@doug65536

This comment has been minimized.

Copy link

commented Mar 31, 2019

If I were at any design meeting about PAE toggle TLB flushing I would have strongly objected because program code must be implemented as if the entire TLB is being flushed after every instruction, and simultaneously, as if an infinite number TLB entries can be cached forever. Changing PAE while paging is enabled violates the first requirement.

Is the argument that it is switching into a recursive mapping so it is simultaneously valid in both interpretations? It is wrong for a hypervisor to have that restriction, but I think it is also slightly wrong for code to rely on strange corner cases, even documented ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.