Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Paging issue on Haswell and pre-Haswell CPUs #218

Open
VelocityRa opened this issue Jun 24, 2019 · 9 comments
Open

Paging issue on Haswell and pre-Haswell CPUs #218

VelocityRa opened this issue Jun 24, 2019 · 9 comments

Comments

@VelocityRa
Copy link

VelocityRa commented Jun 24, 2019

Describe the Bug

Summary: HAXM will not map a host virtual address to a guest physical address at or above a specific address, on Haswell or pre-Haswell CPUs (Haswell-E works).

Instead of a successful address translation, a page fault happens (triple fault in the case of the log below because of no interrupt handling) with CR2 containing the failed address.

Please see @StrikerX3's comment below for details.

Host Environment

  • HAXM version: v7.5.1
  • Host OS version: Windows 10 Pro version 1803
  • Host OS architecture: x86_64
  • Host CPU model: i5-4690K also confirmed not working on i7-2630QM, while it works on ie. i7-5930K
  • Host RAM size: 16GB

Guest environment

Tiny piece of code that mostly just boots to long mode, see below.

To Reproduce

Run the test here: https://github.com/StrikerX3/virt86-demos/tree/master/apps/x64-guest

Expected Behavior

Expectation: Test completes without "VCPU shutting down"/"VCPU execution failed" messages appearing.

Reproducibility

100%, the test with full source should help narrow down the problem

Diagnostic Information

HAXM log (debug level): https://cdn.discordapp.com/attachments/532915071697944576/592552716124028965/haxm.log

@StrikerX3
Copy link

StrikerX3 commented Jun 24, 2019

To be more exact, we can map GPAs successfully up to 0x7F'FFFF'F000 on all CPUs we tested (all three CPUs mentioned above and an i5-4460S). From 0x80'0000'0000 and up, only the Haswell-E CPU works.

This also happens with WHPX, which leads me to believe this is a limitation of the CPUs and not a HAXM bug. However, it would be nice if the IOCTLs that map GPA ranges returned an error in those cases.

Is there a way to programmatically determine the maximum usable GPA address on the current host?

@StrikerX3
Copy link

Seems like we can use CPUID 8000_0008h.EAX[23..16] (or [7..0] if those bits are zero) to find out how many bits are supported in a GPA on the host's CPU.

@wcwang
Copy link
Contributor

wcwang commented Jul 1, 2019

Thanks for your report. We will try to reproduce this issue according to the mentioned project.
With the release of HAXM v7.5.1, we resolved an issue about the vcpu shutdown. And the patch has been merged into QEMU master. Could you help to check whether your QEMU version contains that patch, and provide the QEMU launch command with parameters for further analysis? If the arguments does not contain '-smp', the test case should also pass even without that patch.

@VelocityRa
Copy link
Author

Hello,
we're not using QEMU, the test above uses HAXM (or other HVMs) directly.

@wcwang
Copy link
Contributor

wcwang commented Jul 3, 2019

Thanks for your reply. We will investigate the test case by leveraging the test project. Meanwhile, you are welcome to commit your patch if you have any idea, then we would like to review and discuss the issue further. Thanks.

@hyuan3
Copy link
Contributor

hyuan3 commented Jul 3, 2019

I can't access the url of haxm log. But I understand that gpa mapping in haxm (including set_ram and ept violation handling) will not report triple fault on failing cases.

@VelocityRa
Copy link
Author

Reposting the log:
haxm.log

The triple fault is because there is no IDT in the guest (it's empty).


Anyway, based on what @StrikerX3 said, the fix that would need to be implemented in HAXM is probably just a check that - based on the aforementioned host CPUID value - would return an error condition when you try to map GPAs higher than is supported via the HAXM IOCTL.

@StrikerX3
Copy link

StrikerX3 commented Jul 3, 2019

Exactly. To clarify: there is an upper limit to the address of usable GPAs that the CPU supports, which can be measured through the CPUID value mentioned previously. The limit is higher in Haswell-E CPUs compared to previous generation processors. In our tests, we were trying to map a GPA range above pre-Haswell-E CPUs' limit, but within Haswell-E's range. The issue is that the IOCTL went through without any kind of error on the older CPUs, leading us to believe that the memory was mapped, but when the guest attempted to execute code in that area, we got a page fault. We worked around this by detecting the maximum allowed GPA range based on CPUID and platform limitations and generating an error when an user attempts to map a GPA range beyond the limit.

I also noticed that HAXM imposes an upper limit of 2^31 pages to the GPA range to limit the size of the protection bitmap (as seen here). This limit is not enforced by other platforms such as WHPX or KVM. Additionally, there is an off-by-one error in the check that impedes usage of the highest page in the allowed range.

@varinderpratap
Copy link

varinderpratap commented Sep 16, 2022

Hello, Similar page fault issue we are facing in accessing ram with Tizen Emulator/Qemu on Haxm.
13:18:11.221|17380|T| yagl| 154|[10784/17380] {{{ yagl_transport_begin():154
13:18:11.221|21276|T| yagl| 974|[2878/2878] {{{ glTexSubImage2DData(target = 0xde1, level = 0, xoffset = 1, yoffset = 1, width = 720, height = 1280, format = 0x80e1, type = 0x1401, pixels = 00000000792af148):974
13:18:11.221|17380|E| yagl| 174|[0/0] yagl_transport_begin:174 - yagl_transport_begin - batch_size=6560, out_arrays_size=3683524, fence_seq=0, num_out_da=1
13:18:11.221|17380|T| yagl| 57|[2878/2878] {{{ yagl_mem_get(va = 0x00000000ab2f7000, len = 3683524):57
13:18:11.221|17380|W| yagl| 62|[2878/2878] yagl_mem_get:62 - page fault at 0x00000000ab2f7000, len= 3683524
13:18:11.221|17380|T| yagl| 65|[2878/2878] }}} yagl_mem_get:65

Qemu Source : https://review.tizen.org/gerrit/gitweb?p=sdk%2Femulator%2Fqemu.git;a=shortlog;h=refs%2Fheads%2Ftizen_qemu_5.0.1

Error Line: yagl| 62|[2878/2878] yagl_mem_get:62 - page fault at 0x00000000ab2f7000, len= 3683524
Happens mainly when mem length is greater than 2MB.

Yagl : Yet another graphics library. Yagl is vPCI device to perform OpenGL operations.
https://review.tizen.org/gerrit/gitweb?p=sdk/emulator/qemu.git;a=blob;f=hw/yagl/yagl_device.c;h=2d52864a4bb71f56f17caba3b257823330cad8d4;hb=069e0b790db3c2d2c80148f3baead4be613bf8f9

Please note same code works for WHPX and KVM and same issue on MacOS Haxm.

May I know any pointer or approach to fix the same? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants