Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nested virtualization (for rusty-hermit) #6

Open
jschwe opened this issue Apr 25, 2020 · 10 comments
Open

Support nested virtualization (for rusty-hermit) #6

jschwe opened this issue Apr 25, 2020 · 10 comments

Comments

@jschwe
Copy link
Contributor

jschwe commented Apr 25, 2020

When using nested virtualization rusty-hermit currently panics when detecting the cpu frequency, since all methods fail. Uhyve should provide the CPU frequency even in a nested virtualization environment.
This can be done either by ensuring detect_from_hypervisor() works or by modifying the CPUid brandstring to contain the clockspeed.

Also (this still needs some more testing on my side though) uhyve should print error messages / quit in the same way for nested virtualization as it does for normal virtualization. Currently uhyve seems to print less when using nested virtualization.

@jschwe
Copy link
Contributor Author

jschwe commented Apr 26, 2020

I'd like to add that The output of lscpu for both native and virtual ubuntu. Since all the necessary information is available in the virtualbox, uhyve should be able to provide the clockspeed to rusty-hermit. I did not really understand how detect_from_hypervisor() actually works yet, so I might be wrong about this.

I also double checked and uhyve drops error messages when running with nested virtualization. I compiled and inspected the same program with gdb, and verified that they both panic at the same .expect() line. The nested uhyve however doesn't output any clear error message. Is there a good way to debug uhyve itself with gdb?

lscpu native Ubuntu

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Stepping:            3
CPU MHz:             800.017
CPU max MHz:         3900,0000
CPU min MHz:         800,0000
BogoMIPS:            6999.82
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

lscpu virtualized Ubuntu (kvm hypervisor)

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Stepping:            3
CPU MHz:             3503.998
BogoMIPS:            7007.99
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow flexpriority fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d

panic with uhyve running on virtual ubuntu

HERMIT_VERBOSE=1 ../uhyve/target/debug/uhyve target/x86_64-unknown-hermit/debug/rusty_demo
[0][INFO] Welcome to HermitCore-rs 0.3.25
[0][INFO] Kernel starts at 0x200000
[0][INFO] BSS starts at 0x446600
[0][INFO] TLS starts at 0x444d60 (size 264 Bytes)
[0][INFO] Total memory size: 64 MB
[0][INFO] A pure Rust application is running on top of HermitCore!
[0][INFO] Heap: size 54 MB, start address 0x600000
[0][INFO] Heap is located at 0x600000 -- 0x3c00000 (0 Bytes unmapped)
[0][INFO] 
[0][INFO] ===================== PHYSICAL MEMORY FREE LIST ======================
[0][INFO] 0x00000003C00000 - 0x00000004000000
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] 
[0][INFO] ================== KERNEL VIRTUAL MEMORY FREE LIST ===================
[0][INFO] 0x00000003C00000 - 0x00800000000000
[0][INFO] ======================================================================
[0][INFO] 
ERROR 2020-04-26T08:23:34Z: uhyve::linux::vcpu: Internal error
ERROR 2020-04-26T08:23:34Z: uhyve: CPU 0 crashes! Unknown exit reason.

panic with uhyve running on native ubuntu

$ HERMIT_VERBOSE=1 ../uhyve/target/debug/uhyve target/x86_64-unknown-hermit/debug/rusty_demo
[0][INFO] Welcome to HermitCore-rs 0.3.25
[0][INFO] Kernel starts at 0x200000
[0][INFO] BSS starts at 0x43f600
[0][INFO] TLS starts at 0x43d048 (size 264 Bytes)
[0][INFO] Total memory size: 64 MB
[0][INFO] A pure Rust application is running on top of HermitCore!
[0][INFO] Heap: size 54 MB, start address 0x600000
[0][INFO] Heap is located at 0x600000 -- 0x3c00000 (0 Bytes unmapped)
[0][INFO] 
[0][INFO] ===================== PHYSICAL MEMORY FREE LIST ======================
[0][INFO] 0x00000003C00000 - 0x00000004000000
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] 
[0][INFO] ================== KERNEL VIRTUAL MEMORY FREE LIST ===================
[0][INFO] 0x00000003C00000 - 0x00800000000000
[0][INFO] ======================================================================
[0][INFO] 
thread '<unnamed>' panicked at 'Could not determine the processor frequency: ()', src/arch/x86_64/kernel/processor.rs:405:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[0][INFO] Shutting down system

@stlankes
Copy link
Collaborator

Hm, I have similar setup and it work for me. Can you check if https://github.com/hermitcore/uhyve/blob/master/src/vm.rs#L683 determines the correct frequency on your system? Which Linux kernel do you use?

@jschwe
Copy link
Contributor Author

jschwe commented Apr 29, 2020

On native Ubuntu freq is 3500, which is correct.
On virtual Ubuntu (tested on my vagrant box) freq is 0. Uhyve should probably also give an error/info message for this similar to the None case and skip the write_volatile().

This behavior is expected I guess, since rusty-hermit on uhyve on native ubuntu also can't read out the processor frequency with the CpuId crate. This implies that CpuID doesn't work (in some) virtual environments.

Kernel version virtual Ubuntu (vagrant box): 4.15.0-96-generic This also has virtualbox guest additions installed
Kernel version native Ubuntu: 5.3.0-46-generic

@jschwe
Copy link
Contributor Author

jschwe commented May 1, 2020

#9 works on my local machine. uyhve on virtual ubuntu can detect the CPU frequency on my computer.

However this doesn't work everywhere. For example it doesn't work on travis: https://travis-ci.com/github/jschwe/rusty-hermit/jobs/326283414
The problem here is that the Model name of the CPU is "Intel(R) Xeon(R) CPU" which doesn't contain the frequeny. Since Intel Xeon CPUs are very common for servers we should consider adding an additional method for parsing the CPU frequency.

When looking at the Job log you can also see that the original error reason which should have been "Could not determine the processor frequency" , due to the failed expect is not printed. This only happens on nested environments. It is also not completely consistent, since there where cases when I have seen the reason for a panic printed out on travis or in my virtualbox. I'll try to investigate this further.
Worth noting in this context is that the following messages only appeared after I added some debug printlns to libhermit, without changing anything else.

[0][TRACE] __sys_malloc: allocate memory at 0x600180 (size 0x408, align 0x8)
[0][ERROR] Page Fault (#PF) Exception: ExceptionStackFrame {
    instruction_pointer: 0x37569b,
    code_segment: 0x8,
    cpu_flags: 0x10283,
    stack_pointer: 0x1ffb90,
    stack_segment: 0x10,
}
[0][ERROR] virtual_address = 0x3830000, page fault error = The fault was caused by a non-present page.
The access causing the fault was a read.
The access causing the fault originated when the processor was executing in supervisor mode.
The fault was not caused by reserved bit violation.
The fault was not caused by an instruction fetch.
[0][ERROR] fs = 0x0, gs = 0x44A678

Before adding the prints the error contained much less info
Adding the first prints changed the error

You can view the changes I made here: hermit-os/hermit-rs#5
I basically only added debug outputs and the travis pipeline.

@jschwe
Copy link
Contributor Author

jschwe commented May 1, 2020

I've now also tested this with my second travis pipeline.
When run without the debug prints I get the following error:

thread '<unnamed>' panicked at 'attempt to create unaligned or null slice', /home/travis/build/jschwe/hermit/rust/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/src/rust/src/libcore/slice/mod.rs:5694:5

stack backtrace:

With the added debug prints I actually get the expected error message this time:

thread '<unnamed>' panicked at 'Could not determine the processor frequency: ()', src/arch/x86_64/kernel/processor.rs:419:9

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

[0][INFO] Shutting down system

However either the kernel or uhyve doesn't terminate correctly. It hangs and is terminated by travis after 10 minutes. I recall having seen this behaviour about two or three times locally too.

Something strange is definitely going on here. Could there be some kind of race condition in the panic handler?

@stlankes
Copy link
Collaborator

stlankes commented May 8, 2020

@jschwe Can you check, if hermit core/libhermit-rs#48 determines the CPU frequency correctly on test setup.

@jschwe
Copy link
Contributor Author

jschwe commented May 8, 2020

@stlankes I checked, and this does not determine the CPU frequency on travis. . It might work if we use the cpuid function from uhyve, so I'll test that when I have time and write an update here.

@jschwe
Copy link
Contributor Author

jschwe commented May 9, 2020

Update: Using this method in uhyve also doesn't work on travis. raw_cpuid is able to detect that the hypervisor is kvm, but returns a frequency of 0.

@stlankes
Copy link
Collaborator

Do you still receive sometimes a page fault?

@jschwe
Copy link
Contributor Author

jschwe commented May 15, 2020

Currently I'm not experiencing any panics, so I am not experiencing any page faults when running rusty-demo. However I believe there still is an issue with the panic_handler, since I can still reproduce the Page fault when deliberately panicking (hermit-os/kernel#43)

@jounathaen jounathaen removed their assignment May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants