Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Make HAXM run on system with more than 64 host CPUs #255

Merged
merged 1 commit into from Dec 24, 2019
Merged

Make HAXM run on system with more than 64 host CPUs #255

merged 1 commit into from Dec 24, 2019

Conversation

coxuintel
Copy link
Contributor

Although this patch changed lots of files but only do one thing: make
HAXM run on system with more than 64 logical CPUs.

Previously, HAX_MAX_CPUS is defined as 64, and HAXM stores CPU online
bitmap in 64-bit variable. When running on a system with more than 64
logical CPUs, IPI call actually executed on all CPUs, although internally
HAXM only maintains a 64-bit bitmap. So some per-CPU routine actually
runs on different CPUs, but the 64 loop will check the same pair of
VMXON/VMXOFF for VMX operations, then leads to the error.
Simply increasing the 64-bit bitmap to larger size may resolve the issue
but not efficient and clean.
Previous implementation also has another issue that it invokes
KeQueryActiveProcessors() to get the total logical CPU number, and
KeGetCurrentProcessorNumber() to get the current logical CPU ID. However,
both APIs are NOT designed to get information on Windows with more than
1 CPU groups. Otherwise, both APIs only return value from group 0, which
can't reveal the actual logical CPU information. Instead, user should use
KeQueryActiveProcessorCountEx() and KeGetCurrentProcessorNumberEx().

This patch defines the CPU bitmap in 2-dimention way, in unit of group,
each group can hold up to 64 CPUs for bitmap, same as old implementation.
And introduce another array to store the per-CPU group/bitmap information
so that indexing could be fast. This patch also unify cpu init routines,
more common implementation into same header/source instead of OS specific,
like cpu_info_init(), smp_cfunction(), smp_call_parameter{}, etc. Since
they are very fundamental functions, several files are modified.

Change summary:

  • Define 2-dimention structure hax_cpumap_t to store CPU bitmap info.
    Including total group number, total logical CPU number, bitmap within
    each group, group/bit position for each CPU id.
  • For Windows/Linux/Darwin/BSD, use simliar routine cpu_info_init()
    to initialize host CPUs, and implement in OS specific way.
  • On Windows, use KeQueryActiveProcessorCountEx(),
    KeGetCurrentProcessorNumberEx() and KeQueryActiveGroupCount() to get
    correct logical CPUs number and group information, and fill hax_cpumap_t.
  • For Linux/Darwin/BSD, use OS specific routine to get the total logical
    CPU number, and fill into groups consecutive. This is different against
    Windows since logical process group is Windows definition, and CPU bitmap
    is not guaranteed to consecutively fit into a 64-bit bitmap: 64 logical
    CPUs could be in two groups.
  • Implement the new cpu_is_online() function with the new hax_cpumap_t.
  • Implement the new cpu2cpumap() function with the new hax_cpumap_t.
  • Unify OS specific smp_cfunction() implementations in to one, since
    the function is executed by IPI on each CPU, check the CPU online with
    the new cpu_is_online().
  • For all functions refer to the CPU bitmap, use the new implementation.
  • For all per-CPU IPI function, add current CPU id in log.

After this patch, HAXM design won't block running on system with a large
number CPUs, and easy to expand in case limitted by the date type range:
now the upper bound is 65536*64 regardless of other resource limitation.

Signed-off-by: Colin Xu colin.xu@intel.com

@HaxmCI HaxmCI added CI:Build Pass CI:Build Pass CI:Mac Test Pass CI:Mac Test Pass labels Dec 23, 2019
@krytarowski
Copy link
Contributor

This breaks on NetBSD as there is an unresolved cpu_index symbol.

[  1671,976779] kobj_checksyms, 988: [haxm]: linker error: symbol `cpu_index' not found
[  1671,976779] WARNING: module error: unable to affix module `haxm', error 8
$ nm -A  *.o|grep cpu_index 
cpu.o:                 U cpu_index
ept.o:                 U cpu_index
hax.o:                 U cpu_index
vcpu.o:                 U cpu_index

The problem is with inlining:

static inline uint32_t hax_cpuid(void)
{
    return (uint32_t)cpu_number();
}

without including <sys/cpu.h>.

I don't know if this is a clean approach, but it works:

--- a/include/netbsd/hax_netbsd.h
+++ b/include/netbsd/hax_netbsd.h
@@ -34,6 +34,9 @@
 
 #define HAX_RAM_ENTRY_SIZE 0x4000000
 
+#include <sys/types.h>
+#include <sys/cpu.h>
+
 hax_spinlock *hax_spinlock_alloc_init(void);
 void hax_spinlock_free(hax_spinlock *lock);

I don't have 64 or more CPUs to test this patch, but for my 8cpu NetBSD computer haxm still works.

@coxuintel
Copy link
Contributor Author

This breaks on NetBSD as there is an unresolved cpu_index symbol.

[  1671,976779] kobj_checksyms, 988: [haxm]: linker error: symbol `cpu_index' not found
[  1671,976779] WARNING: module error: unable to affix module `haxm', error 8

Thanks for catching this. Maybe I'd better move the implementation into *.c file. Include more headers in *.h isn't quite good.

@krytarowski
Copy link
Contributor

Is this function called in performance critical places?

@coxuintel
Copy link
Contributor Author

Is this function called in performance critical places?

Sometimes, that's why I define them as inline at first. I"ll see if I can find a better way. Thanks.

@coxuintel
Copy link
Contributor Author

Is this function called in performance critical places?

Sometimes, that's why I define them as inline at first. I"ll see if I can find a better way. Thanks.

Renaming hax_cpuid to hax_cpu_id since we'll need the name for other accurate purpose in future.
Forward declaration hax_cpu_id() in hax.h, and move OS specific implementation into hax_wrapper.c, still keeps as inline for performance consideration.

count = 0;
for (group = 0; group < cpu_online_map.group_num; group++) {
cpu_online_map.cpu_map[group].map = (hax_cpumask_t)KeQueryGroupAffinity(
group);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 spaces indentation for line wrap or align to the parenthesis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

Although this patch changed lots of files but only do one thing: make
HAXM run on system with more than 64 logical CPUs.

Previously, HAX_MAX_CPUS is defined as 64, and HAXM stores CPU online
bitmap in 64-bit variable. When running on a system with more than 64
logical CPUs, IPI call actually executed on all CPUs, although internally
HAXM only maintains a 64-bit bitmap. So some per-CPU routine actually
runs on different CPUs, but the 64 loop will check the same pair of
VMXON/VMXOFF for VMX operations, then leads to the error.
Simply increasing the 64-bit bitmap to larger size may resolve the issue
but not efficient and clean.
Previous implementation also has another issue that it invokes
KeQueryActiveProcessors() to get the total logical CPU number, and
KeGetCurrentProcessorNumber() to get the current logical CPU ID. However,
both APIs are NOT designed to get information on Windows with more than
1 CPU group. Otherwise, both APIs only return value from group 0, which
can't reveal the actual logical CPU information. Instead, user should use
KeQueryActiveProcessorCountEx() and KeGetCurrentProcessorNumberEx().

This patch defines the CPU bitmap in 2-dimention way, in unit of group,
each group can hold up to 64 CPUs for bitmap, same as old implementation.
And introduce another array to store the per-CPU group/bitmap information
so that indexing could be fast. This patch also unify cpu init routines,
more common implementation into same header/source instead of OS specific,
like cpu_info_init(), smp_cfunction(), smp_call_parameter{}, etc. Since
they are very fundamental functions, several files are modified.

Change summary:
- Define 2-dimention structure hax_cpumap_t to store CPU bitmap info.
Including total group number, total logical CPU number, bitmap within
each group, group/bit position for each CPU id.
- For Windows/Linux/Darwin/BSD, use simliar routine cpu_info_init()
to initialize host CPUs, and implement in OS specific way.
- On Windows, use KeQueryActiveProcessorCountEx(),
KeGetCurrentProcessorNumberEx() and KeQueryActiveGroupCount() to get
correct logical CPUs number and group information, and fill hax_cpumap_t.
- For Linux/Darwin/BSD, use OS specific routine to get the total logical
CPU number, and fill into groups consecutive. This is different against
Windows since logical process group is Windows definition, and CPU bitmap
is not guaranteed to consecutively fit into a 64-bit bitmap: 64 logical
CPUs could be in two groups.
- Implement the new cpu_is_online() function with the new hax_cpumap_t.
- Implement the new cpu2cpumap() function with the new hax_cpumap_t.
- Unify OS specific smp_cfunction() implementations in to one, since
the function is executed by IPI on each CPU, check the CPU online with
the new cpu_is_online().
- For all functions refer to the CPU bitmap, use the new implementation.
- For all per-CPU IPI function, add current CPU id in log.

After this patch, HAXM design won't block running on system with a large
number CPUs, and easy to expand in case limitted by the date type range:
now the upper bound is 65536*64 regardless of other resource limitation.

Signed-off-by: Colin Xu <colin.xu@intel.com>
@wcwang wcwang merged commit d62d8f9 into intel:master Dec 24, 2019
@coxuintel coxuintel deleted the more_cpus branch December 30, 2019 01:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI:Build Pass CI:Build Pass CI:Mac Test Pass CI:Mac Test Pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants