New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored host feature detection via CPUID #63

Merged
merged 2 commits into from Jun 20, 2018

Conversation

Projects
None yet
3 participants
@AlexAltea
Copy link
Contributor

AlexAltea commented Jun 14, 2018

This patch allows checking for host features to conditionally enable guest features added in #61 and #62, since the old code only checked for host-support for few hardcoded features (VMX, NX, EMT64T).

Design

The newer one is designed following the comments following #61 (comment):

  • Old features (feat_*) have been replaced by unique 32-bit integer keys (X86_FEATURE_*) which are much more flexible than the approach used in Linux and FreeBSD: These keys encode the entire CPUID input state (both EAX and leaf registers), the cache index, and the bit position in just 25-bits.
  • All CPUID-related definitions/code have been moved to cpuid.h/cpuid.c. These files are self-contained and provide a simple interface to: (1) initialize a CPUID cache, (2) check for features (either in the cache or re-querying CPUID). This interface is consumed by cpu.c.

Aditionally, I had few questions:

  • I have noticed that during each (non-APM) iteration of save_guest_msr, load_guest_msr, save_host_msr, load_host_msr, HAXM checks if cpu_has_emt64_support() is true. This means a total of 2 * (NR_GMSR + NR_HMSR) = 22 queries. Is it really necessary to query it every time, i.e. does the feature set really change? This probably causes some overhead, specially without cache.
  • Some code (see cpu_init_vmx) flags host features for each CPU: Is it safe to assume that all host CPU cores present the same features? That would be nice, because otherwise we would need to have N CPUID caches for N cores.

@AlexAltea AlexAltea force-pushed the AlexAltea:cpuid branch from ba69810 to 4bdd80f Jun 14, 2018

@raphaelning

This comment has been minimized.

Copy link
Contributor

raphaelning commented Jun 15, 2018

Thanks for initiating this effort! I haven't reviewed the code yet, but let me answer the questions first:

Is it really necessary to query it every time, i.e. does the feature set really change? This probably causes some overhead, specially without cache.

This is a good catch! It's just the lm (Long Mode) feature flag, which should never change and should be cached.

Is it safe to assume that all host CPU cores present the same features?

I believe so. It's safe to assume nobody would run HAXM on a system with heterogeneous CPUs. As you said, KVM only checks the boot CPU (BSP), but I guess we can pick an arbitrary CPU.

@AlexAltea AlexAltea force-pushed the AlexAltea:cpuid branch from 4bdd80f to 70ca3ad Jun 15, 2018

*
* This allows representing any feature obtained via:
* - Root register (EAX): 00h-1Fh and 80000000h-8000001Fh
* - Leaf register (ECX, EDX, EBX): 00h-1Fh

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

Overall this is a very neat scheme. I have slightly different ideas about the following:

  1. nit: I'm curious about the nomenclature. IIUC, you refer to the input EAX value as root and the input ECX value as leaf. In Intel SDM, they are called leaf and subleaf (or sub leaf/sub-leaf), respectively. Maybe your terms are also common, so I'm fine with keeping them, but it would be helpful to explain how they map to the Intel terms in a comment, especially because the meaning of leaf can cause confusion.
  2. The only subleaf register allowed by Intel SDM is ECX, and I don't think future extensions to CPUID would add a third input register (after EAX and ECX), so I'd only use 1 bit for leaf_reg, and perhaps rename it to has_leaf (or has_subleaf).
  3. I think we can spend more bits in other fields. For instance, I'd represent root_key_hi with 2 bits, because it is now a de-facto standard among hypervisors to implement the reserved CPUID roots/leaves EAX=0x40000000+i, which enable a paravirtualized guest to identify the hypervisor and detect its features. It may not be useful for HAXM yet, but we could make the option available. We could also cut a couple of reserved bits to make root_key_lo and leaf_key a little longer, so as to support more roots/leaves and subleaves.

This comment has been minimized.

@AlexAltea

AlexAltea Jun 15, 2018

Contributor

Regarding to your points:

  1. You are right, I will use the leaf/subleaf naming to remain consistent with the Intel SDM (which is also what I quoted in the code comments). My previous terms were chosen arbitrarily, so there's not much value in keeping them.

  2. True, I have transformed that field into boolean 1-bit field named subleaf_used.

  3. True, I have extended leaf_hi (previously root_key_hi) to 2-bits. However, I have kept the remaining "key" fields unmodified (5-bits), since no leaf or subleaf has a lower part larger than 0x1F, both on Intel and AMD reference manuals. I think we should only extend fields when necessary to avoid extending the wrong fields and later running out of extra bits for other fields. Since these fields are only consumed internally by cpuid.h, increasing their size can be done at any time easily.

* - Root register (EAX): 00h-1Fh and 80000000h-8000001Fh
* - Leaf register (ECX, EDX, EBX): 00h-1Fh
*/
#define FEATURE_KEY(index, root_val, leaf_key, leaf_reg, value_reg, value_bit) ( \

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

nit: root_val => root_key.

((value_reg & 0x03) << 18) | \
((value_bit & 0x1F) << 20))

#define FEATURE_KEY_ROOT(index, root_val, value_reg, value_bit) \

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

nit: root_val => root_key

#define LEAF(bit) \
FEATURE_KEY_ROOT(4, 0x80000001, CPUID_REG_ECX, bit)
X86_FEATURE_LAHF = LEAF(0), /* 0x00000001 LAHF/SAHF Instructions */
X86_FEATURE_SVM = LEAF(2), /* 0x00000004 Secure Virtual Machine */

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

Some of these features don't seem to come from Intel SDM (I'm looking at Vol. 2A Table 3-8, Initial EAX Value being 80000001H). Maybe you were referring to the AMD manual?

This comment has been minimized.

@AlexAltea

AlexAltea Jun 15, 2018

Contributor

Yes, some of these came from AMD manuals. But on second thoughts, I will remove all AMD-specific features, at least until we add AMD-support at a later point (if).

The only feature that I will keep, not mentioned in the Intel SDM is:

    /* Features for CPUID with EAX=01h stored in ECX  */
    X86_FEATURE_HYPERVISOR    = FEAT(31), /* 0x80000000  Hypervisor Running */

The reason is that it was already defined (and used) by HAXM and according to multiple guides, a de-facto flag to signal the presence of an hypervisor.

*/
#define LEAF(bit) \
FEATURE_KEY_ROOT(5, 0x80000001, CPUID_REG_EBX, bit)
X86_FEATURE_SYSCALL = LEAF(11), /* 0x00000800 SYSCALL/SYSRET Instructions */

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

Same here. Intel SDM doesn't define any feature for output EBX. Instead, some important features (including 64-bit/Long Mode and RDTSCP) are found in EDX. Shall we stick to the Intel spec and worry about AMD support later?


// Functions
void cpuid_cache_init(uint32_t* cache);
bool cpuid_cache_has_feature(uint32_t* cache, uint32_t feature_key);

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

nit: Pointer declaration style

return args.regs[value_reg];
}

void cpuid_cache_init(uint32_t* cache)

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

nit: Use C-style pointer declaration: uint32_t *cache

cache[5] = cpuid_query_root(0x80000001, CPUID_REG_EBX);
}

bool cpuid_cache_has_feature(uint32_t* cache, uint32_t feature_key)

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

nit: Pointer declaration style


void cpuid_cache_init(uint32_t* cache)
{
cache[0] = cpuid_query_root(0x00000001, CPUID_REG_ECX);

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

nit: Maybe we could adjust the API and make fewer CPUID calls:

cpuid_args_t res;

// res is output only
cpuid_query_root(&res, 0x00000001);
cache[0] = res.ecx;
cache[1] = res.edx;

// Assuming (sub)leaf register is always ECX
cpuid_query_leaf(&res, 0x00000007, 0x00);
cache[2] = res.ecx;
cache[3] = res.ebx;
...
// Functions
void cpuid_cache_init(uint32_t* cache);
bool cpuid_cache_has_feature(uint32_t* cache, uint32_t feature_key);
bool cpuid_has_feature(uint32_t feature_key);

This comment has been minimized.

@raphaelning

raphaelning Jun 15, 2018

Contributor

I'm considering renaming these functions to:

  • cpuid_cache_init => cpuid_host_init
  • cpuid_cache_has_feature => cpuid_host_has_feature
  • cpuid_has_feature => cpuid_host_has_feature_uncached

Reasoning:

  1. We may want to cache the CPUID result for each guest later (instead of wasting time to recompute the result at each guest CPUID call), so we need to distinguish between host and guest CPUID APIs.
  2. Since the CPU feature set should never change for a given host, the uncached variant of the host CPUID query API should be rarely used and can have a long name. Meanwhile, we assign the short name to the more common, cached variant.

What do you think?

This comment has been minimized.

@AlexAltea

AlexAltea Jun 15, 2018

Contributor

I fully agree with the names you suggested.

Regarding the 2nd point, although it's true, I don't think these names are too important. After all, these functions are going to be wrapped by cpu.c, so the actual functions that will appear all over the codebase are cpu_has_feature (for host features), and maybe at some point in the future vcpu_has_feature (for guest features).

@AlexAltea AlexAltea force-pushed the AlexAltea:cpuid branch from 70ca3ad to f25bdda Jun 15, 2018

@AlexAltea

This comment has been minimized.

Copy link
Contributor

AlexAltea commented Jun 15, 2018

All fixed (@ dfa8810) and all pending work is finished, so this PR is ready for the final review. I will update things if requested: Once you give green-light to the patch, I'll rewrite the Git history into the following two commits for clarity and to avoid breaking git-bisect:

  1. Added x86 feature definitions and new CPUID interface.
  2. Updated host/guest CPUID feature management.

As side note: This patch (conditionally) enables AESNI, thus fixing #43.

@AlexAltea AlexAltea force-pushed the AlexAltea:cpuid branch 3 times, most recently from cd7cc07 to a13aad0 Jun 15, 2018

@raphaelning
Copy link
Contributor

raphaelning left a comment

Thanks! We are having a long weekend in China, so I'll have to finish the review tomorrow. I've read all your comments and I agree with everything you said.

uint32_t subleaf_used : 1;
uint32_t reg : 2;
uint32_t bit : 5;
uint32_t /*reserved*/ : 8;

This comment has been minimized.

@raphaelning

raphaelning Jun 18, 2018

Contributor

8 => 7

@raphaelning
Copy link
Contributor

raphaelning left a comment

The code looks very good, and I only caught one small issue during the review.

I also tried to compile the PR on Mac and ran into a few issues there. I was able to fix them, and I understand that it must be a pain for you to run tests on Mac ;-) So let me just share my patches with you, which you can then incorporate into your final patch set.

core/cpu.c Outdated
.initialized = 0
};
if (!cache.initialized) {
cpuid_host_init(&cache);

This comment has been minimized.

@raphaelning

raphaelning Jun 19, 2018

Contributor

nit: Is there still a race condition, when two host CPUs both see the cache as uninitialized and try to fill it at the same time? This may not cause a problem, since they only write to the cache and should write the same data. Still, it would be nice if only one CPU performed the initialization. C++11 actually guarantees that static local initializers are only called once:

https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables

I'm not sure if we can take advantage of that, because this is C code, and our compilers may not support or be configured to use C++11.

A simple fix would be to call cpu_has_feature() with an arbitrary feature very early, e.g. before this call:

haxm/core/hax.c

Line 543 in 2c0ac6c

ret = hax_vmx_init();

so as to force CPU cache initialization in a non-smp_call_function() context.

@raphaelning

This comment has been minimized.

Copy link
Contributor

raphaelning commented Jun 19, 2018

haxm-PR63-mac-build-fix.zip

The first patch is not really related to this PR, but my second patch relies on it, so I've also created a separate PR for it: #65. I'm not sure what works best for you: I could merge it into master and then you rebase your work on top of it, or you could pick it into your final patch set for this PR.

The second patch should probably be incorporated into one of your final patches for this PR.

@AlexAltea

This comment has been minimized.

Copy link
Contributor

AlexAltea commented Jun 19, 2018

I applied your second patch 0002-darwin-Fix-CPUID-module-build.patch, since it's relevant to this PR. The other one can be merged via #65 (maybe will require rebasing).

The race condition is fixed by adding a cpu_init_feature_cache function, since I felt that calling cpu_has_feature for no reason was not good enough.

The Git history for this PR is now cleaned too.

@AlexAltea AlexAltea force-pushed the AlexAltea:cpuid branch from 1946f80 to e9f8c89 Jun 19, 2018

@raphaelning
Copy link
Contributor

raphaelning left a comment

The other one can be merged via #65 (maybe will require rebasing).

No problem! I'll rebase #65 after merging this PR.

The race condition is fixed by adding a cpu_init_feature_cache function, since I felt that calling cpu_has_feature for no reason was not good enough.

Cool, that works.

We should be able to merge this PR tomorrow morning after regular testing :-)

@wcwang wcwang merged commit 7279e2d into intel:master Jun 20, 2018

@AlexAltea AlexAltea deleted the AlexAltea:cpuid branch Jun 20, 2018

@AlexAltea AlexAltea referenced this pull request Oct 19, 2018

Open

Support AES-NI #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment