Skip to content

Commit 3377a92

Browse files
committed
KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run loop
Move KVM's swapping of XFEATURE masks, i.e. XCR0 and XSS, out of the fastpath loop now that the guts of the #MC handler runs in task context, i.e. won't invoke schedule() with preemption disabled and clobber state (or crash the kernel) due to trying to context switch XSTATE with a mix of host and guest state. For all intents and purposes, this reverts commit 1811d97 ("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context"), which papered over an egregious bug/flaw in the #MC handler where it would do schedule() even though IRQs are disabled. E.g. the call stack from the commit: kvm_load_guest_xcr0 ... kvm_x86_ops->run(vcpu) vmx_vcpu_run vmx_complete_atomic_exit kvm_machine_check do_machine_check do_memory_failure memory_failure lock_page Commit 1811d97 "fixed" the immediate issue of XRSTORS exploding, but completely ignored that scheduling out a vCPU task while IRQs and preemption is wildly broken. Thankfully, commit 5567d11 ("x86/mce: Send #MC singal from task work") (somewhat incidentally?) fixed that flaw by pushing the meat of the work to the user-return path, i.e. to task context. KVM has also hardened itself against #MC goofs by moving #MC forwarding to kvm_x86_ops.handle_exit_irqoff(), i.e. out of the fastpath. While that's by no means a robust fix, restoring as much state as possible before handling the #MC will hopefully provide some measure of protection in the event that #MC handling goes off the rails again. Note, KVM always intercepts XCR0 writes for vCPUs without protected state, e.g. there's no risk of consuming a stale XCR0 when determining if a PKRU update is needed; kvm_load_host_xfeatures() only reads, and never writes, vcpu->arch.xcr0. Deferring the XCR0 and XSS loads shaves ~300 cycles off the fastpath for Intel, and ~500 cycles for AMD. E.g. using INVD in KVM-Unit-Test's vmexit.c, which an extra hack to enable CR4.OXSAVE, latency numbers for AMD Turin go from ~2000 => 1500, and for Intel Emerald Rapids, go from ~1300 => ~1000. Cc: Jon Kohler <jon@nutanix.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-By: Jon Kohler <jon@nutanix.com> Link: https://patch.msgid.link/20251030224246.3456492-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent 8934c59 commit 3377a92

File tree

1 file changed

+26
-13
lines changed

1 file changed

+26
-13
lines changed

arch/x86/kvm/x86.c

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1219,20 +1219,40 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
12191219
}
12201220
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw);
12211221

1222-
void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
1222+
static void kvm_load_guest_xfeatures(struct kvm_vcpu *vcpu)
12231223
{
12241224
if (vcpu->arch.guest_state_protected)
12251225
return;
12261226

12271227
if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {
1228-
12291228
if (vcpu->arch.xcr0 != kvm_host.xcr0)
12301229
xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
12311230

12321231
if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
12331232
vcpu->arch.ia32_xss != kvm_host.xss)
12341233
wrmsrq(MSR_IA32_XSS, vcpu->arch.ia32_xss);
12351234
}
1235+
}
1236+
1237+
static void kvm_load_host_xfeatures(struct kvm_vcpu *vcpu)
1238+
{
1239+
if (vcpu->arch.guest_state_protected)
1240+
return;
1241+
1242+
if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {
1243+
if (vcpu->arch.xcr0 != kvm_host.xcr0)
1244+
xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0);
1245+
1246+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
1247+
vcpu->arch.ia32_xss != kvm_host.xss)
1248+
wrmsrq(MSR_IA32_XSS, kvm_host.xss);
1249+
}
1250+
}
1251+
1252+
void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
1253+
{
1254+
if (vcpu->arch.guest_state_protected)
1255+
return;
12361256

12371257
if (cpu_feature_enabled(X86_FEATURE_PKU) &&
12381258
vcpu->arch.pkru != vcpu->arch.host_pkru &&
@@ -1254,17 +1274,6 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
12541274
if (vcpu->arch.pkru != vcpu->arch.host_pkru)
12551275
wrpkru(vcpu->arch.host_pkru);
12561276
}
1257-
1258-
if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {
1259-
1260-
if (vcpu->arch.xcr0 != kvm_host.xcr0)
1261-
xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0);
1262-
1263-
if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
1264-
vcpu->arch.ia32_xss != kvm_host.xss)
1265-
wrmsrq(MSR_IA32_XSS, kvm_host.xss);
1266-
}
1267-
12681277
}
12691278
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_load_host_xsave_state);
12701279

@@ -11314,6 +11323,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
1131411323
if (vcpu->arch.guest_fpu.xfd_err)
1131511324
wrmsrq(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
1131611325

11326+
kvm_load_guest_xfeatures(vcpu);
11327+
1131711328
if (unlikely(vcpu->arch.switch_db_regs &&
1131811329
!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))) {
1131911330
set_debugreg(DR7_FIXED_1, 7);
@@ -11400,6 +11411,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
1140011411
vcpu->mode = OUTSIDE_GUEST_MODE;
1140111412
smp_wmb();
1140211413

11414+
kvm_load_host_xfeatures(vcpu);
11415+
1140311416
/*
1140411417
* Sync xfd before calling handle_exit_irqoff() which may
1140511418
* rely on the fact that guest_fpu::xfd is up-to-date (e.g.

0 commit comments

Comments
 (0)