Skip to content

Commit 35053cd

Browse files
sean-jcgregkh
authored andcommitted
KVM: x86: Defer non-architectural deliver of exception payload to userspace read
commit d0ad1b0 upstream. When attempting to play nice with userspace that hasn't enabled KVM_CAP_EXCEPTION_PAYLOAD, defer KVM's non-architectural delivery of the payload until userspace actually reads relevant vCPU state, and more importantly, force delivery of the payload in *all* paths where userspace saves relevant vCPU state, not just KVM_GET_VCPU_EVENTS. Ignoring userspace save/restore for the moment, delivering the payload before the exception is injected is wrong regardless of whether L1 or L2 is running. To make matters even more confusing, the flaw *currently* being papered over by the !is_guest_mode() check isn't even the same bug that commit da998b4 ("kvm: x86: Defer setting of CR2 until #PF delivery") was trying to avoid. At the time of commit da998b4, KVM didn't correctly handle exception intercepts, as KVM would wait until VM-Entry into L2 was imminent to check if the queued exception should morph to a nested VM-Exit. I.e. KVM would deliver the payload to L2 and then synthesize a VM-Exit into L1. But the payload was only the most blatant issue, e.g. waiting to check exception intercepts would also lead to KVM incorrectly escalating a should-be-intercepted #PF into a #DF. That underlying bug was eventually fixed by commit 7709aba ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time"), but in the interim, commit a06230b ("KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS") came along and subtly added another dependency on the !is_guest_mode() check. While not recorded in the changelog, the motivation for deferring the !exception_payload_enabled delivery was to fix a flaw where a synthesized MTF (Monitor Trap Flag) VM-Exit would drop a pending #DB and clobber DR6. On a VM-Exit, VMX CPUs save pending #DB information into the VMCS, which is emulated by KVM in nested_vmx_update_pending_dbg() by grabbing the payload from the queue/pending exception. I.e. prematurely delivering the payload would cause the pending #DB to not be recorded in the VMCS, and of course, clobber L2's DR6 as seen by L1. Jumping back to save+restore, the quirked behavior of forcing delivery of the payload only works if userspace does KVM_GET_VCPU_EVENTS *before* CR2 or DR6 is saved, i.e. before KVM_GET_SREGS{,2} and KVM_GET_DEBUGREGS. E.g. if userspace does KVM_GET_SREGS before KVM_GET_VCPU_EVENTS, then the CR2 saved by userspace won't contain the payload for the exception save by KVM_GET_VCPU_EVENTS. Deliberately deliver the payload in the store_regs() path, as it's the least awful option even though userspace may not be doing save+restore. Because if userspace _is_ doing save restore, it could elide KVM_GET_SREGS knowing that SREGS were already saved when the vCPU exited. Link: https://lore.kernel.org/all/20200207103608.110305-1-oupton@google.com Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: stable@vger.kernel.org Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Tested-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20260218005438.2619063-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent f3deabe commit 35053cd

1 file changed

Lines changed: 39 additions & 23 deletions

File tree

arch/x86/kvm/x86.c

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -695,9 +695,6 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
695695
vcpu->arch.exception.error_code = error_code;
696696
vcpu->arch.exception.has_payload = has_payload;
697697
vcpu->arch.exception.payload = payload;
698-
if (!is_guest_mode(vcpu))
699-
kvm_deliver_exception_payload(vcpu,
700-
&vcpu->arch.exception);
701698
return;
702699
}
703700

@@ -5147,18 +5144,8 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
51475144
return 0;
51485145
}
51495146

5150-
static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
5151-
struct kvm_vcpu_events *events)
5147+
static struct kvm_queued_exception *kvm_get_exception_to_save(struct kvm_vcpu *vcpu)
51525148
{
5153-
struct kvm_queued_exception *ex;
5154-
5155-
process_nmi(vcpu);
5156-
5157-
#ifdef CONFIG_KVM_SMM
5158-
if (kvm_check_request(KVM_REQ_SMI, vcpu))
5159-
process_smi(vcpu);
5160-
#endif
5161-
51625149
/*
51635150
* KVM's ABI only allows for one exception to be migrated. Luckily,
51645151
* the only time there can be two queued exceptions is if there's a
@@ -5169,21 +5156,46 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
51695156
if (vcpu->arch.exception_vmexit.pending &&
51705157
!vcpu->arch.exception.pending &&
51715158
!vcpu->arch.exception.injected)
5172-
ex = &vcpu->arch.exception_vmexit;
5173-
else
5174-
ex = &vcpu->arch.exception;
5159+
return &vcpu->arch.exception_vmexit;
5160+
5161+
return &vcpu->arch.exception;
5162+
}
5163+
5164+
static void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu)
5165+
{
5166+
struct kvm_queued_exception *ex = kvm_get_exception_to_save(vcpu);
51755167

51765168
/*
5177-
* In guest mode, payload delivery should be deferred if the exception
5178-
* will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
5179-
* intercepts #PF, ditto for DR6 and #DBs. If the per-VM capability,
5180-
* KVM_CAP_EXCEPTION_PAYLOAD, is not set, userspace may or may not
5181-
* propagate the payload and so it cannot be safely deferred. Deliver
5182-
* the payload if the capability hasn't been requested.
5169+
* If KVM_CAP_EXCEPTION_PAYLOAD is disabled, then (prematurely) deliver
5170+
* the pending exception payload when userspace saves *any* vCPU state
5171+
* that interacts with exception payloads to avoid breaking userspace.
5172+
*
5173+
* Architecturally, KVM must not deliver an exception payload until the
5174+
* exception is actually injected, e.g. to avoid losing pending #DB
5175+
* information (which VMX tracks in the VMCS), and to avoid clobbering
5176+
* state if the exception is never injected for whatever reason. But
5177+
* if KVM_CAP_EXCEPTION_PAYLOAD isn't enabled, then userspace may or
5178+
* may not propagate the payload across save+restore, and so KVM can't
5179+
* safely defer delivery of the payload.
51835180
*/
51845181
if (!vcpu->kvm->arch.exception_payload_enabled &&
51855182
ex->pending && ex->has_payload)
51865183
kvm_deliver_exception_payload(vcpu, ex);
5184+
}
5185+
5186+
static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
5187+
struct kvm_vcpu_events *events)
5188+
{
5189+
struct kvm_queued_exception *ex = kvm_get_exception_to_save(vcpu);
5190+
5191+
process_nmi(vcpu);
5192+
5193+
#ifdef CONFIG_KVM_SMM
5194+
if (kvm_check_request(KVM_REQ_SMI, vcpu))
5195+
process_smi(vcpu);
5196+
#endif
5197+
5198+
kvm_handle_exception_payload_quirk(vcpu);
51875199

51885200
memset(events, 0, sizeof(*events));
51895201

@@ -5364,6 +5376,8 @@ static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
53645376
{
53655377
unsigned long val;
53665378

5379+
kvm_handle_exception_payload_quirk(vcpu);
5380+
53675381
memset(dbgregs, 0, sizeof(*dbgregs));
53685382
memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
53695383
kvm_get_dr(vcpu, 6, &val);
@@ -11396,6 +11410,8 @@ static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
1139611410
if (vcpu->arch.guest_state_protected)
1139711411
goto skip_protected_regs;
1139811412

11413+
kvm_handle_exception_payload_quirk(vcpu);
11414+
1139911415
kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
1140011416
kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
1140111417
kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);

0 commit comments

Comments
 (0)