Skip to content

Commit cc7ed33

Browse files
committed
KVM: x86/mmu: Always set SPTE's dirty bit if it's created as writable
When creating a SPTE, always set the Dirty bit if the Writable bit is set, i.e. if KVM is creating a writable mapping. If two (or more) vCPUs are racing to install a writable SPTE on a !PRESENT fault, only the "winning" vCPU will create a SPTE with W=1 and D=1, all "losers" will generate a SPTE with W=1 && D=0. As a result, tdp_mmu_map_handle_target_level() will fail to detect that the losing faults are effectively spurious, and will overwrite the D=1 SPTE with a D=0 SPTE. For normal VMs, overwriting a present SPTE is a small performance blip; KVM blasts a remote TLB flush, but otherwise life goes on. For upcoming TDX VMs, overwriting a present SPTE is much more costly, and can even lead to the VM being terminated if KVM isn't careful, e.g. if KVM attempts TDH.MEM.PAGE.AUG because the TDX code doesn't detect that the new SPTE is actually the same as the old SPTE (which would be a bug in its own right). Suggested-by: Sagi Shahar <sagis@google.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20241011021051.1557902-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent 0819769 commit cc7ed33

File tree

1 file changed

+9
-19
lines changed

1 file changed

+9
-19
lines changed

arch/x86/kvm/mmu/spte.c

Lines changed: 9 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -219,30 +219,21 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
219219
if (pte_access & ACC_WRITE_MASK) {
220220
spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
221221

222-
/*
223-
* When overwriting an existing leaf SPTE, and the old SPTE was
224-
* writable, skip trying to unsync shadow pages as any relevant
225-
* shadow pages must already be unsync, i.e. the hash lookup is
226-
* unnecessary (and expensive).
227-
*
228-
* The same reasoning applies to dirty page/folio accounting;
229-
* KVM marked the folio dirty when the old SPTE was created,
230-
* thus there's no need to mark the folio dirty again.
231-
*
232-
* Note, both cases rely on KVM not changing PFNs without first
233-
* zapping the old SPTE, which is guaranteed by both the shadow
234-
* MMU and the TDP MMU.
235-
*/
236-
if (is_last_spte(old_spte, level) && is_writable_pte(old_spte))
237-
goto out;
238-
239222
/*
240223
* Unsync shadow pages that are reachable by the new, writable
241224
* SPTE. Write-protect the SPTE if the page can't be unsync'd,
242225
* e.g. it's write-tracked (upper-level SPs) or has one or more
243226
* shadow pages and unsync'ing pages is not allowed.
227+
*
228+
* When overwriting an existing leaf SPTE, and the old SPTE was
229+
* writable, skip trying to unsync shadow pages as any relevant
230+
* shadow pages must already be unsync, i.e. the hash lookup is
231+
* unnecessary (and expensive). Note, this relies on KVM not
232+
* changing PFNs without first zapping the old SPTE, which is
233+
* guaranteed by both the shadow MMU and the TDP MMU.
244234
*/
245-
if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, synchronizing, prefetch)) {
235+
if ((!is_last_spte(old_spte, level) || !is_writable_pte(old_spte)) &&
236+
mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, synchronizing, prefetch)) {
246237
wrprot = true;
247238
pte_access &= ~ACC_WRITE_MASK;
248239
spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
@@ -252,7 +243,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
252243
if (pte_access & ACC_WRITE_MASK)
253244
spte |= spte_shadow_dirty_mask(spte);
254245

255-
out:
256246
if (prefetch && !synchronizing)
257247
spte = mark_spte_for_access_track(spte);
258248

0 commit comments

Comments
 (0)