RISC-V - Make updates to PTE accessed and dirty bits atomic #105
RISC-V - Make updates to PTE accessed and dirty bits atomic #105
Conversation
The RISC-V Privileged Specification requires that updates to PTE accessed and dirty bits are atomic. There is a small race window in the current code between reading the PTE and updating it. The only reasonable way to make updates atomic is to use atomic_cmpxchg. To use platform atomics, we need to translate the guest physical address to a host address. In the rare case that the atomic_cmpxchg of the PTE fails, it means the PTE is stale, and we must restart the walk. The PTE may longer be valid, the permissions may have changed or the PTE may even point to another physical page. The simplist approach to handling a conflict is to re-walk the page tables for the current translation. This is safe and fast as the likelihood the PTE is stale is rare. Maliciously crafted supervisor code could attempt to limit forward progress by hammering the PTE with updates, but it is likely the page table walker will eventually win the race. Frequent atomic update failures may exhibit as a pathological side-effect of a specially crafted supervisor page-walker denial of service attack. In the uncontended common case, translation overhead is the same as stl_phys/stq_phys. This code uses the QEMU address space translation API to lookup the memory region in the same manner as ldl_phys/ ldq_phys but elides operations if the PTEs are in IO space or non-writable RAM. This seems to be a reasonable behaviour for the typical case where PTEs are in RAM, and the atypical case where PTEs are in ROM or an IO device, where it is not possible for the updates to occur or be atomic.
|
@sorear @palmer-dabbelt This is an alternative to PR #103 I've tested it. You could try something like this. I don't think we'll see Linux hit the contended PTE case, nevertheless, I believe this is a correct fix. This change suppresses updates to PTEs in IO space or ROM. I might like to raise a clarification as to what should happen if a PTE is in ROM with the AD bits clear (the spec advises that the AD bits should be set by the supervisor to suppress updates, however, a supervisor could misconfigure PTEs in ROM or unusually point PTEs into IO space). We use the public guest physical to memory region API ( The previous code had some hacks (static variables or somesuch) to prevent PTEs outside RAM. This code is clean, as it detects the case where PTEs are not in RAM, and just elides the PTE AD bit updates. It has no measurable impact on performance as the uncontended case is similar to the present, except that we use Interested in your feedback. That said, it's likely rare. It would be nice to have a test for this... |
|
This does require 64-bit |
target/riscv/helper.c
Outdated
| } else { | ||
| pte = updated_pte; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an else here which fails the address translation if the PTE cannot be atomically updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing. When running qemu-system-riscv64 on i386 (or let's make this more interesting and say mips32/riscv32), multi-threading is automatically disabled (search the code for TCG_OVERSIZED_GUEST). So I think there should be an #ifdef TCG_OVERSIZED_GUEST here that falls back to a non-atomic update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. We can add a TRANSLATE_FAIL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my first two comments addressed I'd be fine with this. A very slight concern is that none of the functions you're calling seem to handle TB invalidation or hardware watchpoints, although having either overlap a page table would be quite strange.
target/riscv/helper.c
Outdated
| target_ulong *pte_pa = | ||
| qemu_map_ram_ptr(mr->ram_block, addr1); | ||
| target_ulong old_pte = | ||
| atomic_cmpxchg(pte_pa, pte, updated_pte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef TCG_OVERSIZED_GUEST
/* qemu-system-riscv64 on a host without 64-bit atomics. Multithreading is disabled, so do this naively */
*pte_pa = updated_pte;
#else
/* We have target_ulong-bit atomics */
target_ulong old_pte = atomic_cmpxchg(pte_pa, pte, updated_pte);
if (old_pte != pte) {
goto restart;
} else {
pte = updated_pte;
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Okay i'll merge it. It also fixes i386 build by making mip uint32_t so we can close PR #106 Tested on i386, x86_64 Linux and x86_64 macOS |
The RISC-V Privileged Specification requires that updates
to PTE accessed and dirty bits are atomic. There is a small
race window in the current code between reading the PTE and
updating it. The only reasonable way to make updates atomic
is to use atomic_cmpxchg. To use platform atomics, we need
to translate the guest physical address to a host address.
In the rare case that the atomic_cmpxchg of the PTE fails,
it means the PTE is stale, and we must restart the walk. The
PTE may longer be valid, the permissions may have changed
or the PTE may even point to another physical page. The
simplist approach to handling a conflict is to re-walk
the page tables for the current translation. This is safe
and fast as the likelihood the PTE is stale is rare.
Maliciously crafted supervisor code could attempt to limit
forward progress by hammering the PTE with updates, but it
is likely the page table walker will eventually win the race.
Frequent atomic update failures may exhibit as a pathological
side-effect of a specially crafted supervisor page-walker
denial of service attack. In the uncontended common case,
translation overhead is the same as stl_phys/stq_phys.
This code uses the QEMU address space translation API to
lookup the memory region in the same manner as ldl_phys/
ldq_phys but elides operations if the PTEs are in IO space
or non-writable RAM. This seems to be a reasonable behaviour
for the typical case where PTEs are in RAM, and the atypical
case where PTEs are in ROM or an IO device, where it is
not possible for the updates to occur or be atomic.