Skip to content
This repository has been archived by the owner on Apr 13, 2019. It is now read-only.

RISC-V - Make updates to PTE accessed and dirty bits atomic #105

Merged

Conversation

michaeljclark
Copy link
Collaborator

The RISC-V Privileged Specification requires that updates
to PTE accessed and dirty bits are atomic. There is a small
race window in the current code between reading the PTE and
updating it. The only reasonable way to make updates atomic
is to use atomic_cmpxchg. To use platform atomics, we need
to translate the guest physical address to a host address.

In the rare case that the atomic_cmpxchg of the PTE fails,
it means the PTE is stale, and we must restart the walk. The
PTE may longer be valid, the permissions may have changed
or the PTE may even point to another physical page. The
simplist approach to handling a conflict is to re-walk
the page tables for the current translation. This is safe
and fast as the likelihood the PTE is stale is rare.

Maliciously crafted supervisor code could attempt to limit
forward progress by hammering the PTE with updates, but it
is likely the page table walker will eventually win the race.
Frequent atomic update failures may exhibit as a pathological
side-effect of a specially crafted supervisor page-walker
denial of service attack. In the uncontended common case,
translation overhead is the same as stl_phys/stq_phys.

This code uses the QEMU address space translation API to
lookup the memory region in the same manner as ldl_phys/
ldq_phys but elides operations if the PTEs are in IO space
or non-writable RAM. This seems to be a reasonable behaviour
for the typical case where PTEs are in RAM, and the atypical
case where PTEs are in ROM or an IO device, where it is
not possible for the updates to occur or be atomic.

The RISC-V Privileged Specification requires that updates
to PTE accessed and dirty bits are atomic. There is a small
race window in the current code between reading the PTE and
updating it. The only reasonable way to make updates atomic
is to use atomic_cmpxchg. To use platform atomics, we need
to translate the guest physical address to a host address.

In the rare case that the atomic_cmpxchg of the PTE fails,
it means the PTE is stale, and we must restart the walk. The
PTE may longer be valid, the permissions may have changed
or the PTE may even point to another physical page. The
simplist approach to handling a conflict is to re-walk
the page tables for the current translation. This is safe
and fast as the likelihood the PTE is stale is rare.

Maliciously crafted supervisor code could attempt to limit
forward progress by hammering the PTE with updates, but it
is likely the page table walker will eventually win the race.
Frequent atomic update failures may exhibit as a pathological
side-effect of a specially crafted supervisor page-walker
denial of service attack. In the uncontended common case,
translation overhead is the same as stl_phys/stq_phys.

This code uses the QEMU address space translation API to
lookup the memory region in the same manner as ldl_phys/
ldq_phys but elides operations if the PTEs are in IO space
or non-writable RAM. This seems to be a reasonable behaviour
for the typical case where PTEs are in RAM, and the atypical
case where PTEs are in ROM or an IO device, where it is
not possible for the updates to occur or be atomic.
@michaeljclark
Copy link
Collaborator Author

@sorear @palmer-dabbelt This is an alternative to PR #103

I've tested it. You could try something like this.

diff --git a/target/riscv/helper.c b/target/riscv/helper.c
index a607f6a..d4efff4 100644
--- a/target/riscv/helper.c
+++ b/target/riscv/helper.c
@@ -213,6 +213,7 @@ restart:
                     target_ulong old_pte =
                         atomic_cmpxchg(pte_pa, pte, updated_pte);
                     if (old_pte != pte) {
+                        printf("rare\n");
                         goto restart;
                     } else {
                         pte = updated_pte;

I don't think we'll see Linux hit the contended PTE case, nevertheless, I believe this is a correct fix. This change suppresses updates to PTEs in IO space or ROM. I might like to raise a clarification as to what should happen if a PTE is in ROM with the AD bits clear (the spec advises that the AD bits should be set by the supervisor to suppress updates, however, a supervisor could misconfigure PTEs in ROM or unusually point PTEs into IO space).

We use the public guest physical to memory region API (address_space_translate) and (memory_access_is_direct) so that we can perform a regular atomic_cmpxchg as defined in ./docs/devel/atomics.txt. This is the same codepath as is used in stl_phys/stq_physso the performance will be the same.

The previous code had some hacks (static variables or somesuch) to prevent PTEs outside RAM. This code is clean, as it detects the case where PTEs are not in RAM, and just elides the PTE AD bit updates.

It has no measurable impact on performance as the uncontended case is similar to the present, except that we use atomic_cmpxchg instead of a stl_phys/stq_phys. It takes a different approach to PR #103 and instead makes PTE updates atomic. In the unlikely contended case, the page walker restarts the walk, which I believe is the most sensible approach, after researching the topic and looking into how other paging implementations handle concurrency and contention. i.e. in the rare case of an invalid PTE, we just restart the page table walk.

Interested in your feedback. That said, it's likely rare. It would be nice to have a test for this...

@michaeljclark
Copy link
Collaborator Author

This does require 64-bit atomic_cmpxchg on 32-bit hosts. This should be okay on x86 assuming QEMU maps atomic_cmpxchg on long long to CMPXCHG8B. I will test boot RV64 Linux on i386 qemu (it may crash and burn there... it worked last time I tested RV64 on i386 so hopefully we haven't regressed that)...

} else {
pte = updated_pte;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an else here which fails the address translation if the PTE cannot be atomically updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing. When running qemu-system-riscv64 on i386 (or let's make this more interesting and say mips32/riscv32), multi-threading is automatically disabled (search the code for TCG_OVERSIZED_GUEST). So I think there should be an #ifdef TCG_OVERSIZED_GUEST here that falls back to a non-atomic update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We can add a TRANSLATE_FAIL here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my first two comments addressed I'd be fine with this. A very slight concern is that none of the functions you're calling seem to handle TB invalidation or hardware watchpoints, although having either overlap a page table would be quite strange.

target_ulong *pte_pa =
qemu_map_ram_ptr(mr->ram_block, addr1);
target_ulong old_pte =
atomic_cmpxchg(pte_pa, pte, updated_pte);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef TCG_OVERSIZED_GUEST
                    /* qemu-system-riscv64 on a host without 64-bit atomics.  Multithreading is disabled, so do this naively */
                    *pte_pa = updated_pte;
#else
                    /* We have target_ulong-bit atomics */
                    target_ulong old_pte = atomic_cmpxchg(pte_pa, pte, updated_pte);
                    if (old_pte != pte) {
                        goto restart;
                    } else {
                        pte = updated_pte;
                    }
#endif

Copy link
Contributor

@sorear sorear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaeljclark
Copy link
Collaborator Author

Okay i'll merge it. It also fixes i386 build by making mip uint32_t so we can close PR #106

Tested on i386, x86_64 Linux and x86_64 macOS

@michaeljclark michaeljclark merged commit e2f1651 into riscvarchive:qemu-upstream-v5 Feb 19, 2018
@michaeljclark michaeljclark deleted the atomic-pte-updates branch February 25, 2018 00:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants