Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove 32-bit support from the v1 Zcheri standards #294

Open
davidchisnall opened this issue Jun 24, 2024 · 0 comments
Open

Proposal: Remove 32-bit support from the v1 Zcheri standards #294

davidchisnall opened this issue Jun 24, 2024 · 0 comments

Comments

@davidchisnall
Copy link

davidchisnall commented Jun 24, 2024

Since 2010, almost all of the research on CHERI has revolved around 64-bit architectures: CHERI MIPS, Morello, and CHERI RISC-V prototypes. Morello, as a superset architecture with a (relatively) high-performance implementation provided the most useful feedback and has over 100 MLoC running on it in pure-capability mode. We are now in a position where we can quite confidently make some claims about the minimum necessary subset of CHERI for some useful properties on 64-bit systems.

RISC-V is not a research architecture

RISC-V is explicitly not a research ISA. It is an ISA that can enable research, but the official ISA specification is intended to embody only validated research outputs. 64-bit CHERI meets this requirement. I believe CHERIoT does specifically in the context of embedded systems, but not in the context of application cores, where the constraints are different.

The non-CHERIoT 32-bit CHERI software story is incomplete

64-bit CHERI has a mature port FreeBSD port, at least three Linux ports of varying levels of maturity and different approaches, ports of FreeRTOS, and several clean-slate operating systems. CHERIoT has a clean-slate OS that was co-designed with the ISA and a (not production quality) port of FreeRTOS. For a 32-bit CHERI, there is no port of any *NIX system and there are a number of technical issues for creating one.

32-bit and 64-bit CHERI will be unlikely to coexist in a single processor

RV32 is a subset of RV64, in part, to allow RV64 cores to run RV32 binaries. This approach makes it easy for a single core to run 32- and 64-bit programs and even operating systems. It makes it easy for a 64-bit OS to provide a 32-bit compatibility layer to run 32-bit programs.

A 64-bit CHERI core will require a tag bit per 128 bits of memory. A 32-bit CHERI core requires a tag bit per 64 bits of memory. A system that supports both has five states for a 128-bit word:

  • Not a capability.
  • 64-bit capability, data
  • Data, 64-bit capability
  • Two 64-bit capabilities.
  • One 128-bit capability.

This does not fit in two tag bits and, because 5 and 2 are both prime, there is no convenient power of two that can fit these. A 64-byte cache line supports 5^8 states, so it is possible to imagine a memory subsystem with a 19 bits of tag per cache line and some wasted space, but this is more than twice the space overhead of the 8 tag bits required for a pure 64-bit system and comes with a lot of complexity. Similarly, it is possible to imagine a system with a 2-bit tag, where the 11 state is either two 64-bit capabilities or one 128-bit one depending on a page-table permission, which would allow 32-bit and 64-bit CHERI processes to coexist (without sharing pages that contain pointers), but this would have software implications on a kernel and would need to be the subject of more research.

As such, there is no reason to align 32- and 64-bit specifications strongly and this kind of interop is already an explicit non-goal of the Zcheri specification. There is no legacy 32-bit pure-capability software that we need to support on 64-bit systems.

32-bit requires a new page-table format to run *NIX

All of the existing *NIX work relies on being able to restrict which pages can store capabilities. For example, memory-mapped files may not be able to persist capabilities and so it would be very confusing if capabilities silently became untagged when a memory-mapped page left and reentered the buffer cache at unspecified points. Similarly, shared-memory pages between disjoint address spaces may not share capabilities without allowing one process to enable the other to violate its internal memory-safety guarantees.

The Sv32 page table specification uses all of its bits in the base specification and so cannot express this.

It might (future research needed) be possible to use capability permissions for this, but only if the virtual memory subsystem had the ability to revoke capabilities when memory is remapped (see below).

32-bit requires a new page-table format for temporal safety with an MMU

Revocation on systems with MMUs requires at least two page-table bits to track the revoker state. The lack of free PTE bits in Sv32 makes this impossible. This means that there is no clear path to temporal safety on 32-bit systems with an MMU.

32-bit cores are specialised

Any ISA will, either explicitly or implicitly, favour implementations at different scales. Arm's LDM and STM instructions, for an extreme example, are easy to implement on simple in-order pipelines with no MMU but become incredibly painful in the general case on large superscalar pipelines, especially when they can fault in the middle.

The target for a 64-bit CHERI spec is likely to be application cores that, at least, support register renaming, will typically have long pipelines (5+, often 10+ stages), and will mostly be superscalar out-of-order implementations. A 32-bit specification that supports application cores needs to scale down to 2–3-stage in-order implementations and up to 10+ stage out-of-order implementations.

With CHERIoT, we intentionally reduced the scope:

  • We aim to optimise for 2–7-stage pipelines.
  • We aim to support low core counts (1-4).
  • We aim to support smallish fast memory.

This is fine for a microcontroller running an RTOS. It does not lead to the same set of choices that you would make for something that wanted to compete in the same space as a Cortex A15 running Linux. For example:

  • Our revocation model should work nicely for systems with up to around 16 MiB of fast SRAM, but would not be ideal for systems with shorter pipelines and slower RAM.
  • We reduce the shift of AUIPCC by one to remove the need for out-of-bounds representability. This restricts the size of any compartment to 2 GiBs of code. This is probably fine for any real system (I am aware of only one codebase in the world where this would be a problem) but is a divergence from RV32 that may cause problems on cores that want to support CHERI and non-CHERI modes.
  • We have sentries to control interrupt state, which are very easy to implement on in-order cores and which would be incredibly painful on a pipeline with a lot of speculative execution.
  • We are able to disallow W&X capabilities by construction, because we do not have to support APIs such as mmap / VirtualAlloc that imply this conflation.

These are all good design choices for something that aims to support microcontrollers and an RTOS. They are not the right choice for a superscalar pipeline running Linux. Aiming to provide a 32-bit specification that can do both will cause a lot of complexity (which directly translates to DV costs), without a clear benefit (since there is no mature software stack for the proposed 32-bit CHERI spec).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant