# Capabilities, MMIO and Robust Safety

March 10, 2020

## 1 Memory Mapped I/O: Operational Semantics

The proposal is to simply represent read and writes to memory-mapped IO addresses as events in a trace. This says nothing a priori about devices that might be connected to these IO regions. In particular, without additional assumptions, reading a byte from a memory-mapped region just returns an arbitrary value.

If at some point we want to reason under the assumption that we are connected to a specific device that reacts in a specific way, we can express that as an extra Separation Logic assertion, that we assume as a pre-condition, and that restricts the set of different traces that we might observe.

 $\begin{array}{lll} \text{EventTy} & := & \text{IOWrite} \mid \text{IORead} \\ \text{Event} & := & \text{EventTy} \times \text{Addr} \times \mathbb{Z} \\ \text{Trace} & := & \text{list Event} \\ \text{State} & := & \underbrace{\text{Reg} \times \text{Mem}}_{\text{eldertets}} \times \text{Trace} \end{array}$ 

Values of type State represent the state of a configuration in the small-step operational semantics. In this setup, we assume the whole semantics to be parameterized by the range of memory-mapped addresses: MMIO.

$$\mathsf{MMIO} := [\mathsf{MMIO}_{\mathsf{b}}, \mathsf{MMIO}_{\mathsf{e}})$$

In the (current) operational semantics without MMIO, the operational semantics of the Load instruction is:

$$\frac{r[\operatorname{src}] = (p, g, b, e, a) \quad \operatorname{readAllowed} p \quad a \in [b, e)}{(r, m) \xrightarrow{\operatorname{Load} \operatorname{dst} \operatorname{src}} (r[\operatorname{dst} := m[a]], m)}$$

With MMIO, we obtain two rules for Load:

$$\frac{r[\operatorname{src}] = (p,g,b,e,a) \qquad \operatorname{readAllowed} \ p \qquad a \in [b,e) \qquad a \notin \mathsf{MMIO}}{(r,m,t) \xrightarrow{\mathsf{Load} \ \operatorname{dst} \ \operatorname{src}} (r[\operatorname{dst} := m[a]],m,t)}$$
 
$$\frac{\mathsf{IOLoad}}{r[\operatorname{src}] = (p,g,b,e,a)} \qquad \operatorname{readAllowed} \ p \qquad a \in [b,e) \qquad a \in \mathsf{MMIO}}{(r,m,t) \xrightarrow{\mathsf{Load} \ \operatorname{dst} \ \operatorname{src}} (r[\operatorname{dst} := x],m,t ++ (\operatorname{IORead},a,x))}$$

Notice how in the second rule, we read an arbitrary integer x, and record it in the trace.

The Store rule would be similar.

**Remark:** As Lars remarked, the read rule introduces non-determinism into our operational semantics. This has lead to some complications in the past (the nature of which is still mysterious to me; n.b. that we use

wp\_lift\_atomic\_head\_step\_no\_fork in our WP rules, and that one would allow non-determinism). (one question I-Thomas- had in this respect; does the Alloc rule in HeapLang not introduce non-determinism in a similar way? Why is this not problematic?) Lars' suggestion was to additionally parameterize the operational semantics by an input-stream in. The updated IOLOAD load rule might then look something like this:

IOLOAD

$$\frac{r[\operatorname{src}] = (p,g,b,e,a)}{\underset{(r,m,t) \xrightarrow{\operatorname{Load} \ \operatorname{dst} \ \operatorname{src}}}{\operatorname{readAllowed} \ p} \quad a \in [b,e) \quad a \in \operatorname{\mathsf{MMIO}} \quad n = \#\{i \mid t[i] = (\operatorname{IORead}, \_, \_)\}$$

When parameterizing our development with multiple values, we could either write them out as different Coq Parameters where we need them, or store them as fields in our DriverG typeclass (the latter is the approach taken in e.g. the code-base of [SGDL19]).

We now have two options when it comes to modeling the memory m:

- 1. Include the MMIO addresses into the map. Notice that, for any address in MMIO, the specific value held by the map m for that address is now irrelevant. Indeed, m is never read nor modified for addresses in MMIO (possible exception: see next remark). One could choose to enforce that a particular dummy value is stored in m for these addresses. Instead, we choose to leave them unconstrained.
- 2. Have m model the physical RAM addresses only. This model is closer to the conceptual model of state that we wish to implement, since it does not require us to talk about the values in the heap that MMIO locations are associated with. In case it is not harder to implement (which I currently do not think it is), it is to be preferred.

Both approaches require us to make sure that no points-to chunks  $l \mapsto \bot$  with  $l \in \mathsf{MMIO}$  can be created. In the next section, we assume the second approach (until someone proves me wrong about it working).

**Remark:** Additionally, we have the option of disallowing an execution with a PC register pointing to a MMIO location in the semantics. First off, consider that this is an edge case; since we always have points-to chunks describing the layout of code in memory (and a points-to chunk for location l doubles as a proof that  $l \notin \text{MMIO}$ ), it is trivial to prove that this case would not occur in any of our current examples. If we make no changes to the current code base, this behavior would be allowed, and the following would happen in both cases described above:

- 1. An unconstrained value is read from the location. Any instruction could be executed.
- 2. The predicate MemLocate in lang.v will make sure that the default value 0 is read and decoded into an instruction when we try to execute an MMIO location. This would not cause a conflict with the current WP rules, since a points-to chunk is required in the precondition, ensuring that this case does not occur. Another alternative is to read the encoding of a NOP instruction, instead of just the value 0.

If we were to simply disallow execution when the PC points into MMIO, we would make the semantics of all instructions slightly more complicated, but in a uniform way. We could simply add  $a \notin \mathsf{MMIO}$  to the predicate <code>isCorrectPC</code> (or add it as an additional predicate), which is used to define failing execution. It would again not be a drastic change, since a points-to for the address a the PC points to is required by the WP rules regardless, from which we can prove that  $a \notin \mathsf{MMIO}$ .

**Remark:** With this presentation, the specification of the "machine" is very much decoupled from the model of the devices it might be communicating with. This is a good thing, I think.

Nevertheless, one could consider an alternative presentation where the model of the devices is more tightly integrated with the semantics of the machine. For instance, one could make the operational semantics parameterized with the devices' model, where each device is modeled as having some internal state, the ability to react on reads or writes, or to perform an internal step. Then, the operational semantics would either step the usual way, or whenever a device steps.

I believe this would be somewhat similar to the semantics of I/O system calls through a foreign function interfaces as formalized in CakeML  $[FPK^+18]$ , and also in Perennial [CTKZ].

**Remark:** Alix says that the proposed style of operational semantics (using a trace) is close to the semantics of volatile as in CompCert—which is also how memory mapped addresses seem to be exposed to C compilers in practice. So this is probably a good sign.

## 2 Separation Logic Resources

We added a trace as part of the state, so we wish to also expose it as a Separation Logic assertion. Recall the current definition of the state interpretation:

```
state\_interp(r, m) := gen\_heap\_ctx r * gen\_heap\_ctx m
```

The simplest way to account for the trace is to directly expose it in a monolithic fashion. In that case, the state interpretation additionally holds a resource for the trace:

```
 \begin{aligned} \text{state\_interp} \; (r,m,t) := & \text{gen\_heap\_ctx} \; r* \\ & \text{gen\_heap\_ctx} \; m* \text{dom}(m) \cap \mathsf{MMIO} = \emptyset * \begin{bmatrix} \bullet \; t \end{bmatrix}^{\gamma_{\mathrm{T}}}  \end{aligned}
```

**Remark:** For the case where the MMIO locations are still part of the heap m, this would look as follows:

```
state_interp (r, m, t) := \text{gen\_heap\_ctx } r * \text{gen\_heap\_ctx } (m \setminus \mathsf{MMIO}) * [ \bullet t ]^{\gamma_{\mathrm{T}}}
```

Sanity check: we never have allocation of new locations l in our capability machine, but if we did, that would work in both settings by requiring that  $l \notin \mathsf{MMIO}$  in the operational semantics rule for Alloc.

Note that we restrict the usual "points-to" assertions to be used only for non-MMIO addresses. Instead, to assert ownership of the MMIO region, the user works with assertions of the form  $\left[ \underbrace{\circ t}_{-} \right]^{\gamma_{\mathrm{T}}}$ . Such an assertion is not duplicable, and grants full ownership over the trace. In particular, it allows one to *update* the trace by emitting events, i.e. by performing I/O operations.

The wp-rules for Load and Store need to be updated consequently. For instance, the rule for a Load reading the integer x on a MMIO address a now requires  $[ \underbrace{\circ t} ]^{\gamma_{\mathrm{T}}}$  in the pre-condition (for some t), and provides  $[ \underbrace{\circ t} + ( \underline{\mathrm{IORead}}, a, \underline{x} ]^{\gamma_{\mathrm{T}}}$  in the post-condition.

The corresponding resource algebra is Auth(Ex(Trace)). A few relevant rules are:

$$\begin{bmatrix} \bullet t \\ \bullet t \end{bmatrix}^{\gamma_{\mathrm{T}}} * \begin{bmatrix} \circ t' \\ \bullet t \end{bmatrix}^{\gamma_{\mathrm{T}}} - * \begin{bmatrix} t \\ \bullet t' \end{bmatrix}^{\gamma_{\mathrm{T}}} \\ \vdots \\ \bullet t \end{bmatrix}^{\gamma_{\mathrm{T}}} * \begin{bmatrix} \circ t' \\ \bullet t' \end{bmatrix}^{\gamma_{\mathrm{T}}} - * \mathsf{False} \\ \vdots \\ \bullet t \end{bmatrix}^{\gamma_{\mathrm{T}}} * \begin{bmatrix} \bullet t' \\ \bullet t' \end{bmatrix}^{\gamma_{\mathrm{T}}} * \begin{bmatrix} \circ t' \\ \bullet t' \end{bmatrix}^{\gamma_{\mathrm{T}}}$$

**Remark:** This is very coarse-grained: either one has the full ownership for performing I/O and reasoning about it, or one cannot know anything about it.

A first extension could be to allow observing prefixes of the trace (since events can only be appended to the trace). The observation that the trace has a given prefix would be duplicable. This again seems to be an application of monotonicity, that could be realized by having a duplicable AtLeast part as part of  $\lceil \bar{c} \, \bar{t'} \rceil^{\gamma_T}$ , similarly to how we currently model Monotone References.

Another extension, that seems very useful for modular reasoning, would be to allow splitting the trace along separate range of addresses. Concretely, a trace containing events about the range of MMIO addresses [a,c) could be split into two traces, granting ownership over events on addresses [a,b) and [b,c) respectively. Note that recombining these two traces would only yield some unspecified interleaving of the events from the two traces, and not necessarily yield the original trace, since in our model, we cannot know the exact interleaving of IO operations issued by different parties.

One application of this second extension could be the verification of an example involving "multiplexed" I/O, where two separate parts of the code are granted separate ownership over separate MMIO addresses. These two separate pieces of code would be verified separately; then, in the end, one could prove that one gets *some* interleaving of all the events emitted by both components.

**Remark:** As an alternative to having the set of MMIO addresses (MMIO) as a global parameter, we might want to make it part of the state in the operational semantics. In that case, we would need another resource algebra to model the MMIO locations, i.e. add do the state interpretation we had before:

$$\text{state\_interp}\;(r,m,t,\mathsf{MMIO}) := \ldots * \left[ \bullet (\overline{\mathrm{MMIO}}) \right]^{\gamma_{\mathrm{MMIO}}}$$

(where AUTH(Ex(Trace)) is the resource algebra).

Remark: In the future, we might be interested in the dynamic allocation of MMIO memory. The question is what this would mean, though, since the set of available devices allowing for MMIO access and the concrete buffers they provide will most likely still be modeled as fixed at runtime. Rather, this would allow us to take (un)mapped MMIO locations, connected to the different devices, and (un)map them anywhere in main memory. This would require a different way of modeling MMIO, where we need both a notion of the total pool of possible MMIO locations, and a description of the currently mapped locations. We could model this using an authoritative RA.

## 3 Toplevel Theorem (à la OCPL)

Let us start with a brief recap of the toplevel "Robust Safety" theorem for OCPL itself (and its key ingredients), then the extension (by Thomas) of OCPL to include a print capability, and finally move to the capability machine setting.

#### 3.1 OCPL

The RobustSafety theorem of OCPL is as follows:

$$\frac{C \in AdvCtx}{e \text{ closed } \qquad \{\mathsf{True}\}\,e\,\{x.\;\mathsf{lowval}\;x\} \qquad (C[e]); (\emptyset,\mathsf{OK}) \longrightarrow^* T'; (h',g')}{g' = \mathsf{OK}}$$

For any closed expression e, if e has been verified to only return low values, then running e wrapped in an adversarial context C from an initial state, then we can observe that every reachable state is good ( $g' = \mathsf{OK}$  means that no assertion has failed in e).

 $C \in AdvCtx$  means that C cannot contain assertions (otherwise one could trivially contradict the theorem by taking  $C[\cdot] = \text{assert false}$ ), and cannot contain references to raw memory locations (otherwise one could directly access e's private state, invalidating the local state encapsulation mechanisms).

lowval x intuitively means that x cannot be used to directly access private (or "high") locations. It is formally defined using a logical relation "lift  $\Psi$  v", which more generally asserts that the value v only gives direct access to locations that satisfy the predicate  $\Psi$ .

```
\begin{array}{ll} \operatorname{lift} \Psi \left(\operatorname{rec} f \; x. \; e\right) & \triangleq & \triangleright \forall v. \; \{\operatorname{lift} \Psi \; v\} \; e[v/x, \operatorname{rec} f \; x. \; e/f] \; \{y. \; \operatorname{lift} \Psi \; y\} \\ \operatorname{lift} \Psi \left(v_1, v_2\right) & \triangleq & \triangleright \left(\operatorname{lift} \Psi \; v_1, \operatorname{lift} \Psi \; v_2\right) \\ \operatorname{lift} \Psi \; \ell & \triangleq & \Psi \; \ell \end{array}
```

Then, lowval x is defined as lift lowloc x, where lowloc is a predicate characterizing the region of memory containing "low locations" (distinct from the other region containing "high locations").

### 3.2 OCPL with print

In Thomas' extension of OCPL, a new value Out is added, denoting an "output object capability", as well as a print primitive, where print Out v effectively "prints" the value v, i.e. adds it to the trace of printed values.

One then wants to be able to encapsulate the use of Out, for instance by defining object capabilities that enforce some invariants on the values being printed. In that setting, Out is considered as a "high" value:

lift 
$$\Psi$$
 Out  $\triangleq$  False

The theorem then becomes:

That is, if e has been verified under the assumption that the predicate P holds as a trace invariant, then executing e in an adversarial context yields a trace that does satisfy P.

 $C \in AdvCtx$  also has to be extended to forbid C from containing Out.

To be slightly more general, we could actually allow sharing Out in case the predicate P does not place any constraints on the output, i.e. it is the True predicate. Technically, we could hence have the following alternative definition for lift (although admittedly, it does look a bit hard-coded):

lift 
$$\Psi$$
 Out  $\triangleq \exists \iota. \left[ \exists t. \left[ \overbrace{\circ t} \right]^{\Upsilon T} * \lceil (\lambda ... \text{True}) t \rceil \right]^{t}$ 

### 3.3 Capability Machine with MMIO

### 3.3.1 Logical relation

Similarly to lift, we can generalize the existing logical relation to thread a constraint on directly accessible memory locations. One would parameterize the value, expression and register relations (V,  $\mathcal{E}$  and  $\mathcal{R}$ ) with a predicate  $\Psi$  on addresses.

Then, for any permission p that includes either the R or W bit,  $\mathcal V$  is extended as follows:

$$\mathcal{V}^{\Psi}(p,g,b,e,a) \triangleq \underbrace{\cdots}_{\text{as before}} * \lceil \forall a' \in [b,e). \ \Psi(a') \rceil$$

Then,  $\mathcal{E}^{\Psi}$  and  $\mathcal{R}^{\Psi}$  are simplify defined by threading the extra  $\Psi$  parameter through the existing definition.

Finally, for our relation to characterize "low values" that do not give direct access to MMIO addresses, one would instantiate  $\Psi$  with a predicate  $\overline{\mathsf{MMIO}}$  that excludes memory mapped addresses:

$$\overline{\mathsf{MMIO}}(a) \triangleq a \notin \mathsf{MMIO}$$

Then,  $\mathcal{V}^{\overline{\text{MMIO}}}$  is similar to the lowval predicate of OCPL. Intuitively, a value in the relation does not directly point to memory-mapped locations, nor can it gain access to memory-mapped locations indirectly, similarly to how lowvals cannot grant access to high locations directly. The relation  $\mathcal{E}^{\overline{\text{MMIO}}}$  specifies that, if we fill the registers with values in  $\mathcal{V}^{\overline{\text{MMIO}}}$ , then the invariants in a private future world of our starting world will be satisfied at the end of execution.

**Remark:** As it is defined above, the "generalized" value relation is in fact not very useful for instantiation of  $\Psi$  other than  $\overline{\text{MMIO}}$ . Indeed, even if  $\Psi$  does allow referring to addresses in the MMIO region, the value relation does not grant any corresponding resources. The fix would be to use the generalized trace resources mentioned previously, and grant ownership for the part of the trace corresponding to the addresses in  $[b,e) \cap \text{MMIO}_{\text{pub}}$  with MMIO<sub>pub</sub> the shareable subset of addresses in MMIO.

### 3.3.2 Robust safety theorem

A first attempt at adapting the OCPL robust safety theorem to the capability machine setting might look something like the following (incomplete!) statement.

$$\forall a_i \in A_{\text{entry}}. \overline{\left[\exists t. \left[ \circ t \right]^{\gamma_{\text{T}}} * P(t) \right]^{\iota}} \vdash \forall W. \left( \mathcal{E}^{\overline{\text{MMIO}}}(\text{RX}, g, b, e, a_i) \right) W$$

$$(r[\overline{r_{a_i}} := (E, g, b, e, a_i)], m, \emptyset) \longrightarrow^* (r', m', t)$$

$$P(t)$$

The code that is verified (similar to the expression e in OCPL) is here given as a set of *entrypoints* (addresses)  $A_{\text{entry}}$ . The capability (RX, g, b, e,  $a_i$ ) then points to the ith entry point of the code we want to gradually verify.

The capability  $(E, g, b, e, a_i)$  then represents a closure for this same piece of code. Adversarial code only gets access to the closure (as an enter capability), in order to avoid them from executing or reading arbitrary lines of code within the trusted module.

Notice the universal quantification over worlds in the precondition; this needs to be there, since we do not know in what world our code will be called.

**Remark:** We could simplify the theorem statement by only considering the case of a single entrypoint. This is equivalent: a piece of code that wants to expose two distinct "methods" can implement a single toplevel entrypoint that immediately returns two pointers that can then be used to invoke the two methods. This would make the toplevel theorem simpler, at the cost of somewhat more contrieved calling convention between trusted and untrusted

code. Another alternative is to provide an opcode to specify the operation that needs to be performed. This is slightly less general, since it does not allow providing read access only to an adversary.

Remark: The universal quantification on worlds was not explicitly present in the OCPL formalization. The reason for that is that (I think so at least, do not quote me on this - Thomas) they used the built-in Iris worlds within their definition of weakest precondition, by leveraging the state interpretation, which is universally quantified over in the definition of weakest precondition. As we discussed before, if we do away with local capabilities, it should be possible to fall back onto Iris' notion of worlds, which would allow us to remove the universal quantification in the above definition too because it would be implicitly present in the WP inside  $\mathcal{E}$ .

It would be interesting to have a discussion about how to do away with the worlds if we remove local capabilities, regardless of whether we will in the end, since that might shed some light on why the worlds look the way they do right now.

**Remark:** Note that we could have written ...  $\vdash \forall W. \ (\mathcal{V}^{\overline{\mathsf{MMIO}}}(E, ...)) \ W$  in the above theorem, instead of using the expression relation. This is equivalent, and easily unfolds to the former statement through the definition of the value relation. I wrote  $\mathcal{E}$  to place the focus on the execution of the driver being safe. I consider this a matter of taste.

The theorem above is *incomplete*. In other words, it is not provable as is, because it includes no notion of adversarial context (similar to OCPL's AdvCtx), and therefore imposes no restriction on the adversarial code (i.e., the contents of the r and m other than our verified code). As such, the statement can be trivially falsified by taking an attacker with direct read or write access to any location in MMIO, or with direct access to the trusted code's internal state.

We move to a theorem statement that correctly constrains the adversary in two steps:

- 1. We add a syntactic condition (on the state of memory and registers) to the tentative statement of our robust safety theorem, ensuring that the adversary code cannot gain access to any non-E capabilities pointing into the driver code or any capabilities pointing into MMIO directly. Alternatively, we can formulate this condition in a semantic way, see below. Additionally, we specify what exact code the driver contains, so that we can use this in our proof that the driver is secure. This will involve adding an invariant to the robust safety theorem.
- 2. Then, we provide an end-to-end statement that assumes the existence of a piece of boot code, that gets to run first when the capability machine

starts up, and makes sure the above condition indeed holds before passing control to the adversary.

The boot code is part of the trusted code base, and must satisfy a given specification, as a precondition of the toplevel theorem.

The particular implementation of the boot code depends on the model of the initial state of the machine. If the memory can contain any capability at startup, then the boot code has to erase the memory before passing control. If we assume that the initial memory cannot contain any capabilities whatsoever, then the boot code does not need to do any cleanup.

In this setting, we can subdivide the memory m into 3 relevant subsections;  $m = m_{\text{MMIO}} \uplus m_{\text{driver}} \uplus m_{\text{adv}}$  with the following meaning:

- $\bullet$   $m_{\mathsf{MMIO}}$  contains the memory-mapped locations
- $m_{\text{driver}}$  contains the driver code
- $m_{\text{adv}}$  contains the adversary's code and data (this will also include the boot code in the general setting, since it does not contain any inherently dangerous capabilities itself).

We can then implement the syntactic check as requiring registers and memory to only contain non-capability values:

$$\begin{aligned} \operatorname{nonCap}(\operatorname{inl} z) &\triangleq \operatorname{True} & \operatorname{nonCap}(\operatorname{inr} (p,g,b,e,a)) &\triangleq \operatorname{\mathsf{False}} \\ \operatorname{nonCap}(reg) &\triangleq \forall r \in \operatorname{dom}(reg). \operatorname{nonCap}(reg(r)) \\ \operatorname{nonCap}(mem) &\triangleq \forall l \in \operatorname{dom}(mem). \operatorname{nonCap}(mem(l)) \end{aligned}$$

Given this definition, we can now formalize the initialization constraints on registers r and memory m, given the region MMIO and the driver's address space [b,e) as follows (for a resource-aware formulation; see the bootSpec post-condition below):

$$\begin{split} m &= m_{\mathsf{MMIO}} \uplus m_{\mathsf{driver}} \uplus m_{\mathsf{adv}} \\ \operatorname{dom}(m_{\mathsf{MMIO}}) &= \mathsf{MMIO} \quad \operatorname{dom}(m_{\mathsf{driver}}) = [b, e) \quad \operatorname{dom}(m_{\mathsf{adv}}) = [b_{\mathsf{adv}}, e_{\mathsf{adv}}) \\ &\underbrace{\quad \operatorname{nonCap}(r) \quad \operatorname{nonCap}(m_{\mathsf{adv}}) \quad r[\mathsf{PC}] = (\mathsf{RWX}, \mathsf{G}, b_{\mathsf{adv}}, e_{\mathsf{adv}}, b_{\mathsf{adv}})}_{ & \operatorname{init}_{\mathsf{OK}}([b, e), r, m) \end{split}}$$

It should be noted that a more general, semantic version of this specification is also possible. It would contain the following elements, that should all follow directly from the specification above (TODO: write this out properly and do the same for the semantic bootspec):

- All registers except PC are in the value relation with respect to either the empty world Ø (if we do not want to allow any non-empty, non-enter capabilities in registers), or a world W<sub>init</sub> containing a single standard region for all of the adversary's memory. The PC register satisfies both the readand write conditions (note the parallel with the fundamental theorem) with respect to this same world, if we picked the non-empty world, or just points to the adversary's memory, which does not contain capabilities (if we picked the empty world, a capability pointing into memory can never be valid). It feels slightly less artificial to pick the non-empty world option here.
- We have possession of the predicates region and sts\_full\_world for the
  world we picked in the previous bullet. we will need these to apply the
  fundamental theorem.

and subsequently formulate the robust safety theorem as follows (using the semantic specification for  $init_{OK}$ , which references the initial world):

$$\forall a_i \in A_{\text{entry}}. \begin{cases} \left[ \exists t. \left[ c.t \right]^{-r} * P(t) \right]^{\iota} * \operatorname{code}(b, e, \overline{op_l}) \vdash \\ \forall W \ \exists^{\text{priv}} \ W_{\text{init}}. \ (\mathcal{E}^{\overline{\mathsf{MMIO}}}(\operatorname{RX}, g, b, e, a_i)) \ W \end{cases}$$
 
$$\underbrace{\operatorname{init}^{\operatorname{sem}}_{\operatorname{OK}}([b, e), r, m, W_{\operatorname{init}}) \quad (r[\overline{r_{a_i} := (\operatorname{E}, g, b, e, a_i)}], m, \emptyset) \longrightarrow^* (r', m', t)}_{P(t)}$$

where we have set the PC up to be a sensible capability (pointing to  $b_{\text{adv}}$  without loss of generality).

Notice how the syntactic check for the registers is only performed on the set of registers r, allowing the closures in  $\overline{r_{a_i}}$  to point into to the driver address space (TODO: we should change this though and add a requirement to init<sub>OK</sub>). (TODO: introduce notation for trace-owndership and predicate P holding)

**Remark:** If useful, the syntactic check can be relaxed to allow any capabilities that do not point into the MMIO region or the driver's memory. One would define the predicate  $\rightarrow^{\overline{R}}(w)$ , that takes a word, memory region or register file and checks if it points into the forbidden region R as follows:

$$\begin{array}{lll} \rightarrow^{\overline{R}}(\operatorname{inl} z) & \triangleq & \mathsf{True} \\ \rightarrow^{\overline{R}}(\operatorname{inr} \left(p,g,b,e,a\right)) & \triangleq & R \cap [b,e) = \emptyset \\ \rightarrow^{\overline{R}}(\operatorname{reg}) & \triangleq & \forall r \in \operatorname{dom}(\operatorname{reg}). \rightarrow^{\overline{R}}(\operatorname{reg}(r)) \\ \rightarrow^{\overline{R}}(\operatorname{mem}) & \triangleq & \forall l \in \operatorname{dom}(\operatorname{mem}). \rightarrow^{\overline{R}}(\operatorname{mem}(l)) \end{array}$$

Then, one would use  $\rightarrow^{\overline{\mathsf{MMIO}}\cup[b,e)}$ () in place of nonCap().

Having this theorem in our toolbox, we can now take a closer look at scenario 2 above, where a piece of boot code is used to set the memory up correctly, obviating the need for any other assumptions than that the boot code behaves according to some spec. Concretely, we want the boot code to uphold a spec that allows us to prove the above theorem. (Alternatively, we could also just implement one specific instance of boot code, and prove that after it finishes executing, all preconditions to apply the above theorem are met.)

Let  $r_0$  and  $m_0$  be the respective initial state of registers and memory immediately after booting. We assume that the following two properties hold, where  $l_{\text{boot}}$  is some memory location (initially loaded in PC):

$$dom(m_0) = [0, MEM_{MAX}) r_0[PC] = (RWX, G, 0, MEM_{MAX}, l_{boot})$$

Depending on the specifics of the operiational semantics, we might have more information about  $m_0$  and  $r_0$ , that can then be used in the implementation of the boot code (e.g.,  $m_0$  might be initialized to only zeroes).

We also assume the capability machine to initially start execution in the boot code at address  $l_{\rm boot}$ . Then, the syntactic specification for the boot code that we have to prove is stated as follows:

$$\begin{aligned} & \text{bootSpec}(P, g, b, e, A_{\text{entry}}) \triangleq \\ & \left\{ \begin{array}{l} \text{PC} \mapsto_r (\text{RWX}, \text{G}, 0, \text{MEM}_{\text{MAX}}, l_{\text{boot}}) * \\ & * r \mapsto_r - * & * m \mapsto_{a - *} \left[ \begin{array}{c} \bullet \bullet \bullet \bullet \end{array} \right]^{\gamma_{\text{T}}} \end{array} \right\} \\ & \text{Seq (Instr Executable)} \\ & \left\{ \begin{array}{l} \exists b_{adv} \ e_{adv} \ t. \\ & \text{PC} \mapsto_r (\text{RWX}, \text{G}, b_{adv}, e_{adv}, b_{adv}) * \\ & * r_{a_i} \mapsto_r (\text{E}, g, b, e, a_i) * & * r \mapsto_r \text{inl} - * \\ & a_i \in A_{\text{entry}} \end{cases} \\ & * m \mapsto_{a} \text{inl} - * & * m \mapsto_{a - *} \left[ \begin{array}{c} \bullet \bullet \bullet \bullet \bullet \end{array} \right]^{\gamma_{\text{T}}} * \left[ P(t) \right] \\ & * m \mapsto_{a} \text{inl} - * & * m \mapsto_{a - *} \left[ \begin{array}{c} \bullet \bullet \bullet \bullet \bullet \bullet \end{array} \right] \\ & * m \mapsto_{a} \text{inl} - * & * m \mapsto_{a - *} \left[ \begin{array}{c} \bullet \bullet \bullet \bullet \bullet \bullet \bullet \bullet \bullet \end{array} \right] \end{aligned}$$

Intuitively, establishing this specification ensures that, after running the boot code, we arrive in a state that satisfies  $\mathrm{init}_{\mathrm{OK}}$ .

The resulting boot-code-base formulation of the robust safety theorem is now a corollary of the general robust safety theorem, and has the following statement:

$$\forall a_i \in A_{\text{entry}}. \begin{cases} \left[ \exists t. \left[ c t \right]^{\gamma_{\text{T}}} * P(t) \right]^t * \operatorname{code}(b, e, \overline{op_l}) \vdash \\ \forall W \ \exists^{\text{priv}} \ W_{\text{init}}. \ (\mathcal{E}^{\overline{\text{MMIO}}}(\operatorname{RX}, g, b, e, a_i)) \ W \end{cases}$$

$$\underbrace{\operatorname{bootSpec}^{\operatorname{sem}}(P, g, b, e, A_{\operatorname{entry}}, W_{\operatorname{init}}) \quad (r_0, m_0, \emptyset) \longrightarrow^* (r, m, t)}_{P(t)}$$

Note that to link scenario 2 to scenario 1, we need a slightly adapted version of the robust safety theorem stated previously, where the initial trace is allowed to not be empty, but rather just satisfy P(t), but that should not cause any problems.

Note as well that if the boot-code performs I/O, then it is required to preserve the invariant P on the trace. If needed, one could come up with a more general theorem statement, where the boot code is allowed to perform arbitrary I/O, and where P is thus only true of the suffix of the trace for events that occur after the boot code has run...

**Remark:** On the level of the operational semantics, proving the boot specification corresponds to proving that the following holds, in order to use our concrete driver implementation with the robust safety theorem above (the driver still needs to be proven safe separately):

(BOOTCODEOK)  

$$\exists n \ b \ e. \ (r_0, m_0, \emptyset) \longrightarrow^n (r'[\overline{r_{a_i}} := (E, g, b, e, a_i)], m', t)$$
  
 $\land P(t) \land \operatorname{init}_{OK}([b, e), r', m')$ 

Remark: Dominique made the remark that we might want to allow capabilities pointing into the driver code that are not enter capabilities, but can just generally be proven safe wrt. the driver code. This scenario (which we currently do not support out of simplicity considerations) would force us to add the driver code as a static region to the world. While this would work on paper, it does not in the current Coq formalization, since all regions that standard regions point to are forced to be standard as well (through the conditions in the different cases of the interp1 predicate). Essentially, this behavior is disallowed because the Coq development merged the notion of read- and write condition, and there is hence no room to define a static region which defines a stricter reading policy, but hence disallowes writing. More generally, it is currently impossible to plug non-standard regions into the world, and allow standard regions to depend on

them in any way whatsoever. This was not a problem up until now, since our examples never required this sort of dependency.

#### 3.3.3 Concrete boot and driver code

Now we have a closer look at the concrete example we would like to verify, including what its boot code looks like. Our goal is to prove that this concrete example satisfies bootSpec (alternatively, BOOTCODEOK) and this last robust safety theorem.

The code for for the driver scenario we want to gradually verify can be found in driver\_code.v. This example file contains more details on the scenario in the comments. From a more high-level perspective, it contains a trusted part, consisting of a single MMIO location, boot code to properly set up the capability machine at start-up, and the code for the driver itself. On the other hand, the example also contains an untrusted part, consisting of a known, but untrusted code section and sandbox section (which we do not make assumptions over). The first address of the adversary code section is what the trusted part jumps to once it has finished setting up.

The boot code is where execution starts off when the capability machine is powered on. It contains an omnipotent RWX capability (i.e. ranging over all of memory and providing full permissions over it) starts off execution. (Design Alternative: in the long run, it might be worthwhile to provide the boot code with a RWLX capability. Since RWX cannot be upgraded to RWLX, it is impossible in the current setting to develop a secure stack calling convention á la Lau, where the stack capability has to be local-WL and (in this case) the driver code has to ensure that there is no other WL than the stack. However, having a global stack capability is currently not an issue, since the interface to our driver is first order, i.e. it will never create another adversary stack frame on top of itself (currently, it does not even use the stack in the first place!), and since we are working in a single-threaded setting (not very relevant, but I think the multi-threaded setting is an interesting thought experiment). I avoided unnecessary complications and hence kept the omnipotent capability as RWX. If the omnipotent capability were to become RWLX in the future, we would have to be careful using a stack, since if we want our secure calling convention with a higher order interface to our driver API, there cannot be any adversaryaccessible global write-local memory, i.e. the reduced omnipotent capability cannot be allowed to be global if we pass it to the adversary. We would thus have to pass a stack capability that we explicitly make local, and a global RWX capability for the rest of memory)

The boot code has the following responsibilities:

• Generate the necessary closures (i.e. Enter capabilities) for the driver's read and write methods, so that the adversary can use safely use them to perform I/O, and any properties that the driver wants to uphold on the input-output stream can actually be enforced.

- Wipe all of the adversary's sandbox section, to make sure no remaining capabilities can be found there. (**Design Alternative:** maybe we should try to find out how CHERI handles this; is it possible that any lingering capabilities are left in RAM on machine start-up? In any case, we are already being slightly unrealistic by assuming the adversary code section is just there. We have the option to not do any erasure, and just make the assumption that adversary memory contains no capabilities pointing into MMIO or pointing into the driver.). It does not, however, wipe the untrusted, hard-coded adversary routine that reads N lines of code from a single (and the only) MMIO location in memory.
- Reduce the omnipotent capability to provide RWX access to all of the adversary's code, makes it point to the first element of the adversary's code section and jumps to it, in order to hand control to the adversary. Before the jump, it clears all registers except for the 2 containing the read and write driver closures, in order to make sure that no extra permissions leak to the adversary.

## 4 Potential driver clients/properties

The concrete driver that we described in subsec:driverimpl does not yet enforce any useful safety properties on the traces that it outputs. This section describes a couple such properties that we might want to enforce, modifications to the driver that they require and example clients that make use of our modified driver.

#### 4.1 Stateless properties

Printing values < 1000

Modelling channel-specific safety properties

### 4.2 Stateful properties

**Performing**  $\leq 1000$  **I/O operations** To ensure that no more than 1000 values will be read or sent, the driver needs to keep some internal state, e.g. a single integer value n (at location  $l_n$ ) representing the number of times it has been called.

To accommodate this local state, the initial world will need to contain one extra region for location  $l_n$ , that allows any integer values  $\geq 0$ . Since this region only has a single state, we can again short-circuit the world and enforce the constraints we want on n through an invariant directly. We now discuss the concrete constraint that we are looking for.

We have to tie this integer n to the number of I/O events present in the trace t, otherwise it will prove impossible to verify our driver. Concretely, the invariant we will verify our driver under changes from  $\exists t. \left[ \overline{\circ} \, \overline{t} \right]^{\gamma_{\text{T}}} * \lceil |t| \le 1000 \rceil$  to  $\exists t. \left[ \overline{\circ} \, \overline{t} \right]^{\gamma_{\text{T}}} * \lceil |t| \le 1000 \rceil * \exists n. l_n \mapsto n * n = |t| \rceil^t$ . Concretely, this means that instead of having a pure predicate P in our robust safety theorem, we should also allow non-pure predicates that entail P, since we might have to enforce other invariants relating to local state over the trace as well.

**Remark:** I swept this under the rug in the above exposition, but if we allow boot code to run first, we have to make sure that the value n indeed corresponds to the number of I/O events that have occurred when control is passed to the adversary.

To make sure we can prove the driver specification, we will have to update the robust safety theorem slightly to incorporate these changes. It will then look as follows:

Never performing I/O again after some stop token has been read on in

### References

- [CTKZ] Tej Chajed, Joseph Tassarotti, Frans Kaashoek, and Nickolai Zeldovich. Perennial: language tics. https://github.com/mit-pdos/perennial/blob/ d71e8fa9291ba51eb4b59915f5ac94b728578899/src/goose\_ lang/lang.v.
- [FPK+18] Hugo Férée, Johannes Åman Pohjola, Ramana Kumar, Scott Owens, Magnus O. Myreen, and Son Ho. Program verification in the presence of I/O: Semantics, verified library routines, and verified applications. In Ruzica Piskac and Philipp Rümmer, editors, Verified Software. Theories, Tools, and Experiments (VSTTE), volume 11294 of Lecture Notes in Computer Science. Springer, 2018.
- [SGDL19] Michael Sammler, Deepak Garg, Derek Dreyer, and Tadeusz Litak. The high-level benefits of low-level sandboxing. *Proc. ACM Program. Lang.*, 4(POPL), December 2019.