# Rust and type safety for MPI and scientific computing

## **Jed Brown**, CU Boulder


## 2025-02-12

# What does this function do?

```c
int table[4];
bool exists_in_table(int v) {
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}
```
Compiles and runs cleanly with `-Wall -Wextra -fstack-protector`

---
* https://godbolt.org/z/64Yxsr31f
* https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

```asm
exists_in_table:
        mov     al, 1
        ret
```

# What does this program print?

```c++
int main() {
    std::vector<int> v {10, 11, 12};
    v.pop_back();
    int &vref = v[1];
    v.push_back(13);
    std::cout << vref << std::endl;
    return 0;
}
```

---
* https://godbolt.org/z/aMs9fTKhG
* https://cacm.acm.org/research/safe-systems-programming-in-rust/

```
11
```

## Comment `v.pop_back()`

```
5
```
(or **anything**)

# Undefined Behavior (UB) is painful and costly in parallel

* UB is masked by abstraction
  * You're never looking at the whole context
* Reliably avoiding UB in code review and CI is intractible
  * tools help with some forms, but lack of detection is not lack of UB
* Debugging at scale is hard and HPC facilities do not accommodate that need well
* Debugging your user's code via email when it fails nondeterministically at scale is worse yet

# Cognitive load slows everything

* Seasoned developers have been burned enough to have internalized paranoia
* It's hard for new developers to "learn" that paranoia without being burned themselves
  * Part of being burned is learning the arcane tools to debug
  * Or wait helplessly until an overworked senior colleague can reproduce and figure it out
* The cognitive load is a tax on your critical and creative thinking

# Rust: a type-safe systems language

```rust
fn main() {
    let mut v = vec![10, 11, 12];
    let vref = &v[1];
    v.push(13);
    println!("{}", *vref);
}
 ```

**Type-safe/memory-safe**

**Near-zero cost**

**Expressive, low-level control**

<pre><font color="#F66151"><b>error[E0502]</b></font><b>: cannot borrow `v` as mutable because it is also borrowed as immutable</b>
 <font color="#2A7BDE"><b>--&gt; </b></font>src/main.rs:4:5
  <font color="#2A7BDE"><b>|</b></font>
<font color="#2A7BDE"><b>3</b></font> <font color="#2A7BDE"><b>|</b></font>     let vref = &amp;v[1];
  <font color="#2A7BDE"><b>|</b></font>                 <font color="#2A7BDE"><b>-</b></font> <font color="#2A7BDE"><b>immutable borrow occurs here</b></font>
<font color="#2A7BDE"><b>4</b></font> <font color="#2A7BDE"><b>|</b></font>     v.push(13);
  <font color="#2A7BDE"><b>|</b></font>     <font color="#F66151"><b>^^^^^^^^^^</b></font> <font color="#F66151"><b>mutable borrow occurs here</b></font>
<font color="#2A7BDE"><b>5</b></font> <font color="#2A7BDE"><b>|</b></font>     println!(&quot;{}&quot;, *vref);
  <font color="#2A7BDE"><b>|</b></font>                    <font color="#2A7BDE"><b>-----</b></font> <font color="#2A7BDE"><b>immutable borrow later used here</b></font>

<b>For more information about this error, try `rustc --explain E0502`.</b></pre>

# MPI is error-prone

```c
int x[] = {1, 2, 3, 4}, buf[4];
MPI_Request reqs, reqr;
MPI_Isend(x, 4, MPI_INT, right, tag, comm, &reqs);
MPI_Irecv(buf, 4, MPI_INT, left, tag, comm, &reqr);
buf[0]; // data race
MPI_Wait(&reqs, MPI_STATUS_IGNORE);
MPI_WAIT(&reqr, MPI_STATUS_IGNORE);
```

* Caller is responsible for matching types (some special compiler support, but does not cover all cases)
* Caller must not mutate `x` until send completes
* Caller must not access `buf` until receive completes.
* Caller must specify sizes correctly

* Bugs will be benign for smaller sizes due to the eager threshold
* Even worse in the presence of GPU-aware MPI
* Tooling is not great at detecting issues

# RSMPI: safe Rust bindings to MPI

```rs
let mut buf = [0; 4];
mpi::request::scope(|sc| {
    let reqs = world.process_at_rank(right)
        .immediate_send(sc, &x[0]);
    let reqr = world.process_at_rank(left)
        .immediate_receive_into(sc, &mut buf);
    buf[0]; // data race
    reqs.wait(); reqr.wait();
});
println!("{:?}", buf);
```

* Correct types guaranteed by type-checker
* Lifetimes prevent data races
  * Aliasing XOR Mutabality (AXM)
  * Allows many immutable references `&x`
  * If one `&mut x` exists, there can be no others
* Bounds checking guaranteed
* Clean and concise error messages

<pre><font color="#F66151"><b>error[E0503]</b></font><b>: cannot use `buf[_]` because it was mutably borrowed</b>
  <font color="#2A7BDE"><b>--&gt; </b></font>examples/immediate_multiple_test.rs:75:13
   <font color="#2A7BDE"><b>|</b></font>
<font color="#2A7BDE"><b>74</b></font> <font color="#2A7BDE"><b>|</b></font>                 .immediate_receive_into(scope, &amp;mut buf);
   <font color="#2A7BDE"><b>|</b></font>                                                <font color="#2A7BDE"><b>--------</b></font> <font color="#2A7BDE"><b>`buf` is borrowed here</b></font>
<font color="#2A7BDE"><b>75</b></font> <font color="#2A7BDE"><b>|</b></font>             buf[0]; // data race
   <font color="#2A7BDE"><b>|</b></font>             <font color="#F66151"><b>^^^^^^</b></font> <font color="#F66151"><b>use of borrowed `buf`</b></font>
<font color="#2A7BDE"><b>76</b></font> <font color="#2A7BDE"><b>|</b></font>             reqs.wait(); reqr.wait();
   <font color="#2A7BDE"><b>|</b></font>                          <font color="#2A7BDE"><b>----</b></font> <font color="#2A7BDE"><b>borrow later used here</b></font>

<b>For more information about this error, try `rustc --explain E0503`.</b></pre>

# Collective type matching

```rs
if comm.rank() == 0 {
    let x = [1.0_f32, 2.0];
    comm.process_at_rank(1).send(&x);
} else if comm.rank() == 1 {
    let mut y = [0.0_f32; 2];
    comm.process_at_rank(0).receive_into(&mut y);
    println!("Rank 1 received: {:?}", y);
}
```
Yields
```
Rank 1 received: [1.0, 2.0]
```

```rs
if comm.rank() == 0 {
    let x = [1.0_f32, 2.0];
    comm.process_at_rank(1).send(&x);
} else if comm.rank() == 1 {
    let mut y = [0_u32; 2]; // <---- changed
    comm.process_at_rank(0).receive_into(&mut y);
    println!("Rank 1 received: {:?}", y);
}
```
Yields
```
Rank 1 received: [1065353216, 1073741824]
```

## Was that type-safe?

# Technically, yes

This is not `unsafe`

```rs
impl f32 {
    pub const fn to_bits(self) -> u32;
    pub const fn from_bits(v: u32) -> Self {
        // SAFETY: `u32` is a plain old datatype so we can always transmute from it.
        unsafe { mem::transmute(v) }
    }
}
```
https://doc.rust-lang.org/std/primitive.f32.html#method.to_bits

* Every bit pattern is a valid `u32` and `f32` (even if NaN)

## Unsafe Rust

* You can manipulate raw pointers in Rust (`*mut f32` vs `&mut f32`).
* AXM does not apply to pointers
* Most pointer operations are `unsafe`

```rs
unsafe {
    // unsafe operations must be enclosed in an unsafe block
    danger(ptr);
}

unsafe fn danger(x: *mut f32) {
    *x.offset(1) += 1.0;
}
```

# Was that really type-safe?

## What if instead of `f32` and `u32`, we had `bool` or an enum?

A `bool` is the same storage as a `u8`, but only two bit patterns (of 256) are valid. A `bool` embodied with a different bit pattern is UB.

Cannot allow receiving into a type in which all bit patterns are not valid (unless we check types).

## It's also just probably a bug if you inadvertently match an `f32` send to a `u32` receive.

Type-safety doesn't eliminate bugs, but it provides powerful tools.

# Improving MPI safety for modern languages
## Jake Tronge, Howard Pritchard, JB (2023) https://doi.org/10.1145/3615318.3615328

* Match types at run-time using Serde (`bincode`), `iovec`, or `flat` (in-place; implemented by modifying Open MPI)
* Mismatches detected at run-time (hard to trace back to root cause, especially for nonblocking)

<img src="figures/tronge2023/latency-simple-2023-08-15.1.svg" width="90%" />

# An alternative: typed communicators (with Nafees Iqbal)

```rs
let comm = TypedCommunicator::new(&world);
if comm.rank() == 0 {
    let x = [1.0_f32, 2.0];
    comm.send_slice(&x, 1, tag);
} else {
    let mut y = [0_u32; 2];
    comm.receive_slice(&mut y, 0, tag);
    println!("Rank 1 received: {:?}", y);
}
```

* Type of `comm` is inferred to be `TypedCommunicator<f32>`
* The mismatch is caught at compile time

<pre><font color="#F66151"><b>error[E0271]</b></font><b>: type mismatch resolving `&lt;u32 as Equivalence&gt;::Base == f32`</b>
  <font color="#2A7BDE"><b>--&gt; </b></font>examples/typed_communicator_test.rs:43:28
   <font color="#2A7BDE"><b>|</b></font>
<font color="#2A7BDE"><b>43</b></font> <font color="#2A7BDE"><b>|</b></font>         comm.receive_slice(&amp;mut y, 0, tag);
   <font color="#2A7BDE"><b>|</b></font>              <font color="#2A7BDE"><b>-------------</b></font> <font color="#F66151"><b>^^^^^^</b></font> <font color="#F66151"><b>expected `f32`, found `u32`</b></font>
   <font color="#2A7BDE"><b>|</b></font>              <font color="#2A7BDE"><b>|</b></font>
   <font color="#2A7BDE"><b>|</b></font>              <font color="#2A7BDE"><b>required by a bound introduced by this call</b></font>
   <font color="#2A7BDE"><b>|</b></font>
<font color="#33D17A"><b>note</b></font>: required by a bound in `TypedCommunicator::&lt;&apos;a, T&gt;::receive_slice`
  <font color="#2A7BDE"><b>--&gt; </b></font>/home/jed/src/rsmpi/src/typed_communicator.rs:94:24
   <font color="#2A7BDE"><b>|</b></font>
<font color="#2A7BDE"><b>92</b></font> <font color="#2A7BDE"><b>|</b></font>     pub fn receive_slice&lt;U&gt;(&amp;self, buffer: &amp;mut [U], source: i32, tag: i32)
   <font color="#2A7BDE"><b>|</b></font>            <font color="#2A7BDE"><b>-------------</b></font> <font color="#2A7BDE"><b>required by a bound in this associated function</b></font>
<font color="#2A7BDE"><b>93</b></font> <font color="#2A7BDE"><b>|</b></font>     where
<font color="#2A7BDE"><b>94</b></font> <font color="#2A7BDE"><b>|</b></font>         U: Equivalence&lt;Base = T&gt;,
   <font color="#2A7BDE"><b>|</b></font>                        <font color="#33D17A"><b>^^^^^^^^</b></font> <font color="#33D17A"><b>required by this bound in `TypedCommunicator::&lt;&apos;a, T&gt;::receive_slice`</b></font></pre>

# Not zero-cost: need to check at run-time

This mismatch type-checks if we move the constructor into each arm of the conditional.
```rs
if world.rank() == 0 {
    let comm = TypedCommunicator::new(&world);
    let x = [1.0_f32, 2.0];
    comm.send_slice(&x, 1, tag);
} else {
    let comm = TypedCommunicator::new(&world);
    let mut y = [0_u32; 2];
    comm.receive_slice(&mut y, 0, tag);
    println!("Rank 1 received: {:?}", y);
}
```

* The constructor `TypedCommunicator::new` must have a run-time check for compatibility (cost of one reduction)
  * Root cause of a mismatch is more explainable
* Zero-cost after construction
* Can send/receive congruent types (do not need to match exactly)
* Is it expressive enough?
  * Although PETSc uses derived types, all but one message type contains homogeneous "base" of integer or floating-point data.
* Run-time checks don't prevent deadlock

# Compile-time convergent/collective semantics?

## Choreographic programming

Inspired by https://lsd-ucsc.github.io/ChoRus/

```rs
struct X {
    local: Local<Part>,
    global_param: f64,
}

comm.locally(|access| {
    let mut loc = access(x.local);
    loc[0] =+ 1.0;
}
// no access to divergent/local state
comm.reduce(x.global_param);
```

* Fits collective intuition; enshrines best practices
* Cannot prevent leaking divergent data out of `comm.locally` via interior mutability

## Session types

* Inconvenient: write algorithm once in type and again to implement
* Session type inference using generativity ("branding") and RPIT:

```rs
if tid == 0 {
    let (unique_x, token) = sync(
        data_x.with_policy(consteval!(UniqueAccess { owner: 2 })),
        token,
    );
    token // one type
} else {
    let (unique_x, token) = sync(
        data_x.with_policy(consteval!(UniqueAccess { owner: 0 })),
        token,
    );
    token // different type
}
```

* Awkward, high cognitive load
* Poor error messages


# Effects systems

Rust has `const fn` and `async fn`, both of which limit what you can do.

https://rust-lang.github.io/keyword-generics-initiative/updates/2024-02-09-extending-rusts-effect-system.html

> * **no-panic**: guarantees a function will never produce a panic, causing the function to unwind.
> * **parametricity**: guarantees that a function only operates on its arguments. That means no implicit access to statics, no global filesystem, no thread-locals.

## `#t-lang/effects` (314 members) on effect generics initiative

We could add `convergent fn`: similar to parametricity, ensuring that divergent data cannot be used (no `comm.rank()`, etc.), including in error code paths (via `Result` or panic).

# RSMPI niceties

Derive macro for `Equivalence` is all you need to send/receive structs

```rs
#[derive(Equivalence, Default, PartialEq, Debug)]
struct MyDataRust {
    b: bool,
    f: f64,
    i: u16,
}
```

No need for `MPI_Type_create_struct`, etc.

# On soundness

Any function that is safe to call (i.e., not an `unsafe fn`) must have defined behavior for all well-typed inputs.

* You cannot rely on a trait being correctly implemented by the user.
* You can rely on correctness of concrete implementations that you use.

## Preventing bugs beyond soundness

Crichton, *Typed design patterns for the functional era* https://doi.org/10.1145/3609025.3609477

Numerical code often relies on invariants.
```rs
struct Permutation {
    perm: Vec<usize>, // private
}
impl Permutation {
    pub fn new(perm: Vec<usize>) -> Result<Self, PermError> {
        if Self::is_perm(&perm) {
            Ok(Self { perm })
        } else {
            Err(PermError)
        }
    }
}
```

# Rust ecosystem

## rustup
Cross-platform toolchain management

## Cargo

* `cargo run` (and `build`, `test`, etc)
* `Cargo.toml`
```toml
[dependencies]
mpi = { version = "0.8.0", features = ["derive"] }
```
* Parallel across your dependency graph
* `cargo publish` to [crates.io](https://crates.io)

## rust-analyzer

IDE integration, works for any project without setup steps

## rustdoc

Cross-referenced documentation including doctests; [docs.rs](https://docs.rs) integrated with [crates.io](https://crates.io)

## test

Unit testing, doctests, integration tests, custom test harnesses, editor integration.

## built-in cross-compilation

# Quality diagnostics

<img src="figures/rust/gankra-ekuber.jpg" width="100%" />

* This ethos permeates the ecosystem and is a central factor in language evolution

## [Stability without stagnation](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html)

Experimental features are available only on `nightly`, not the `stable` release channel.

## [MIRI](https://github.com/rust-lang/miri): An interpreter for Rust's mid-level intermediate representation

* `cargo miri run`
* Quality diagnostics explains when `unsafe` code leads to UB
* Much more capable than valgrind, stack protector, address sanitizer

# Scientific ecosystem

<img src="https://scientificcomputing.rs/img/ferris.png" width="80%" />

## Scientific Computing in Rust
* Virtual conference (since 2023): https://scientificcomputing.rs/
* Zulip
* See program for lots of exciting projects

## [Faer](https://faer-rs.github.io/)

* Rust-native library covering most of BLAS/BLIS, Lapack, and SuiteSparse
* Performance on par with best implementations

## Rayon

* Similar to OpenMP (CPU), but safe and integrated in the type system (not annotation).

## ndarray/nalgebra
## RSMPI
## Bindings to many popular libraries
* PETSc, Scotch, BLIS, BLAS/LAPACK

# What properties must constitutive models have?

* Dimensional consistency/invariance
* Reference-frame invariance/equivariance
\begin{align}
\text{invariant} && \psi(\mathbf E) &= \psi(Q \mathbf E Q^T) & \forall Q \in O(3) \\
\text{equivariant} && Q \mathbf S(\mathbf E) Q^T &= \mathbf S(Q \mathbf E Q^T) & \forall Q \in O(3)
\end{align}

## Integrity bases can represent all equivariant functions
<img src="figures/wineman-pipkin.svg" width="100%" />

# What is the state of practice?

* Materials scientists like to model in terms of:
  * Free energy
  * Dissipation potential
* Derivatives are required for observable relations
* Further derivatives for efficient solvers
* Many publications per year:
  * dedicate lots of space to representation of derivatives
  * software interfaces are error-prone

* Heavy C++ hierarchies, hard to reuse
* Tedious code in Fortran (e.g., for Abaqus `UMAT`/`UHYPER`)
<img src="figures/ratel/abaqus-uhyper.png" />

# Enzyme `#[feature(autodiff)]` for Neo-Hookean model

$$\DeclareMathOperator{\trace}{trace}
\psi(\mathbf E) = \frac{\lambda}{4}(J^2 - 1 - 2\ln J) + \mu (\trace \mathbf E - \ln J)$$
where $\mathbf E$ is Green-Lagrange strain and $J = \sqrt{\lvert \mathbf I + 2\mathbf E \rvert}$.
```rs
#[autodiff(d_psi, Reverse, Duplicated, Const, Active)]
fn psi(e: &KelvinMandel, nh: &NeoHookean) -> f64 {
    let J = e.cauchy_green().det().sqrt();
    let lnJ = J.ln();
    0.25 * nh.lambda * (J * J - 1. - 2. * lnJ) + nh.mu * (e.trace() - lnJ)
}
```

Compute stress $\mathbf \tau(\mathbf e) = \frac{\partial \psi}{\partial \mathbf e} \mathbf b$ for current configuration, with its derivative for use by Newton solvers:
```rs
#[autodiff(d_stress, Forward, Dual, Const, Dual)]
fn stress(e: &KelvinMandel, nh: &NeoHookean, tau: &mut KelvinMandel) {
    let mut dpsi_de = KelvinMandel::zero();
    d_psi(&e, &mut dpsi_de, &nh, 1.0);
    let b = e.cauchy_green();
    *tau = dpsi_de * b;
}
```

# Diman: Zero-cost compile-time dimensional analysis (Toni Peter)

```rs
struct Primitive {
    pressure: Pressure<f64>,
    velocity: [Velocity<f64>; 3],
    temperature: Temperature<f64>,
}
struct Conservative {
    density: MassDensity<f64>,
    momentum: [MomentumDensity<f64>; 3],
    energy: EnergyDensity<f64>,
}
```

In gas dynamics, one needs to convert between primitive and conservative variables. This depends on the equation of state (gas model), which is an active research area (e.g., [CoolProp](http://coolprop.org/fluid_properties/PurePseudoPure.html#introduction)).

```rs
impl From<&Primitive> for Conservative {
    fn from(s: &Primitive) -> Self {
        let gas = IdealGas::air();
        let density: MassDensity<f64> = s.pressure / (gas.r * s.temperature);
        let energy_internal: SpecificEnergy<f64> = gas.cv * s.temperature;
        let energy_kinetic =
            0.5 * s.velocity.iter().map(|v| v * v).sum::<SpecificEnergy<f64>>();
        let momentum = s.velocity.map(|v| density * v);
        let energy = density * (energy_internal + energy_kinetic);
        Self {density, momentum, energy}
    }
}

#[autodiff(d_primitive_to_conservative, Forward, Dual, Dual)]
fn primitive_to_conservative(p: &Primitive, c: &mut Conservative) {
    *c = Conservative::from(p);
}
```

# Diman prevents coding bugs

<img src="figures/diman/diman-mismatch.png" />

## Diman works seamlessly with `#[feature(autodiff)]`

<img src="figures/diman/primitive_to_conservative.png" />

# A day in the life

## What I dream I do
### cross-cutting methods and community software

## What my students think I do
### review pull requests

## What my university thinks I do
### teach classes

## What I actually do

### debug broken environments, linker errors, and memory errors via email during faculty meetings

# Outlook

## We need more libraries

* Low-level bindings to mature C/C++ libraries
* Safe/ergonomic/higher-level interfaces
* Pure Rust implementations (portability for free)

## GPU support

* Several initiatives, including "reboot" of Rust-CUDA
* NVPTX (and nascent AMDGPU) provides low-level (unsafe) support, available on nightly
* Research questions about safety (cf. convergent semantics)