Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions MEMORY-MODEL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# ilo Memory Model and Cycle Freedom

Reference notes on ilo's runtime memory model, written up while
investigating the "improve cycle detection" memory brief. Captures
*why* ilo does not have (and does not currently need) a cycle
collector, what would change that, and where the foundation already
exists.

## TL;DR

ilo's runtime is reference-counted (Arc in the tree interpreter,
custom RC on `HeapObj` in the VM). Pure RC cannot reclaim reference
cycles. Most RC languages pair RC with a cycle collector. **ilo does
not, because the surface language is structurally cycle-free.** Every
heap-allocated value in both the tree interpreter and the VM is
immutable after construction, and there is no surface-language
construct that lets a user install a reference back into a
previously-allocated holder.

The static foundation for a future collector is already in place:
`Type::can_form_cycle` in `src/ast/mod.rs` classifies each static type
as cycle-capable or cycle-incapable, and is tested against the cases
that would matter the day ilo grows a mutation primitive. Until then
the classifier sits unused at runtime and serves as a regression
surface for the invariant.

## Tree interpreter (`src/interpreter/mod.rs`)

```rust
pub enum Value {
Number(f64),
Text(Arc<String>),
Bool(bool),
Nil,
List(Arc<Vec<Value>>),
Map(Arc<HashMap<MapKey, Value>>),
Record { type_name: String, fields: HashMap<String, Value> },
Ok(Box<Value>),
Err(Box<Value>),
FnRef(String),
Closure { fn_name: String, captures: Vec<Value> },
}
```

Every shared structure is wrapped in `Arc` of an *immutable* payload.
Mutation builtins (`mset`, `slc`, list updates) consume an owned `Arc`
and use `Arc::make_mut`, which clones if the strong count is greater
than one. The value bound to the new key is an already-evaluated
`Value`; it is impossible for that value to be a reference forward to
the map being mutated. Closures capture by value. Records use plain
`HashMap` per record (no sharing) and are reconstructed by `with`.

## VM (`src/vm/mod.rs`)

```rust
enum HeapObj {
Str(String),
List(Vec<NanVal>),
ListView { src: NanVal, start: usize, len: usize },
Map(HashMap<MapKey, NanVal>),
Record { type_info: Rc<TypeInfo>, fields: Box<[NanVal]> },
OkVal(NanVal),
ErrVal(NanVal),
Closure { kind: FnRefKind, id: u32, captures: Vec<NanVal> },
}
```

The VM has a manual RC on `HeapObj` (the existing SAFETY comments
around `OP_RECSETFIELD` document the invariants). The closest thing
to in-place mutation is `OP_RECSETFIELD`, which the compiler emits
only against records the same instruction sequence just allocated via
`OP_RECNEW_EMPTY` or `OP_RECCOPY`. At that moment the record has
refcount 1 and is not reachable from any other value; the field being
stored is itself a NanVal computed before the assignment. There is no
way to weave the record's own NanVal back into one of its own fields
from the source language.

## Closure captures

Both engines snapshot captures at the `make_closure` site. There is no
by-reference capture form. A closure cannot mutate a captured value in
a way that would point another value back at the closure.

## Why cycles are unreachable

The combination of immutable post-construction heap objects,
copy-on-share for the "mutable" container builtins, by-value closure
captures, and the absence of any field-assignment expression means
the static structure of an ilo program cannot produce a cycle in the
runtime heap. Refcounts will always reach zero through normal drop
chains.

## What flips this

If ilo ever grows one of these features, cycles become reachable and
a real cycle collector becomes necessary:

1. A field-assignment expression on records (`r.x = y`).
2. A mutable reference cell type (`Ref t`, similar to OCaml `ref` or
Rust `RefCell`).
3. By-reference closure captures.
4. Any FFI builtin that exposes a writable handle into a previously
constructed heap object.

When that day comes, `Type::can_form_cycle` is the pruning oracle a
Bacon-Rajan-style synchronous cycle collector would consult to decide
whether an allocation needs a colour field, an incoming entry in the
roots set, or trial-deletion treatment at all.

## The classifier

```rust
impl Type {
pub fn can_form_cycle<F>(&self, resolve_record: &F) -> bool
where F: Fn(&str) -> Option<Vec<Type>>;
}
```

Returns `true` if a runtime value of the given static type could
possibly participate in a cycle. Defaults to `true` whenever
information is missing (unknown `Named`, `Any`, function types). It
is always sound to mark a type cycle-capable; the cost is unnecessary
scanning. The unsound case is the reverse: marking a cycle-capable
type clean would let a real cycle leak forever.

Tests in `src/ast/mod.rs` cover: primitives, list/map/optional/result
of primitives, `Any`, `Fn`, unknown `Named`, primitive records,
records with primitive collections, records with `Any` or `Fn` fields,
self-referential records, mutually recursive records, and lists of
records.

## Status of the original brief

The memory-2 brief (`zero-gap-specs/briefs/memory/2-cycle-detection-brief.md`)
asks for two optimisations to "ilo's runtime cycle detector":
incremental detection, and type-based pruning. Both presume a
detector that does not exist. The right follow-up is to retire or
rewrite the brief as "when ilo grows mutation, here is what a cycle
collector should look like and what we have already pre-built".
216 changes: 216 additions & 0 deletions src/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -883,6 +883,92 @@ fn desugar_expr(expr: &mut Expr, scope: &[String], rf: &std::collections::HashSe
}
}

/// Cycle-capability classifier for runtime values of a given static type.
///
/// Background: ilo's runtime is reference-counted (Arc in the tree
/// interpreter, custom RC on `HeapObj` in the VM). Pure RC cannot reclaim
/// reference cycles (A -> B -> A). Most RC languages pair RC with a cycle
/// collector to handle this. ilo deliberately does NOT — the surface
/// language is structurally cycle-free:
///
/// * Records are immutable after construction. `with` allocates a fresh
/// record; there is no field-assignment expression. Fields are bound
/// from already-evaluated values, so a field cannot refer forward to
/// the record being built.
/// * Lists and maps are persistent. Mutation goes through copy-on-share
/// (`Arc::make_mut`, fresh `HeapObj` allocation). You cannot install
/// a reference back to a holder you no longer have a write handle to.
/// * Closures capture by value.
/// * Numbers, booleans, text, sums-of-strings, nil are inline / immutable.
///
/// This classifier exists as a foundation. It defines the invariant
/// explicitly and gives us a regression surface if a future language
/// change quietly introduces a cycle-forming construct. It is also the
/// hook a future cycle collector would consult to prune immutable types.
///
/// Default policy: when in doubt, return `true` (cycle-capable). It is
/// always sound to mark a type cycle-capable; the cost is unnecessary
/// scanning. The unsound case is the reverse: marking a cycle-capable
/// type clean would let a real cycle leak forever.
impl Type {
/// Returns true if a runtime value of this type could possibly
/// participate in a reference cycle under ilo's current memory model.
///
/// `resolve_record` resolves a record type name to its field types.
/// Pass `&|_| None` to treat all `Named` references conservatively
/// (cycle-capable).
pub fn can_form_cycle<F>(&self, resolve_record: &F) -> bool
where
F: Fn(&str) -> Option<Vec<Type>>,
{
fn rec<F>(ty: &Type, seen: &mut Vec<String>, resolve_record: &F) -> bool
where
F: Fn(&str) -> Option<Vec<Type>>,
{
match ty {
// Inline primitives.
Type::Number | Type::Bool | Type::Sum(_) => false,
// Immutable shared bytes; no embedded references.
Type::Text => false,
// No information at the type level.
Type::Any => true,
// Closures capture by value. Without a per-closure capture
// type list at this layer we conservatively mark Fn as
// cycle-capable. Param/return types here describe the
// call-site arrow, not the captured environment.
Type::Fn(_, _) => true,
// Wrappers inherit cycle-capability from their inner type.
Type::Optional(inner) | Type::List(inner) => rec(inner, seen, resolve_record),
Type::Result(ok, err) => {
rec(ok, seen, resolve_record) || rec(err, seen, resolve_record)
}
Type::Map(k, v) => rec(k, seen, resolve_record) || rec(v, seen, resolve_record),
Type::Named(name) => {
if seen.iter().any(|s| s == name) {
// Closed loop on the resolution path — by definition
// cycle-capable.
return true;
}
match resolve_record(name.as_str()) {
Some(fields) => {
seen.push(name.clone());
let result = fields.iter().any(|f| rec(f, seen, resolve_record));
seen.pop();
result
}
// Unknown name (type variable, missing record,
// unresolved alias): conservative default.
None => true,
}
}
}
}

let mut seen: Vec<String> = Vec::new();
rec(self, &mut seen, resolve_record)
}
}

#[cfg(test)]
#[allow(clippy::approx_constant)]
mod tests {
Expand Down Expand Up @@ -1525,4 +1611,134 @@ mod tests {
"expected Field unchanged when field name not in scope, got {last:?}"
);
}

// ---- can_form_cycle tests ----
//
// The whole point of the classifier is to make our cycle-freedom
// invariant testable. If any of these change, the language has grown
// a new cycle-forming construct and the runtime needs a cycle
// collector before that change ships.

fn no_records(_: &str) -> Option<Vec<Type>> {
None
}

#[test]
fn primitives_cannot_cycle() {
let resolver = |s: &str| no_records(s);
assert!(!Type::Number.can_form_cycle(&resolver));
assert!(!Type::Bool.can_form_cycle(&resolver));
assert!(!Type::Text.can_form_cycle(&resolver));
assert!(!Type::Sum(vec!["a".into(), "b".into()]).can_form_cycle(&resolver));
}

#[test]
fn lists_and_maps_of_primitives_cannot_cycle() {
let resolver = |s: &str| no_records(s);
assert!(!Type::List(Box::new(Type::Number)).can_form_cycle(&resolver));
assert!(!Type::List(Box::new(Type::Text)).can_form_cycle(&resolver));
assert!(!Type::Map(Box::new(Type::Text), Box::new(Type::Number)).can_form_cycle(&resolver));
assert!(!Type::Optional(Box::new(Type::Number)).can_form_cycle(&resolver));
assert!(
!Type::Result(Box::new(Type::Number), Box::new(Type::Text)).can_form_cycle(&resolver)
);
}

#[test]
fn any_is_conservative() {
let resolver = |s: &str| no_records(s);
assert!(Type::Any.can_form_cycle(&resolver));
assert!(Type::List(Box::new(Type::Any)).can_form_cycle(&resolver));
}

#[test]
fn fn_is_conservative() {
// Closures capture by value. The capture types aren't exposed at the
// Type level, so the classifier marks Fn cycle-capable.
let resolver = |s: &str| no_records(s);
assert!(Type::Fn(vec![Type::Number], Box::new(Type::Number)).can_form_cycle(&resolver));
}

#[test]
fn unknown_named_is_conservative() {
let resolver = |s: &str| no_records(s);
assert!(Type::Named("Whatever".into()).can_form_cycle(&resolver));
}

#[test]
fn record_of_primitives_cannot_cycle() {
let resolver = |s: &str| match s {
"Point" => Some(vec![Type::Number, Type::Number]),
_ => None,
};
assert!(!Type::Named("Point".into()).can_form_cycle(&resolver));
}

#[test]
fn record_with_primitive_list_cannot_cycle() {
let resolver = |s: &str| match s {
"Bag" => Some(vec![Type::Text, Type::List(Box::new(Type::Number))]),
_ => None,
};
assert!(!Type::Named("Bag".into()).can_form_cycle(&resolver));
}

#[test]
fn record_containing_any_field_can_cycle() {
let resolver = |s: &str| match s {
"Box" => Some(vec![Type::Any]),
_ => None,
};
assert!(Type::Named("Box".into()).can_form_cycle(&resolver));
}

#[test]
fn record_containing_function_field_can_cycle() {
let resolver = |s: &str| match s {
"Handler" => Some(vec![Type::Fn(vec![Type::Number], Box::new(Type::Number))]),
_ => None,
};
assert!(Type::Named("Handler".into()).can_form_cycle(&resolver));
}

#[test]
fn self_referential_record_marked_cycle_capable() {
// type Node { next:Node } — not constructible today (records are
// immutable, so you can't tie the knot), but the classifier still
// marks the *type* cycle-capable. If we ever add a primitive that
// would let you build one, the runtime needs to know.
let resolver = |s: &str| match s {
"Node" => Some(vec![Type::Named("Node".into())]),
_ => None,
};
assert!(Type::Named("Node".into()).can_form_cycle(&resolver));
}

#[test]
fn mutually_recursive_records_marked_cycle_capable() {
let resolver = |s: &str| match s {
"A" => Some(vec![Type::Named("B".into())]),
"B" => Some(vec![Type::Named("A".into())]),
_ => None,
};
assert!(Type::Named("A".into()).can_form_cycle(&resolver));
assert!(Type::Named("B".into()).can_form_cycle(&resolver));
}

#[test]
fn list_of_record_inherits_record_capability() {
let primitive_record = |s: &str| match s {
"Point" => Some(vec![Type::Number, Type::Number]),
_ => None,
};
assert!(
!Type::List(Box::new(Type::Named("Point".into()))).can_form_cycle(&primitive_record)
);

let cyclic_record = |s: &str| match s {
"Node" => Some(vec![Type::Named("Node".into())]),
_ => None,
};
assert!(Type::List(Box::new(Type::Named("Node".into()))).can_form_cycle(&cyclic_record));
}
}
Loading