diff --git a/MEMORY-MODEL.md b/MEMORY-MODEL.md new file mode 100644 index 00000000..3ffb2a88 --- /dev/null +++ b/MEMORY-MODEL.md @@ -0,0 +1,139 @@ +# ilo Memory Model and Cycle Freedom + +Reference notes on ilo's runtime memory model, written up while +investigating the "improve cycle detection" memory brief. Captures +*why* ilo does not have (and does not currently need) a cycle +collector, what would change that, and where the foundation already +exists. + +## TL;DR + +ilo's runtime is reference-counted (Arc in the tree interpreter, +custom RC on `HeapObj` in the VM). Pure RC cannot reclaim reference +cycles. Most RC languages pair RC with a cycle collector. **ilo does +not, because the surface language is structurally cycle-free.** Every +heap-allocated value in both the tree interpreter and the VM is +immutable after construction, and there is no surface-language +construct that lets a user install a reference back into a +previously-allocated holder. + +The static foundation for a future collector is already in place: +`Type::can_form_cycle` in `src/ast/mod.rs` classifies each static type +as cycle-capable or cycle-incapable, and is tested against the cases +that would matter the day ilo grows a mutation primitive. Until then +the classifier sits unused at runtime and serves as a regression +surface for the invariant. + +## Tree interpreter (`src/interpreter/mod.rs`) + +```rust +pub enum Value { + Number(f64), + Text(Arc), + Bool(bool), + Nil, + List(Arc>), + Map(Arc>), + Record { type_name: String, fields: HashMap }, + Ok(Box), + Err(Box), + FnRef(String), + Closure { fn_name: String, captures: Vec }, +} +``` + +Every shared structure is wrapped in `Arc` of an *immutable* payload. +Mutation builtins (`mset`, `slc`, list updates) consume an owned `Arc` +and use `Arc::make_mut`, which clones if the strong count is greater +than one. The value bound to the new key is an already-evaluated +`Value`; it is impossible for that value to be a reference forward to +the map being mutated. Closures capture by value. Records use plain +`HashMap` per record (no sharing) and are reconstructed by `with`. + +## VM (`src/vm/mod.rs`) + +```rust +enum HeapObj { + Str(String), + List(Vec), + ListView { src: NanVal, start: usize, len: usize }, + Map(HashMap), + Record { type_info: Rc, fields: Box<[NanVal]> }, + OkVal(NanVal), + ErrVal(NanVal), + Closure { kind: FnRefKind, id: u32, captures: Vec }, +} +``` + +The VM has a manual RC on `HeapObj` (the existing SAFETY comments +around `OP_RECSETFIELD` document the invariants). The closest thing +to in-place mutation is `OP_RECSETFIELD`, which the compiler emits +only against records the same instruction sequence just allocated via +`OP_RECNEW_EMPTY` or `OP_RECCOPY`. At that moment the record has +refcount 1 and is not reachable from any other value; the field being +stored is itself a NanVal computed before the assignment. There is no +way to weave the record's own NanVal back into one of its own fields +from the source language. + +## Closure captures + +Both engines snapshot captures at the `make_closure` site. There is no +by-reference capture form. A closure cannot mutate a captured value in +a way that would point another value back at the closure. + +## Why cycles are unreachable + +The combination of immutable post-construction heap objects, +copy-on-share for the "mutable" container builtins, by-value closure +captures, and the absence of any field-assignment expression means +the static structure of an ilo program cannot produce a cycle in the +runtime heap. Refcounts will always reach zero through normal drop +chains. + +## What flips this + +If ilo ever grows one of these features, cycles become reachable and +a real cycle collector becomes necessary: + +1. A field-assignment expression on records (`r.x = y`). +2. A mutable reference cell type (`Ref t`, similar to OCaml `ref` or + Rust `RefCell`). +3. By-reference closure captures. +4. Any FFI builtin that exposes a writable handle into a previously + constructed heap object. + +When that day comes, `Type::can_form_cycle` is the pruning oracle a +Bacon-Rajan-style synchronous cycle collector would consult to decide +whether an allocation needs a colour field, an incoming entry in the +roots set, or trial-deletion treatment at all. + +## The classifier + +```rust +impl Type { + pub fn can_form_cycle(&self, resolve_record: &F) -> bool + where F: Fn(&str) -> Option>; +} +``` + +Returns `true` if a runtime value of the given static type could +possibly participate in a cycle. Defaults to `true` whenever +information is missing (unknown `Named`, `Any`, function types). It +is always sound to mark a type cycle-capable; the cost is unnecessary +scanning. The unsound case is the reverse: marking a cycle-capable +type clean would let a real cycle leak forever. + +Tests in `src/ast/mod.rs` cover: primitives, list/map/optional/result +of primitives, `Any`, `Fn`, unknown `Named`, primitive records, +records with primitive collections, records with `Any` or `Fn` fields, +self-referential records, mutually recursive records, and lists of +records. + +## Status of the original brief + +The memory-2 brief (`zero-gap-specs/briefs/memory/2-cycle-detection-brief.md`) +asks for two optimisations to "ilo's runtime cycle detector": +incremental detection, and type-based pruning. Both presume a +detector that does not exist. The right follow-up is to retire or +rewrite the brief as "when ilo grows mutation, here is what a cycle +collector should look like and what we have already pre-built". diff --git a/src/ast/mod.rs b/src/ast/mod.rs index 0e858b7a..9db87df3 100644 --- a/src/ast/mod.rs +++ b/src/ast/mod.rs @@ -883,6 +883,92 @@ fn desugar_expr(expr: &mut Expr, scope: &[String], rf: &std::collections::HashSe } } +/// Cycle-capability classifier for runtime values of a given static type. +/// +/// Background: ilo's runtime is reference-counted (Arc in the tree +/// interpreter, custom RC on `HeapObj` in the VM). Pure RC cannot reclaim +/// reference cycles (A -> B -> A). Most RC languages pair RC with a cycle +/// collector to handle this. ilo deliberately does NOT — the surface +/// language is structurally cycle-free: +/// +/// * Records are immutable after construction. `with` allocates a fresh +/// record; there is no field-assignment expression. Fields are bound +/// from already-evaluated values, so a field cannot refer forward to +/// the record being built. +/// * Lists and maps are persistent. Mutation goes through copy-on-share +/// (`Arc::make_mut`, fresh `HeapObj` allocation). You cannot install +/// a reference back to a holder you no longer have a write handle to. +/// * Closures capture by value. +/// * Numbers, booleans, text, sums-of-strings, nil are inline / immutable. +/// +/// This classifier exists as a foundation. It defines the invariant +/// explicitly and gives us a regression surface if a future language +/// change quietly introduces a cycle-forming construct. It is also the +/// hook a future cycle collector would consult to prune immutable types. +/// +/// Default policy: when in doubt, return `true` (cycle-capable). It is +/// always sound to mark a type cycle-capable; the cost is unnecessary +/// scanning. The unsound case is the reverse: marking a cycle-capable +/// type clean would let a real cycle leak forever. +impl Type { + /// Returns true if a runtime value of this type could possibly + /// participate in a reference cycle under ilo's current memory model. + /// + /// `resolve_record` resolves a record type name to its field types. + /// Pass `&|_| None` to treat all `Named` references conservatively + /// (cycle-capable). + pub fn can_form_cycle(&self, resolve_record: &F) -> bool + where + F: Fn(&str) -> Option>, + { + fn rec(ty: &Type, seen: &mut Vec, resolve_record: &F) -> bool + where + F: Fn(&str) -> Option>, + { + match ty { + // Inline primitives. + Type::Number | Type::Bool | Type::Sum(_) => false, + // Immutable shared bytes; no embedded references. + Type::Text => false, + // No information at the type level. + Type::Any => true, + // Closures capture by value. Without a per-closure capture + // type list at this layer we conservatively mark Fn as + // cycle-capable. Param/return types here describe the + // call-site arrow, not the captured environment. + Type::Fn(_, _) => true, + // Wrappers inherit cycle-capability from their inner type. + Type::Optional(inner) | Type::List(inner) => rec(inner, seen, resolve_record), + Type::Result(ok, err) => { + rec(ok, seen, resolve_record) || rec(err, seen, resolve_record) + } + Type::Map(k, v) => rec(k, seen, resolve_record) || rec(v, seen, resolve_record), + Type::Named(name) => { + if seen.iter().any(|s| s == name) { + // Closed loop on the resolution path — by definition + // cycle-capable. + return true; + } + match resolve_record(name.as_str()) { + Some(fields) => { + seen.push(name.clone()); + let result = fields.iter().any(|f| rec(f, seen, resolve_record)); + seen.pop(); + result + } + // Unknown name (type variable, missing record, + // unresolved alias): conservative default. + None => true, + } + } + } + } + + let mut seen: Vec = Vec::new(); + rec(self, &mut seen, resolve_record) + } +} + #[cfg(test)] #[allow(clippy::approx_constant)] mod tests { @@ -1525,4 +1611,134 @@ mod tests { "expected Field unchanged when field name not in scope, got {last:?}" ); } + + // ---- can_form_cycle tests ---- + // + // The whole point of the classifier is to make our cycle-freedom + // invariant testable. If any of these change, the language has grown + // a new cycle-forming construct and the runtime needs a cycle + // collector before that change ships. + + fn no_records(_: &str) -> Option> { + None + } + + #[test] + fn primitives_cannot_cycle() { + let resolver = |s: &str| no_records(s); + assert!(!Type::Number.can_form_cycle(&resolver)); + assert!(!Type::Bool.can_form_cycle(&resolver)); + assert!(!Type::Text.can_form_cycle(&resolver)); + assert!(!Type::Sum(vec!["a".into(), "b".into()]).can_form_cycle(&resolver)); + } + + #[test] + fn lists_and_maps_of_primitives_cannot_cycle() { + let resolver = |s: &str| no_records(s); + assert!(!Type::List(Box::new(Type::Number)).can_form_cycle(&resolver)); + assert!(!Type::List(Box::new(Type::Text)).can_form_cycle(&resolver)); + assert!(!Type::Map(Box::new(Type::Text), Box::new(Type::Number)).can_form_cycle(&resolver)); + assert!(!Type::Optional(Box::new(Type::Number)).can_form_cycle(&resolver)); + assert!( + !Type::Result(Box::new(Type::Number), Box::new(Type::Text)).can_form_cycle(&resolver) + ); + } + + #[test] + fn any_is_conservative() { + let resolver = |s: &str| no_records(s); + assert!(Type::Any.can_form_cycle(&resolver)); + assert!(Type::List(Box::new(Type::Any)).can_form_cycle(&resolver)); + } + + #[test] + fn fn_is_conservative() { + // Closures capture by value. The capture types aren't exposed at the + // Type level, so the classifier marks Fn cycle-capable. + let resolver = |s: &str| no_records(s); + assert!(Type::Fn(vec![Type::Number], Box::new(Type::Number)).can_form_cycle(&resolver)); + } + + #[test] + fn unknown_named_is_conservative() { + let resolver = |s: &str| no_records(s); + assert!(Type::Named("Whatever".into()).can_form_cycle(&resolver)); + } + + #[test] + fn record_of_primitives_cannot_cycle() { + let resolver = |s: &str| match s { + "Point" => Some(vec![Type::Number, Type::Number]), + _ => None, + }; + assert!(!Type::Named("Point".into()).can_form_cycle(&resolver)); + } + + #[test] + fn record_with_primitive_list_cannot_cycle() { + let resolver = |s: &str| match s { + "Bag" => Some(vec![Type::Text, Type::List(Box::new(Type::Number))]), + _ => None, + }; + assert!(!Type::Named("Bag".into()).can_form_cycle(&resolver)); + } + + #[test] + fn record_containing_any_field_can_cycle() { + let resolver = |s: &str| match s { + "Box" => Some(vec![Type::Any]), + _ => None, + }; + assert!(Type::Named("Box".into()).can_form_cycle(&resolver)); + } + + #[test] + fn record_containing_function_field_can_cycle() { + let resolver = |s: &str| match s { + "Handler" => Some(vec![Type::Fn(vec![Type::Number], Box::new(Type::Number))]), + _ => None, + }; + assert!(Type::Named("Handler".into()).can_form_cycle(&resolver)); + } + + #[test] + fn self_referential_record_marked_cycle_capable() { + // type Node { next:Node } — not constructible today (records are + // immutable, so you can't tie the knot), but the classifier still + // marks the *type* cycle-capable. If we ever add a primitive that + // would let you build one, the runtime needs to know. + let resolver = |s: &str| match s { + "Node" => Some(vec![Type::Named("Node".into())]), + _ => None, + }; + assert!(Type::Named("Node".into()).can_form_cycle(&resolver)); + } + + #[test] + fn mutually_recursive_records_marked_cycle_capable() { + let resolver = |s: &str| match s { + "A" => Some(vec![Type::Named("B".into())]), + "B" => Some(vec![Type::Named("A".into())]), + _ => None, + }; + assert!(Type::Named("A".into()).can_form_cycle(&resolver)); + assert!(Type::Named("B".into()).can_form_cycle(&resolver)); + } + + #[test] + fn list_of_record_inherits_record_capability() { + let primitive_record = |s: &str| match s { + "Point" => Some(vec![Type::Number, Type::Number]), + _ => None, + }; + assert!( + !Type::List(Box::new(Type::Named("Point".into()))).can_form_cycle(&primitive_record) + ); + + let cyclic_record = |s: &str| match s { + "Node" => Some(vec![Type::Named("Node".into())]), + _ => None, + }; + assert!(Type::List(Box::new(Type::Named("Node".into()))).can_form_cycle(&cyclic_record)); + } }