mly — Verified ML Framework

v0.2.0-dev | Apache-2.0 | Repo

Current import path: give mly a torch.export graph plus safetensors weights and get back an improving Rust/Metal import-and-compile bridge, with machine-readable ConvertReport output and optional proof/report hooks where they are actually wired today, including current composition-bounds method/soundness/proof-strength when that verifier path runs.

mly is building toward a Rust-native ML stack where model code, GPU kernels, and proof tooling live together. Today the concrete production story is Kani-backed kernel verification, an exported-artifact compiler bridge that keeps getting better, partial gamma-crown integration, and Metal as the real backend in production use.

The direct one-function PyTorch/ONNX compiler vision is not here yet. Today the import path is torch.export JSON + safetensors, and the mly CLI can shell out to mly_export.py to produce those artifacts for you.

PyTorch export + weights
        |
        v
  mly::convert() bridge
        |
        ├── Parse torch.export graph
        ├── Map aten ops → imported graph / mly TraceOps
        ├── Load safetensors weights
        ├── Trace → compile → constant-fold / fuse / optimize
        ├── Compile to GPU (Metal MSL today; HIP / SPIR-V planned)
        ├── Structured `ConvertReport` + optional verification hooks where wired
        |
        v
  CompiledModel + `ConvertReport`
        |
        ├── execute_dyn(&inputs) → outputs  (GPU-optimized inference)
        └── report / parity artifacts       (when configured and available)

This is the exported-artifact bridge that exists today, not yet a universal one-call PyTorch/ONNX compiler for arbitrary models.

Quick Start

# Build / trace / execute on Metal
mly = { git = "https://github.com/andrewdyates/mly", features = ["metal"] }

# Compiled exported-artifact conversion via `mly::convert()`
mly = { git = "https://github.com/andrewdyates/mly", features = ["import-metal"] }

# Same compiled import bridge plus available verification hooks
mly = { git = "https://github.com/andrewdyates/mly", features = ["import-full"] }

# Backend-agnostic `convert_from_trace()`;
# combine with `metal` for one-function exported-artifact Metal helpers
mly = { git = "https://github.com/andrewdyates/mly", features = ["metal", "convert-model"] }

Option A: Build a model from scratch

use mly::{DType, Device, DynTensor, Module, Result, VarBuilder};
use mly::{linear, embedding, layer_norm};

fn main() -> Result<()> {
    let device = Device::Cpu;
    let vb = VarBuilder::zeros(DType::F32, &device);

    let emb = embedding(1000, 128, &vb.pp("embed"))?;
    let ln = layer_norm(128, Default::default(), &vb.pp("ln"))?;
    let proj = linear(128, 32, &vb.pp("proj"))?;

    let ids = DynTensor::from_vec_u32(vec![1, 42, 7], &[3], &device)?;
    let x = emb.forward(&ids)?;   // [3, 128]
    let x = ln.forward(&x)?;      // [3, 128]
    let out = proj.forward(&x)?;  // [3, 32]

    assert_eq!(out.dims(), &[3, 32]);
    Ok(())
}

Option B: Compile exported PyTorch artifacts to Metal

use std::path::Path;

use mly::{convert, metal::PipelineCache, OptLevel};

let cache = PipelineCache::new()?;
let result = convert(
    Path::new("model_graph.json"),    // from torch.export
    Path::new("weights.safetensors"), // from PyTorch
    &cache,
)
.optimize(OptLevel::Full)
.build()?;

// Optional import-time verification/report hooks exist on `import-full`,
// but they are still partial and model-dependent today.

// result.result.model: CompiledModel — optimized GPU inference
// result.result.proof: optional report/parity artifacts when configured
let output = result.result.model.execute_dyn(&cache, &[&input])?;

mly::convert() is gated by import-metal and returns the builder-style Metal bridge shown above. If you want the backend-agnostic imported-model surface instead, enable convert-model and call mly::convert_from_trace(). If you want the newer one-function exported-artifact Metal helpers, enable metal + convert-model and call mly::compile_metal_from_exported_artifacts() or mly::compile_metal_from_exported_artifacts_with_report().

mly::compile_metal_from_exported_artifacts_with_report() returns the compiled Metal model plus the same structured ConvertReport used by the CLI. Both mly convert and mly compile can persist that JSON schema with --report-output and print it with --json. The populated verification/report fields are feature- and pipeline-dependent today; when the current composition-bounds verifier path runs, the JSON can record composition_method, composition_soundness_mode, and composition_proof_strength for that run. This is audit-friendly reporting for the exported-artifact path, not a finished proof-powered compiler certificate.

For automation, mly compile --output model.mlyc --report-output model.compile.json --json persists the same structured ConvertReport JSON to disk and stdout. If --report-output is omitted, mly compile still writes a default sibling report next to the .mlyc output and prints that report path on stderr.

Broad direct PyTorch/ONNX intake is still roadmap work.

Option C: Trace and compile any mly model

use mly_core::dyn_tensor::trace::trace_graph;
use mly_metal::compiled_model::CompiledModel;

// Trace a forward pass (like torch.compile)
let (_output, graph) = trace_graph(|| model.forward(&input))?;

// Compile: fuse ops, optimize, upload weights to GPU
let compiled = CompiledModel::builder(&graph, &cache).build()?;

// Subsequent calls are fast — pre-compiled GPU dispatch
let result = compiled.execute_dyn(&cache, &[&input])?;

Features

Feature	What it enables
`metal`	Apple Metal GPU backend (macOS)
`dsl`	`#[mly::kernel]` and `#[mly::model]` proc-macros
`training`	Autodiff (reverse-mode AD) + optimizers (AdamW, SGD, LoRA)
`verify`	`metal`-backed verification surfaces plus Kani / gamma-crown / z4 integrations
`import`	Exported-artifact parsing and graph-building re-exports (`torch.export` JSON, op mapping, imported graphs)
`import-metal`	Compiled exported-artifact conversion via `mly::convert()` / `mly::import::convert()`
`import-full`	`import-metal` plus the available import-time verification hooks
`convert-model`	Backend-agnostic `mly::convert_from_trace()` → `ConvertedModel`; combine with `metal` for `mly::compile_metal_from_exported_artifacts()` / `mly::compile_metal_from_exported_artifacts_with_report()`
`whisper`	Whisper speech-to-text model
`qwen3`	Qwen3 LLM
`glm5`	GLM-4/5 (ChatGLM) LLM
`models`	Kokoro TTS, HTDemucs, ECAPA-TDNN, WeSpeaker, STFT/iSTFT
`reftest`	Reference tensor comparison tooling
`tts-verify`	Audio-domain quality verification for TTS/SVS pipelines

What You Get

nn layers and building blocks — Linear, Conv1d/2d, ConvTranspose1d/2d, LSTM, BiLSTM, Embedding, LayerNorm, GroupNorm, InstanceNorm, BatchNorm, RMSNorm, AdaIN, AdaLN, MultiHeadAttention, RoPE, SwiGLU, MoE, and more.

Model implementations — Kokoro TTS (flagship integration target), Silero VAD, HTDemucs, Whisper, Qwen3, GLM-5, ECAPA-TDNN, and additional import/model-conversion paths under active development.

DynTensor — dynamic-rank tensor with 60+ ops, GPU dispatch, BF16/F16/F32 support. API mirrors candle for easy migration.

Compilation pipeline — trace_graph() captures computation graphs, compile_trace() lowers to GPU dispatch plans, and configurable fusion / peephole / native-op passes reduce dispatch overhead before execution.

Fused NativeOps — LSTM sequence, Flash Attention, SIMD GEMM, fused norm+activation+conv, batched projections, and related dispatch-collapsing kernels.

Verified kernels — #[mly::kernel] generates GPU code + Kani proof harness + differential test from one Rust function. Live proof totals are tracked in operational_state.json.

Metal GPU backend — precompiled Metal kernels, lazy command buffer batching, activation arena for buffer reuse, simdgroup matrix multiply, and F16 autocast.

Certification pipeline — certify_model() orchestrates trace, verify, and proof bundling. Certificate coverage is still model-dependent today; Kokoro/Moonshot-style checks are partial, not a universal end-to-end guarantee.

Training — reverse-mode autodiff with gradient tape, AdamW/SGD optimizers, LoRA adapters, learning rate schedules, gradient clipping/scaling.

Current Status

Fast-moving repo totals live in operational_state.json rather than this README. Current tracked counts there are:

Metric	Value
Rust source LOC	1,450,257
Tests	46,803
Kani proof harnesses	11,475
Bounds registry entries	22
nn `Module` impls	55

Kokoro TTS (flagship model)

Kokoro is mly's primary integration target and the model that drove most of the GPU/compiler work. The single-voice path is useful today, but the chorus path is still below the production bar:

What Kokoro drove	Framework benefit
LSTM GPU kernel	Any model with LSTMs runs on GPU
Flash Attention	Any transformer model benefits
Lazy command buffer batching	All models: fewer GPU sync points
Activation arena	All models: GPU buffer reuse
Trace-compile pipeline	`trace_graph() → CompiledModel` works for any model
Fusion / peephole passes	General: AddLayerNorm, Linear+activation, batched Q/K/V, NormLinear fusion. Kokoro-specific: ResBlock, style projection, conv/norm fusion work
iSTFT GPU	Any vocoder or audio model
F16 autocast	All models: mixed-precision inference
VarBuilder weight loading	All models: safetensors import

Kokoro scorecards are split. The live machine-readable single-voice production scorecard is operational_state.json (kokoro_rtf_tracking / kokoro_performance), which records production RTF, dispatch, flush, and related gate metadata for the single-voice path only. The current chorus-side report path is reports/2026-03-29-gate-measurements.md, which captures chorus-adjacent gate results such as TTFA snapshots and chorus test outcomes. That report is useful for execution-state honesty, but it is not a proof that chorus quality or production throughput is solved. Multi-voice chorus quality and throughput are still not production-good.

The production Kokoro gates also expose stable machine-readable scrape surfaces for local automation: single-voice RTF uses KOKORO_RTF_SCORECARD_JSON= / KOKORO_RTF_SCORECARD_OUT, production chorus uses KOKORO_PRODUCTION_CHORUS_SCORECARD_JSON= / KOKORO_PRODUCTION_CHORUS_SCORECARD_OUT, and streaming chorus uses KOKORO_STREAMING_CHORUS_SCORECARD_JSON= / KOKORO_STREAMING_CHORUS_SCORECARD_OUT. Those are scorecard/report surfaces, not proof that chorus is production-good. The repo also now has explicit recommended-autocast guard coverage on batch/shared-encode/streaming chorus surfaces. Those tests are regression guardrails for finite-audio, parity, and chunk-contract stability under the recommended config, not a certificate of chorus quality or throughput.

Near-Term Execution

Exported-artifact compiler bridge: keep expanding supported torch.export coverage on real models, persist real ConvertReport JSON from mly convert / mly compile, and land reusable equivalence checks for exported graphs. Raw PyTorch/ONNX intake is still future work.
Verification: raise gamma-crown from selective IBP/CROWN on current segments to broader composed single-voice coverage. The verifier path is useful today, but it is still partial rather than a finished end-to-end proof-powered compiler.
Chorus: keep the scorecard split honest until there is a real chorus production benchmark. operational_state.json is the live single-voice production scorecard; reports/2026-03-29-gate-measurements.md is still a chorus TTFA/test snapshot, not proof that chorus quality or throughput is solved.

How Verification Works

mly combines GPU kernel formal verification (Kani/CBMC model checking) with partial neural-network bound propagation via gamma-crown. Proofs operate on Rust source; Metal is the production backend today while broader model-property coverage is still in progress.

System	Verifies	Trusts (unverified)
alpha-beta-CROWN	ONNX graph bounds	PyTorch→ONNX export, CUDA runtime, GPU arithmetic
Marabou	ONNX graph via SMT	Same
ERAN	ONNX graph via abstract interpretation	Same
mly	Rust source, GPU kernels, selected model bounds	Hardware; broader gamma-crown/model coverage still in progress

Five verification levels:

Level	Method	What it catches
0	Type system	Rank mismatches (compile error)
1	Static analysis	Dimension mismatches (runtime)
2	IBP (gamma-crown)	Unbounded outputs, NaN/Inf
3	CROWN (gamma-crown)	Where wired, tighter-than-IBP correlations
4	SMT (z4)	Complete verification of small subgraphs

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│  MODEL DEFINITION                                                    │
│                                                                      │
│  Option A: Rust with proc-macros                                     │
│    #[mly::kernel] fn snake(x: f32, alpha: f32) -> f32 { ... }       │
│    #[mly::model] fn encoder(audio: Tensor<3, F32>) -> ...           │
│                                                                      │
│  Option B: Exported-artifact import                                  │
│    mly::convert(graph.json, weights.safetensors, &cache).build()    │
│                                                                      │
│  Option C: Trace compilation                                         │
│    trace_graph(|| model.forward(&x)) → CompiledModel                │
└───────────────────────────────┬──────────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────────┐
│  OPTIMIZATION (fusion + peephole/native-op passes)                   │
│                                                                      │
│  Elementwise chain fusion, ResBlock fusion, Linear+activation,       │
│  AddLayerNorm, batched Q/K/V projection, NormLinear, flip+LSTM,     │
│  Conv+LayerNorm, style projection batching                           │
└───────────────────────────────┬──────────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────────┐
│  VERIFICATION (on Rust source, backend-agnostic)                     │
│                                                                      │
│  Kani:        Each kernel proved correct (no overflow, no NaN)       │
│  gamma-crown: Partial model bounds via IBP/CROWN where wired         │
│  z4:          Complete SMT verification of small components          │
└───────────────────────────────┬──────────────────────────────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              ▼                 ▼                 ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│  Metal (Apple)   │  │  HIP (AMD)       │  │  Vulkan          │
│  MSL → .metallib │  │  HIP codegen     │  │  SPIR-V          │
│  Production      │  │  Codegen only    │  │  Planned         │
└──────────────────┘  └──────────────────┘  └──────────────────┘

Crates

Core

Crate	Description
mly	Top-level re-exports with feature gates
mly-core	DynTensor, Tensor, nn modules, VarBuilder
mly-dsl	`#[kernel]` proc-macro, KernelIR, tensor IR, trace compiler, fusion / peephole infrastructure
mly-macros	Proc-macro entry point for `#[mly::kernel]` and `#[mly::model]`

Backends

Crate	Description
mly-metal	Metal GPU backend — compiled model execution, arena, lazy batching
mly-cuda	CUDA/HIP backend — HIP codegen from DispatchStep IR
mly-cpu	CPU backend (NEON/AVX2 SIMD) — planned
mly-vulkan	Vulkan backend (SPIR-V) — planned

Verification

Crate	Description
mly-verify	Kani harness generation, gamma-crown bridge (partial), z4 SMT encoding, fusion proofs
mly-reftest	Reference tensor comparison, differential testing, spectral comparison

Models

Crate	Description
mly-models	HTDemucs, Silero VAD, Kokoro TTS, ECAPA-TDNN, WeSpeaker, STFT/iSTFT
mly-whisper	Whisper speech-to-text (encoder-decoder transformer, mel spectrogram)
mly-qwen3	Qwen3 LLM (decoder-only transformer with RoPE, GQA, SwiGLU)
mly-glm5	GLM-4/5 decoder-only LLM

Training & Import

Crate	Description
mly-autodiff	Reverse-mode automatic differentiation
mly-optim	AdamW, SGD, LoRA, learning rate schedules
mly-import	Exported PyTorch artifact bridge — `torch.export` graphs / segmented bundles → `ComputationGraph` / Metal-compiled segments
mly-tts-verify	Audio-domain quality verification for TTS/SVS pipelines

Building

cargo check --workspace              # Type-check all crates
cargo test --workspace               # Run all tests
cargo test -p mly-core               # Test core tensor types and nn layers
cargo test -p mly-metal              # Test Metal GPU backend
cargo test -p mly-verify             # Test verification infrastructure

Roadmap

Milestone	Status	Description
Core types + Metal backend	Complete	DynTensor, Tensor, nn layers/modules, Metal dispatch
Kernel DSL + verification	Kani complete	Counts live in `operational_state.json`; gamma-crown IBP/CROWN coverage remains partial
Training	Complete	Reverse-mode AD, AdamW/SGD/LoRA, Kani-verified gradients
Proof certificates	Partial	`certify_model()` orchestrator and checker exist; end-to-end property coverage still varies by model
Trace-compile pipeline	Complete	`trace_graph()` → fusion / optimization → `CompiledModel` → GPU
PyTorch import	Bridge improving	`convert()` handles supported `torch.export` artifacts; not yet a universal one-function PyTorch/ONNX compiler
Kokoro TTS pipeline	In progress	Strong integration target; multi-voice chorus quality/perf still below the production bar
Multi-backend	Metal prod	Metal is the production backend; CUDA/HIP codegen exists but runtime work remains
General model compilation	Next	Broaden `convert()` and compiler coverage beyond Kokoro and selected paths
Verified quantization	Planned	Prove bf16/int8 preserves bounds for all inputs

References

Kani — Rust model checker used for kernel verification
alpha-beta-CROWN — VNN-COMP winner 2021-2025, NN bound propagation
Marabou — SMT-based NN verification
ERAN — Abstract interpretation for NNs
VNN-COMP — Annual NN verification competition

License

Author: Andrew Yates

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crates		crates
presentation		presentation
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
INTEGRITY.json		INTEGRITY.json
LICENSE		LICENSE
README.md		README.md
kokoro_optimizer_summary.json		kokoro_optimizer_summary.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mly — Verified ML Framework

Quick Start

Option A: Build a model from scratch

Option B: Compile exported PyTorch artifacts to Metal

Option C: Trace and compile any mly model

Features

What You Get

Current Status

Kokoro TTS (flagship model)

Near-Term Execution

How Verification Works

Architecture

Crates

Core

Backends

Verification

Models

Training & Import

Building

Roadmap

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mly — Verified ML Framework

Quick Start

Option A: Build a model from scratch

Option B: Compile exported PyTorch artifacts to Metal

Option C: Trace and compile any mly model

Features

What You Get

Current Status

Kokoro TTS (flagship model)

Near-Term Execution

How Verification Works

Architecture

Crates

Core

Backends

Verification

Models

Training & Import

Building

Roadmap

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages