Skip to content

statsim/compiler

Repository files navigation

statsim-lang

C++ transpiler that compiles .sm math notation into multiple probabilistic programming backends.

Quick example

p ~ Beta(1, 1)
x ~ Bin(10, p) | 7
sample(5000)

Compiles to ~20 lines of Stan, ~15 lines of PyMC, TFP.js, etc.

Usage

statsim-lang model.sm --target stan    # Stan code
statsim-lang model.sm --target tfpjs   # TFP.js (JavaScript)
statsim-lang model.sm --target pymc    # PyMC (Python)
statsim-lang model.sm --target z3      # Z3 (Python, optimization only)

Available backends

Backend Status
TFP.js 10/10 models validated
Stan 10/10 models validated
Z3 model 10 validated (constraint/optimization only)
PyMC implemented (compile + syntax validation tests)
NumPyro planned

Build

cmake -B build
cmake --build build

Requires C++17. Tests run via CTest + Python (no network fetch required).

Module mode (Python / JavaScript)

Module mode compiles a .sm model into a runner object you can call from host code.

Python (PyMC)

import statsim_lang as sl

runner = sl.model("model.sm", target="pymc")
trace = runner.run({"x": data_x, "N": len(data_x)})   # run inference
model = runner.build({"x": data_x, "N": len(data_x)})  # get pm.Model

JavaScript (TFP.js)

import { model } from '@statsim/lang'

const runner = model(source, 'tfpjs')
const results = await runner.run({ x: dataX, N: dataX.length })
const { targetLogProb, bijectors } = await runner.build({ x: dataX })

Host functions (Call In)

Models can reference functions the compiler doesn't know about. The host provides them at runtime — no special syntax needed.

h = hazard(time, X, beta, gamma)
y ~ Poisson(h), i=1:N
sample(1000)

Python — pass callables as keyword arguments:

def hazard(time, X, beta, gamma):
    return pt.exp(X @ beta + gamma * time)

trace = runner.run(data, hazard=hazard)

JavaScript — pass callables alongside data (auto-separated):

const encode = (images) => model.predict(images)
const results = await runner.run({ images, labels, encode })

Model composition (Compose Out)

Use runner.build() to get the underlying model object and extend it in host code.

Python — returns a pm.Model:

model = runner.build(data)
with model:
    f = pm.gp.Latent(cov_func=pm.gp.cov.Matern52(1, ls=1.0))
    pm.sample(1000)

JavaScript — returns { targetLogProb, bijectors, stateNames }:

const { targetLogProb, bijectors } = await runner.build(data)
// use with custom MCMC or optimization

Inline functions

Define reusable functions with f(x) = expr syntax:

logistic(x) = 1 / (1 + exp(-x))
p = logistic(alpha + beta * x)
y ~ Bernoulli(p), i=1:N

Emits as native functions in each backend (Python def, JS function, Stan functions {} block).

Language reference

See PLAN.md for the full language spec, spec/syntax.ebnf for the grammar, and spec/distributions.md for the distribution parameter mapping across backends.

Examples

10 reference models in spec/examples/:

Model Features
01_coin_flip Basic sample + observe
02_ab_test Multiple priors, Binomial
03_linear_reg Plated observations
04_robust_reg Student-t, heavy tails
05_hierarchical Nested priors, hyperparameters
06_mixture Gaussian mixture, Dirichlet
07_gp Gaussian process, matrix ops
08_ar2 Autoregressive time series
09_changepoint Discrete switching
10_portfolio Optimization with constraints

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published