C++ transpiler that compiles .sm math notation into multiple probabilistic programming backends.
p ~ Beta(1, 1)
x ~ Bin(10, p) | 7
sample(5000)
Compiles to ~20 lines of Stan, ~15 lines of PyMC, TFP.js, etc.
statsim-lang model.sm --target stan # Stan code
statsim-lang model.sm --target tfpjs # TFP.js (JavaScript)
statsim-lang model.sm --target pymc # PyMC (Python)
statsim-lang model.sm --target z3 # Z3 (Python, optimization only)| Backend | Status |
|---|---|
| TFP.js | 10/10 models validated |
| Stan | 10/10 models validated |
| Z3 | model 10 validated (constraint/optimization only) |
| PyMC | implemented (compile + syntax validation tests) |
| NumPyro | planned |
cmake -B build
cmake --build buildRequires C++17. Tests run via CTest + Python (no network fetch required).
Module mode compiles a .sm model into a runner object you can call from host code.
import statsim_lang as sl
runner = sl.model("model.sm", target="pymc")
trace = runner.run({"x": data_x, "N": len(data_x)}) # run inference
model = runner.build({"x": data_x, "N": len(data_x)}) # get pm.Modelimport { model } from '@statsim/lang'
const runner = model(source, 'tfpjs')
const results = await runner.run({ x: dataX, N: dataX.length })
const { targetLogProb, bijectors } = await runner.build({ x: dataX })Models can reference functions the compiler doesn't know about. The host provides them at runtime — no special syntax needed.
h = hazard(time, X, beta, gamma)
y ~ Poisson(h), i=1:N
sample(1000)
Python — pass callables as keyword arguments:
def hazard(time, X, beta, gamma):
return pt.exp(X @ beta + gamma * time)
trace = runner.run(data, hazard=hazard)JavaScript — pass callables alongside data (auto-separated):
const encode = (images) => model.predict(images)
const results = await runner.run({ images, labels, encode })Use runner.build() to get the underlying model object and extend it in host code.
Python — returns a pm.Model:
model = runner.build(data)
with model:
f = pm.gp.Latent(cov_func=pm.gp.cov.Matern52(1, ls=1.0))
pm.sample(1000)JavaScript — returns { targetLogProb, bijectors, stateNames }:
const { targetLogProb, bijectors } = await runner.build(data)
// use with custom MCMC or optimizationDefine reusable functions with f(x) = expr syntax:
logistic(x) = 1 / (1 + exp(-x))
p = logistic(alpha + beta * x)
y ~ Bernoulli(p), i=1:N
Emits as native functions in each backend (Python def, JS function, Stan functions {} block).
See PLAN.md for the full language spec, spec/syntax.ebnf for the grammar, and spec/distributions.md for the distribution parameter mapping across backends.
10 reference models in spec/examples/:
| Model | Features |
|---|---|
| 01_coin_flip | Basic sample + observe |
| 02_ab_test | Multiple priors, Binomial |
| 03_linear_reg | Plated observations |
| 04_robust_reg | Student-t, heavy tails |
| 05_hierarchical | Nested priors, hyperparameters |
| 06_mixture | Gaussian mixture, Dirichlet |
| 07_gp | Gaussian process, matrix ops |
| 08_ar2 | Autoregressive time series |
| 09_changepoint | Discrete switching |
| 10_portfolio | Optimization with constraints |