SQLite math/stat UDFs (ln/log/sqrt/pow/exp + stddev/var/corr/covar) with O(1) Welford memory and cross-dialect parity
SLayer 0.4.1
Bug-fix-and-capability release that brings SQLite to math/stat parity
with Postgres, DuckDB, MySQL, and ClickHouse, and exposes seven new
statistical aggregations as built-in colon-syntax aggregates. Motivated
by DEV-1317: a
BIRD-Interact KB-translation effort that hit ~30–40 formulas which were
unencodable on bare SQLite for want of basic math/stat operators.
Backwards-compatible. No schema migrations.
What's new
Scalar math UDFs on SQLite
Use these inside Column.sql, ModelMeasure.formula, or
Aggregation.formula. NULL inputs return NULL (matching cross-dialect
SQL semantics). Math-domain errors propagate as
sqlite3.OperationalError — matching Postgres's strict semantics, not
SQLite ≥3.35's silent-NULL log() built-in.
| Name | Args | Notes |
|---|---|---|
ln(x) |
1 | Natural logarithm |
log10(x) |
1 | Base-10 logarithm |
log(B, X) |
2 | Returns log_B(X). Base first, value second — matches Postgres LOG(b, x), sqlglot's emission, and the SQLite ≥3.35 built-in. The UDF registers unconditionally and overrides any built-in so the strict-error policy is uniform across SQLite versions. |
exp(x) |
1 | e^x |
sqrt(x) |
1 | Square root |
pow(x, n) / power(x, n) |
2 | Both spellings registered. Uses math.pow internally — rejects negative-base + non-integer exponent (Python's ** would silently return a complex), and bounds huge exponents to IEEE-754 (Python's ** would build unbounded big-ints). |
Statistical aggregations (colon syntax)
Available as built-in aggregations on every numeric column:
{"formula": "latency:stddev_samp"}
{"formula": "latency:var_pop"}
{"formula": "price:corr(other=quantity)"}
{"formula": "price:covar_samp(other=quantity)"}| Aggregation | Postgres-matching edge cases |
|---|---|
stddev_samp / var_samp / covar_samp |
NULL when N ≤ 1 |
stddev_pop / var_pop / covar_pop |
NULL at N = 0; 0 at N = 1 |
corr |
NULL when fewer than 2 non-null pairs OR either side has zero variance |
The two-column aggregations (corr, covar_samp, covar_pop) take
the second column as a named other= parameter — same shape as
weighted_avg(weight=…). NULL pairs (either x or y NULL) are dropped
entirely from the calculation.
Memory profile: implemented with Welford online accumulators —
O(1) memory regardless of group size, and numerically more stable than
the naive two-pass formula. Safe for SQLite analytics workloads with
millions of rows per group.
Cross-dialect support
| Dialect | Stat aggregates | Math scalars |
|---|---|---|
| SQLite | All seven via Python UDFs (slayer/sql/sqlite_udfs.py); var_samp/var_pop aliased to variance/variance_pop to handle sqlglot's transpilation rewrite. |
All seven UDFs above. |
| Postgres / DuckDB | All native: STDDEV_* / VAR_* / CORR / COVAR_*. |
All native. |
| ClickHouse | All native (sqlglot may emit camelCase, e.g. varSamp). |
All native. |
| MySQL | STDDEV_SAMP / STDDEV_POP / VAR_SAMP / VAR_POP are native. corr / covar_samp / covar_pop raise NotImplementedError at SQL generation time (no native function, no Python-UDF mechanism). Use MariaDB or compute client-side. |
All native. |
Internal improvements
These don't change behaviour but are worth knowing about:
-
Unified aggregate builders. The three dialect-aware builders
(_build_percentile,_build_formula_agg,_build_stat_aggin
slayer/sql/generator.py) now share two helpers —_wrap_filterand
SQLGenerator._resolve_agg_param— and emit fully model-qualified
column references via_resolve_sql. Cognitive complexity of
_build_stat_aggdropped from 22 → ~6, addressing a Sonar
python:S3776finding. -
percentile(p=…)validation.pis now validated as a numeric
literal in[0, 1]after parameter resolution. Closes a security gap
where a malicious model-level default likep=pg_sleep(10)could
bypass the SQL-injection regex check (Codex review finding).
Verify your work
# Full unit suite
poetry run pytest -m "not integration"
# SQLite + DuckDB integration (no Docker needed)
poetry run pytest tests/integration/test_integration.py -m integration
poetry run pytest tests/integration/test_integration_duckdb.py -m integration
# Lint
poetry run ruff check slayer/ tests/ examples/If you run the Docker-backed examples, the new aggregations are smoke-
tested against live ClickHouse and MySQL via
examples/{clickhouse,mysql}/verify.py (with check_stddev_var() and,
on ClickHouse, check_corr_covar() — MySQL skips the latter since
corr/covar_* aren't supported there).
Documentation
- CLAUDE.md — Key Conventions + Aggregation caveats sections updated.
docs/concepts/{formulas,queries,models,terminology}.md— new aggregations + scalar functions documented end-to-end, with thelog(B, X)arg-order pin called out explicitly.docs/database-support.md— per-dialect support matrix + SQLite UDF caveats.docs/examples/07_aggregations/aggregations.md— extended quick-reference table.slayer/help/topics/03_aggregations.md+.claude/skills/slayer-{query,models}.md— agent-facing summaries updated.
Known limitations & follow-ups
Filed as sibling Linear issues for future releases:
- DEV-1320 — SQLite string/regex UDFs (
regexp,regexp_replace,split_part, …). Not blocking any known KB pattern but rounds out parity. - DEV-1321 — ClickHouse stat aggregates may return
NaN(not SQL NULL) for single-row / empty / zero-variance groups, contradicting the documented Postgres-style semantics. Fix needsIF(isNaN(x), NULL, x)wrapping plus ClickHouse integration coverage. - DEV-1322 — Type-check kwarg-as-column args for
weighted_avg(weight=…)/corr(other=…)/covar_*(other=…). Todayamount:corr(other=string_column)slips past enrichment and fails at the database with a dialect-specific runtime error rather than a clear "not numeric" message.
Acknowledgements
CodeRabbit, SonarCloud, and Codex (via the codex-review skill) all
contributed review feedback that shaped the final shape of this
release.