SLayer 0.4.1

Bug-fix-and-capability release that brings SQLite to math/stat parity
with Postgres, DuckDB, MySQL, and ClickHouse, and exposes seven new
statistical aggregations as built-in colon-syntax aggregates. Motivated
by DEV-1317: a
BIRD-Interact KB-translation effort that hit ~30–40 formulas which were
unencodable on bare SQLite for want of basic math/stat operators.

Backwards-compatible. No schema migrations.

What's new

Scalar math UDFs on SQLite

Use these inside Column.sql, ModelMeasure.formula, or
Aggregation.formula. NULL inputs return NULL (matching cross-dialect
SQL semantics). Math-domain errors propagate as
sqlite3.OperationalError — matching Postgres's strict semantics, not
SQLite ≥3.35's silent-NULL log() built-in.

Name	Args	Notes
`ln(x)`	1	Natural logarithm
`log10(x)`	1	Base-10 logarithm
`log(B, X)`	2	Returns log_B(X). Base first, value second — matches Postgres `LOG(b, x)`, sqlglot's emission, and the SQLite ≥3.35 built-in. The UDF registers unconditionally and overrides any built-in so the strict-error policy is uniform across SQLite versions.
`exp(x)`	1	e^x
`sqrt(x)`	1	Square root
`pow(x, n)` / `power(x, n)`	2	Both spellings registered. Uses `math.pow` internally — rejects negative-base + non-integer exponent (Python's `` would silently return a complex), and bounds huge exponents to IEEE-754 (Python's `` would build unbounded big-ints).

Statistical aggregations (colon syntax)

Available as built-in aggregations on every numeric column:

{"formula": "latency:stddev_samp"}
{"formula": "latency:var_pop"}
{"formula": "price:corr(other=quantity)"}
{"formula": "price:covar_samp(other=quantity)"}

Aggregation	Postgres-matching edge cases
`stddev_samp` / `var_samp` / `covar_samp`	NULL when N ≤ 1
`stddev_pop` / `var_pop` / `covar_pop`	NULL at N = 0; 0 at N = 1
`corr`	NULL when fewer than 2 non-null pairs OR either side has zero variance

The two-column aggregations (corr, covar_samp, covar_pop) take
the second column as a named other= parameter — same shape as
weighted_avg(weight=…). NULL pairs (either x or y NULL) are dropped
entirely from the calculation.

Memory profile: implemented with Welford online accumulators —
O(1) memory regardless of group size, and numerically more stable than
the naive two-pass formula. Safe for SQLite analytics workloads with
millions of rows per group.

Cross-dialect support

Dialect	Stat aggregates	Math scalars
SQLite	All seven via Python UDFs (`slayer/sql/sqlite_udfs.py`); `var_samp`/`var_pop` aliased to `variance`/`variance_pop` to handle sqlglot's transpilation rewrite.	All seven UDFs above.
Postgres / DuckDB	All native: `STDDEV_` / `VAR_` / `CORR` / `COVAR_*`.	All native.
ClickHouse	All native (sqlglot may emit camelCase, e.g. `varSamp`).	All native.
MySQL	`STDDEV_SAMP` / `STDDEV_POP` / `VAR_SAMP` / `VAR_POP` are native. `corr` / `covar_samp` / `covar_pop` raise `NotImplementedError` at SQL generation time (no native function, no Python-UDF mechanism). Use MariaDB or compute client-side.	All native.

Internal improvements

These don't change behaviour but are worth knowing about:

Unified aggregate builders. The three dialect-aware builders
(_build_percentile, _build_formula_agg, _build_stat_agg in
slayer/sql/generator.py) now share two helpers — _wrap_filter and
SQLGenerator._resolve_agg_param — and emit fully model-qualified
column references via _resolve_sql. Cognitive complexity of
_build_stat_agg dropped from 22 → ~6, addressing a Sonar
python:S3776 finding.
percentile(p=…) validation. p is now validated as a numeric
literal in [0, 1] after parameter resolution. Closes a security gap
where a malicious model-level default like p=pg_sleep(10) could
bypass the SQL-injection regex check (Codex review finding).

Verify your work

# Full unit suite
poetry run pytest -m "not integration"

# SQLite + DuckDB integration (no Docker needed)
poetry run pytest tests/integration/test_integration.py -m integration
poetry run pytest tests/integration/test_integration_duckdb.py -m integration

# Lint
poetry run ruff check slayer/ tests/ examples/

If you run the Docker-backed examples, the new aggregations are smoke-
tested against live ClickHouse and MySQL via
examples/{clickhouse,mysql}/verify.py (with check_stddev_var() and,
on ClickHouse, check_corr_covar() — MySQL skips the latter since
corr/covar_* aren't supported there).

Documentation

CLAUDE.md — Key Conventions + Aggregation caveats sections updated.
docs/concepts/{formulas,queries,models,terminology}.md — new aggregations + scalar functions documented end-to-end, with the log(B, X) arg-order pin called out explicitly.
docs/database-support.md — per-dialect support matrix + SQLite UDF caveats.
docs/examples/07_aggregations/aggregations.md — extended quick-reference table.
slayer/help/topics/03_aggregations.md + .claude/skills/slayer-{query,models}.md — agent-facing summaries updated.

Known limitations & follow-ups

Filed as sibling Linear issues for future releases:

DEV-1320 — SQLite string/regex UDFs (regexp, regexp_replace, split_part, …). Not blocking any known KB pattern but rounds out parity.
DEV-1321 — ClickHouse stat aggregates may return NaN (not SQL NULL) for single-row / empty / zero-variance groups, contradicting the documented Postgres-style semantics. Fix needs IF(isNaN(x), NULL, x) wrapping plus ClickHouse integration coverage.
DEV-1322 — Type-check kwarg-as-column args for weighted_avg(weight=…) / corr(other=…) / covar_*(other=…). Today amount:corr(other=string_column) slips past enrichment and fails at the database with a dialect-specific runtime error rather than a clear "not numeric" message.

Acknowledgements

CodeRabbit, SonarCloud, and Codex (via the codex-review skill) all
contributed review feedback that shaped the final shape of this
release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLite math/stat UDFs (ln/log/sqrt/pow/exp + stddev/var/corr/covar) with O(1) Welford memory and cross-dialect parity

Choose a tag to compare

Sorry, something went wrong.