Skip to content

BM25 memory + tighter filters

Choose a tag to compare

@ZmeiGorynych ZmeiGorynych released this 11 May 08:19
· 410 commits to main since this release
013fca7

SLayer 0.5.1 Release Notes

A maintenance release: six PRs since 0.5.0 tighten the agent-facing surface that the bird-interact-agents benchmark pounded on. The headline items are a BM25 ranker for the memory layer, a cleaned-up two-mode reference model that pins what is and isn't valid in Column.sql / Column.filter / SlayerModel.filters versus DSL formula and filter strings, SQL-mode model filters now properly accepting arbitrary SQL functions plus a small string-hygiene allowlist for DSL-mode query filters (lower, upper, trim, replace, substr, instr, length, concat, and ||), an MCP query tool that finally accepts inline source-model dicts, and clearer diagnostics for two filter-parser edge cases (path-qualified LIKE and subquery-shaped predicates). One small breaking change rides along: RecallHit.match_count: int is renamed to RecallHit.score: float -- no alias.

BM25 ranking for memory recall (DEV-1365, BREAKING)

MemoryService.recall_memories previously ranked candidates by raw entity-overlap count, which trivially favoured memories tagged with large entity sets -- a memory linked to 50 entities would out-overlap a precisely-tagged one of 2 regardless of relevance. The new slayer/memories/ranker.py module exposes bm25_rank(memories, query_entities) over canonical entity sets, using rank_bm25.BM25Plus (now a core dep). IDF / avgdl are computed across the full memory corpus (recall calls storage.list_memories(entities=None) rather than the intersection-filtered form); an explicit set-intersection pre-filter still enforces "must overlap on at least one entity" and BM25 is used purely to rank the eligible set. BM25Plus (rather than BM25Okapi) is chosen specifically because at small corpus sizes Okapi's IDF goes negative for terms appearing in even a moderate fraction of documents, which inverts BM25's length normalisation and reproduces the exact bug DEV-1365 was meant to fix; the rationale is documented in the module header. The empty-about recency fallback is unchanged. The breaking part: RecallHit.match_count: int is replaced by RecallHit.score: float across MCP, REST, CLI, and SlayerClient. Hard rename, no alias -- callers introspecting the field break loudly rather than silently misreading. inspect_model's Learnings section is unchanged (it is a per-model browsing view, not a retrieval query, and still pulls memories via the storage entity-intersection filter in insertion order).

Two-mode reference semantics (DEV-1369, BREAKING)

SLayer has always had two distinct expression layers but the implementation had drifted; this release pins the intended semantics and enforces them. SQL mode (Column.sql, Column.filter, SlayerModel.filters) is sqlglot-parsed free SQL: arbitrary function calls (json_extract, coalesce, nullif, CASE WHEN, ...), dialect operators, and __-delimited join paths (customers__regions.name) are all accepted; DSL constructs (aggregation colon syntax, transform calls, raw OVER (...) in filters) are rejected. DSL mode (ModelMeasure.formula, SlayerQuery.{measures, filters, dimensions, time_dimensions, order, main_time_dimension}) is the Python-AST DSL: bare names strict-resolve at enrichment time against the model's defined Column / ModelMeasure / custom-aggregation / query-level alias set, and raw SQL function calls or unknown names raise actionable errors. The DEV-1336 escape hatch -- auto-promoting a query filter against a Column whose sql contained a window function to a post-aggregation outer WHERE -- is removed; the rank-family transforms (DEV-1353) cover top-N filtering in pure DSL and the new error message points at them or at multi-stage source_queries. Opportunistic consolidation: the new slayer/core/refs.py is now the single source of truth for identifier-shape regexes (AGG_REF_RE, IDENT_OR_PATH_RE, DOTTED_IDENT_REF_RE, IDENTIFIER_RE), the agg-suffix parser (canonical_agg_name, strip_agg_suffix), and the user-input dunder helper -- four prior duplicates in formula.py, dbt/converter.py, engine/enrichment.py, and memories/resolver.py re-export from here. _walk_join_chain collapses the two near-duplicate join walkers in query_engine.py. Migration: stored YAML / SQLite models with aggregation colon syntax or transform calls inside Column.filter / SlayerModel.filters will fail at load (move the predicate to a query-level filter or a ModelMeasure.formula); stored SlayerQuery objects that filter on a windowed Column will raise (switch to filters=["rank(<measure>) <= N"] or factor the window column into a multi-stage source); stored queries that filter on bare names not declared as Column entries on the model (silent pass-through to underlying-table columns) will raise at enrichment (declare the column on the model first). The new docs/concepts/references.md is the single source of truth; CLAUDE.md, docs/concepts/queries.md, and docs/concepts/models.md point at it.

Filters referencing unjoined models now raise at enrichment (DEV-1367)

A query filter of the form <other_model>.<col> <op> <value>, where <other_model> is not in the source model's joins, used to silently render SQL with an unbound table reference in the WHERE clause and fail at execution with a cryptic "no such column" -- the agent received an opaque SQL-runtime error instead of a clear translation-time error. DEV-1369's strict-resolution check fired only on the bare-name branch of resolve_filter_columns; this PR closes the dotted-path gap. The dotted-path branch now raises ValueError at enrichment when the path's head segment doesn't match any ModelJoin.target_model on the source model (error: "Filter '' references model '' but it is not in joins for source model ''. Add it to source_model.joins or rewrite the filter to use a local derived column.") or when the path resolves through joins but the leaf column doesn't exist on the terminal model. The cross-model-measure rerooting CTE path (_build_rerooted_enriched) inherits the outer query's filter list and some of those filters reference models reachable from the outer source but not from the re-rooted source; a new drop_unreachable_filters: bool = False kwarg threads through enrich_query / _enrich onto resolve_filter_columns's drop_if_unresolved parameter. The rerooting call sets it True so unresolved dotted paths drop from the parsed-filter list rather than raising; outer-query callers keep the strict default.

MCP query tool accepts inline source_model (DEV-1372)

The MCP query tool's source_model parameter is now typed str | ModelExtension | SlayerModel, matching SlayerQuery.source_model's native polymorphism. Previously typed str, which forced agents to JSON-encode inline dicts -- SLayer then validated the JSON blob as a model name and rejected it with Invalid model name '{...}': must not contain '/'. Wire-level evidence from a single 15-task households / Haiku 4.5 benchmark run: 15 / 88 query-tool calls (17 %) failed this way. The run-by-name shortcut (engine.execute(str) with model-name strings) is preserved -- it layers model.query_variables -> stage -> runtime precedence that the regular SlayerQuery path doesn't reach -- and is gated on isinstance(source_model, str) so inline values fall through to SlayerQuery.model_validate. The variable-precedence asymmetry that forces this gate is filed separately as DEV-1373.

SQL-mode predicate routing and string-hygiene allowlist (DEV-1378)

Closes two implementation gaps left over from the DEV-1369 reference-semantics work. First, Column.filter and SlayerModel.filters (SQL mode) were authored as raw SQL but re-parsed at enrichment time through the DSL parser, so any model filter containing json_extract, coalesce, CASE WHEN, or any other arbitrary SQL function raised Unknown filter function 'json_extract' at runtime. The new slayer/sql/sql_predicate.py module exposes a dedicated parse_sql_predicate that rejects DSL constructs (colon syntax, transform calls, raw OVER (...)) up front, extracts column-shaped identifiers, and otherwise accepts any SQL function call; every enrichment site that handles an SQL-mode filter (measure_def.filter, the model-filter validation loop, _collect_needed_paths, and _resolve_joins) now routes through this parser. DSL-mode query filters (SlayerQuery.filters) continue through the DSL parser unchanged. Strictness is split per mode: model filters resolve with strict=False, query filters with strict=True. Second, DSL-mode query filters now accept a small lowercase string-hygiene allowlist -- lower, upper, trim, replace, substr, instr, length, concat -- exposed as the STRING_HYGIENE_OPS frozenset in slayer/core/formula.py; the SQL || operator is rewritten to concat(...) by a new _preprocess_concat pass so agents can write lower(name) || '_x' = 'foo_x' in filter strings. Names are lowercase only, matching SLayer's existing transform convention; sqlglot translates per-dialect at SQL-generation time. Also folded in: the SQL generator's predicate parser now wraps user-supplied fragments in SELECT 1 WHERE ... before parsing (_parse_predicate), which dodges a sqlglot trap on SQLite / MySQL where REPLACE at the start of a fragment was parsed as the REPLACE INTO statement keyword rather than the function call. Docs updated in docs/concepts/queries.md; CLAUDE.md and .claude/skills/slayer-query.md reflect the new allowlist.

Filter-parser fixes for path-qualified LIKE and subquery-in-filter (DEV-1376)

Two DSL filter-parser gaps surfaced by the bird-interact-agents benchmark. Path-qualified LIKE / NOT LIKE (<joined_model>.<col> like '...') was rejected with "Unsupported filter syntax" because _preprocess_like's LHS regex excluded .; the four-step preprocess collapses into a single regex (_LIKE_RE) whose LHS accepts dotted paths and whose pattern group captures the closing quote in one shot. Subquery-shaped predicates (housenum in (select ...), not in (select ...), exists (select ...)) used to surface Python's misleading "Perhaps you forgot a comma?" advice and send agents on a nonsense recovery path; a new _SUBQUERY_IN_FILTER_RE early-rejects in parse_filter next to the existing has_window_function check and points at source_queries / Column.sql / joins instead. Narrowly scoped to IN (SELECT...) / NOT IN (SELECT...) / EXISTS (SELECT...); existing UNION SELECT / ; SELECT 1 paths keep their original handling.

Schema versions

SlayerModel remains version 5, SlayerQuery remains version 3, DatasourceConfig remains version 1. No storage migrations were needed for this release.