Skip to content

SLayer 0.6.0: Semantic search on memories and entities

Choose a tag to compare

@ZmeiGorynych ZmeiGorynych released this 12 May 09:24
· 345 commits to main since this release
6f2813d

SLayer 0.6.0 Release Notes

A feature release: three PRs since 0.5.1 collapse the agent retrieval surface into a single search tool, add three-channel ranking over both memories and schema entities (entity-overlap BM25 + tantivy full-text + optional dense embeddings via litellm), introduce a persisted per-column sampled snapshot that feeds the search index, and tighten the bundled Jaffle Shop demo (streamed jafgen logs, default size dropped from 4 years to 2). One breaking change rides along: recall_memories is gone -- no deprecation shim, no alias -- and is fully replaced by search channel 1. SlayerModel bumps from v5 to v6 (forward no-op converter) and two new entities pick up an explicit version field: Memory and Embedding, both at v1.

Unified search tool with two retrieval channels (DEV-1375, BREAKING)

A new search tool retrieves memories AND schema entities (datasources, non-hidden models, non-hidden columns, named ModelMeasures, custom Aggregations) through two parallel channels merged by Reciprocal Rank Fusion (k=60). Channel 1 is the BM25Plus entity-overlap ranker recall_memories used to wrap, now scoped to memories only. Channel 2 builds a fresh in-memory tantivy index per call covering memories union entity docs, tokenised by en_stem (Porter stemmer plus default tokenisation on _ and .) so "shipped" matches "shipping" and "customer" matches customer_id; an exact-match canonical field lets agents paste a literal <ds>.<model>.<col> string and get the doc back directly. The behaviour matrix is documented in docs/concepts/search.md -- both inputs run both channels; entity-only input runs channel 1 only; question-only input runs channel 2 only; the empty input returns the newest learning-only plus query-bearing memories with a warning. Memory hits are partitioned by Memory.query is None: learning-only memories land in memories, query-bearing memories in example_queries, each capped independently so bulky example queries cannot crowd out small learning-only notes. SearchResponse.resolved_input_entities echoes the resolver output for diagnostics. The same surface is exposed across MCP search, REST POST /search, CLI slayer search [--entity ...] [--question ...] [--query ...], and SlayerClient.search. The breaking part: the entire recall_memories surface is gone -- MCP tool, REST POST /memories/recall, CLI slayer memory recall, SlayerClient.recall_memories, the RecallHit / RecallResponse Pydantic models, and MemoryService.recall_memories. Channel 1 of search is the exact same BM25 ranker on the exact same canonical-entity index; migration is a one-call swap.

Persisted per-column sample-value cache (DEV-1375)

Every Column gains an optional sampled: Optional[str] field caching the per-column profile string the search index renders into each entity's text field. Previously this was recomputed by inspect_model on every call; the cache makes search indexing cheap and gives inspect_model a stable rendered snapshot. Populated on slayer ingest / ingest_datasource_models MCP / POST /ingest for every table-backed model in the touched datasource; on the new slayer search refresh-samples [--data-source X] [--model M ...] CLI; on edit_model (column-level edits refresh that column; model-level filter / sql / source-query changes refresh every column); and lazily on inspect_model with best-effort write-back when the cache is None. sql-mode and query-backed models are silently skipped in this release. The schema bump to SlayerModel v6 is a no-op forward converter (slayer/storage/v6_migration.py); the field defaults to None and is populated by the first subsequent ingest.

Optional embedding-based third retrieval channel (DEV-1386)

A third channel runs alongside tantivy and entity-overlap BM25: dense embedding similarity via litellm, with the question embedded once per call and cosine similarity computed in numpy over a corpus matrix loaded fresh from storage. Memory rankings are RRF-fused across all three channels; entity hits -- previously tantivy-only with raw scores -- are RRF-fused across channels 2 and 3. The channel is gated behind the optional embedding_search extra (pip install motley-slayer[embedding_search] -- pulls litellm and numpy); when the extra is missing, no provider key is configured, or no embedding rows exist for the active model, the channel emits one warning into SearchResponse.warnings and search degrades gracefully via tantivy + BM25. The default model is openai/text-embedding-3-small; override via the SLAYER_EMBEDDING_MODEL env var in <provider>/<model-name> litellm format. Provider credentials (OPENAI_API_KEY, AZURE_API_KEY, etc.) are read by litellm directly. Embeddings are persisted in a sidecar table (SQLite embeddings table / YAML embeddings.yaml) keyed by (canonical_id, embedding_model_name), with a SHA256 content_hash on each row so idempotent re-runs skip the litellm call when the source text hasn't changed. Storage is JSON lists of floats (~6 KB per 1536-dim row) -- portable, debuggable, dialect-neutral. Per-entity embed failures (rate limits, transient network errors, bad keys) are non-fatal: the failing row is not written and a warning is appended to the response. Switching SLAYER_EMBEDDING_MODEL mid-project leaves old rows inert; re-run slayer ingest or re-save the memories to populate the new model's rows. Dimension mismatch between the question embedding and stored rows is detected and warns instead of crashing. Refresh edges match Column.sampled: slayer ingest, edit_model, and save_memory. New Memory and Embedding schema entities both ship at v1.

Storage cascade-delete refactor (DEV-1386)

delete_model and delete_datasource are now defined on the StorageBackend ABC -- backends implement only the row-level _delete_model_row / _delete_datasource_row primitives and the ABC wrappers handle embedding-row cascade by canonical-id prefix. delete_model drops every embedding under <ds>.<model>% (model doc + columns + measures + aggregations); delete_datasource drops every row under <ds>%; delete_memory drops the matching memory:<id> row. Matches the existing pattern from DEV-1361 where collision and validation rules live in the ABC, not duplicated per backend.

Bundled demo: streamed logs and faster default (PR #104, follow-up d34b8bf)

slayer datasources create demo now forwards jafgen's stdout/stderr live to the user's terminal so the multi-minute generation step shows real-time progress instead of a silent wait. When the active stream lacks a real file descriptor (e.g. an ipykernel OutStream shim), the implementation falls back to a line-pumping Popen so notebook integration tests still pass; on real TTYs it keeps the inheriting fast path so Rich progress animations render correctly. The default demo size drops from 4 years to 2 -- enough to exercise every Jaffle Shop schema feature, but fast enough that slayer serve --demo and slayer mcp --demo finish inside MCP-client startup timeouts. Override with --years N on slayer datasources create demo.

Schema versions

SlayerModel v5 -> v6 (forward no-op for the new optional Column.sampled field). SlayerQuery remains v3. DatasourceConfig remains v1. New entities: Memory v1 and Embedding v1. Existing storage migrates automatically on load via the converter chain in slayer/storage/migrations.py.