Semantic layer overhaul: make every documented feature real#1
Merged
Conversation
added 10 commits
June 1, 2026 10:53
- SemanticCompileError/MetricYamlError domain errors (core, no web dep) - Nested metric schema (simple|ratio|derived|cumulative) under metric: root - count_distinct agg, name regex, per-type required-field validation - validate_dimension_yaml for categorical|time dimensions - rewrite schema tests for the nested shape
- compile() emits templates with {extra_filter_clause}/{group_by_append} slots
- simple/ratio/cumulative/derived strategies; count_distinct -> COUNT(DISTINCT)
- resolve_table/resolve_dimension callbacks for bare columns + dimension refs
- bind() substitutes slots and normalises whitespace
- metricflow_compiler.py kept as a back-compat shim
- firewall: single-SELECT, no DDL/commands/multi-statement, no subqueries, anonymous-function allowlist (blocks read_csv_auto/read_parquet/etc.) - SemanticCompile(400, code=semantic_compile_error) + handler bridging the core SemanticCompileError so invalid definitions return 400 not 500
…ension_type, aliases - migration 0014: metadata_json on metrics+dimensions; dimension_type on dimensions (categorical|time) with check constraint - entities: metadata_json + dimension_type mapped columns + constraint - DTOs: metric/dimension metadata_json, dimension Read uses dimension_type, SemanticVersionRead exposes version_number/metric_id (doc-aligned aliases) - glossary DTOs accept documented synonyms/related_metrics/tags keys via alias
…ning, recompile, dimensions) - SemanticRepository: get_by_name (PUBLISHED), metadata_json, version rows capture compiled SQL, publish records compiled on current version, status filter, tenant+workspace predicates on all reads/writes - SemanticService: nested validator + compiler + firewall; recompile-on-update for published metrics; dimension group_by resolution; tenant-scoped - dimensions repo+service: validate_dimension_yaml, compile to grain-aware expression, version on publish, dimension_type - controllers thread tenant context; metrics/dimensions list gain status filter - semantic error handling moved to flyquery-local web/semantic_error_handler.py (conventions files restored to canon; lockstep clean) - integration test updated to nested YAML + asserts version carries compiled SQL
- wire SemanticRepository into all 3 QueryService factories (user/agent/conversations) - _compiled_metric_sql: tenant+workspace scoped get_by_name, SemanticCompiler.bind to strip runtime slots; narrowed exception handling (no more silent swallow) - SEMANTIC_LAYER branch executes bound compiled SQL with NO GenerationAgent and pins metric name+version into the persisted query record - glossary retrieval surfaces related_metrics; grounding prompt routes a matched term to its metric via SEMANTIC_LAYER - tests: QueryService SEMANTIC_LAYER path (compiled used, generation skipped, version pinned) + synthesis fallback when no published metric matches
- /api/v1/agent/semantic/metrics, /api/v1/agent/semantic/dimensions, /api/v1/agent/glossary — full lifecycle mirrors guarded by flyquery.semantic:author (writes) / flyquery.semantic:read (reads) - delegate to the same SemanticService/SemanticDimensionsService/GlossaryService - auto-discovered via existing scan_packages(flyquery.web.controllers.agent)
- semantic-layer.md: nested MetricFlow is authoritative; all 4 types execute;
qualified measure exprs; corrected compile/bind template + slot names;
dimension :retire/GET{id}/status rows; agent-tier mirrors (metrics/dims/glossary)
- CHANGELOG: semantic-layer-overhaul entry (fast-path fix + new features)
- openapi.json regenerated (108 paths incl. 12 new agent-tier semantic/glossary)
- ruff import-sort fixes across touched controllers/tests
Bump version 26.5.14 -> 26.6.0 (pyproject, __init__, app.py, README badge, SDK version args). Finalize CHANGELOG. Regenerate openapi.json (info.version 26.6.0) and both SDKs (Python + Java) from the spec — adds the new semantic/ glossary + agent-tier APIs and the nested-schema/dimension_type/metadata_json/ version_number models.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings flyquery's semantic layer up to its documented contract. The headline fix: the
SEMANTIC_LAYERfast-path now actually executes — previously a published metric's compiled SQL was never run (two independent defects masked by a bareexcept), so every query silently fell through to the LLM.Originated from a 157-agent audit (49/50 findings confirmed) cross-checked by direct code reading. Built with TDD, phase-by-phase.
What changed (by audit finding)
Critical
SEMANTIC_LAYERpath executes the bound compiled SQL with no GenerationAgent;SemanticRepositoryis now wired into all 3QueryServicefactories; addedget_by_name; metric name+version pinned in the query record. (C1, C2, C4, M3)categorical|timevalidator + compiler — the documented dimension YAML is now creatable. (C3, H11)High
SIMPLE/RATIO/DERIVED/CUMULATIVE) compile to templates with{extra_filter_clause}/{group_by_append}slots filled bySemanticCompiler.bind. (H1, H3, L4)count_distinctsupported. (H6)compiled_sql_template; publish records it; update-of-published recompiles + re-firewalls. (H7, H8)synonyms/related_metrics); related metrics surfaced to grounding for routing. (H9, H10)Medium/Low
SemanticCompileError→ RFC 7807400 semantic_compile_error(flyquery-local handler; conventions kept lockstep-clean). (M1)/api/v1/agent/semantic/metrics|dimensions,/api/v1/agent/glossary. (M2)metadata_jsoncolumn (migration0014);dimension_typecolumn. (M4)SemanticVersionReadexposesversion_number/metric_id; metrics/dimensionslistgainsstatus; repos tenant+workspace scoped. (M5, M6, M7)Tests — run for real
0014.openapi.jsonregenerated (108 paths incl. 12 new agent-tier).Migration
0014_semantic_meta_dimtype: addsmetadata_json(metrics+dimensions) anddimension_type(dimensions). Reversible.