Phase 3: Discovery, catalog & trust (M1 + M2)#16
Merged
Conversation
Milestone 1 — certification & semantic versioning (migration 007): - status/version/certified_by/certified_at on metrics, glossary, sample queries (+ cert stamps on saved queries) - SemanticVersion changelog + versioning_service: one governed lifecycle (draft -> in_review -> certified -> deprecated) for all four entity types; editors submit/revert, admins certify/deprecate; certify validates SQL. /status + /versions endpoints; PUTs bump version + snapshot. Milestone 2 — catalog & lineage (migration 008): - catalog_service: hybrid search (reuses pgvector + keyword scorer, no tsvector) across tables/columns/metrics/glossary/sample+saved queries/knowledge, certified-first ranking; /search + /facets - lineage_service + ArtifactDependency: sqlglot parses saved-query/metric SQL into table/column edges on create/update (best-effort, lazy import); per-artifact "touches" + impact "depended on by" views Frontend: shared CertificationBadge/StatusActions/VersionHistory wired into Metrics/Glossary/SavedQueries; new CatalogPage with search, facets, lineage drawer; nav + route. Adds 22 unit tests (177 total). sqlglot added as optional [lineage] extra and to the backend image. Column profiling deferred to a later milestone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The lineage tests assert the populated extraction path, which needs sqlglot (the optional [lineage] extra). CI installed only [llm,dev,observability], so extract_refs hit its ImportError no-op branch and the tests failed. Add lineage to the CI install and guard the module with importorskip so it skips cleanly where sqlglot isn't present. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reflect the CI fix in the docs: the backend install/dev commands and CI note now include the [lineage] extra (sqlglot), with the importorskip caveat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements Phase 3 from
planfull.md— the trust + discovery layer — scoped to M1 (certification & versioning) and M2 (catalog & lineage). Reuses the existing pgvector + keyword search (no tsvector) and uses sqlglot for lineage. Column profiling is deferred to a later milestone.Milestone 1 — Certification & semantic versioning (migration
007)status/version/certified_by_id/certified_aton metrics, glossary, sample queries (+ cert stamps on saved queries).SemanticVersionchangelog table +versioning_service.py: one governed lifecycle (draft → in_review → certified → deprecated) for all four entity types. Editors submit-for-review/revert; admins certify/deprecate. Certifying validates SQL (read-only blocklist + sqlglot parse). Every content edit and transition appends a snapshot.POST .../{entity}/{id}/status+GET .../versions[/{v}]on the metric/glossary/sample-query/saved-query routers; PUTs bump version + snapshot. Saved-query status changes are routed through the governed lifecycle.Milestone 2 — Catalog & lineage (migration
008)catalog_service.py: hybrid search across tables/columns/metrics/glossary/sample+saved queries/knowledge, reusing the existing pgvector embeddings + keyword scorer (no new full-text infra). Certified-first ranking;GET .../catalog/search+/facets.lineage_service.py+ArtifactDependency: sqlglot parses saved-query/metric SQL into table/column edges on create/update (best-effort, lazy import → graceful no-op if sqlglot absent). Per-artifact "what it touches" (.../{entity}/{id}/lineage) and impact view "what depends on this table" (.../catalog/lineage?table=).Frontend
CertificationBadge/StatusActions/VersionHistory, wired into the Metrics, Glossary, and Saved Queries pages.CatalogPage(search + facet sidebar + lineage detail drawer); nav item + route.Tests & docs
CHANGELOG,README,CLAUDE.md, andplanfull.mdstatus table updated.sqlglotadded as the optional[lineage]extra and to the backend image.Verification (live against the sample stack)
007/008apply and reverse cleanly (downgrade 006→upgrade head).in_review → certifiedstamps owner/time; invalid transition → 422; changelog records both; revert clears the cert stamp.ruffclean on new files, new modulesmypy-clean,npm run build+lintpass.Notes
MissingGreenleton theonupdatetimestamp after an UPDATE (Postgres uses RETURNING on INSERT but not here) — resolved by refreshing the entity inversioning_service, same pattern asdashboard_service._finalize.Still open for Phase 3
🤖 Generated with Claude Code