feat: multi-graph support with schema isolation and RBAC#156
Merged
charlie83Gs merged 19 commits intomainfrom Apr 6, 2026
Merged
feat: multi-graph support with schema isolation and RBAC#156charlie83Gs merged 19 commits intomainfrom
charlie83Gs merged 19 commits intomainfrom
Conversation
Introduces the foundation for multiple isolated knowledge graphs: - Graph, GraphMember, DatabaseConnection models with graph_type and byok_enabled flags - GraphSessionResolver for per-graph session factory caching (schema or database isolation) - GraphRepository for CRUD + member management - Graph management API (CRUD, member roles, synchronous provisioning) - Graph-scoped data endpoints via GraphContext dependency - Parameterized Qdrant collection names (per-graph isolation) - GraphAwareMixin on all key Hatchet workflow inputs (backward compatible) - Multi-graph sync worker (iterates all active graphs) - Multi-schema Alembic migration runner with ALEMBIC_SCHEMA env override - Frontend /graphs page with create form, detail page, and member management - GraphDatabaseConfig in Settings for named DB connection pairs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add graph_slugs column to ApiToken for per-token graph restriction
- Add graph parameter to all 8 MCP tools (default: "default")
- GraphSessionResolver in MCP dependencies for per-graph session routing
- OAuth tokens carry graph:{slug} scopes from API token graph_slugs
- GraphContext checks token graph scope before granting access
- Frontend token creation form with graph selector
- Token list shows graph scope (all graphs vs specific)
- Updated MCP instructions to document multi-graph support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - #1: Validate schema names with strict ^[a-z0-9_]+$ regex before DDL - #2: Escape ILIKE special chars (%, _, \) in graph_nodes search - #3: Replace cached Graph ORM instances with frozen GraphInfo dataclass to prevent DetachedInstanceError High: - #4: Reuse system session factories for default graph (no duplicate pools) via default_graph_session_factory/default_write_session_factory params - #5: Add 23 unit tests — GraphInfo, GraphSessions, GraphSessionResolver, slug/schema validation, CreateGraphRequest, role validation - #6: Scope sync watermarks by graph_slug — SyncEngine now passes graph_slug to _get_watermark/_set_watermark, composite PK on (table_name, graph_slug) Medium: - #7: Replace N+1 member count queries with batch GROUP BY - #8: Replace catch { // ignore } with console.error in frontend - #9: Engine pool disposal on GraphSessionResolver.invalidate() - #10: Run Alembic migrations during graph provisioning - #11: (node_count in list deferred — requires cross-schema queries) Low: - #13: Replace "Cycle Role" button with role dropdown - #14: require_writer/require_graph_admin kept for future endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes TypeScript type error: ApiTokenRead now requires graph_slugs field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GraphProvider context wrapping the app layout, persists active graph
to localStorage, syncs to api module via setActiveGraphSlug()
- GraphPicker component in sidebar (dropdown expanded, icon collapsed)
auto-hides when only one graph exists
- graphRequest() helper in api.ts routes through /graphs/{slug}/... for
non-default graphs, falls back to standard paths for default
- setActiveGraphSlug/getActiveGraphSlug exports for module-level state
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - #1: Remove dead quote_ident call — regex is the sole injection guard - #2: Add ^[a-z0-9_]+$ validation for ALEMBIC_SCHEMA in both env.py files High: - #3: Derive kt_db_root from kt_db package location instead of fragile parents[5] - #4: Document MCP omits default_write_session_factory intentionally (read-only) - #5: GraphContext now uses GraphInfo (frozen dataclass) instead of ORM Graph Medium: - #6: Replace user._token_graph_slugs monkey-patching with request.state - #7: Fix remaining catch { // ignore } in graphs/page.tsx - #9: Document MCP graph access check limitation, planned for follow-up Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - #1: Invalidate resolver cache after provisioning (both success and error) so subsequent resolve() picks up fresh status - #2: Combine status="active" + add_member in single commit to prevent orphaned graphs on crash High: - #3: Run Alembic migrations via asyncio.to_thread() to avoid blocking the event loop during HTTP requests - #5: Store AsyncEngine references in GraphSessions for proper disposal instead of accessing sessionmaker.kw["bind"] internals Medium: - #7: Replace silent .catch(() => {}) with console.error in tokens page - migrate.py path comment clarified for consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - #1: Validate schema_name in GraphRepository.create() (data layer guard) - #2: Enforce graph:{slug} scopes in MCP _get_graph_factory via get_access_token() — tokens without matching scope are denied - #3: Disallow hyphens in slugs to prevent schema name collisions (my_graph and my-graph can no longer coexist) High: - #4: Add asyncio.Lock to GraphSessionResolver.resolve/resolve_by_slug with double-check pattern to prevent duplicate engine pool creation - #5: Evict from cache on graph deletion (invalidate in delete_graph) Medium: - #8: Last-admin protection — prevent removing or demoting the last admin - #9: Defense-in-depth schema_name validation in _make_session_factory Low: - Validate stored graph slug still exists in GraphProvider (reset to default) - Update tests and frontend for no-hyphens slug policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single serial sync_wf with two workflows: - sync_dispatch_wf (cron every minute) — fans out one sync per graph - sync_graph_wf (on-demand) — syncs a single graph with per-graph concurrency (max_runs=1 keyed by input.graph_slug) This prevents high-activity bursts on one graph from starving sync for other graphs. Each graph syncs independently and in parallel. Worker slots increased from 1 to 10 to allow parallel graph syncs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - #1: Control-plane migrations (zzai, zzaj) now skip when ALEMBIC_SCHEMA is non-public — prevents duplicating graphs/ graph_members/api_tokens tables in per-graph schemas High: - #3: Replace global asyncio.Lock with per-graph locks via _locks dict + lightweight _meta_lock for dict insertion only - #7: Default graph now enforces min_role for write operations (PUT /graphs/default requires admin) Medium: - #9: Validate storage_mode=database requires connection key at creation time (422 instead of confusing ValueError at resolve) - #12: Fix SyncWatermark docstring (defaults to "default", not NULL) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rors - #1: Extract validate_schema_name() into kt_db.keys as single source of truth. Remove duplicate regex from graphs.py, repositories/graphs.py, graph_sessions.py, and both alembic env.py files. Remove redundant double-quotes in SET search_path. - #3: Provisioning no longer caches via resolver — uses temporary write session factory for DDL, avoiding stale cached engines mid-migration. - #5: Qdrant collection failures now propagate (not swallowed), causing graph to go "error" instead of "active" without collections. - #7: GraphProvider gates listGraphs() on auth loading complete + user !== null, preventing race with AuthProvider. - #11: Replace <a> with Next.js <Link> on graphs list page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - #1: Add SECURITY comments to all DDL f-strings tying them to validate_schema_name() regex — future-proofs against regex loosening - #2: Add POST /graphs/{slug}/retry-provision endpoint for graphs stuck in "error" status. Idempotent (CREATE SCHEMA IF NOT EXISTS + Alembic upgrade head). Also adds admin member if none exist. High: - #3: MCP now requires explicit graph:{slug} scopes for non-default graphs — tokens without graph scopes are denied (not silently allowed) - #4: Document default graph policy: open reads, superuser-only writes - #5: Use one-off engine with dispose() for write-db DDL during provisioning — no leaked connection pools Medium: - #8: Document connection budget math in sync worker slots comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- #3: Deny non-default graph access when get_access_token() returns None (SKIP_AUTH or missing auth context) - #4: Use SELECT ... FOR UPDATE on admin members during role demotion and member removal to prevent concurrent last-admin race - #5: Add _slug_to_id index to GraphSessionResolver for O(1) slug lookups instead of linear cache scan. Maintained in _build_and_cache and invalidate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ease - #1: Lock admin members unconditionally before checking role — prevents race where two concurrent requests both see admin_count=2 before lock - #5: Release control session before acquiring per-graph lock in resolve_by_slug to avoid holding pool slot during lock wait - #7: require_writer now enforces superuser-only for default graph writes, matching the documented policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- #2/#14: Fix MCP scope check — empty scopes = unrestricted access (graph_slugs=null tokens and SKIP_AUTH both work again). Only tokens with explicit graph:* scopes are restricted to those graphs. - #3: Block reserved PG schema names (public, pg_catalog, pg_toast, pg_temp, information_schema, pg_*) in validate_schema_name() - #4: Fix scalar_one() → scalar_one_or_none() in resolve_by_slug (introduced by session-release refactor in prior commit) - #10: Sync raises RuntimeError instead of returning error dict when graph_resolver is unavailable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Workers: - Add resolve_sessions() helper to WorkerState for one-line graph_id resolution to (graph_sf, write_sf) tuple - All input models now extend GraphAwareMixin (graph_id on every input) - worker-bottomup: _open_sessions + _build_agent_context accept graph_id, all 9 call sites updated - worker-ingest: _open_sessions + _build_agent_context accept graph_id - worker-nodes: HatchetPipeline constructor accepts graph_id, resolves sessions in _open_sessions and _build_ctx - worker-search: direct session factory calls resolve per graph_id - worker-synthesis: synthesizer + super-synthesizer resolve graph_id to per-graph ReadGraphEngine + write session factories Frontend: - Switch 30 graph-scoped API methods from request() to graphRequest() (nodes, edges, facts, sources, seeds, conversations, syntheses, etc.) - Non-graph-scoped methods (auth, config, members, usage) unchanged - Graph picker now actually affects all data queries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: - Fix 38 remaining API methods still using request() instead of graphRequest() — node sub-resources (dimensions, facts, edges, history, convergence), conversations, seeds, edge candidates, syntheses, sources. All graph-scoped data now routes correctly. High: - Add graph_pool_size / graph_max_overflow settings (defaults 5/10) for schema-mode non-default graphs, replacing hardcoded values Low: - migrate.py path derivation now uses kt_db.__file__ (matches graphs.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
systemSettings, members, waitlist, and invites are global/control-plane resources without graph-scoped backend routes. Using graphRequest() would 404 on non-default graphs. Reverted 7 methods to request(). Graph updated_at confirmed working — onupdate=_utcnow on the ORM column handles auto-update on every flush. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Graph,GraphMember,DatabaseConnectionmodels withgraph_type(versioned, for backward compat) andbyok_enabled(honors early adopter BYOK access)GraphSessionResolvercaches per-graph session factories using PostgreSQLsearch_pathfor schema isolation or separate DB connectionsGraphContextdependency (/api/v1/graphs/{slug}/nodes/...)GraphAwareMixinon all key Hatchet workflow inputs (backward compatible —graph_id=Nonemeans default graph)ALEMBIC_SCHEMAenv override)/graphspage with create form, detail/member management pageTest plan
alembic upgrade headfor both graph-db and write-db migrationsPOST /api/v1/graphsand verify schema + Qdrant collections are created/api/v1/nodes) continue to work unchanged🤖 Generated with Claude Code