Skip to content

feat: multi-graph support with schema isolation and RBAC#156

Merged
charlie83Gs merged 19 commits intomainfrom
worktree-feat+multigraph
Apr 6, 2026
Merged

feat: multi-graph support with schema isolation and RBAC#156
charlie83Gs merged 19 commits intomainfrom
worktree-feat+multigraph

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Summary

  • Adds full multi-graph infrastructure: Graph, GraphMember, DatabaseConnection models with graph_type (versioned, for backward compat) and byok_enabled (honors early adopter BYOK access)
  • GraphSessionResolver caches per-graph session factories using PostgreSQL search_path for schema isolation or separate DB connections
  • Graph management API with CRUD, member roles (reader/writer/admin), and synchronous schema provisioning
  • Graph-scoped data endpoints via GraphContext dependency (/api/v1/graphs/{slug}/nodes/...)
  • Parameterized Qdrant collection names for per-graph vector isolation
  • GraphAwareMixin on all key Hatchet workflow inputs (backward compatible — graph_id=None means default graph)
  • Sync worker iterates all active graphs per cycle
  • Multi-schema Alembic migration runner (ALEMBIC_SCHEMA env override)
  • Frontend /graphs page with create form, detail/member management page

Test plan

  • Verify all existing tests pass (40 API, 64 Qdrant, 19 Hatchet, 123 frontend — all green locally)
  • Run alembic upgrade head for both graph-db and write-db migrations
  • Create a graph via POST /api/v1/graphs and verify schema + Qdrant collections are created
  • Add members with different roles and verify access control on graph-scoped endpoints
  • Verify default graph endpoints (/api/v1/nodes) continue to work unchanged
  • Verify sync worker syncs both default and non-default graphs
  • Test BYOK-enabled graph creation and verify the flag persists

🤖 Generated with Claude Code

charlie83Gs and others added 19 commits April 5, 2026 11:49
Introduces the foundation for multiple isolated knowledge graphs:

- Graph, GraphMember, DatabaseConnection models with graph_type and byok_enabled flags
- GraphSessionResolver for per-graph session factory caching (schema or database isolation)
- GraphRepository for CRUD + member management
- Graph management API (CRUD, member roles, synchronous provisioning)
- Graph-scoped data endpoints via GraphContext dependency
- Parameterized Qdrant collection names (per-graph isolation)
- GraphAwareMixin on all key Hatchet workflow inputs (backward compatible)
- Multi-graph sync worker (iterates all active graphs)
- Multi-schema Alembic migration runner with ALEMBIC_SCHEMA env override
- Frontend /graphs page with create form, detail page, and member management
- GraphDatabaseConfig in Settings for named DB connection pairs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add graph_slugs column to ApiToken for per-token graph restriction
- Add graph parameter to all 8 MCP tools (default: "default")
- GraphSessionResolver in MCP dependencies for per-graph session routing
- OAuth tokens carry graph:{slug} scopes from API token graph_slugs
- GraphContext checks token graph scope before granting access
- Frontend token creation form with graph selector
- Token list shows graph scope (all graphs vs specific)
- Updated MCP instructions to document multi-graph support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- #1: Validate schema names with strict ^[a-z0-9_]+$ regex before DDL
- #2: Escape ILIKE special chars (%, _, \) in graph_nodes search
- #3: Replace cached Graph ORM instances with frozen GraphInfo dataclass
  to prevent DetachedInstanceError

High:
- #4: Reuse system session factories for default graph (no duplicate pools)
  via default_graph_session_factory/default_write_session_factory params
- #5: Add 23 unit tests — GraphInfo, GraphSessions, GraphSessionResolver,
  slug/schema validation, CreateGraphRequest, role validation
- #6: Scope sync watermarks by graph_slug — SyncEngine now passes
  graph_slug to _get_watermark/_set_watermark, composite PK on
  (table_name, graph_slug)

Medium:
- #7: Replace N+1 member count queries with batch GROUP BY
- #8: Replace catch { // ignore } with console.error in frontend
- #9: Engine pool disposal on GraphSessionResolver.invalidate()
- #10: Run Alembic migrations during graph provisioning
- #11: (node_count in list deferred — requires cross-schema queries)

Low:
- #13: Replace "Cycle Role" button with role dropdown
- #14: require_writer/require_graph_admin kept for future endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes TypeScript type error: ApiTokenRead now requires graph_slugs field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GraphProvider context wrapping the app layout, persists active graph
  to localStorage, syncs to api module via setActiveGraphSlug()
- GraphPicker component in sidebar (dropdown expanded, icon collapsed)
  auto-hides when only one graph exists
- graphRequest() helper in api.ts routes through /graphs/{slug}/... for
  non-default graphs, falls back to standard paths for default
- setActiveGraphSlug/getActiveGraphSlug exports for module-level state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- #1: Remove dead quote_ident call — regex is the sole injection guard
- #2: Add ^[a-z0-9_]+$ validation for ALEMBIC_SCHEMA in both env.py files

High:
- #3: Derive kt_db_root from kt_db package location instead of fragile parents[5]
- #4: Document MCP omits default_write_session_factory intentionally (read-only)
- #5: GraphContext now uses GraphInfo (frozen dataclass) instead of ORM Graph

Medium:
- #6: Replace user._token_graph_slugs monkey-patching with request.state
- #7: Fix remaining catch { // ignore } in graphs/page.tsx
- #9: Document MCP graph access check limitation, planned for follow-up

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- #1: Invalidate resolver cache after provisioning (both success and error)
  so subsequent resolve() picks up fresh status
- #2: Combine status="active" + add_member in single commit to prevent
  orphaned graphs on crash

High:
- #3: Run Alembic migrations via asyncio.to_thread() to avoid blocking
  the event loop during HTTP requests
- #5: Store AsyncEngine references in GraphSessions for proper disposal
  instead of accessing sessionmaker.kw["bind"] internals

Medium:
- #7: Replace silent .catch(() => {}) with console.error in tokens page
- migrate.py path comment clarified for consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- #1: Validate schema_name in GraphRepository.create() (data layer guard)
- #2: Enforce graph:{slug} scopes in MCP _get_graph_factory via
  get_access_token() — tokens without matching scope are denied
- #3: Disallow hyphens in slugs to prevent schema name collisions
  (my_graph and my-graph can no longer coexist)

High:
- #4: Add asyncio.Lock to GraphSessionResolver.resolve/resolve_by_slug
  with double-check pattern to prevent duplicate engine pool creation
- #5: Evict from cache on graph deletion (invalidate in delete_graph)

Medium:
- #8: Last-admin protection — prevent removing or demoting the last admin
- #9: Defense-in-depth schema_name validation in _make_session_factory

Low:
- Validate stored graph slug still exists in GraphProvider (reset to default)
- Update tests and frontend for no-hyphens slug policy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single serial sync_wf with two workflows:
- sync_dispatch_wf (cron every minute) — fans out one sync per graph
- sync_graph_wf (on-demand) — syncs a single graph with per-graph
  concurrency (max_runs=1 keyed by input.graph_slug)

This prevents high-activity bursts on one graph from starving sync
for other graphs. Each graph syncs independently and in parallel.

Worker slots increased from 1 to 10 to allow parallel graph syncs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- #1: Control-plane migrations (zzai, zzaj) now skip when
  ALEMBIC_SCHEMA is non-public — prevents duplicating graphs/
  graph_members/api_tokens tables in per-graph schemas

High:
- #3: Replace global asyncio.Lock with per-graph locks via
  _locks dict + lightweight _meta_lock for dict insertion only
- #7: Default graph now enforces min_role for write operations
  (PUT /graphs/default requires admin)

Medium:
- #9: Validate storage_mode=database requires connection key at
  creation time (422 instead of confusing ValueError at resolve)
- #12: Fix SyncWatermark docstring (defaults to "default", not NULL)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rors

- #1: Extract validate_schema_name() into kt_db.keys as single source
  of truth. Remove duplicate regex from graphs.py, repositories/graphs.py,
  graph_sessions.py, and both alembic env.py files. Remove redundant
  double-quotes in SET search_path.
- #3: Provisioning no longer caches via resolver — uses temporary write
  session factory for DDL, avoiding stale cached engines mid-migration.
- #5: Qdrant collection failures now propagate (not swallowed), causing
  graph to go "error" instead of "active" without collections.
- #7: GraphProvider gates listGraphs() on auth loading complete +
  user !== null, preventing race with AuthProvider.
- #11: Replace <a> with Next.js <Link> on graphs list page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- #1: Add SECURITY comments to all DDL f-strings tying them to
  validate_schema_name() regex — future-proofs against regex loosening
- #2: Add POST /graphs/{slug}/retry-provision endpoint for graphs
  stuck in "error" status. Idempotent (CREATE SCHEMA IF NOT EXISTS +
  Alembic upgrade head). Also adds admin member if none exist.

High:
- #3: MCP now requires explicit graph:{slug} scopes for non-default
  graphs — tokens without graph scopes are denied (not silently allowed)
- #4: Document default graph policy: open reads, superuser-only writes
- #5: Use one-off engine with dispose() for write-db DDL during
  provisioning — no leaked connection pools

Medium:
- #8: Document connection budget math in sync worker slots comment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- #3: Deny non-default graph access when get_access_token() returns None
  (SKIP_AUTH or missing auth context)
- #4: Use SELECT ... FOR UPDATE on admin members during role demotion
  and member removal to prevent concurrent last-admin race
- #5: Add _slug_to_id index to GraphSessionResolver for O(1) slug
  lookups instead of linear cache scan. Maintained in _build_and_cache
  and invalidate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ease

- #1: Lock admin members unconditionally before checking role — prevents
  race where two concurrent requests both see admin_count=2 before lock
- #5: Release control session before acquiring per-graph lock in
  resolve_by_slug to avoid holding pool slot during lock wait
- #7: require_writer now enforces superuser-only for default graph
  writes, matching the documented policy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- #2/#14: Fix MCP scope check — empty scopes = unrestricted access
  (graph_slugs=null tokens and SKIP_AUTH both work again). Only tokens
  with explicit graph:* scopes are restricted to those graphs.
- #3: Block reserved PG schema names (public, pg_catalog, pg_toast,
  pg_temp, information_schema, pg_*) in validate_schema_name()
- #4: Fix scalar_one() → scalar_one_or_none() in resolve_by_slug
  (introduced by session-release refactor in prior commit)
- #10: Sync raises RuntimeError instead of returning error dict when
  graph_resolver is unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Workers:
- Add resolve_sessions() helper to WorkerState for one-line graph_id
  resolution to (graph_sf, write_sf) tuple
- All input models now extend GraphAwareMixin (graph_id on every input)
- worker-bottomup: _open_sessions + _build_agent_context accept graph_id,
  all 9 call sites updated
- worker-ingest: _open_sessions + _build_agent_context accept graph_id
- worker-nodes: HatchetPipeline constructor accepts graph_id, resolves
  sessions in _open_sessions and _build_ctx
- worker-search: direct session factory calls resolve per graph_id
- worker-synthesis: synthesizer + super-synthesizer resolve graph_id
  to per-graph ReadGraphEngine + write session factories

Frontend:
- Switch 30 graph-scoped API methods from request() to graphRequest()
  (nodes, edges, facts, sources, seeds, conversations, syntheses, etc.)
- Non-graph-scoped methods (auth, config, members, usage) unchanged
- Graph picker now actually affects all data queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical:
- Fix 38 remaining API methods still using request() instead of
  graphRequest() — node sub-resources (dimensions, facts, edges,
  history, convergence), conversations, seeds, edge candidates,
  syntheses, sources. All graph-scoped data now routes correctly.

High:
- Add graph_pool_size / graph_max_overflow settings (defaults 5/10)
  for schema-mode non-default graphs, replacing hardcoded values

Low:
- migrate.py path derivation now uses kt_db.__file__ (matches graphs.py)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
systemSettings, members, waitlist, and invites are global/control-plane
resources without graph-scoped backend routes. Using graphRequest()
would 404 on non-default graphs. Reverted 7 methods to request().

Graph updated_at confirmed working — onupdate=_utcnow on the ORM
column handles auto-update on every flush.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs charlie83Gs merged commit 8eecab7 into main Apr 6, 2026
18 checks passed
@charlie83Gs charlie83Gs deleted the worktree-feat+multigraph branch April 6, 2026 14:26
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant