Skip to content

feat(graphs): pick a database when creating a graph#171

Merged
charlie83Gs merged 4 commits intomainfrom
feat/multi-graph-db-picker
Apr 7, 2026
Merged

feat(graphs): pick a database when creating a graph#171
charlie83Gs merged 4 commits intomainfrom
feat/multi-graph-db-picker

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Summary

Drops the confusing "schema vs database" storage-mode toggle in the graph creation UI and replaces it with a single Database picker. The first option is always default (the system DB); other options come from a new GET /api/v1/graphs/database-connections endpoint that lists rows from the database_connections table.

Conceptually: every non-default graph is a schema in some database. The only choice the user makes is which database. Multiple graphs in the same DB are schema-isolated by convention.

Closes the gap reported on prod where the new graph UI had no database options to pick from.

What changes for users

Before — confusing dropdown:

Storage Mode: [ Schema (same DB, separate schema) | Database (different DB) ]

…and even when "Database" was selected, there was no way to pick which DB.

After — clear single dropdown:

Database: [ default | shared | ...other provisioned DBs ]

Backend

  • New GET /api/v1/graphs/database-connections (admin-only). Synthetic default entry is always first; rows from database_connections follow in name order.
  • Drop storage_mode from CreateGraphRequest. Server derives it from database_connection_config_key (None or "default" → schema mode in system DB; otherwise → database mode in the referenced external DB).
  • Fix _provision_graph to honor database_connection_id: resolves the graph/write/qdrant URLs from settings.graph_databases[config_key], creates schemas via one-off engines pointed at the correct DBs, runs alembic with DATABASE_URL / WRITE_DATABASE_URL env overrides (Pydantic Settings re-reads them in the subprocess), and provisions Qdrant collections against the per-graph URL when it differs.
  • New make_qdrant_client() factory in kt-qdrant for non-singleton clients pointed at arbitrary URLs.

Frontend

  • New DatabaseConnectionResponse type and listDatabaseConnections() API helper.
  • frontend/src/app/graphs/page.tsx: Storage Mode <select> replaced with Database <select> populated from the new endpoint; default selection is "default".
  • frontend/src/app/graphs/[slug]/page.tsx: "Storage" field renamed to "Database", showing the connection name (looked up by database_connection_id), falling back to default / external.

Tests

  • services/api/tests/test_graph_schemas.py — drop the storage_mode-specific cases; add a test_default_connection_key_accepted for the magic string.
  • New services/api/tests/test_database_connections_endpoint.py — covers synthetic-default ordering, name-ordered rows, and string-UUID id field. All 3 tests pass locally against real Postgres.
  • kt-config (26), kt-qdrant (64), services/api graph schemas (15), and frontend (123) tests all green.

Out of scope (follow-ups)

  • Dropping the storage_mode column entirely (would need an alembic migration; column kept for now)
  • Per-graph runtime use of qdrant_url (the resolver in graph_sessions.py already plumbs it through; no consumer reads it yet)

Test plan

  • Merge & wait for CI to publish the new image
  • Reconcile flux on prod
  • Hit GET /api/v1/graphs/database-connections as admin → expect [{default}, {shared}]
  • In the UI: open Graphs → Create Graph → confirm dropdown shows default and Shared
  • Create a graph with slug prov_test against Shared
  • Verify schema lands in knowledge-tree-shared-graph-db (NOT in the system graph-db)
    kubectl exec -n knowledge-tree knowledge-tree-shared-graph-db-1 -- \
      psql -U postgres -d knowledge_tree_shared -c "\dn" | grep prov_test
  • Verify Qdrant collections exist on knowledge-tree-shared-qdrant
  • Verify the graph card shows DB: Shared instead of Separate DB / Shared DB

Risks

  • _provision_graph external-DB code path is exercised in prod for the first time. If alembic migrations fail against a fresh external DB it likely means search_path isn't being honored — fixable, may need a follow-up.
  • DDL through PgBouncer transaction mode for the write-db schema creation should work for a single CREATE SCHEMA. If it fails, plan B is to point provisioning at the underlying CNPG -rw service directly.

🤖 Generated with Claude Code

Replaces the confusing "schema vs database" storage-mode toggle with a
single Database picker. Schema is now the only isolation strategy: every
non-default graph gets its own schema in some database. The user's only
choice is *which* database. To get DB-level isolation, just don't put
another graph in the same connection.

Fixes the bug where selecting any database in the new graph UI silently
created the schema in the system DB instead of the chosen one.

Addresses PR #171 review feedback (#1, #2, #4, #5, #6, nits) — supersedes
that PR with a much smaller diff that builds on the upstream multi-graph
foundation that landed since.

Backend
- GET /api/v1/graphs/database-connections now prepends a synthetic
  ``default`` entry (id=null, config_key="default") so the system DB is
  selectable from the dropdown.
- DatabaseConnectionResponse: id and created_at are nullable to carry the
  synthetic entry.
- CreateGraphRequest: drop ``storage_mode`` field. Server hardcodes
  storage_mode="schema" (the column stays for backward compat with old
  rows but is no longer load-bearing at the API).
- create_graph handler: treat ``database_connection_config_key`` of None
  or "default" as the system DB; any other key resolves to a row.
- _provision_graph: previously hardcoded the system graph-db / write-db
  / Qdrant. Now resolves the target URLs from
  ``settings.graph_databases[config_key]`` based on
  ``graph.database_connection_id``, creates schemas via one-off engines
  pointed at the correct DBs, runs alembic with DATABASE_URL /
  WRITE_DATABASE_URL env overrides (Pydantic Settings re-reads them in
  the subprocess), and provisions Qdrant collections against the
  per-graph URL when it differs.
- graph_sessions.py: collapse the "schema vs database" branch in
  _build_and_cache; non-default graphs now route by
  ``database_connection_id IS NULL`` only.
- New ``make_qdrant_client(url, timeout)`` factory in kt-qdrant for
  non-singleton clients pointed at arbitrary URLs.
- GraphRepository.create_database_connection now rejects the reserved
  config_key "default" — without this, an admin could insert a row that
  silently shadows the synthetic entry.
- GraphDatabaseConfig: add a Pydantic field validator that normalizes
  ``postgresql://...`` to ``postgresql+asyncpg://...`` so plain URLs
  from EXTRA_DB_* env vars or YAML don't blow up create_async_engine.

Frontend
- Drop the Storage Mode <select> entirely. Database <select> is always
  visible, populated from listDatabaseConnections() (which now includes
  the synthetic "default" first), default value "default".
- On submit, omit ``database_connection_config_key`` when "default" so
  the backend treats it as the system DB.
- Drop the legacy "Separate DB / Shared DB" badge fallback in the graph
  card and detail page; render ``g.database_connection_name ?? "default"``.
- Filter dropdown: rename the "schema mode" option to "default".

Tests
- libs/kt-config/tests/test_graph_databases.py — covers the new asyncpg
  URL validator.
- services/api/tests/test_graph_schemas.py — drop the storage_mode-
  specific cases; add test_default_connection_key_accepted.
- services/api/tests/integration/test_database_connections_endpoint.py —
  TestClient-based coverage of the synthetic-default ordering, real-row
  ordering, admin-only auth (403 for non-superuser), and the reserved
  "default" rejection in the repository.

Out of scope (follow-ups)
- Dropping the ``storage_mode`` column entirely (would need a migration).
- Per-graph runtime use of qdrant_url in worker code (the resolver in
  graph_sessions.py already plumbs it through; no consumer reads it yet).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs charlie83Gs force-pushed the feat/multi-graph-db-picker branch from 9b2e7a9 to 879573c Compare April 7, 2026 02:03
@charlie83Gs
Copy link
Copy Markdown
Contributor Author

Pushed a fresh take addressing all review feedback. Force-pushed because the previous branch was diverged from main and the rebase had multiple conflicts — easier to start clean from the latest main.

Changes vs the original PR

  • Rebased onto main — discovered that v0.36.0/v0.37.0 already landed: _discover_extra_databases, qdrant_url on GraphDatabaseConfig, the database picker UI scaffolding, and database_connection_name on GraphResponse. So most of the original diff was duplicating upstream work.
  • Conceptual simplification per follow-up feedback: drop the "schema vs database" toggle entirely. Schema is now the only isolation strategy. The user just picks a database — to get DB-level isolation, just don't put another graph in the same connection.
  • graph_sessions.py: collapsed the storage_mode == "database" branch in _build_and_cache. Non-default graphs now route by database_connection_id IS NULL only.

Review feedback addressed

  1. Reserved default config_keyGraphRepository.create_database_connection now raises ValueError if you try to insert a row with config_key="default". Covered by test_create_database_connection_rejects_default_key.
  2. asyncpg URL normalization — added a field_validator on GraphDatabaseConfig that auto-rewrites postgresql://postgresql+asyncpg://. Covered by libs/kt-config/tests/test_graph_databases.py::TestGraphDatabaseConfigValidator.
  3. storage_mode grep — confirmed nothing else branches on it at runtime: graph_sessions.py:200 was the only consumer and it's been refactored away. The column stays for backward-compat with old rows but is no longer load-bearing.
  4. Admin auth coverageservices/api/tests/integration/test_database_connections_endpoint.py::test_blocked_for_non_admin uses httpx.AsyncClient + dependency_overrides[require_auth] to assert non-superusers get 403 from the route.
  5. make_qdrant_client cosmetic — only calls get_settings() when timeout is None.
  6. (skipped per request — no CI smoke test for the external-DB provisioning path)

Nits

  • from kt_config.settings import get_settings moved to module top in graphs.py.
  • 🟡 Did not extract a dbLabel(g) helper in the frontend — the badge is now a one-line nullish coalesce: g.database_connection_name ?? "default". No ternary chain to extract.

Test plan

  • kt-config (34), kt-qdrant (64), kt-db graph models (8), api graph schemas (15), api permissions (14), api database-connections endpoint (4) — all green locally.
  • Frontend lint clean; 123 frontend tests passing.
  • Wait for CI build, deploy, then exercise create-graph-against-shared end-to-end on prod.

- Fix Backend Lint failure: hoist DatabaseConnection/delete imports out of
  the cleanup_connections fixture body so ruff I001 (import-block sort)
  passes (services/api/tests/integration/test_database_connections_endpoint.py).
- Provisioning engines: tag them with application_name="kt-provision-{slug}"
  via connect_args.server_settings, and use poolclass=NullPool since these
  are single-shot connections. Connections are now visible in
  pg_stat_activity and don't sit in a pool waiting for nothing.
- Frontend create form: disable the submit button when dbError is set so a
  user whose database-connections fetch failed can't silently fall through
  to the system DB by accident.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs
Copy link
Copy Markdown
Contributor Author

Round 2 fixes pushed (e07b6ca):

Lint failure

Fixed — moved DatabaseConnection and delete imports out of the cleanup_connections fixture body to module-level so ruff I001 passes. Confirmed locally with uv run --frozen ruff check ..

Review feedback addressed

  1. application_name on provisioning engines — both one-off engines now pass connect_args={"server_settings": {"application_name": f"kt-provision-{graph.slug}"}}. They show up in pg_stat_activity tagged.
  2. NullPool on one-off engines — added poolclass=NullPool so we don't keep an idle pool around for a single CREATE SCHEMA.
  3. (acknowledged — DatabaseConnectionResponse duplication is pre-existing)
  4. Follow-up issue opened: Drop graphs.storage_mode column (alembic migration) #172 to drop the storage_mode column with a proper alembic migration.
  5. (FYI — test path is correct)
  6. Disable submit on dbError<Button disabled={creating || dbError !== null}> so a failed db-connections fetch can't silently fall through to the system DB.
  7. (verified) — checked libs/kt-db/alembic/env.py: it honors ALEMBIC_SCHEMA by setting version_table_schema and running SET search_path TO {schema}, public in do_run_migrations. Combined with the DATABASE_URL / WRITE_DATABASE_URL env override in the subprocess, the alembic-against-external-DB path should work end-to-end. Will still verify with the manual smoke test in the test plan after merge.

Test status

  • ruff check . clean
  • test_graph_schemas.py (15) ✓
  • test_database_connections_endpoint.py (4) ✓
  • test_permissions.py (14) ✓
  • Frontend lint clean, 123 frontend tests ✓

Picks up the actionable items from the round-3 review.

#1 Orphan-schema risk — defended in docstring + recovery path documented
   The Graph row is committed in "provisioning" status BEFORE _provision_graph
   runs (graphs.py:313). All provisioning steps are idempotent
   (CREATE SCHEMA IF NOT EXISTS, alembic upgrade head, ensure_collection),
   so a partial failure marks the row "error" and /retry-provision is the
   recovery path. Added a "Failure recovery" section to the _provision_graph
   docstring explaining why we deliberately don't drop schemas on failure.

#3 Fragile qdrant_url comparison — strip trailing slashes when comparing
   so http://h:6333 and http://h:6333/ are treated as the same target and
   we don't needlessly spawn+close a fresh Qdrant client.

#4 "default" magic string — extracted DEFAULT_DB_CONFIG_KEY constant in
   kt_config.settings and used everywhere (API handler, repo guard,
   list endpoint, startup check). Added a startup check in
   kt_api.main._assert_default_db_key_unreserved() that logs an error
   if a real database_connections row holds the reserved key — catches
   anything that may have slipped in via raw SQL or a previous version
   that lacked the repo guard.

#6 Test coverage gap — added two new test files:
   - services/api/tests/test_provision_graph_routing.py: 3 tests that mock
     create_async_engine, subprocess.run, and the Qdrant client factories
     to assert _provision_graph routes to the EXTERNAL DB URLs when
     database_connection_id is set, routes to the SYSTEM DB URLs when it's
     null, and raises a clear error when the config_key is missing from
     settings.graph_databases.
   - test_database_connections_endpoint.py: new
     test_create_graph_with_default_key_uses_system_db that POSTs a graph
     with database_connection_config_key="default" and asserts the
     resulting row has database_connection_id=None. Added a stub_users_in_db
     fixture that inserts the test users into the User table to satisfy
     the FK constraint on graphs.created_by.

#7 Frontend grep — confirmed nothing references storage_mode on
   CreateGraphRequest in frontend/. The only remaining reference is on
   GraphResponse, which is the read-side type (kept for backward compat).

Skipped per reviewer note: #2 (alembic env override — reviewer self-resolved),
#5 (cosmetic _admin rename).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs
Copy link
Copy Markdown
Contributor Author

Round 3 fixes pushed (7b21f76):

Items addressed

#1 — Orphan-schema risk (defended + documented)
After re-reading the actual control flow, I think the original concern was based on a misread of the order. The Graph row commits at graphs.py:313 before _provision_graph runs at graphs.py:317, so there's no "orphan row" possibility. There IS a possible "orphan schema in the external DB after a partial provisioning failure" scenario, but every step is idempotent (CREATE SCHEMA IF NOT EXISTS, alembic upgrade head, ensure_collection), so a failure marks the row "error" and POST /api/v1/graphs/{slug}/retry-provision is the safe recovery path. I added a "Failure recovery" section to the _provision_graph docstring explaining this so future readers don't have to re-derive it. Open to adding cleanup-on-failure if you still think it's worth the complexity, but the cleanup itself can fail and would muddy the error path.

#3 — Fragile qdrant_url comparison
Now strips trailing slashes before comparing: qdrant_url.rstrip("/") != settings.qdrant_url.rstrip("/"). So http://h:6333 and http://h:6333/ no longer needlessly spawn a fresh client.

#4 — "default" magic string

  • Extracted DEFAULT_DB_CONFIG_KEY = "default" in libs/kt-config/src/kt_config/settings.py with a docstring explaining the contract.
  • All four sites now import and use it: API handler (graphs.py:288), synthetic list entry (graphs.py:355), repo guard (repositories/graphs.py).
  • Added a startup check in kt_api.main._assert_default_db_key_unreserved() that logs an error at API startup if any real database_connections row holds the reserved key. Catches anything that may have slipped in via raw SQL or a previous version that lacked the repo guard.

#6 — Test coverage gap
Two new test files:

  • services/api/tests/test_provision_graph_routing.py — 3 unit tests that mock create_async_engine, subprocess.run, and the Qdrant client factories to assert _provision_graph routes to:
    • the external DB URLs when database_connection_id is set,
    • the system DB URLs when it's NULL,
    • raises a clear RuntimeError if the config_key is missing from Settings.graph_databases.
  • test_database_connections_endpoint.py::test_create_graph_with_default_key_uses_system_db — POSTs a graph with database_connection_config_key="default" and asserts the resulting row has database_connection_id=None. Added a stub_users_in_db fixture to insert the test users into the User table so the graphs.created_by FK doesn't trip.

#7 — Frontend storage_mode references
grep -r "storage_mode" frontend/ returns one hit: GraphResponse.storage_mode in types/index.ts. That's the read-side type (kept for backward compat with old graph rows). Nothing references CreateGraphRequest.storage_mode.

Skipped per reviewer note

Test results

  • ruff check . clean (auto-fixed I001 import-order issues from the new code)
  • 94 api tests pass, including the 3 new _provision_graph routing tests and the new test_create_graph_with_default_key_uses_system_db
  • 34 kt-config tests pass
  • 8 kt-db graph models tests pass

CI runs ``ruff format --check`` and the new test file had a long
assertion line that needed reformatting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs charlie83Gs merged commit 6c8543c into main Apr 7, 2026
18 checks passed
@charlie83Gs charlie83Gs deleted the feat/multi-graph-db-picker branch April 7, 2026 02:28
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

charlie83Gs added a commit that referenced this pull request Apr 7, 2026
When the create-graph handler runs alembic against an external DB whose
CNPG-generated password contains characters URL-encoded as ``%XX`` (e.g.
``+`` → ``%2B``), ``alembic/env.py`` crashes:

    ValueError: invalid interpolation syntax in
    'postgresql+asyncpg://kt:%2BzYZK...@host:5432/db' at position 24

Python's ``configparser`` interprets ``%(...)s`` as interpolation tokens.
``set_main_option`` validates the value at write time and rejects any
``%`` that isn't part of a valid interpolation reference. The fix is to
escape ``%`` to ``%%`` before calling ``set_main_option`` — the
unescape happens automatically at ``.get()`` time, so SQLAlchemy still
receives the original URL.

Same fix applied to ``alembic_write/env.py`` for write-db.

Reproducer + fix verified by ``test_alembic_env_url_escape.py``.

Discovered while running PR #171's _provision_graph against the prod
shared graph-db: the password starts with ``+`` and broke alembic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant