fix(ci): reset version to 0.1.0 and fix Docker builds by charlie83Gs · Pull Request #6 · openktree/knowledge-tree

charlie83Gs · 2026-03-23T16:11:28Z

Summary

Fixes two issues from the initial release:

1. Version reset: v1.0.0 → v0.1.0

Semantic-release treated the initial release as a major bump (1.0.0) despite major_on_zero = false. This happened because there were no prior tags, so it defaulted to a major release.

Reset pyproject.toml version to 0.1.0
Deleted the v1.0.0 release and tag
Reset CHANGELOG.md

After merge, a v0.1.0 tag will be created manually as the baseline for future semantic-release runs.

2. Docker build failures

All 9 backend service builds failed with:

error: Failed to determine installation plan
  Caused by: Distribution not found at: file:///app/services/worker-conversations

Each Dockerfile copies only libs/ and its own service, but uv sync --frozen tries to resolve ALL workspace members from uv.lock.

Fix: Use --package <name> flag so uv only installs the target package and its dependencies:

RUN uv sync --frozen --no-dev --package kt-api

3. Build matrix fail-fast disabled

Added fail-fast: false so one service's build failure doesn't cancel all other builds.

Test plan

CI passes (lint + tests)
After merge, manually create v0.1.0 tag to trigger build pipeline
All 11 Docker images build and push successfully

🤖 Generated with Claude Code

- Reset version from 1.0.0 back to 0.1.0 (semantic-release incorrectly treated initial release as major) - Fix all 9 service Dockerfiles: use `uv sync --frozen --no-dev --package <name>` so uv only resolves the target package, not all workspace members - Add fail-fast: false to build matrix so one failure doesn't cancel all - Reset CHANGELOG.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Critical: - #1: Validate schema names with strict ^[a-z0-9_]+$ regex before DDL - #2: Escape ILIKE special chars (%, _, \) in graph_nodes search - #3: Replace cached Graph ORM instances with frozen GraphInfo dataclass to prevent DetachedInstanceError High: - #4: Reuse system session factories for default graph (no duplicate pools) via default_graph_session_factory/default_write_session_factory params - #5: Add 23 unit tests — GraphInfo, GraphSessions, GraphSessionResolver, slug/schema validation, CreateGraphRequest, role validation - #6: Scope sync watermarks by graph_slug — SyncEngine now passes graph_slug to _get_watermark/_set_watermark, composite PK on (table_name, graph_slug) Medium: - #7: Replace N+1 member count queries with batch GROUP BY - #8: Replace catch { // ignore } with console.error in frontend - #9: Engine pool disposal on GraphSessionResolver.invalidate() - #10: Run Alembic migrations during graph provisioning - #11: (node_count in list deferred — requires cross-schema queries) Low: - #13: Replace "Cycle Role" button with role dropdown - #14: require_writer/require_graph_admin kept for future endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Critical: - #1: Remove dead quote_ident call — regex is the sole injection guard - #2: Add ^[a-z0-9_]+$ validation for ALEMBIC_SCHEMA in both env.py files High: - #3: Derive kt_db_root from kt_db package location instead of fragile parents[5] - #4: Document MCP omits default_write_session_factory intentionally (read-only) - #5: GraphContext now uses GraphInfo (frozen dataclass) instead of ORM Graph Medium: - #6: Replace user._token_graph_slugs monkey-patching with request.state - #7: Fix remaining catch { // ignore } in graphs/page.tsx - #9: Document MCP graph access check limitation, planned for follow-up Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces the confusing "schema vs database" storage-mode toggle with a single Database picker. Schema is now the only isolation strategy: every non-default graph gets its own schema in some database. The user's only choice is *which* database. To get DB-level isolation, just don't put another graph in the same connection. Fixes the bug where selecting any database in the new graph UI silently created the schema in the system DB instead of the chosen one. Addresses PR #171 review feedback (#1, #2, #4, #5, #6, nits) — supersedes that PR with a much smaller diff that builds on the upstream multi-graph foundation that landed since. Backend - GET /api/v1/graphs/database-connections now prepends a synthetic ``default`` entry (id=null, config_key="default") so the system DB is selectable from the dropdown. - DatabaseConnectionResponse: id and created_at are nullable to carry the synthetic entry. - CreateGraphRequest: drop ``storage_mode`` field. Server hardcodes storage_mode="schema" (the column stays for backward compat with old rows but is no longer load-bearing at the API). - create_graph handler: treat ``database_connection_config_key`` of None or "default" as the system DB; any other key resolves to a row. - _provision_graph: previously hardcoded the system graph-db / write-db / Qdrant. Now resolves the target URLs from ``settings.graph_databases[config_key]`` based on ``graph.database_connection_id``, creates schemas via one-off engines pointed at the correct DBs, runs alembic with DATABASE_URL / WRITE_DATABASE_URL env overrides (Pydantic Settings re-reads them in the subprocess), and provisions Qdrant collections against the per-graph URL when it differs. - graph_sessions.py: collapse the "schema vs database" branch in _build_and_cache; non-default graphs now route by ``database_connection_id IS NULL`` only. - New ``make_qdrant_client(url, timeout)`` factory in kt-qdrant for non-singleton clients pointed at arbitrary URLs. - GraphRepository.create_database_connection now rejects the reserved config_key "default" — without this, an admin could insert a row that silently shadows the synthetic entry. - GraphDatabaseConfig: add a Pydantic field validator that normalizes ``postgresql://...`` to ``postgresql+asyncpg://...`` so plain URLs from EXTRA_DB_* env vars or YAML don't blow up create_async_engine. Frontend - Drop the Storage Mode <select> entirely. Database <select> is always visible, populated from listDatabaseConnections() (which now includes the synthetic "default" first), default value "default". - On submit, omit ``database_connection_config_key`` when "default" so the backend treats it as the system DB. - Drop the legacy "Separate DB / Shared DB" badge fallback in the graph card and detail page; render ``g.database_connection_name ?? "default"``. - Filter dropdown: rename the "schema mode" option to "default". Tests - libs/kt-config/tests/test_graph_databases.py — covers the new asyncpg URL validator. - services/api/tests/test_graph_schemas.py — drop the storage_mode- specific cases; add test_default_connection_key_accepted. - services/api/tests/integration/test_database_connections_endpoint.py — TestClient-based coverage of the synthetic-default ordering, real-row ordering, admin-only auth (403 for non-superuser), and the reserved "default" rejection in the repository. Out of scope (follow-ups) - Dropping the ``storage_mode`` column entirely (would need a migration). - Per-graph runtime use of qdrant_url in worker code (the resolver in graph_sessions.py already plumbs it through; no consumer reads it yet). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Picks up the actionable items from the round-3 review. #1 Orphan-schema risk — defended in docstring + recovery path documented The Graph row is committed in "provisioning" status BEFORE _provision_graph runs (graphs.py:313). All provisioning steps are idempotent (CREATE SCHEMA IF NOT EXISTS, alembic upgrade head, ensure_collection), so a partial failure marks the row "error" and /retry-provision is the recovery path. Added a "Failure recovery" section to the _provision_graph docstring explaining why we deliberately don't drop schemas on failure. #3 Fragile qdrant_url comparison — strip trailing slashes when comparing so http://h:6333 and http://h:6333/ are treated as the same target and we don't needlessly spawn+close a fresh Qdrant client. #4 "default" magic string — extracted DEFAULT_DB_CONFIG_KEY constant in kt_config.settings and used everywhere (API handler, repo guard, list endpoint, startup check). Added a startup check in kt_api.main._assert_default_db_key_unreserved() that logs an error if a real database_connections row holds the reserved key — catches anything that may have slipped in via raw SQL or a previous version that lacked the repo guard. #6 Test coverage gap — added two new test files: - services/api/tests/test_provision_graph_routing.py: 3 tests that mock create_async_engine, subprocess.run, and the Qdrant client factories to assert _provision_graph routes to the EXTERNAL DB URLs when database_connection_id is set, routes to the SYSTEM DB URLs when it's null, and raises a clear error when the config_key is missing from settings.graph_databases. - test_database_connections_endpoint.py: new test_create_graph_with_default_key_uses_system_db that POSTs a graph with database_connection_config_key="default" and asserts the resulting row has database_connection_id=None. Added a stub_users_in_db fixture that inserts the test users into the User table to satisfy the FK constraint on graphs.created_by. #7 Frontend grep — confirmed nothing references storage_mode on CreateGraphRequest in frontend/. The only remaining reference is on GraphResponse, which is the read-side type (kept for backward compat). Skipped per reviewer note: #2 (alembic env override — reviewer self-resolved), #5 (cosmetic _admin rename). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR4 review (#192) flagged five substantive correctness gaps. All addressed; full test suite still green (kt-graph 95 / kt-hatchet 40). ## Correctness fixes 1. **`_match_or_create_node` no longer over-reports `created` and no longer trusts the remote `node_uuid`** (review #1). - The local uuid is now derived from `key_to_uuid(make_node_key(...))` so it never collides with the existing unique index on `write_nodes.node_uuid` when the same concept already exists locally under a different historical id. - The insert uses `RETURNING node_uuid` to distinguish a real insert from an ON CONFLICT no-op. On no-op the bridge re-SELECTs the existing local row's `node_uuid` and reports `created=False` so `ImportResult.nodes_matched` increments correctly. 2. **`_load_linked_nodes` array-overlap query now has integration coverage** (review #2). New test file `tests/integration/test_public_bridge_db.py` spins up a real write-db schema and exercises the full SQL surface that the unit tests can't: - The `write_nodes.fact_ids && ARRAY[...]` overlap operator — including the concept/entity type filter (perspective rows must NOT match) and an empty-input short-circuit. - `_load_linked_facts` / `_load_linked_fact_sources` provenance joins. - `_upsert_raw_source` returning the correct local id on both insert and ON CONFLICT branches. - `_match_or_create_node` reuse-vs-create branches without Qdrant. - `_upsert_fact_source` idempotency under re-imports. 3. **`_upsert_raw_source` now returns the local id, not the remote one** (review #3). Both call sites updated: - `import_cached_source` records `result.raw_source_id = local_raw_id` so PR5's workflow code attaches downstream rows to the right id. - `contribute_source_and_facts` discards the return — fact_source rows there are keyed on `content_hash`, not the source id. 4. **`make_worker_engine` now refuses to wire a bridge for a non-default graph that's missing its Qdrant collection prefix** (review #4). Empty prefix would silently dedup against the default graph's collection — exactly the cross-contamination this whole subsystem exists to prevent. Fail loud at construction rather than discover it in production. 5. **`_upsert_fact_source` now uses a deterministic UUID5** keyed on `(local_fact_id, raw_source_content_hash)` instead of a fresh `uuid.uuid4()` (review #8). Re-imports of the same source into the same target graph become true no-ops without needing a schema-level unique constraint. PR5's workflow should still avoid re-imports, this is defence in depth. ## Smaller things - **lifespan.py**: extracted `_resolve_default_graph_id()` helper — the duplicated try/except block in `worker_lifespan()` and `build_worker_state()` collapses to one call (review #5). - **test_public_bridge.py**: comment on the staleness assertion now reflects the actual fixture date (`2023-01-01`, not `2026-01-01`) (review #6). - **CLAUDE.md**: noted the one-way `kt-hatchet → kt-graph` workspace dep added in PR4 so future contributors don't reverse it (review #9). ## Test plan - [x] kt-graph: 95 passed (13 unit + 9 new integration on `test_public_bridge_db.py`, plus the existing 73) - [x] kt-hatchet: 40 passed - [ ] CI all green

Non-blocking review items on the sync wiring PR: #4 Replace `assert worker_state.services is not None` with an explicit `if ... raise RuntimeError` so the invariant holds under `python -O` (which strips asserts) and we never dispatch against a silently-None services container. #5 Dedup the 'no plugin contributes this id' WARN from `resolve_sync_provider` per `(graph_id, provider_id)` pair. Sync dispatch is a cron that runs once per graph every minute; without dedup a rolling deployment fills the log with N×60 duplicate lines per hour per missing plugin. First occurrence still fires at WARNING; subsequent occurrences drop to DEBUG so operators can still see them under a verbose logger but don't get paged on spam. Set stays bounded by (graph_id, provider_id) cardinality (handful × handful) — no eviction needed; a worker restart re-warns once, which is the signal operators want. #6 Surface `SyncResult.failures` in the task log line + emitted event. Legacy path doesn't have a failures counter (engine's dead-letter inserts are logged inside the engine), so the field is 0 on the legacy branch and populated from the provider result on the registry branch. No behaviour change on legacy; parity for the provider contract. #7 Move the `from kt_worker_sync.sync_engine import SyncEngine` import inside the legacy-path branch — the provider-driven path never needs it, so we skip the import on workers whose plugin is registered. Micro-optimisation; clearer at the call site too. Skipped: - #1 PR body overstates scope: will amend body on GitHub instead. - #2 ABC lifecycle mismatch (init once vs per-task): flagged for a Phase-5 follow-up. Cache-on-WorkerState design needs more thought than fits in a review pass. - #3 options-dict workaround: real design tension — per-graph session factories don't fit the `initialize(services)` once-per- worker contract. Phase-5 follow-up. Tests: 15 composition helper tests (2 new dedup tests — one asserting repeated resolves produce a single WARN, one asserting distinct keys warn independently), 18 worker-sync unit tests.

charlie83Gs merged commit 3c2d838 into main Mar 23, 2026
3 checks passed

charlie83Gs deleted the fix/release-and-docker-builds branch March 23, 2026 16:15

charlie83Gs mentioned this pull request Mar 23, 2026

fix(ci): set allow_zero_version and reset to 0.1.0 #7

Merged

charlie83Gs added a commit that referenced this pull request Mar 27, 2026

fix(ci): reset version to 0.1.0 and fix Docker builds (#6)

2792098

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

charlie83Gs mentioned this pull request Apr 7, 2026

feat(graphs): pick a database when creating a graph #171

Merged

8 tasks

charlie83Gs mentioned this pull request Apr 20, 2026

feat(api): GET /graphs/{slug}/migrations endpoint (Phase 7 #59) #257

Merged

3 tasks

This was referenced Apr 21, 2026

feat(worker-sync): dispatch sync_graph_task via registry (Phase 5 #51) #287

Merged

test(plugin-agentic-synthesizer): P6 end-to-end composition integration test (#55) #290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): reset version to 0.1.0 and fix Docker builds#6

fix(ci): reset version to 0.1.0 and fix Docker builds#6
charlie83Gs merged 1 commit intomainfrom
fix/release-and-docker-builds

charlie83Gs commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charlie83Gs commented Mar 23, 2026

Summary

1. Version reset: v1.0.0 → v0.1.0

2. Docker build failures

3. Build matrix fail-fast disabled

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant