Skip to content

fix(ci): reset version to 0.1.0 and fix Docker builds#6

Merged
charlie83Gs merged 1 commit intomainfrom
fix/release-and-docker-builds
Mar 23, 2026
Merged

fix(ci): reset version to 0.1.0 and fix Docker builds#6
charlie83Gs merged 1 commit intomainfrom
fix/release-and-docker-builds

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Summary

Fixes two issues from the initial release:

1. Version reset: v1.0.0 → v0.1.0

Semantic-release treated the initial release as a major bump (1.0.0) despite major_on_zero = false. This happened because there were no prior tags, so it defaulted to a major release.

  • Reset pyproject.toml version to 0.1.0
  • Deleted the v1.0.0 release and tag
  • Reset CHANGELOG.md

After merge, a v0.1.0 tag will be created manually as the baseline for future semantic-release runs.

2. Docker build failures

All 9 backend service builds failed with:

error: Failed to determine installation plan
  Caused by: Distribution not found at: file:///app/services/worker-conversations

Each Dockerfile copies only libs/ and its own service, but uv sync --frozen tries to resolve ALL workspace members from uv.lock.

Fix: Use --package <name> flag so uv only installs the target package and its dependencies:

RUN uv sync --frozen --no-dev --package kt-api

3. Build matrix fail-fast disabled

Added fail-fast: false so one service's build failure doesn't cancel all other builds.

Test plan

  • CI passes (lint + tests)
  • After merge, manually create v0.1.0 tag to trigger build pipeline
  • All 11 Docker images build and push successfully

🤖 Generated with Claude Code

- Reset version from 1.0.0 back to 0.1.0 (semantic-release incorrectly
  treated initial release as major)
- Fix all 9 service Dockerfiles: use `uv sync --frozen --no-dev --package <name>`
  so uv only resolves the target package, not all workspace members
- Add fail-fast: false to build matrix so one failure doesn't cancel all
- Reset CHANGELOG.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs charlie83Gs merged commit 3c2d838 into main Mar 23, 2026
3 checks passed
@charlie83Gs charlie83Gs deleted the fix/release-and-docker-builds branch March 23, 2026 16:15
charlie83Gs added a commit that referenced this pull request Mar 27, 2026
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 5, 2026
Critical:
- #1: Validate schema names with strict ^[a-z0-9_]+$ regex before DDL
- #2: Escape ILIKE special chars (%, _, \) in graph_nodes search
- #3: Replace cached Graph ORM instances with frozen GraphInfo dataclass
  to prevent DetachedInstanceError

High:
- #4: Reuse system session factories for default graph (no duplicate pools)
  via default_graph_session_factory/default_write_session_factory params
- #5: Add 23 unit tests — GraphInfo, GraphSessions, GraphSessionResolver,
  slug/schema validation, CreateGraphRequest, role validation
- #6: Scope sync watermarks by graph_slug — SyncEngine now passes
  graph_slug to _get_watermark/_set_watermark, composite PK on
  (table_name, graph_slug)

Medium:
- #7: Replace N+1 member count queries with batch GROUP BY
- #8: Replace catch { // ignore } with console.error in frontend
- #9: Engine pool disposal on GraphSessionResolver.invalidate()
- #10: Run Alembic migrations during graph provisioning
- #11: (node_count in list deferred — requires cross-schema queries)

Low:
- #13: Replace "Cycle Role" button with role dropdown
- #14: require_writer/require_graph_admin kept for future endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 5, 2026
Critical:
- #1: Remove dead quote_ident call — regex is the sole injection guard
- #2: Add ^[a-z0-9_]+$ validation for ALEMBIC_SCHEMA in both env.py files

High:
- #3: Derive kt_db_root from kt_db package location instead of fragile parents[5]
- #4: Document MCP omits default_write_session_factory intentionally (read-only)
- #5: GraphContext now uses GraphInfo (frozen dataclass) instead of ORM Graph

Medium:
- #6: Replace user._token_graph_slugs monkey-patching with request.state
- #7: Fix remaining catch { // ignore } in graphs/page.tsx
- #9: Document MCP graph access check limitation, planned for follow-up

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 7, 2026
Replaces the confusing "schema vs database" storage-mode toggle with a
single Database picker. Schema is now the only isolation strategy: every
non-default graph gets its own schema in some database. The user's only
choice is *which* database. To get DB-level isolation, just don't put
another graph in the same connection.

Fixes the bug where selecting any database in the new graph UI silently
created the schema in the system DB instead of the chosen one.

Addresses PR #171 review feedback (#1, #2, #4, #5, #6, nits) — supersedes
that PR with a much smaller diff that builds on the upstream multi-graph
foundation that landed since.

Backend
- GET /api/v1/graphs/database-connections now prepends a synthetic
  ``default`` entry (id=null, config_key="default") so the system DB is
  selectable from the dropdown.
- DatabaseConnectionResponse: id and created_at are nullable to carry the
  synthetic entry.
- CreateGraphRequest: drop ``storage_mode`` field. Server hardcodes
  storage_mode="schema" (the column stays for backward compat with old
  rows but is no longer load-bearing at the API).
- create_graph handler: treat ``database_connection_config_key`` of None
  or "default" as the system DB; any other key resolves to a row.
- _provision_graph: previously hardcoded the system graph-db / write-db
  / Qdrant. Now resolves the target URLs from
  ``settings.graph_databases[config_key]`` based on
  ``graph.database_connection_id``, creates schemas via one-off engines
  pointed at the correct DBs, runs alembic with DATABASE_URL /
  WRITE_DATABASE_URL env overrides (Pydantic Settings re-reads them in
  the subprocess), and provisions Qdrant collections against the
  per-graph URL when it differs.
- graph_sessions.py: collapse the "schema vs database" branch in
  _build_and_cache; non-default graphs now route by
  ``database_connection_id IS NULL`` only.
- New ``make_qdrant_client(url, timeout)`` factory in kt-qdrant for
  non-singleton clients pointed at arbitrary URLs.
- GraphRepository.create_database_connection now rejects the reserved
  config_key "default" — without this, an admin could insert a row that
  silently shadows the synthetic entry.
- GraphDatabaseConfig: add a Pydantic field validator that normalizes
  ``postgresql://...`` to ``postgresql+asyncpg://...`` so plain URLs
  from EXTRA_DB_* env vars or YAML don't blow up create_async_engine.

Frontend
- Drop the Storage Mode <select> entirely. Database <select> is always
  visible, populated from listDatabaseConnections() (which now includes
  the synthetic "default" first), default value "default".
- On submit, omit ``database_connection_config_key`` when "default" so
  the backend treats it as the system DB.
- Drop the legacy "Separate DB / Shared DB" badge fallback in the graph
  card and detail page; render ``g.database_connection_name ?? "default"``.
- Filter dropdown: rename the "schema mode" option to "default".

Tests
- libs/kt-config/tests/test_graph_databases.py — covers the new asyncpg
  URL validator.
- services/api/tests/test_graph_schemas.py — drop the storage_mode-
  specific cases; add test_default_connection_key_accepted.
- services/api/tests/integration/test_database_connections_endpoint.py —
  TestClient-based coverage of the synthetic-default ordering, real-row
  ordering, admin-only auth (403 for non-superuser), and the reserved
  "default" rejection in the repository.

Out of scope (follow-ups)
- Dropping the ``storage_mode`` column entirely (would need a migration).
- Per-graph runtime use of qdrant_url in worker code (the resolver in
  graph_sessions.py already plumbs it through; no consumer reads it yet).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 7, 2026
Picks up the actionable items from the round-3 review.

#1 Orphan-schema risk — defended in docstring + recovery path documented
   The Graph row is committed in "provisioning" status BEFORE _provision_graph
   runs (graphs.py:313). All provisioning steps are idempotent
   (CREATE SCHEMA IF NOT EXISTS, alembic upgrade head, ensure_collection),
   so a partial failure marks the row "error" and /retry-provision is the
   recovery path. Added a "Failure recovery" section to the _provision_graph
   docstring explaining why we deliberately don't drop schemas on failure.

#3 Fragile qdrant_url comparison — strip trailing slashes when comparing
   so http://h:6333 and http://h:6333/ are treated as the same target and
   we don't needlessly spawn+close a fresh Qdrant client.

#4 "default" magic string — extracted DEFAULT_DB_CONFIG_KEY constant in
   kt_config.settings and used everywhere (API handler, repo guard,
   list endpoint, startup check). Added a startup check in
   kt_api.main._assert_default_db_key_unreserved() that logs an error
   if a real database_connections row holds the reserved key — catches
   anything that may have slipped in via raw SQL or a previous version
   that lacked the repo guard.

#6 Test coverage gap — added two new test files:
   - services/api/tests/test_provision_graph_routing.py: 3 tests that mock
     create_async_engine, subprocess.run, and the Qdrant client factories
     to assert _provision_graph routes to the EXTERNAL DB URLs when
     database_connection_id is set, routes to the SYSTEM DB URLs when it's
     null, and raises a clear error when the config_key is missing from
     settings.graph_databases.
   - test_database_connections_endpoint.py: new
     test_create_graph_with_default_key_uses_system_db that POSTs a graph
     with database_connection_config_key="default" and asserts the
     resulting row has database_connection_id=None. Added a stub_users_in_db
     fixture that inserts the test users into the User table to satisfy
     the FK constraint on graphs.created_by.

#7 Frontend grep — confirmed nothing references storage_mode on
   CreateGraphRequest in frontend/. The only remaining reference is on
   GraphResponse, which is the read-side type (kept for backward compat).

Skipped per reviewer note: #2 (alembic env override — reviewer self-resolved),
#5 (cosmetic _admin rename).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 8, 2026
PR4 review (#192) flagged five substantive correctness gaps. All
addressed; full test suite still green (kt-graph 95 / kt-hatchet 40).

## Correctness fixes

1. **`_match_or_create_node` no longer over-reports `created` and no
   longer trusts the remote `node_uuid`** (review #1).
   - The local uuid is now derived from `key_to_uuid(make_node_key(...))`
     so it never collides with the existing unique index on
     `write_nodes.node_uuid` when the same concept already exists locally
     under a different historical id.
   - The insert uses `RETURNING node_uuid` to distinguish a real insert
     from an ON CONFLICT no-op. On no-op the bridge re-SELECTs the
     existing local row's `node_uuid` and reports `created=False` so
     `ImportResult.nodes_matched` increments correctly.

2. **`_load_linked_nodes` array-overlap query now has integration
   coverage** (review #2). New test file
   `tests/integration/test_public_bridge_db.py` spins up a real write-db
   schema and exercises the full SQL surface that the unit tests can't:
   - The `write_nodes.fact_ids && ARRAY[...]` overlap operator —
     including the concept/entity type filter (perspective rows must NOT
     match) and an empty-input short-circuit.
   - `_load_linked_facts` / `_load_linked_fact_sources` provenance joins.
   - `_upsert_raw_source` returning the correct local id on both insert
     and ON CONFLICT branches.
   - `_match_or_create_node` reuse-vs-create branches without Qdrant.
   - `_upsert_fact_source` idempotency under re-imports.

3. **`_upsert_raw_source` now returns the local id, not the remote one**
   (review #3). Both call sites updated:
   - `import_cached_source` records `result.raw_source_id = local_raw_id`
     so PR5's workflow code attaches downstream rows to the right id.
   - `contribute_source_and_facts` discards the return — fact_source
     rows there are keyed on `content_hash`, not the source id.

4. **`make_worker_engine` now refuses to wire a bridge for a non-default
   graph that's missing its Qdrant collection prefix** (review #4).
   Empty prefix would silently dedup against the default graph's
   collection — exactly the cross-contamination this whole subsystem
   exists to prevent. Fail loud at construction rather than discover
   it in production.

5. **`_upsert_fact_source` now uses a deterministic UUID5** keyed on
   `(local_fact_id, raw_source_content_hash)` instead of a fresh
   `uuid.uuid4()` (review #8). Re-imports of the same source into the
   same target graph become true no-ops without needing a schema-level
   unique constraint. PR5's workflow should still avoid re-imports,
   this is defence in depth.

## Smaller things

- **lifespan.py**: extracted `_resolve_default_graph_id()` helper —
  the duplicated try/except block in `worker_lifespan()` and
  `build_worker_state()` collapses to one call (review #5).
- **test_public_bridge.py**: comment on the staleness assertion now
  reflects the actual fixture date (`2023-01-01`, not `2026-01-01`)
  (review #6).
- **CLAUDE.md**: noted the one-way `kt-hatchet → kt-graph` workspace dep
  added in PR4 so future contributors don't reverse it (review #9).

## Test plan

- [x] kt-graph: 95 passed (13 unit + 9 new integration on
      `test_public_bridge_db.py`, plus the existing 73)
- [x] kt-hatchet: 40 passed
- [ ] CI all green
charlie83Gs added a commit that referenced this pull request Apr 21, 2026
Non-blocking review items on the sync wiring PR:

#4 Replace `assert worker_state.services is not None` with an
   explicit `if ... raise RuntimeError` so the invariant holds
   under `python -O` (which strips asserts) and we never dispatch
   against a silently-None services container.

#5 Dedup the 'no plugin contributes this id' WARN from
   `resolve_sync_provider` per `(graph_id, provider_id)` pair.
   Sync dispatch is a cron that runs once per graph every minute;
   without dedup a rolling deployment fills the log with N×60
   duplicate lines per hour per missing plugin. First occurrence
   still fires at WARNING; subsequent occurrences drop to DEBUG so
   operators can still see them under a verbose logger but don't
   get paged on spam. Set stays bounded by (graph_id, provider_id)
   cardinality (handful × handful) — no eviction needed; a worker
   restart re-warns once, which is the signal operators want.

#6 Surface `SyncResult.failures` in the task log line + emitted
   event. Legacy path doesn't have a failures counter (engine's
   dead-letter inserts are logged inside the engine), so the field
   is 0 on the legacy branch and populated from the provider result
   on the registry branch. No behaviour change on legacy; parity
   for the provider contract.

#7 Move the `from kt_worker_sync.sync_engine import SyncEngine`
   import inside the legacy-path branch — the provider-driven path
   never needs it, so we skip the import on workers whose plugin is
   registered. Micro-optimisation; clearer at the call site too.

Skipped:
- #1 PR body overstates scope: will amend body on GitHub instead.
- #2 ABC lifecycle mismatch (init once vs per-task): flagged for a
  Phase-5 follow-up. Cache-on-WorkerState design needs more thought
  than fits in a review pass.
- #3 options-dict workaround: real design tension — per-graph
  session factories don't fit the `initialize(services)` once-per-
  worker contract. Phase-5 follow-up.

Tests: 15 composition helper tests (2 new dedup tests — one
asserting repeated resolves produce a single WARN, one asserting
distinct keys warn independently), 18 worker-sync unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant