Skip to content

refactor(kt-graph,worker-nodes): route auto_build through WorkerGraphEngine#298

Closed
charlie83Gs wants to merge 1 commit intomainfrom
feat/engine-batch-absorb
Closed

refactor(kt-graph,worker-nodes): route auto_build through WorkerGraphEngine#298
charlie83Gs wants to merge 1 commit intomainfrom
feat/engine-batch-absorb

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Summary

  • auto_build_graph was reaching into WriteNode/Edge/Dimension repos and Qdrant directly, duplicating key derivation, edge-rekey logic and Qdrant plumbing. The rest of the worker-nodes pipeline already goes through WorkerGraphEngine — auto_build was the lone divergent path and risked silently bypassing future engine side-effects (hooks, counters, telemetry).
  • Add batch + lifecycle methods on WorkerGraphEngine so the engine remains the single source of write-routing truth, then route auto_build through them.

New on WorkerGraphEngine (libs/kt-graph/src/kt_graph/worker_engine.py)

  • bulk_create_nodes(specs) — accepts list[NodeCreateSpec], upserts each into write-db, populates the in-memory node cache, returns BulkCreateResult per spec. No commit, no Qdrant.
  • bulk_upsert_nodes_to_qdrant(items) — single batched Qdrant call (chunked at 200 internally by QdrantNodeRepository.upsert_batch).
  • delete_node(node_id) / delete_node_qdrant(node_uuid) — write-db node delete + best-effort Qdrant cleanup.
  • delete_edge_by_key(edge_key) — write-db edge delete with cache eviction.
  • absorb_node(loser_id, winner_id) — composite: dimension transfer, edge re-key, fact merge, loser delete. Returns AbsorbResult so caller can do Qdrant cleanup outside any savepoint.

All new methods leave transaction control to the caller (no commit, no begin_nested) and keep Qdrant calls non-transactional — matches the per-seed semantics auto_build was already relying on.

auto_build (services/worker-nodes/src/kt_worker_nodes/workflows/auto_build.py)

  • _promote_seeds builds a NodeCreateSpec per active seed, calls bulk_create_nodes inside a per-seed savepoint, marks the seed promoted, then issues one batched bulk_upsert_nodes_to_qdrant outside savepoints. Drops direct imports of WriteNodeRepository, make_node_key, key_to_uuid, QdrantNodeRepository.
  • _absorb_merged_nodes resolves loser/winner UUIDs and calls engine.absorb_node inside a savepoint, then engine.delete_node_qdrant outside. Drops direct imports of WriteEdgeRepository, WriteDimensionRepository, make_edge_key.
  • _check_fact_stale_nodes unchanged.

Seed lifecycle (get_promotable_seeds, mark_seed_promoted, get_merged_promoted_seeds, clear_promoted_node_key, get_seed_fact_ids) stays in WriteSeedRepository — seeds are not graph entities.

Tests

  • New libs/kt-graph/tests/test_worker_engine_bulk.py — 17 unit tests covering all 6 new methods, including absorb-node happy path, missing loser/winner, self-edge, no-commit / no-Qdrant invariants.
  • services/worker-nodes/tests/test_merge_absorption.py rewritten — mocks the engine seam (WorkerGraphEngine.absorb_node, delete_node_qdrant) instead of three repo classes. Engine internals are now covered by the engine's own tests.

Test plan

  • uv run --project libs/kt-graph pytest libs/kt-graph/tests/ --ignore=libs/kt-graph/tests/integration — 43 passed
  • uv run --project services/worker-nodes pytest services/worker-nodes/tests/ --ignore=services/worker-nodes/tests/integration — 122 passed
  • ruff check + ruff format clean on changed files
  • CI green

🤖 Generated with Claude Code

…Engine

auto_build was reaching into WriteNode/Edge/Dimension repos and Qdrant
directly, duplicating key derivation, edge-rekey logic and Qdrant
plumbing. Add batch + lifecycle methods on WorkerGraphEngine and have
auto_build go through them so the engine remains the single source of
write-routing truth.

New on WorkerGraphEngine:
- bulk_create_nodes, bulk_upsert_nodes_to_qdrant
- delete_node, delete_node_qdrant, delete_edge_by_key
- absorb_node (composite: dims + edges-rekey + facts + delete-loser)

All new methods leave transaction control to the caller (no commit, no
savepoint) and keep Qdrant calls non-transactional, matching today's
per-seed semantics.

auto_build:
- _promote_seeds builds NodeCreateSpec list per batch, calls
  bulk_create_nodes inside per-seed savepoints, then a single batched
  bulk_upsert_nodes_to_qdrant outside.
- _absorb_merged_nodes calls engine.absorb_node + engine.delete_node_qdrant.

Tests: new libs/kt-graph/tests/test_worker_engine_bulk.py (17 cases) for
the engine surface; test_merge_absorption rewritten to mock the engine
seam instead of three repo classes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@charlie83Gs
Copy link
Copy Markdown
Contributor Author

Merged locally; switching to local-iteration mode (no remote PRs for now).

@github-actions
Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant