Skip to content

Phase 4 follow-ups: source_id ABC debt + remaining DecompositionPipeline consumer wire-ups #281

@charlie83Gs

Description

@charlie83Gs

Follow-ups flagged during PR #279/#280 review. Neither is a blocker for Phase 4 close but should land before Phase 4 is marked fully done.

1. `DecompositionContext.source_id → source_url` mapping is lossy

`kt_core_engine_api.decomposition.DecompositionContext.source_id` and the built-in `LlmDefaultFactDecompositionProvider` conflate "DB id" with "URL" — the adapter forwards `ctx.source_id` onto `TextExtractor`'s `source_url` kwarg because that's the closest analogue on the existing extractor.

Options:

  • Rename: `source_id → source_url` on the ABC. Matches existing extractor semantics; forces callers threading a bare DB id to be explicit.
  • Split: add both `source_id` and `source_url` typed fields. Providers pick whichever they need.

Prefer split — URL-less sources (uploaded files, future connectors) still have a DB id.

Files:

  • `libs/kt-core-engine-api/src/kt_core_engine_api/decomposition/provider.py`
  • `plugins/backend-engine-fact-decomposition/src/kt_plugin_be_fact_decomposition/provider.py`
  • `libs/kt-facts/src/kt_facts/pipeline.py` (extract_text → DecompositionContext construction)

2. Remaining `DecompositionPipeline` consumer wire-ups

#280 wired only `decompose_chunk_task` (worker-search). Other call sites still run legacy `TextExtractor`:

  • `services/worker-search/src/kt_worker_search/workflows/decompose.py` — `decompose_source_task`
  • `services/worker-nodes/src/kt_worker_nodes/pipelines/gathering/pipeline.py:532` — `_decompose_images_with_session`
  • `services/worker-nodes/src/kt_worker_nodes/pipelines/building/unified.py:137,287` — refresh + expand paths
  • `services/worker-nodes/src/kt_worker_nodes/pipelines/nodes/pipeline.py:97,464,751`

Pattern per call site:

  1. Resolve provider via `kt_hatchet.composition.resolve_fact_decomposition_provider(state, graph_id, gateway)` at task entry.
  2. Pass as `fact_decomposition_provider=` to `DecompositionPipeline`.
  3. Worker-nodes sites inside `AgentContext`-wrapped pipelines need `graph_id` threaded through — either extend `AgentContext` or thread as explicit kwarg like Phase 2 fix(helm): use version_variables for Chart.yaml version bumping #33 did.

One PR per call site or one bundled — either works. Use caveman canary test runs to catch regression (`validate_all_graph_types` should stay quiet for the default graph type).

Related

No blockers. Both items are cleanup landed on Phase 4 foundations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions