feat(plugins): backend-engine-fact-decomposition plugin (Phase 4 #44)#279
Conversation
Contributes the ``llm-default`` :class:`FactDecompositionProvider` via
the plugin registry, wiring the id ``DefaultGraphTypePlugin`` declares
in ``composition().fact_decomposition``.
- ``LlmDefaultFactDecompositionProvider`` adapts the existing
``kt_facts.extraction.TextExtractor``. Registry lookups
(``plugin_registry.get_fact_decomposition_provider("llm-default",
gw)``) resolve here instead of callers importing ``TextExtractor``
directly. No new extraction logic — extractor source stays in
``kt-facts`` for this slice; a follow-up can relocate it.
- Plugin registered in ``_DEFAULT_PLUGIN_TARGETS``. ``is_enabled()``
gated on ``plugin.fact_decomposition.enabled`` (default ``True`` —
built-in convention).
- CI: new ``plugin-fact-decomposition`` entry in
``.github/workflows/test.yml`` paths-filter + maybe_add matrix.
First concrete provider under the Phase 4 ABC. The consumer-side
wire-up (``DecompositionPipeline`` / workers resolving via
``ctx.config.composition.fact_decomposition``) lands next (#45).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Pipeline consumers Addresses PR #279 review items 1 + 2 directly on this branch. Item 1 — ABC split: - ``DecompositionContext.source_url`` added as typed field alongside existing ``source_id``. ``source_id`` now reserved for opaque DB identifiers (future callers); ``source_url`` carries the retrieval URL for prompt-building providers. Drops the lossy conflation where the adapter forwarded ``source_id`` onto ``TextExtractor.source_url``. - Plugin adapter reads ``ctx.source_url`` directly; ignores ``source_id`` (it's opaque, URL-only extractor doesn't need it). - ``DecompositionPipeline.extract_text`` passes the URL kwarg to ``source_url=`` (was ``source_id=``). - Test updated to assert ``ctx.source_url`` + ``ctx.source_id is None`` on a URL-only caller. Item 2 — remaining consumer wire-ups: - ``AgentContext`` grows ``fact_decomposition_provider`` + ``fact_decomposition_settings`` optional fields. Propagates through ``create_child_context``. - Both ``_build_agent_context`` helpers (worker-bottomup.shared, worker-ingest.workflows.ingest) auto-resolve from ``kt_hatchet.composition`` when not pre-supplied. Workflow task entries can still pre-resolve to avoid duplicate lookups; caller- supplied values take precedence. - All 5 bare ``DecompositionPipeline(ctx.model_gateway)`` call sites in worker-nodes (building/unified ×2, nodes/pipeline ×3) now thread ``phase_settings`` + ``fact_decomposition_provider`` from ctx. - ``GatherFactsPipeline._decompose_images_with_session`` forwards ``ctx.fact_decomposition_provider`` into its pipeline. - ``decompose_source_task`` (worker-search) resolves provider at entry. Full coverage — no legacy ``TextExtractor`` path remains for graphs with a composition that names a registered provider. Unregistered-id path still warns + falls back (canary tolerance preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review items 1 + 2 addressed directly on this branch (df790b3). Item 1 — source_id / source_url split on ABC
Item 2 — all DecompositionPipeline consumers wired
No legacy Tests: 175 green across kt-agents-core, kt-facts, worker-search, worker-nodes, worker-bottomup. |
Fixes Backend Lint CI failure — auto-format after manual edits in prior commits.
_build_agent_context is the single source of truth for resolving the composition-driven fact-decomposition provider + phase settings. Before: task entries (worker-bottomup.bottom_up.workflow and the ingest equivalent) called resolve_fact_decomposition_provider + resolve_phase_settings themselves, then passed the results into _build_agent_context, which would re-resolve only if the arg was None. A legitimate "unregistered plugin → None" return from the task-entry resolve would collapse with "not provided" and trigger a second resolution plus duplicate WARN log for the same graph. After: task entries don't pre-resolve. _build_agent_context always resolves and stamps the values onto the returned AgentContext. Downstream calls that still need the settings (run_bottom_up_scope_pipeline) read them off agent_ctx.fact_decomposition_settings. Also drops the fact_decomposition_provider/settings kwargs from both _build_agent_context signatures (no callers pass them anymore). Tests: 34 worker-bottomup, 28 worker-ingest, 124 worker-nodes pass.
|
Addressed review feedback in 2 commits on top of #279: Commit e7be609 — Commit 8bee23a — Notes on other items:
All 672 files formatted; all unit tests pass (14 libs + 7 services + 2 plugins). |
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
Summary
First concrete Phase 4 provider — `LlmDefaultFactDecompositionProvider` wraps the existing `kt_facts.extraction.TextExtractor` and contributes `llm-default` via the plugin registry. Registry lookups for the id `DefaultGraphTypePlugin` declares in `composition().fact_decomposition` resolve here instead of callers importing `TextExtractor` directly.
No new extraction logic — extractor source stays in `kt-facts` for this slice to keep blast radius to a registration adapter; a follow-up can relocate `TextExtractor` into this package once no other caller imports it directly.
Why
Closes first half of Phase 4 #44 / #45. The worker-side wire-up (`DecompositionPipeline` resolving via `ctx.config.composition.fact_decomposition` instead of constructing `TextExtractor` by hand) lands in a stacked follow-up.
Test plan
🤖 Generated with Claude Code