Skip to content

feat(plugins): backend-engine-fact-decomposition plugin (Phase 4 #44)#279

Merged
charlie83Gs merged 6 commits intomainfrom
feat/backend-engine-fact-decomposition-plugin
Apr 21, 2026
Merged

feat(plugins): backend-engine-fact-decomposition plugin (Phase 4 #44)#279
charlie83Gs merged 6 commits intomainfrom
feat/backend-engine-fact-decomposition-plugin

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Summary

First concrete Phase 4 provider — `LlmDefaultFactDecompositionProvider` wraps the existing `kt_facts.extraction.TextExtractor` and contributes `llm-default` via the plugin registry. Registry lookups for the id `DefaultGraphTypePlugin` declares in `composition().fact_decomposition` resolve here instead of callers importing `TextExtractor` directly.

  • New `plugins/backend-engine-fact-decomposition/` package.
  • Registered in `_DEFAULT_PLUGIN_TARGETS`. `is_enabled()` gated on `plugin.fact_decomposition.enabled` (default True — built-in convention).
  • CI: new `plugin-fact-decomposition` entry in paths-filter + maybe_add matrix.

No new extraction logic — extractor source stays in `kt-facts` for this slice to keep blast radius to a registration adapter; a follow-up can relocate `TextExtractor` into this package once no other caller imports it directly.

Why

Closes first half of Phase 4 #44 / #45. The worker-side wire-up (`DecompositionPipeline` resolving via `ctx.config.composition.fact_decomposition` instead of constructing `TextExtractor` by hand) lands in a stacked follow-up.

Test plan

  • `uv run --project plugins/backend-engine-fact-decomposition pytest plugins/backend-engine-fact-decomposition/tests/ -x -v` — 5/5 green (plugin identity, is_enabled default, llm-default contribution, factory→ABC, registry round-trip)
  • `uv run --project libs/kt-config pytest libs/kt-config/tests/ -x -q` — 236/236 green
  • `uv run --project libs/kt-facts pytest libs/kt-facts/tests/ --ignore=tests/integration -x -q` — 232/232 green
  • CI: full workspace tests

🤖 Generated with Claude Code

Contributes the ``llm-default`` :class:`FactDecompositionProvider` via
the plugin registry, wiring the id ``DefaultGraphTypePlugin`` declares
in ``composition().fact_decomposition``.

- ``LlmDefaultFactDecompositionProvider`` adapts the existing
  ``kt_facts.extraction.TextExtractor``. Registry lookups
  (``plugin_registry.get_fact_decomposition_provider("llm-default",
  gw)``) resolve here instead of callers importing ``TextExtractor``
  directly. No new extraction logic — extractor source stays in
  ``kt-facts`` for this slice; a follow-up can relocate it.
- Plugin registered in ``_DEFAULT_PLUGIN_TARGETS``. ``is_enabled()``
  gated on ``plugin.fact_decomposition.enabled`` (default ``True`` —
  built-in convention).
- CI: new ``plugin-fact-decomposition`` entry in
  ``.github/workflows/test.yml`` paths-filter + maybe_add matrix.

First concrete provider under the Phase 4 ABC. The consumer-side
wire-up (``DecompositionPipeline`` / workers resolving via
``ctx.config.composition.fact_decomposition``) lands next (#45).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
charlie83Gs and others added 2 commits April 21, 2026 08:31
…d provider (Phase 4 #44) (#280)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Pipeline consumers

Addresses PR #279 review items 1 + 2 directly on this branch.

Item 1 — ABC split:
- ``DecompositionContext.source_url`` added as typed field alongside
  existing ``source_id``. ``source_id`` now reserved for opaque DB
  identifiers (future callers); ``source_url`` carries the retrieval
  URL for prompt-building providers. Drops the lossy conflation where
  the adapter forwarded ``source_id`` onto ``TextExtractor.source_url``.
- Plugin adapter reads ``ctx.source_url`` directly; ignores
  ``source_id`` (it's opaque, URL-only extractor doesn't need it).
- ``DecompositionPipeline.extract_text`` passes the URL kwarg to
  ``source_url=`` (was ``source_id=``).
- Test updated to assert ``ctx.source_url`` + ``ctx.source_id is None``
  on a URL-only caller.

Item 2 — remaining consumer wire-ups:
- ``AgentContext`` grows ``fact_decomposition_provider`` +
  ``fact_decomposition_settings`` optional fields. Propagates through
  ``create_child_context``.
- Both ``_build_agent_context`` helpers (worker-bottomup.shared,
  worker-ingest.workflows.ingest) auto-resolve from
  ``kt_hatchet.composition`` when not pre-supplied. Workflow task
  entries can still pre-resolve to avoid duplicate lookups; caller-
  supplied values take precedence.
- All 5 bare ``DecompositionPipeline(ctx.model_gateway)`` call sites
  in worker-nodes (building/unified ×2, nodes/pipeline ×3) now thread
  ``phase_settings`` + ``fact_decomposition_provider`` from ctx.
- ``GatherFactsPipeline._decompose_images_with_session`` forwards
  ``ctx.fact_decomposition_provider`` into its pipeline.
- ``decompose_source_task`` (worker-search) resolves provider at
  entry.

Full coverage — no legacy ``TextExtractor`` path remains for graphs
with a composition that names a registered provider. Unregistered-id
path still warns + falls back (canary tolerance preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@charlie83Gs
Copy link
Copy Markdown
Contributor Author

Review items 1 + 2 addressed directly on this branch (df790b3).

Item 1 — source_id / source_url split on ABC

  • DecompositionContext.source_url promoted to typed field alongside source_id. source_id reserved for opaque DB identifiers; source_url carries retrieval URL. Drops the lossy conflation where the adapter forwarded source_id onto TextExtractor.source_url.
  • Plugin adapter reads ctx.source_url directly; ignores source_id (URL-only extractor doesn't need it).
  • DecompositionPipeline.extract_text passes the URL kwarg to source_url= (was source_id=).
  • Test updated to assert ctx.source_url + ctx.source_id is None on URL-only caller.

Item 2 — all DecompositionPipeline consumers wired

  • AgentContext grows fact_decomposition_provider + fact_decomposition_settings optional fields. Propagates through create_child_context.
  • Both _build_agent_context helpers (worker-bottomup.shared, worker-ingest.workflows.ingest) auto-resolve from kt_hatchet.composition when not pre-supplied. Caller-supplied values take precedence (workflow entries can pre-resolve to avoid duplicate lookups).
  • All 5 bare DecompositionPipeline(ctx.model_gateway) sites in worker-nodes (building/unified ×2, nodes/pipeline ×3) now thread phase_settings + fact_decomposition_provider from ctx.
  • GatherFactsPipeline._decompose_images_with_session forwards ctx.fact_decomposition_provider.
  • decompose_source_task (worker-search) resolves provider at entry.

No legacy TextExtractor path remains for graphs whose composition names a registered provider. Unregistered-id path still WARN + falls back (canary tolerance preserved).

Tests: 175 green across kt-agents-core, kt-facts, worker-search, worker-nodes, worker-bottomup.

Fixes Backend Lint CI failure — auto-format after manual edits in prior commits.
_build_agent_context is the single source of truth for resolving the
composition-driven fact-decomposition provider + phase settings.

Before: task entries (worker-bottomup.bottom_up.workflow and the ingest
equivalent) called resolve_fact_decomposition_provider +
resolve_phase_settings themselves, then passed the results into
_build_agent_context, which would re-resolve only if the arg was None.
A legitimate "unregistered plugin → None" return from the task-entry
resolve would collapse with "not provided" and trigger a second
resolution plus duplicate WARN log for the same graph.

After: task entries don't pre-resolve. _build_agent_context always
resolves and stamps the values onto the returned AgentContext. Downstream
calls that still need the settings (run_bottom_up_scope_pipeline) read
them off agent_ctx.fact_decomposition_settings.

Also drops the fact_decomposition_provider/settings kwargs from both
_build_agent_context signatures (no callers pass them anymore).

Tests: 34 worker-bottomup, 28 worker-ingest, 124 worker-nodes pass.
@charlie83Gs
Copy link
Copy Markdown
Contributor Author

Addressed review feedback in 2 commits on top of #279:

Commit e7be609style(worker): ruff format 6 files — fixes the Backend Lint CI failure.

Commit 8bee23arefactor(worker): drop pre-resolve at fact-decomp task entries — addresses the double-resolve risk (issue #1). Task entries no longer pre-resolve provider/settings; _build_agent_context is the single source of truth and stamps the values onto the returned AgentContext. Downstream calls that still need the settings (run_bottom_up_scope_pipeline) read from agent_ctx.fact_decomposition_settings. Also drops the now-unused kwargs from both _build_agent_context signatures.

Notes on other items:

All 672 files formatted; all unit tests pass (14 libs + 7 services + 2 plugins).

@charlie83Gs charlie83Gs merged commit 0b8efc2 into main Apr 21, 2026
30 checks passed
@charlie83Gs charlie83Gs deleted the feat/backend-engine-fact-decomposition-plugin branch April 21, 2026 15:17
@github-actions
Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant