Design proposal: ModelOutputThunk structural cleanup (#909) #1197

ajbozarth · 2026-06-03T22:58:16Z

ajbozarth
Jun 3, 2026
Maintainer

This Discussion is the design review for #909 (ModelOutputThunk structural cleanup). The issue itself was filed in April as a follow-up to #793 and has aged through the streaming-validation epic (#891, now closed) and the in-flight tracing refactor (#1181, approved). The shape of the cleanup has shifted with the codebase, and one related issue (#1191) is closely-enough scoped that it should be folded in.

Posting here rather than on the issue so each area and the appendix can thread independently. Per-area open questions live at the bottom of each area's comment.

How to comment

Per-area details (with their open questions) and the bug appendix are in the comments below — reply to the relevant comment for scoped feedback.
For overall shape / sequencing / scope feedback, reply at the top level of this Discussion.
CC @jakelorocco @planetf1 @psschwei specifically since you were part of the prior discussion

TL;DR

Area 1 (raw responses) — move the backend _meta[...] keys onto a typed mot.raw: RawProviderResponse namespace. Migrate chat.py:_parse to switch on mot.raw.provider.
Area 2 (telemetry_span) — already resolved by refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 (approved, pending merge). Drops out of scope on merge.
Area 3 (computation logistics) — split the seventeen private attrs into two internal sub-objects, mot._call (originating call, preserved across copies) and mot._gen (in-flight machinery, reset on copy). Make the preserve-vs-reset semantics that __deepcopy__ already implies explicit. Keep mot._generate_log exposed (see Area 5) — do not bury it.
Area 4 (_thinking) — promote to public mot.thinking with a property alias for _thinking through one deprecation cycle.
Area 5 (new — fold in Soft-fail errors on ModelOutputThunk are only reachable via a private attribute #1191) — give MOT a public surface for errors and the generate log. Specifically, expose mot.generate_log as a public property, and add a public mot.error channel so callers can detect a soft-failed thunk without reading private state.
Coordinated, not merged in: feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013 (Phase 2 streaming chunking, placeholder — Area 3 shape should accommodate it), Standardize logits in MoT #123 (logits — should land inside mot.generation per ModelOutputThunk field refactor: partition fields into sub-structures #793 precedent, not in mot.raw), ModelOutputThunk should not be a CBlock #269 (MOT-not-CBlock — orthogonal, out of scope, called out so reviewers know).

State of the world today

Reading mellea/core/base.py directly. Contrasted against #909's April field list because the class has shifted since then.

#909 area	Resolution today
1. Raw provider responses	Unchanged. `_meta` still carries `chat_response`, `oai_chat_response[_choice/_streamed]`, `oai_streaming_usage`, `litellm_full_response`, `litellm_chat_response[_streamed]`, `litellm_streaming_usage`, `hf_output`, etc. `chat.py:_parse` still branches on these keys.
2. `telemetry_span`	Pending #1181 (approved 2026-06-03, not yet merged). On `feat/enhanced-tracing`, `_meta["_telemetry_span"]` is gone from `base.py` and all five backends; spans live on the OTel context owned by `BackendTracingPlugin`, and `cancel_generation` / `astream` error path fire `GENERATION_ERROR` hooks instead of poking `_meta`. Drop from #909 once #1181 merges.
3. Computation logistics	Unchanged in shape, but #908 added `mot.generation` (precedent for sub-structures), #942 added `cancel_generation()` + `_cancelled` + public `cancelled` property (locked in the single-consumer queue contract), and #1181 will add `_generation_id` as another flat private attr.
4. `_thinking`	Unchanged. Still private-by-name, public-by-docs.

The audit also surfaced things #909 didn't enumerate:

_generation_id (post-refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181) — Mellea-side hook correlation ID. Joins the area-3 field set.
__deepcopy__ already groups fields implicitly. It preserves _action, _context, _model_options, _generate_log, generation (originating-call data) and resets the in-flight machinery. Any area-3 grouping should mirror this distinction, not collapse it.
_generate_log is heavily relied on as private surface. 55 external accesses across 19 files (every backend, every sampling strategy, functional.py, session.py, multiple test files). It is not movable without a property alias, and Soft-fail errors on ModelOutputThunk are only reachable via a private attribute #1191 explicitly wants it more visible, not less.
mot.cancelled exists but streaming.py:437 still reads mot._cancelled. See appendix.
Stale _meta["usage"] reads in budget_forcing_alg.py predate the mot.generation.usage move. See appendix.

Cross-cutting constraints

These shape every area below.

_meta is CBlock surface, not just MOT surface. This proposal does not change CBlock._meta; it only removes MOT's own writes into the dict. Reshaping CBlock is out of scope, and ModelOutputThunk should not be a CBlock #269 (which would moot the question by removing the inheritance) is also out of scope.
The public API is wider than the field names suggest. mot.value, mot.parsed_repr, mot.tool_calls, mot.generation, mot.cancelled, mot.is_computed(), and mot._thinking, mot._meta["chat_response" | "oai_chat_response" | "usage"], mot._generate_log are all read by code outside base.py today. Any move needs property aliases through at least one minor release.
Streaming validation locks in current shape. cancel_generation(), mot._async_queue single-consumer semantics, and mot.cancelled are load-bearing for stream_with_chunking() (feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942, merged May 2026). Area 3 cannot rename in a way that breaks those names without coordinated migration.
Phase 2 streaming chunking (feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013) is a placeholder. The Phase 2 design hasn't been written; feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013 has two comments and no code. Area 3's _gen shape should leave room for parsed-repr stream buffers but should not block on feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013 — that issue could remain open indefinitely.
refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 should land first. Both because Area 2 resolves on its merge, and because it adds _generation_id — Area 3's grouping needs to know about it.
Mellea is pre-1.0. Property aliases with DeprecationWarning are the right migration tool, but a long deprecation horizon isn't required.

Related-issues coordination summary

Issue	Disposition
#1181	Resolves Area 2 on merge. Area 3 inherits `_generation_id` from this PR.
#1191	Folded into #909 as Area 5. Closed by Area 5's two changes.
#1013	Coordinated, not blocking. `_gen` shape in Area 3 should accommodate Phase 2 streaming buffers. Land Area 3 first; #1013 benefits.
#123 (logits)	Not in #909's scope. Flagged: when implemented, logits go in `mot.generation` (e.g. `generation.logprobs`), not `mot.raw`. Per #793 precedent.
#269 (MOT-not-CBlock)	Out of scope. Flagged so reviewers know I'm aware. If acted on later, Area 1's `mot.raw` is unaffected (already a sibling of `_meta`, not nested under it); the rest of #909 is also unaffected.

Sequencing

If this shape is approved, I'll convert #909 into an epic with sub-issues per area, fold in #1191 as Area 5, and land in this order:

refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 lands first. Independent, already approved. After this, Area 2 is closed and _generation_id is in the Area 3 field set.
Area 5 (generate_log + error properties). Smallest. Closes Soft-fail errors on ModelOutputThunk are only reachable via a private attribute #1191.
Area 1 (mot.raw). No dependency on 3/4/5. Can ship after refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 merges.
Area 4 (mot.thinking). Independent. Same release as Area 1 is fine.
Area 3 (_call + _gen grouping) lands last. Largest internal rename. Original April rationale ("hold for feat: streaming validation — per-chunk requirement checking with early exit #891") is satisfied — Phase 1 is done. Recommended to land now, not wait for feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013 placeholder to elaborate; the _gen shape is independently motivated by the existing copy-method maintenance pain.

Per-area details (each with its open questions) and the bug appendix are in the comments below.

ajbozarth · 2026-06-03T22:58:37Z

ajbozarth
Jun 3, 2026
Maintainer Author

Area 1 — raw provider responses

Today

Each backend writes a different bag of keys to mot._meta during streaming aggregation and post-processing:

Ollama: chat_response: ollama.ChatResponse
OpenAI: oai_chat_response, oai_chat_response_choice, oai_streaming_usage, oai_chat_response_streamed
LiteLLM: litellm_full_response, litellm_chat_response, litellm_chat_response_streamed, litellm_streaming_usage
Watsonx: oai_chat_response, oai_chat_response_choice, oai_chat_response_streamed (OpenAI-shape proxy)
HuggingFace: hf_output (transformers GenerateOutput / token tensor)

External readers: stdlib/components/chat.py:_parse (10 sites; switches on chat_response vs oai_chat_response), tests in test/backends/ and test/stdlib/components/. (budget_forcing_alg.py reads _meta["usage"] but that's stale code — see the appendix comment.)

Proposal

Add a typed mot.raw namespace as a sibling of mot.generation:

@dataclass
class RawProviderResponse:
    """Backend-native response payload. Provider-coupling escape hatch.

    Anyone reading these fields is opting into provider-specific shape. Portable
    consumers should read `mot.value`, `mot.parsed_repr`, `mot.tool_calls`, or
    `mot.generation` instead.
    """
    provider: str | None = None         # mirrors generation.provider for self-description
    response: Any | None = None         # post-merge final response object
    streamed_chunks: list[Any] | None = None  # populated only for streaming

mot.raw.response replaces the merged chat_response / oai_chat_response / litellm_chat_response / hf_output keys.
mot.raw.streamed_chunks replaces the parallel *_streamed lists.
*_streaming_usage keys disappear entirely — that data is already in mot.generation.usage post-ModelOutputThunk field refactor: partition fields into sub-structures #793.
oai_chat_response_choice (a convenience alias to response.choices[0]) disappears. Callers can compute it.

chat.py:_parse becomes a switch on mot.raw.provider ("ollama" → response.message.role/content, "openai" | "watsonx" → response["choices"][0]["message"]…). The huggingface fallback path is unchanged.

Migration

Ship mot.raw as the new write target; have _meta[<old key>] become a DeprecationWarning-emitting __getitem__ shim on a custom dict for one minor release. Internal callers (chat.py, tests) move in the same PR.

Coordination notes

Logits (Standardize logits in MoT #123) are not part of mot.raw. Per ModelOutputThunk field refactor: partition fields into sub-structures #793 precedent, logits are execution metadata and belong on mot.generation (e.g. generation.logprobs). Calling that out so Standardize logits in MoT #123 doesn't get parked in raw later.
Area 1 is independent of refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181. Can ship in either order.

Open questions on Area 1

OQ1: Does mot.raw.response carry typed sub-shapes, or is Any correct?

I lean toward Any. The whole point of raw is "you opted into provider coupling, here's the native object". Anyone wanting portable structured access should read mot.generation (which #1181 broadens with response_model, finish_reasons, response_id).

Counter-argument worth airing: a Protocol for the OpenAI-shape responses would let chat.py:_parse keep type safety on the openai/watsonx branch. Probably overkill for the gain, but raising for visibility.

OQ2: Should chat.py:_parse lose its raw-response branch entirely?

With mot.tool_calls and mot.value carrying canonical content, the only thing the raw branch in _parse recovers is the role (assistant vs other). If we add generation.response_role (or similar), _parse becomes provider-independent and mot.raw has no internal callers — pure escape hatch.

Cleaner; adds one generation field. Worth it? Or is provider-aware parsing in _parse fine as-is and we keep mot.raw as the access path?

3 replies

planetf1 Jun 4, 2026
Maintainer

Reading the _meta shim as read-only — backends migrate writes to mot.raw.response in the Area 1 PR, and internal reads (_parse, budget forcing) migrate in the same PR. The shim then covers only external consumer reads for one minor release. That makes sense given we control the backends. Worth confirming that is the intent.

OQ1: personally uncomfortable with Any almost anywhere, and a per-provider Union is technically possible — but each would just be a thin wrapper over types we do not own, so it adds a layer without adding information. Lean toward Any with the intent made explicit in the docstring: "typed access goes via mot.generation; mot.raw is an intentional escape hatch."

OQ2: the role from a chat completion response is always "assistant" across the APIs we support — thinking models do not change the response role, they put reasoning in a separate field. So hardcoding "assistant" in _parse is safe and the raw branch can be eliminated without needing response_role on GenerationMetadata. Separately, promoting mot.thinking to public is only half the story — Message has no thinking field and _parse ignores _thinking entirely, so reasoning is silently dropped in multi-turn chat. Filed #1201 on this — Area 4 should cross-reference it.

jakelorocco Jun 5, 2026
Maintainer

OQ1: Any seems good to me.
OQ2: We should confirm this. I think tool requests may technically have a "tool" role for some of the providers we support, etc...

Also, I would like to confirm, the raw response is the full response object returned, right? I see this line which would indicate that: oai_chat_response_choice (a convenience alias to response.choices[0]) disappears. Callers can compute it..

ajbozarth Jun 5, 2026
Maintainer Author

@planetf1 @jakelorocco — confirming OQ2 with the actual SDK type signatures.

Roles across providers:

OpenAI (ChatCompletionMessage.role): Literal["assistant"] — always "assistant".
Watsonx: OpenAI-shape proxy, same.
LiteLLM (Message.role): typed as Literal["assistant", "user", "system", "tool", "function"] but defaults to "assistant" in response init; chat completions return "assistant" in practice.
Ollama (Message.role): docstring is explicit — "Response messages has role 'assistant' or 'tool'." Ollama can legitimately return "tool".

So Jake's instinct was right — hardcoding "assistant" would silently change behavior for tool-returning Ollama responses.

OQ2 resolution: keep the raw-response branch in _parse (provider-aware on the role dimension), keep mot.raw as the access path, do not add generation.response_role.

Confirming Jake's other question: yes, mot.raw.response is the full SDK response object (e.g. ollama.ChatResponse, OpenAI ChatCompletion) — no per-field unpacking, that's the point of raw as the escape hatch.

Read-only _meta shim (planetf1's framing): correct. Backend writes go to mot.raw.response in the Area 1 PR; internal reads (_parse, budget forcing, tests) migrate in the same PR. The deprecation shim only covers external consumers reading _meta[<old key>] for one minor cycle, then the keys disappear.

ajbozarth · 2026-06-03T22:58:53Z

ajbozarth
Jun 3, 2026
Maintainer Author

Area 3 — computation logistics

Today

The seventeen private attrs (post-#1181) sit flat on the MOT and split naturally into three groups, one of which is already enforced by __deepcopy__:

# Originating call info — preserved (deep-copied) by __deepcopy__
_action, _context, _model_options, _generate_log, _generation_id

# In-flight machinery — reset by __deepcopy__
_async_queue, _chunk_size, _first_chunk_received,
_generate, _generate_type, _generate_extra,
_cancel_hook, _process, _post_process, _on_computed,
_start

# Computation status — flat, read by user code
_computed, _cancelled

Proposal

Two internal sub-objects matching the deepcopy semantics, plus _computed/_cancelled left flat (they back is_computed() and the public cancelled property and need to remain cheap):

@dataclass
class _CallInfo:
    """Originating-call data. Preserved across copies because retries and
    sampling routinely need it."""
    action: Component | CBlock | None = None
    context: list[Component | CBlock] | None = None
    model_options: dict[str, Any] | None = None
    generation_id: str | None = None  # added by #1181

@dataclass
class _GenerationState:
    """In-flight computation machinery. Reset to a fresh empty instance on
    __copy__ / __deepcopy__ — a copied MOT is a distinct (non-generating)
    object and must not share queues, tasks, or thread signals."""
    queue: asyncio.Queue = field(default_factory=lambda: asyncio.Queue(maxsize=20))
    chunk_size: int = 3
    first_chunk_received: bool = False
    generate: asyncio.Task[None] | None = None
    generate_type: GenerateType = GenerateType.NONE
    generate_extra: asyncio.Task[Any] | None = None
    cancel_hook: Callable[[], None] | None = None
    process: Callable[[ModelOutputThunk, Any], Coroutine] | None = None
    post_process: Callable[[ModelOutputThunk], Coroutine] | None = None
    on_computed: Callable[[ModelOutputThunk], Coroutine] | None = None
    start: datetime.datetime | None = None

Note _generate_log is not in _CallInfo — see Area 5; it gets public exposure as mot.generate_log and stays accessible at that path.

__copy__ / __deepcopy__ deep-copy _call and reset _gen to a fresh _GenerationState(). The preserve-vs-reset rule that's currently enforced field-by-field becomes one line per method.

Migration

All mot._async_queue, mot._action, etc. accesses inside mellea/ move to mot._gen.queue, mot._call.action. These are all internal — backends, the streaming orchestrator, sampling — and migrate in the same PR. Tests get updated alongside.

For private attrs read by external callers, _action / _context / _model_options are not currently read outside mellea/, so no aliases are needed. _generate_log is the exception, addressed in Area 5.

Tradeoffs

Pro: copy/deepcopy contract becomes "deep-copy _call, reset _gen", one line per method. Adding fields for feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013 lands inside _gen without three follow-on edits.
Pro: the existing __deepcopy__ distinction becomes self-documenting via the type split.
Con: wide internal rename. Mitigation: confined to mellea/ and test/.
Con: two sub-objects vs one is more API surface. Worth it because the preserve-vs-reset semantics genuinely differ.

Interaction with `ComputedModelOutputThunk`

ComputedModelOutputThunk reassigns __class__ in-place — this proposal does not change that contract. _call and _gen are inherited like any other attribute; _gen on a ComputedModelOutputThunk is just dormant (no in-flight task). Mentioned only because the rename touches every internal call site.

Coordination with #1013

#1013's stream_parsed_repr will likely add a parsed-repr stream buffer adjacent to the current text queue. With _gen in place, that's a new field on _GenerationState; without it, another flat private attr to remember in three copy methods. Recommendation: do not block Area 3 on #1013 — that issue could remain a placeholder for months. Land Area 3 first; #1013 inherits the cleaner shape.

Open questions on Area 3

OQ4: Does _gen reset (rather than copy) on __copy__ / __deepcopy__ match all current callers?

The current copy methods do reset generation plumbing field-by-field, so this should be a wash. The audit found no external code reading _async_queue etc. on a copied MOT.

Flagging this as an open question only because the change goes from "implicit, enforced by inspection" to "structural, enforced by _gen = _GenerationState()" — if any caller has been relying on weird subset-reset behavior, it would break. PR review is the place to catch that, but raising it here in case anyone has historical context.

3 replies

planetf1 Jun 4, 2026
Maintainer

OQ4: no concerns — this formalises what the copy methods already do implicitly.

jakelorocco Jun 5, 2026
Maintainer

This sounds good to me. I think losing _gen information on copy is fine since the computed flag will be set in those cases. We should probably disallow copying an uncomputed model output thunk?

ajbozarth Jun 5, 2026
Maintainer Author

We should probably disallow copying an uncomputed model output thunk?

Agreed, folding into Area 3 scope.

The contract is already implicitly "copies are post-generation": __copy__/__deepcopy__ reset all in-flight machinery and the docstring says "A copied ModelOutputThunk cannot be used for generation." Raising on _computed is False makes that rule explicit instead of letting silent reset hide bugs.

Audit of actual copy sites in mellea/: nothing copies a ModelOutputThunk directly. The closest is streaming.py:438 doing copy(self._mot.generation) — that's the metadata sub-object, not the MOT. Sampling does deepcopy(action), never the thunk. So the blast radius for an uncomputed-copy guard is essentially zero in our own code.

The Area 3 issue will include: __copy__ / __deepcopy__ raise RuntimeError if self._computed is False, with a corresponding test in test/core/test_base.py alongside the existing post-computed copy assertions.

ajbozarth · 2026-06-03T22:59:03Z

ajbozarth
Jun 3, 2026
Maintainer Author

Area 4 — `_thinking`

Today

Documented as private (single underscore) but referenced publicly in docs/docs/integrations/openai.md:

If you intend to use thinking mode, read the reasoning trace from result._thinking rather than result.value.

Populated by all four chat backends (openai, litellm, watsonx, ollama). Not on mot.generation; not on mot.parsed_repr; not on mot.tool_calls. The underscore is the only path.

Proposal

Promote to public mot.thinking with _thinking kept as a deprecated alias for one minor release. Update docs/docs/integrations/openai.md. Backends keep writing through the property setter (which writes _thinking) until the deprecation cycle ends.

Considered alternatives

mot.generation.thinking — matches the "execution metadata" flavor of usage and finish_reasons. Rejected for now: reasoning is content the model produced, semantically closer to value than to usage; longer dotted path for a piece of content most users want directly.
mot.reasoning.* namespace — would give room for future structured reasoning data (encrypted reasoning, multiple streams, etc.). Rejected as speculative: reasoning is currently a flat string across all backends. mot.thinking doesn't preclude introducing mot.reasoning.* later if structured cases arrive.

Open questions on Area 4

OQ3: Deprecation horizon for _meta keys, _thinking, _generate_log.

Putting this on Area 4 because _thinking is the most user-visible of the deprecated names (published docs reference it), but the answer applies equally to the _meta keys retired in Area 1 and the _generate_log underscore retired in Area 5.

One minor or two? Pre-1.0 says one is fine; published docs reference result._thinking, which argues for caution. My lean: one minor with DeprecationWarning, then remove.

If there's any external (non-mellea-team) usage you're aware of, that pushes toward two. Otherwise one.

1 reply

planetf1 Jun 4, 2026
Maintainer

OQ3: one minor cycle. The DeprecationWarning mechanism and CONTRIBUTING_DOCS marker are already established — what is missing is a documented removal timeline. This would be a good opportunity to pin that down. Given _thinking is the most visible, one minor feels right, especially since the doc page gets updated in the same PR.

ajbozarth · 2026-06-03T22:59:14Z

ajbozarth
Jun 3, 2026
Maintainer Author

Area 5 (new) — fold in #1191: public error / generate_log surface

Why include this

#1191 is needs-design, identifies the same root problem #909 identifies (the "reach into private state" pattern), and explicitly says "the right shape touches the public thunk API and should be consistent across backends, not just Ollama." Designing it separately would force two coordinated proposals that touch the same surface in the same release. Folding it in here is cheaper.

Today

When OllamaModelBackend.generate_from_raw soft-fails on an empty done-response (#1161), the affected slot returns a ModelOutputThunk with value="" and stashes the error at thunk._generate_log.extra["error"] (raw response at ["empty_response"]). A caller iterating gather results cannot distinguish a soft-failure from a legitimately empty completion without reading _generate_log (private) and knowing about ollama-specific keys (no precedent elsewhere).

_generate_log itself is private by convention but read by 55 sites across 19 files (every backend, every sampling strategy, functional.py, session.py, tests). It's de-facto public.

Proposal

Two changes:

Promote _generate_log to a public property mot.generate_log. No shape change to GenerateLog; just expose the existing private attr officially. The attr stays as _generate_log underneath; backends and sampling code keep writing to it (internal); external consumers read mot.generate_log. This addresses Soft-fail errors on ModelOutputThunk are only reachable via a private attribute #1191's "no public accessor" concern at minimum cost.
Add a public mot.error: Exception | None property. Backed by an internal _error attr on the MOT. Backends that soft-fail set it. cancel_generation(error=...) already records its error at the OTel / hook layer; this gives the same information a public read path on the MOT itself. error is not None becomes the canonical "did this thunk soft-fail" check, sibling to mot.cancelled.

The ollama soft-fail path migrates from thunk._generate_log.extra["error"] to thunk.error. The extra["empty_response"] raw key is dropped — if the raw response is wanted, Area 1's mot.raw.response carries it.

The migration path for #1191's specific case: after both changes ship, the caller code is:

results = await asyncio.gather(*calls)
for r in results:
    if r.error is not None:
        # soft-failure
        ...
    elif r.cancelled:
        # explicit cancellation
        ...
    else:
        # ok
        use(r.value)

…which is what #1191 asks for. Closes #1191 as part of this work.

Coordination

Order: Area 5 can ship independently of Areas 1, 3, 4 — it adds public properties and one private attr, no rename.
Soft-fail errors on ModelOutputThunk are only reachable via a private attribute #1191's "soft-fail vs cancel" decision (proposal direction 3 in Soft-fail errors on ModelOutputThunk are only reachable via a private attribute #1191) is not settled here. We expose both channels (error and cancelled); whether to converge them or keep them distinct is OQ5 below.

Open questions on Area 5

OQ5: mot.error vs mot.cancelled — separate channels or unified?

#1191 floats merging them ("mark the soft-failed empty thunk cancelled, carrying the error").

I lean toward keeping them separate:

cancelled means "an active reader called cancel_generation" (deliberate, by some consumer)
error means "the backend produced an unusable result without raising" (involuntary, by the inference layer)

Conflating loses information that callers may want — was this a reader cancellation or a backend failure? Different remediation.

But there's a real case for converging too: from a "did this thunk produce a usable result" perspective, the answer is the same.

Worth resolving here before Area 5 implementation lands.

OQ6: Should mot.generate_log be read-only or mutable?

Internal code mutates _generate_log.is_final_result (sampling sets it to True on the chosen result). External code reading the property doesn't need write access.

Probably expose the attr by reference (mutation possible but discouraged); sampling stays on the underlying private name in mellea/ if we want to be strict about the public surface.

Alternative: expose a frozen view (e.g. dataclass replace-style) for external readers; keep _generate_log mutable for internal sampling. Heavier; probably not worth the complexity.

4 replies

planetf1 Jun 4, 2026
Maintainer

OQ5: keep them separate — cancelled means a consumer chose to stop, error means the backend failed. Conflating them loses information callers need to decide what to do next.

OQ6: return by reference. Internal code sets _generate_log.is_final_result directly on the GenerateLog object and will continue to — the proposed mot.generate_log property just exposes that object for external reading. A frozen copy would be surprising and adds complexity without solving a real problem.

jakelorocco Jun 5, 2026
Maintainer

OQ5: I like keeping them separate.
OQ6: Do we need generate_log anymore? My understanding is that most of the information in that is duplicated in the MOT already. Or if we need those fields, maybe we should break them out further into some other field of MOT. generate_log is mostly a relic of our initial implementation.

ajbozarth Jun 5, 2026
Maintainer Author

Do we need generate_log anymore? My understanding is that most of the information in that is duplicated in the MOT already.

@jakelorocco you're largely right — field-by-field audit:

`GenerateLog` field	Duplicated?	Where
`model_options`	yes	`mot._model_options` (→ `mot._call.model_options` after Area 3)
`model_output`	yes	`mot.raw.response` (Area 1)
`action`	yes	`mot._action` (→ `mot._call.action` after Area 3)
`result`	yes	circular ref back to the MOT
`backend`	partial	`mot.generation.provider` + `mot.generation.model` (different shape: `"litellm::granite"` vs `provider="litellm"` + `model="granite"`)
`date`	no	no datetime on `generation` (only `ttfb_ms`)
`prompt`	no	the formatted prompt-as-sent isn't anywhere on MOT today
`is_final_result`	no	sampling state
`extra`	no	free-form; only systematic use is the ollama error stash, which Area 5's `mot.error` already replaces

Three fields have a unique home today: prompt, date, is_final_result. Everything else is redundant or migrating in this very proposal.

This is a real decision that gates opening the Area 5 sub-issue, so flagging two options:

Option A — keep Area 5 narrow. Ship mot.error + the mot.generate_log property as proposed. Close #1191. File GenerateLog deprecation as a separate follow-up issue.

Option B — expand Area 5. Fold the deprecation in. Add mot.generation.prompt, mot.generation.timestamp, and a flat mot._is_final_result; deprecate GenerateLog, the public m.validate(generate_logs=...) parameter, and _generate_log together; drop extra entirely.

I lean A. The deprecation touches every backend's post_processing, sampling's is_final_result plumbing, the public validate(generate_logs=...) API in session.py and functional.py, and the test_vision_* tests that read _generate_log.prompt. Bigger than Area 5 should swallow, and #1191 closes either way.

@jakelorocco @planetf1 — does A work for you, or do you want this rolled into Area 5?

jakelorocco Jun 5, 2026
Maintainer

Okay, I'm fine with Option A. I think we may eventually want to deprecate generate_log, but I'm willing to accept that that is a different question / work item.

ajbozarth · 2026-06-03T22:59:22Z

ajbozarth
Jun 3, 2026
Maintainer Author

Appendix A — bugs, stale code, and inconsistencies found during research

These surfaced during the audit but are not part of #909 itself. Listed here so they can either ride along in the relevant area's PR or be split out as separate fix PRs. None are blockers.

Stale _meta["usage"] reads in budget_forcing_alg.py. Lines 121, 152, 154-156, 192, 195, 197-199 in mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.py. Predates refactor!: partition ModelOutputThunk execution metadata into Generat… #908 (mot.generation.usage). Should read result.generation.usage["completion_tokens"] etc. Independent bugfix — not a Epic: ModelOutputThunk structural cleanup #909 dependency. Suggested: separate fix(sampling) PR.
streaming.py:437 reads self._mot._cancelled instead of the public mot.cancelled property that feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 added. mellea/stdlib/streaming.py:437. One-line change. Suggested: ride along with whichever Epic: ModelOutputThunk structural cleanup #909 area touches streaming.py first (most likely Area 5 or Area 3).
chat.py:_parse falls through to computed.value for huggingface. mellea/stdlib/components/chat.py:144-148. The "no guarantees" branch is intentional (HF doesn't have a uniform raw shape), but the role is hardcoded to "assistant" and the content is the raw text without the role-extraction the other branches do. Worth confirming during Area 1 migration that this fallback is still desired with mot.raw.response in place, or if a richer shape can be extracted from HF outputs.
oai_chat_response_choice is redundant. Always equals oai_chat_response["choices"][0]. Both backends (openai, watsonx) write it. Drop during Area 1.
chat.py:_parse huggingface tool-call branch returns Message(role="assistant", content=computed.value) without recovering tool call structure — but the precondition computed.tool_calls is not None was already true. The string content will be the raw tool-call text the model emitted. Possibly desired; possibly should serialize computed.tool_calls instead. Flag during Area 1 migration.
_meta["_telemetry_span"] cleanup in cancel_generation and astream is duplicated with backend post_processing cleanup. Pre-refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 only. Resolved by refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181's removal of the key entirely. No action.
functional.py asserts result._generate_log is not None in multiple places. Tightly couples to a private attr being set, with no public accessor. Resolved by Area 5's mot.generate_log exposure — these callers can either keep the underscore (internal code) or migrate.
Implicit deepcopy preserve-vs-reset rule has no tests. __deepcopy__'s field-by-field choices (preserve _action, reset _async_queue) are enforceable only by inspection. Area 3's _call / _gen split makes the rule structural, but a test for "deepcopy of a generating MOT is not itself generating" is worth adding in the same PR.
docs/docs/integrations/openai.md instructs users to read result._thinking. Resolved by Area 4. Doc gets updated to result.thinking in the same PR.

2 replies

planetf1 Jun 4, 2026
Maintainer

Item 2: streaming.py:437 reads self._mot._cancelled on the RHS instead of the public mot.cancelled property — fix is thunk._cancelled = self._mot.cancelled. One-line, no dependency on #909. Worth a standalone fix: PR.

Item 1: the framing here is worth revisiting. Budget forcing exclusively calls generate_from_raw, and the raw paths were never migrated in #908 — Ollama's raw path still writes _meta["usage"] directly, so these reads are not stale. The real issue is the inconsistency: chat paths surface token usage on mot.generation.usage, raw paths do not. A caller cannot reliably read one field regardless of which path produced the thunk. Worth migrating raw paths to mot.generation.usage for uniformity, though not urgent. Separate latent risk: Ollama can return eval_count=None; budget forcing does arithmetic on those values without a None guard (lines 121, 192), which would TypeError silently in edge cases.

ajbozarth Jun 5, 2026
Maintainer Author

You're right — I had this framed wrong. Confirmed on main and on feat/enhanced-tracing:

mot.generation.usage is only written in the chat path (post_processing of generate_from_chat_context).
generate_from_raw across all five backends still writes only _meta["usage"] — never migrated in refactor!: partition ModelOutputThunk execution metadata into Generat… #908.
refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 leaves the raw-path _meta["usage"] write in place too, so this isn't a "post-merge mot.generation.usage will fix it" situation.

So the budget-forcing reads aren't stale; they're consistent with what the raw paths actually write today. The real issue is the chat-vs-raw uniformity gap: a caller can't read mot.generation.usage reliably regardless of which path produced the thunk.

Reframing Item 1 for the follow-up: not "fix stale _meta[\"usage\"] reads in budget forcing" but "migrate raw-path token usage from _meta[\"usage\"] to mot.generation.usage across all backends, then update budget forcing to read the unified field." Plus the latent eval_count is None bug on lines 121/192 — separate fix: PR, doesn't depend on the migration.

Both will be filed as separate follow-up issues (not Area 1 sub-issue scope) when the sub-issues go out.

planetf1 · 2026-06-04T11:04:49Z

planetf1
Jun 4, 2026
Maintainer

Overall shape and sequencing look good — #1181 first, then Area 5, then 1/4, then 3 makes sense. A few per-area replies below. One thing being filed separately: promoting mot.thinking to public in Area 4 is only half the story — Message has no thinking field and reasoning is silently dropped in multi-turn chat. That warrants its own issue so Area 4 does not land looking complete when it is not. Filed as #1201.

0 replies

psschwei · 2026-06-04T16:45:52Z

psschwei
Jun 4, 2026
Maintainer

Do you have a stubbed out version of what the updated MOT class would look like?

2 replies

ajbozarth Jun 5, 2026
Maintainer Author

Here's the post-#909 __init__. The new sub-object dataclasses are defined in the per-area threads above; this is just where each lands on the MOT itself.

class ModelOutputThunk(CBlock, Generic[S]):
    def __init__(self, value, meta=None, parsed_repr=None, tool_calls=None):
        super().__init__(value, meta)

        # Public output surface
        self.parsed_repr = parsed_repr
        self.tool_calls = tool_calls
        self.thinking: str | None = None             # Area 4 (was _thinking)
        self.error: Exception | None = None          # Area 5, NEW
        self.generate_log: GenerateLog | None = None # Area 5 (was _generate_log)
        self.generation = GenerationMetadata()       # backend execution metadata (#793)
        self.raw = RawProviderResponse()             # Area 1, NEW

        # Status — flat, hot reads back is_computed() and cancelled property
        self._computed = value is not None
        self._cancelled = False

        # Internal grouping (Area 3)
        self._call = _CallInfo()                     # preserved on copy
        self._gen = _GenerationState()               # reset on copy

That's the steady state after the deprecation cycle. Copy semantics fall out structurally: _call is deep-copied, _gen is reinitialized to its __init__ default. The seventeen flat private attrs from today collapse to two status flags, and __copy__/__deepcopy__ go from enumerating 15+ fields to one line each.

Full stub including GenerationMetadata, the new dataclasses, properties, and __copy__

@dataclass
class GenerationMetadata:
    """Backend execution metadata. (#793, broadened by #1181.)"""
    usage: dict[str, Any] | None = None
    model: str | None = None
    provider: str | None = None
    ttfb_ms: float | None = None
    streaming: bool = False
    response_model: str | None = None       # added by #1181
    finish_reasons: list[str] | None = None # added by #1181
    response_id: str | None = None          # added by #1181
    # Future: logprobs (#123)


@dataclass
class RawProviderResponse:                  # NEW — Area 1
    provider: str | None = None
    response: Any | None = None
    streamed_chunks: list[Any] | None = None


@dataclass
class _CallInfo:                            # NEW — Area 3 (internal, preserved on copy)
    action: Component | CBlock | None = None
    context: list[Component | CBlock] | None = None
    model_options: dict[str, Any] | None = None
    generation_id: str | None = None        # added by #1181


@dataclass
class _GenerationState:                     # NEW — Area 3 (internal, reset on copy)
    queue: asyncio.Queue = field(default_factory=lambda: asyncio.Queue(maxsize=20))
    chunk_size: int = 3
    first_chunk_received: bool = False
    generate: asyncio.Task | None = None
    generate_type: GenerateType = GenerateType.NONE
    generate_extra: asyncio.Task | None = None
    cancel_hook: Callable | None = None
    process: Callable | None = None
    post_process: Callable | None = None
    on_computed: Callable | None = None
    start: datetime.datetime | None = None


class ModelOutputThunk(CBlock, Generic[S]):
    # ... __init__ as shown above ...

    # Existing public surface unchanged: value, cancelled, is_computed(),
    # avalue(), astream(), cancel_generation()

    def __copy__(self) -> ModelOutputThunk:
        copied = ModelOutputThunk(self._underlying_value, self._meta, self.parsed_repr, self.tool_calls)
        copied._computed = self._computed
        copied._cancelled = self._cancelled
        copied.thinking = self.thinking
        copied.error = self.error
        copied.generate_log = self.generate_log
        copied.generation = copy(self.generation)
        copied.raw = copy(self.raw)
        copied._call = copy(self._call)              # preserved
        # _gen left as the fresh _GenerationState() from __init__ — reset on copy
        return copied

psschwei Jun 5, 2026
Maintainer

thanks, this is helpful

jakelorocco · 2026-06-05T18:14:51Z

jakelorocco
Jun 5, 2026
Maintainer

I think this proposal looks good! Happy with the direction and think that it's thought out; I don't see any blockers to implementing. Thank you!

0 replies

ajbozarth · 2026-06-05T21:36:30Z

ajbozarth
Jun 5, 2026
Maintainer Author

Thank you for all the feedback, the issues are now all open and the epic updated:

Further design questions on individual areas should go on the relevant sub-issue. Closing this discussion as resolved.

0 replies

Design proposal: ModelOutputThunk structural cleanup (#909) #1197

Uh oh!

ajbozarth Jun 3, 2026 Maintainer

How to comment

TL;DR

State of the world today

Cross-cutting constraints

Related-issues coordination summary

Sequencing

Replies: 9 comments · 15 replies

Uh oh!

ajbozarth Jun 3, 2026 Maintainer Author

Area 1 — raw provider responses

Today

Proposal

Migration

Coordination notes

Open questions on Area 1

Uh oh!

planetf1 Jun 4, 2026 Maintainer

Uh oh!

jakelorocco Jun 5, 2026 Maintainer

Uh oh!

ajbozarth Jun 5, 2026 Maintainer Author

Uh oh!

ajbozarth Jun 3, 2026 Maintainer Author

Area 3 — computation logistics

Today

Proposal

Migration

Tradeoffs

Interaction with ComputedModelOutputThunk

Coordination with #1013

Open questions on Area 3

Uh oh!

planetf1 Jun 4, 2026 Maintainer

Uh oh!

jakelorocco Jun 5, 2026 Maintainer

Uh oh!

ajbozarth Jun 5, 2026 Maintainer Author

Uh oh!

ajbozarth Jun 3, 2026 Maintainer Author

Area 4 — _thinking

Today

Proposal

Considered alternatives

Open questions on Area 4

Uh oh!

planetf1 Jun 4, 2026 Maintainer

Uh oh!

ajbozarth Jun 3, 2026 Maintainer Author

Area 5 (new) — fold in #1191: public error / generate_log surface

Why include this

Today

Proposal

Coordination

Open questions on Area 5

Uh oh!

planetf1 Jun 4, 2026 Maintainer

Uh oh!

jakelorocco Jun 5, 2026 Maintainer

Uh oh!

ajbozarth Jun 5, 2026 Maintainer Author

Uh oh!

jakelorocco Jun 5, 2026 Maintainer

Uh oh!

ajbozarth Jun 3, 2026 Maintainer Author

Appendix A — bugs, stale code, and inconsistencies found during research

Uh oh!

planetf1 Jun 4, 2026 Maintainer

Uh oh!

ajbozarth Jun 5, 2026 Maintainer Author

Uh oh!

planetf1 Jun 4, 2026 Maintainer

Uh oh!

psschwei Jun 4, 2026 Maintainer

Uh oh!

ajbozarth Jun 5, 2026 Maintainer Author

Uh oh!

psschwei Jun 5, 2026 Maintainer

ajbozarth
Jun 3, 2026
Maintainer

Replies: 9 comments 15 replies

ajbozarth
Jun 3, 2026
Maintainer Author

planetf1 Jun 4, 2026
Maintainer

jakelorocco Jun 5, 2026
Maintainer

ajbozarth Jun 5, 2026
Maintainer Author

ajbozarth
Jun 3, 2026
Maintainer Author

Interaction with `ComputedModelOutputThunk`

planetf1 Jun 4, 2026
Maintainer

jakelorocco Jun 5, 2026
Maintainer

ajbozarth Jun 5, 2026
Maintainer Author

ajbozarth
Jun 3, 2026
Maintainer Author

Area 4 — `_thinking`

planetf1 Jun 4, 2026
Maintainer

ajbozarth
Jun 3, 2026
Maintainer Author

planetf1 Jun 4, 2026
Maintainer

jakelorocco Jun 5, 2026
Maintainer

ajbozarth Jun 5, 2026
Maintainer Author

jakelorocco Jun 5, 2026
Maintainer

ajbozarth
Jun 3, 2026
Maintainer Author

planetf1 Jun 4, 2026
Maintainer

ajbozarth Jun 5, 2026
Maintainer Author

planetf1
Jun 4, 2026
Maintainer

psschwei
Jun 4, 2026
Maintainer

ajbozarth Jun 5, 2026
Maintainer Author

psschwei Jun 5, 2026
Maintainer

jakelorocco
Jun 5, 2026
Maintainer

ajbozarth
Jun 5, 2026
Maintainer Author