Feature Request: Expose Model Internal Reasoning States (logits, attention, token-level trace)

Hi LocalAI team 👋,

I'd like to request a feature that would greatly enhance the interpretability and debuggability of LocalAI models — the ability to expose the internal reasoning process during text generation.


Problem:
Currently, LocalAI only returns the final generated output (tokens or text), which limits insight into how the model arrived at its response.

There is no way to access intermediate model states such as:

logprobs or top-k token scores at each decoding step

attention weights per layer/head

hidden states / intermediate token embeddings

any kind of token-level reasoning trace

This makes it hard to:

debug model behavior

understand model uncertainty

build explainable AI systems (e.g. chain-of-thought visualization, step-by-step validation)

evaluate how model biases or hallucinations might arise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature Request: Expose Model Internal Reasoning States (logits, attention, token-level trace) #5360

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Expose Model Internal Reasoning States (logits, attention, token-level trace) #5360

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions