Response from model is replaced by query sent to model

LocalAGI: 2.8.0, also tested with master
Jetson Orin NX, using container localai/localai:latest-nvidia-l4t-arm64 for LocalAI

LocalAI itself, using its web UI, works correctly.

With LocalAGI, the response is replaced by the query, this happens on almost every query, the few times I observed a correct response was when the query was the very first query after a restart.
This can be observed using the "chat" functionality in LocalAGI and also using the OpenAPI compatible API calls.

Relevant information:

API call:
curl http://localhost:8080/v1/responses   -H "Content-Type: application/json"   -d '{"model": "Nova", "input": "Tell me a three sentence bedtime story about a unicorn."}'

API Response:
{"id":"dce1e717-af54-46fa-8f8c-0a489b761fdd","object":"response","created_at":1772662092,"status":"completed","error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"model":"Nova","output":[{"type":"message","id":"msg_1772662092683299122","status":"completed","role":"assistant","content":[{"type":"output_text","text":"Tell me a three sentence bedtime story about a unicorn.","annotations":null}]}],"parallel_tool_calls":false,"previous_response_id":null,"reasoning":{},"store":false,"temperature":0,"text":{},"tool_choice":"","tools":null,"top_p":0,"truncation":"","usage":{"input_tokens":0,"input_tokens_details":{"cached_tokens":0},"output_tokens":0,"output_tokens_details":{"cached_tokens":0},"total_tokens":0},"user":null,"metadata":null}

Relevant portion of log:
localai-1                  | Mar 04 22:08:12 DEBUG GRPC stdout id="gemma-3-4b-it-qat-127.0.0.1:37881" line="[DEBUG] Received 1 results" caller={caller.file="/build/pkg/model/process.go"  caller.L=162 }
localai-1                  | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="      total time =    3216.38 ms /    79 tokens" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 }
localai-1                  | Mar 04 22:08:12 DEBUG GRPC stdout id="gemma-3-4b-it-qat-127.0.0.1:37881" line="[DEBUG] Predict request completed successfully" caller={caller.file="/build/pkg/model/process.go"  caller.L=162 }
localai-1                  | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="slot      release: id  0 | task 38 | stop processing: n_tokens = 78, truncated = 0" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 }
localai-1                  | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="srv  update_slots: all slots are idle" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 }
localai-1                  | Mar 04 22:08:12 DEBUG Response response="{\"created\":1772662089,\"object\":\"chat.completion\",\"id\":\"ae9eb1fe-917b-4f10-bce6-201a01048bbd\",\"model\":\"gemma-3-4b-it-qat\",\"choices\":[{\"index\":0,\"finish_reason\":\"stop\",\"message\":{\"role\":\"assistant\",\"content\":\"Once upon a time, in a field of shimmering moonlight, lived a unicorn named Lumi. Lumi loved to dance among the fireflies, leaving trails of sparkling dust wherever she went. As she drifted off to sleep, she dreamed of flying through the stars with her friends, safe and sound.\"}}],\"usage\":{\"prompt_tokens\":20,\"completion_tokens\":59,\"total_tokens\":79}}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=780 }
localai-1                  | Mar 04 22:08:12 INFO  HTTP request method="POST" path="/chat/completions" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=118 }
localagi-1                 | Mar 04 22:08:12 DEBUG Long term memory is disabled agent="Nova" caller={caller.file="/work/core/agent/knowledgebase.go"  caller.L=133 }
localagi-1                 | Mar 04 22:08:12 ERROR Observable completed without any progress id=1 name="job" caller={caller.file="/work/core/types/observable.go"  caller.L=52 }
localagi-1                 | Mar 04 22:08:12 DEBUG Agent is now waiting for a new job agent="Nova" caller={caller.file="/work/core/agent/agent.go"  caller.L=1421 }
localagi-1                 | Mar 04 22:08:12 DEBUG Agent has finished agent="Nova" caller={caller.file="/work/core/agent/agent.go"  caller.L=245 }
localagi-1                 | Mar 04 22:08:12 DEBUG Agent has finished being asked agent="Nova" caller={caller.file="/work/core/agent/agent.go"  caller.L=220 }
localagi-1                 | Mar 04 22:08:12 INFO  we got a response from the agent agent="Nova" response="Tell me a three sentence bedtime story about a unicorn." caller={caller.file="/work/webui/app.go"  caller.L=641 }
localrecall-postgres-1     | 2026-03-04 22:08:22.207 UTC [81] LOG:  checkpoint complete: wrote 216 buffers (1.3%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 1 recycled; write=21.668 s, sync=0.031 s, total=21.724 s; sync files=32, longest=0.009 s, average=0.001 s; distance=1672 kB, estimate=1672 kB; lsn=0/207B428, redo lsn=0/207B398

The relevant portion seems to be this:
Mar 04 22:08:12 ERROR Observable completed without any progress id=1 name="job" caller={caller.file="/work/core/types/observable.go"  caller.L=52 }

This is the only message of type ERROR in the log.

The log shows the response actually received, this is correct for the query in all cases, but in most cases the returned text is the query instead.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response from model is replaced by query sent to model #445

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Response from model is replaced by query sent to model #445

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions