Skip to content

Response from model is replaced by query sent to model #445

@MelanieT

Description

@MelanieT

LocalAGI: 2.8.0, also tested with master
Jetson Orin NX, using container localai/localai:latest-nvidia-l4t-arm64 for LocalAI

LocalAI itself, using its web UI, works correctly.

With LocalAGI, the response is replaced by the query, this happens on almost every query, the few times I observed a correct response was when the query was the very first query after a restart.
This can be observed using the "chat" functionality in LocalAGI and also using the OpenAPI compatible API calls.

Relevant information:

API call:
curl http://localhost:8080/v1/responses -H "Content-Type: application/json" -d '{"model": "Nova", "input": "Tell me a three sentence bedtime story about a unicorn."}'

API Response:
{"id":"dce1e717-af54-46fa-8f8c-0a489b761fdd","object":"response","created_at":1772662092,"status":"completed","error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"model":"Nova","output":[{"type":"message","id":"msg_1772662092683299122","status":"completed","role":"assistant","content":[{"type":"output_text","text":"Tell me a three sentence bedtime story about a unicorn.","annotations":null}]}],"parallel_tool_calls":false,"previous_response_id":null,"reasoning":{},"store":false,"temperature":0,"text":{},"tool_choice":"","tools":null,"top_p":0,"truncation":"","usage":{"input_tokens":0,"input_tokens_details":{"cached_tokens":0},"output_tokens":0,"output_tokens_details":{"cached_tokens":0},"total_tokens":0},"user":null,"metadata":null}

Relevant portion of log:
localai-1 | Mar 04 22:08:12 DEBUG GRPC stdout id="gemma-3-4b-it-qat-127.0.0.1:37881" line="[DEBUG] Received 1 results" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line=" total time = 3216.38 ms / 79 tokens" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stdout id="gemma-3-4b-it-qat-127.0.0.1:37881" line="[DEBUG] Predict request completed successfully" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="slot release: id 0 | task 38 | stop processing: n_tokens = 78, truncated = 0" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="srv update_slots: all slots are idle" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
localai-1 | Mar 04 22:08:12 DEBUG Response response="{"created":1772662089,"object":"chat.completion","id":"ae9eb1fe-917b-4f10-bce6-201a01048bbd","model":"gemma-3-4b-it-qat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Once upon a time, in a field of shimmering moonlight, lived a unicorn named Lumi. Lumi loved to dance among the fireflies, leaving trails of sparkling dust wherever she went. As she drifted off to sleep, she dreamed of flying through the stars with her friends, safe and sound."}}],"usage":{"prompt_tokens":20,"completion_tokens":59,"total_tokens":79}}" caller={caller.file="/build/core/http/endpoints/openai/chat.go" caller.L=780 }
localai-1 | Mar 04 22:08:12 INFO HTTP request method="POST" path="/chat/completions" status=200 caller={caller.file="/build/core/http/app.go" caller.L=118 }
localagi-1 | Mar 04 22:08:12 DEBUG Long term memory is disabled agent="Nova" caller={caller.file="/work/core/agent/knowledgebase.go" caller.L=133 }
localagi-1 | Mar 04 22:08:12 ERROR Observable completed without any progress id=1 name="job" caller={caller.file="/work/core/types/observable.go" caller.L=52 }
localagi-1 | Mar 04 22:08:12 DEBUG Agent is now waiting for a new job agent="Nova" caller={caller.file="/work/core/agent/agent.go" caller.L=1421 }
localagi-1 | Mar 04 22:08:12 DEBUG Agent has finished agent="Nova" caller={caller.file="/work/core/agent/agent.go" caller.L=245 }
localagi-1 | Mar 04 22:08:12 DEBUG Agent has finished being asked agent="Nova" caller={caller.file="/work/core/agent/agent.go" caller.L=220 }
localagi-1 | Mar 04 22:08:12 INFO we got a response from the agent agent="Nova" response="Tell me a three sentence bedtime story about a unicorn." caller={caller.file="/work/webui/app.go" caller.L=641 }
localrecall-postgres-1 | 2026-03-04 22:08:22.207 UTC [81] LOG: checkpoint complete: wrote 216 buffers (1.3%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 1 recycled; write=21.668 s, sync=0.031 s, total=21.724 s; sync files=32, longest=0.009 s, average=0.001 s; distance=1672 kB, estimate=1672 kB; lsn=0/207B428, redo lsn=0/207B398

The relevant portion seems to be this:
Mar 04 22:08:12 ERROR Observable completed without any progress id=1 name="job" caller={caller.file="/work/core/types/observable.go" caller.L=52 }

This is the only message of type ERROR in the log.

The log shows the response actually received, this is correct for the query in all cases, but in most cases the returned text is the query instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions