-
Notifications
You must be signed in to change notification settings - Fork 257
Description
LocalAGI: 2.8.0, also tested with master
Jetson Orin NX, using container localai/localai:latest-nvidia-l4t-arm64 for LocalAI
LocalAI itself, using its web UI, works correctly.
With LocalAGI, the response is replaced by the query, this happens on almost every query, the few times I observed a correct response was when the query was the very first query after a restart.
This can be observed using the "chat" functionality in LocalAGI and also using the OpenAPI compatible API calls.
Relevant information:
API call:
curl http://localhost:8080/v1/responses -H "Content-Type: application/json" -d '{"model": "Nova", "input": "Tell me a three sentence bedtime story about a unicorn."}'
API Response:
{"id":"dce1e717-af54-46fa-8f8c-0a489b761fdd","object":"response","created_at":1772662092,"status":"completed","error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"model":"Nova","output":[{"type":"message","id":"msg_1772662092683299122","status":"completed","role":"assistant","content":[{"type":"output_text","text":"Tell me a three sentence bedtime story about a unicorn.","annotations":null}]}],"parallel_tool_calls":false,"previous_response_id":null,"reasoning":{},"store":false,"temperature":0,"text":{},"tool_choice":"","tools":null,"top_p":0,"truncation":"","usage":{"input_tokens":0,"input_tokens_details":{"cached_tokens":0},"output_tokens":0,"output_tokens_details":{"cached_tokens":0},"total_tokens":0},"user":null,"metadata":null}
Relevant portion of log:
localai-1 | Mar 04 22:08:12 DEBUG GRPC stdout id="gemma-3-4b-it-qat-127.0.0.1:37881" line="[DEBUG] Received 1 results" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line=" total time = 3216.38 ms / 79 tokens" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stdout id="gemma-3-4b-it-qat-127.0.0.1:37881" line="[DEBUG] Predict request completed successfully" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="slot release: id 0 | task 38 | stop processing: n_tokens = 78, truncated = 0" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
localai-1 | Mar 04 22:08:12 DEBUG GRPC stderr id="gemma-3-4b-it-qat-127.0.0.1:37881" line="srv update_slots: all slots are idle" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
localai-1 | Mar 04 22:08:12 DEBUG Response response="{"created":1772662089,"object":"chat.completion","id":"ae9eb1fe-917b-4f10-bce6-201a01048bbd","model":"gemma-3-4b-it-qat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Once upon a time, in a field of shimmering moonlight, lived a unicorn named Lumi. Lumi loved to dance among the fireflies, leaving trails of sparkling dust wherever she went. As she drifted off to sleep, she dreamed of flying through the stars with her friends, safe and sound."}}],"usage":{"prompt_tokens":20,"completion_tokens":59,"total_tokens":79}}" caller={caller.file="/build/core/http/endpoints/openai/chat.go" caller.L=780 }
localai-1 | Mar 04 22:08:12 INFO HTTP request method="POST" path="/chat/completions" status=200 caller={caller.file="/build/core/http/app.go" caller.L=118 }
localagi-1 | Mar 04 22:08:12 DEBUG Long term memory is disabled agent="Nova" caller={caller.file="/work/core/agent/knowledgebase.go" caller.L=133 }
localagi-1 | Mar 04 22:08:12 ERROR Observable completed without any progress id=1 name="job" caller={caller.file="/work/core/types/observable.go" caller.L=52 }
localagi-1 | Mar 04 22:08:12 DEBUG Agent is now waiting for a new job agent="Nova" caller={caller.file="/work/core/agent/agent.go" caller.L=1421 }
localagi-1 | Mar 04 22:08:12 DEBUG Agent has finished agent="Nova" caller={caller.file="/work/core/agent/agent.go" caller.L=245 }
localagi-1 | Mar 04 22:08:12 DEBUG Agent has finished being asked agent="Nova" caller={caller.file="/work/core/agent/agent.go" caller.L=220 }
localagi-1 | Mar 04 22:08:12 INFO we got a response from the agent agent="Nova" response="Tell me a three sentence bedtime story about a unicorn." caller={caller.file="/work/webui/app.go" caller.L=641 }
localrecall-postgres-1 | 2026-03-04 22:08:22.207 UTC [81] LOG: checkpoint complete: wrote 216 buffers (1.3%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 1 recycled; write=21.668 s, sync=0.031 s, total=21.724 s; sync files=32, longest=0.009 s, average=0.001 s; distance=1672 kB, estimate=1672 kB; lsn=0/207B428, redo lsn=0/207B398
The relevant portion seems to be this:
Mar 04 22:08:12 ERROR Observable completed without any progress id=1 name="job" caller={caller.file="/work/core/types/observable.go" caller.L=52 }
This is the only message of type ERROR in the log.
The log shows the response actually received, this is correct for the query in all cases, but in most cases the returned text is the query instead.