With docker from cli I get this error: Error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
LocalAI version:
localai/localai:latest-gpu-nvidia-cuda-13
Environment, CPU architecture, OS, and Version:
Linux titanio 7.1.1-2-cachyos #1 SMP PREEMPT_DYNAMIC Tue, 23 Jun 2026 21:38:08 +0000 x86_64 GNU/Linux
Describe the bug
docker run --rm --gpus all --name local-ai -v /home/ac/projects/localai/models:/models -v /home/ac/projects/localai/backends:/backends -v /home/ac/projects/localai/data:/data -p 8090:8080 localai/localai:latest-gpu-nvidia-cuda-13
with docker compose I get same error: Error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
services:
api:
image: localai/localai:latest-gpu-nvidia-cuda-13
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8090:8080
environment:
- DEBUG=false
volumes:
- ./models:/models
- ./backends:/backends
- ./data:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
If I run localai with CPU exclusively capababilities (non GPU), both, docker and docker compose ways, works fine.
To Reproduce
Expected behavior
Logs
api-1 | Jul 01 16:40:49 INFO BackendLoader starting modelID="ornith-1.0-9b" backend="llama-cpp" model="llama-cpp/models/Ornith-1.0-9B-GGUF/ornith-1.0-9
b-Q4_K_M.gguf"
api-1 | Jul 01 16:40:49 INFO effective runtime tuning (override in the model YAML; LOCALAI_DISABLE_HARDWARE_DEFAULTS=true disables hardware auto-tuning
) modelID="ornith-1.0-9b" context=262144 n_batch=512 n_gpu_layers=99999999 parallel="1" flash_attention="auto" f16=false
api-1 | Jul 01 16:40:54 WARN Backend process exited unexpectedly id="ornith-1.0-9b" address="127.0.0.1:37207" process="run.sh" exitCode="-1"
api-1 | Jul 01 16:40:54 ERROR Failed to load model modelID="ornith-1.0-9b" error=failed to load model with internal loader: could not load model: rpc er
ror: code = Unavailable desc = error reading from server: EOF backend="llama-cpp"
api-1 | Jul 01 16:40:54 ERROR Stream ended with error error=failed to load model with internal loader: could not load model: rpc error: code = Unavailab
le desc = error reading from server: EOF
Additional context
With docker from cli I get this error: Error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
LocalAI version:
localai/localai:latest-gpu-nvidia-cuda-13
Environment, CPU architecture, OS, and Version:
Linux titanio 7.1.1-2-cachyos #1 SMP PREEMPT_DYNAMIC Tue, 23 Jun 2026 21:38:08 +0000 x86_64 GNU/Linux
Describe the bug
docker run --rm --gpus all --name local-ai -v /home/ac/projects/localai/models:/models -v /home/ac/projects/localai/backends:/backends -v /home/ac/projects/localai/data:/data -p 8090:8080 localai/localai:latest-gpu-nvidia-cuda-13
with docker compose I get same error: Error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
services:
api:
image: localai/localai:latest-gpu-nvidia-cuda-13
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8090:8080
environment:
- DEBUG=false
volumes:
- ./models:/models
- ./backends:/backends
- ./data:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
If I run localai with CPU exclusively capababilities (non GPU), both, docker and docker compose ways, works fine.
To Reproduce
Expected behavior
Logs
api-1 | Jul 01 16:40:49 INFO BackendLoader starting modelID="ornith-1.0-9b" backend="llama-cpp" model="llama-cpp/models/Ornith-1.0-9B-GGUF/ornith-1.0-9
b-Q4_K_M.gguf"
api-1 | Jul 01 16:40:49 INFO effective runtime tuning (override in the model YAML; LOCALAI_DISABLE_HARDWARE_DEFAULTS=true disables hardware auto-tuning
) modelID="ornith-1.0-9b" context=262144 n_batch=512 n_gpu_layers=99999999 parallel="1" flash_attention="auto" f16=false
api-1 | Jul 01 16:40:54 WARN Backend process exited unexpectedly id="ornith-1.0-9b" address="127.0.0.1:37207" process="run.sh" exitCode="-1"
api-1 | Jul 01 16:40:54 ERROR Failed to load model modelID="ornith-1.0-9b" error=failed to load model with internal loader: could not load model: rpc er
ror: code = Unavailable desc = error reading from server: EOF backend="llama-cpp"
api-1 | Jul 01 16:40:54 ERROR Stream ended with error error=failed to load model with internal loader: could not load model: rpc error: code = Unavailab
le desc = error reading from server: EOF
Additional context