Works on CPU not in CPU

With docker from cli I get this error: Error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF



**LocalAI version:**

localai/localai:latest-gpu-nvidia-cuda-13

**Environment, CPU architecture, OS, and Version:**

Linux titanio 7.1.1-2-cachyos #1 SMP PREEMPT_DYNAMIC Tue, 23 Jun 2026 21:38:08 +0000 x86_64 GNU/Linux

**Describe the bug**

docker run --rm --gpus all --name local-ai -v /home/ac/projects/localai/models:/models -v /home/ac/projects/localai/backends:/backends -v /home/ac/projects/localai/data:/data -p 8090:8080 localai/localai:latest-gpu-nvidia-cuda-13

with docker compose I get same error: Error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF

services:
  api:
    image: localai/localai:latest-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8090:8080
    environment:
      - DEBUG=false
    volumes:
      - ./models:/models
      - ./backends:/backends
      - ./data:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]


If I run localai with CPU exclusively capababilities (non GPU), both, docker and docker compose ways, works fine. 
**To Reproduce**


**Expected behavior**


**Logs**

api-1  | Jul 01 16:40:49 INFO  BackendLoader starting modelID="ornith-1.0-9b" backend="llama-cpp" model="llama-cpp/models/Ornith-1.0-9B-GGUF/ornith-1.0-9
b-Q4_K_M.gguf"
api-1  | Jul 01 16:40:49 INFO  effective runtime tuning (override in the model YAML; LOCALAI_DISABLE_HARDWARE_DEFAULTS=true disables hardware auto-tuning
) modelID="ornith-1.0-9b" context=262144 n_batch=512 n_gpu_layers=99999999 parallel="1" flash_attention="auto" f16=false
api-1  | Jul 01 16:40:54 WARN  Backend process exited unexpectedly id="ornith-1.0-9b" address="127.0.0.1:37207" process="run.sh" exitCode="-1"
api-1  | Jul 01 16:40:54 ERROR Failed to load model modelID="ornith-1.0-9b" error=failed to load model with internal loader: could not load model: rpc er
ror: code = Unavailable desc = error reading from server: EOF backend="llama-cpp"
api-1  | Jul 01 16:40:54 ERROR Stream ended with error error=failed to load model with internal loader: could not load model: rpc error: code = Unavailab
le desc = error reading from server: EOF
**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Works on CPU not in CPU #10625

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Works on CPU not in CPU #10625

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions