Connection to the server API fails and website returns an empty response when using Docker on Windows with WSL2

I installed llama.cpp on Windows via Docker with a WSL2 backend. When running the server and trying to connect to it with a python script using the OpenAI module it fails with a connection Error, I also noticed that when I visit `http://127.0.0.1:2080` (changed the port) it returns `ERR_EMPTY_RESPONSE`. I'm not sure if something with the server fails or if it's something on my end.

Here is my docker compose file:
```yaml
name: <docker compose stack name>

networks:
  <network name>:
    external: true

services:
  llamacpp:
    container_name: llamacpp
    image: ghcr.io/ggerganov/llama.cpp:full-cuda
    networks: [ ai-discord-bot ]
    environment:
      - PUID=1014
      - PGID=1017
    volumes:
      - ./llamacpp/models:/models
    ports:
      - "2080:2080"
    command: "-s -m /models/openhermes-2.5-mistral-7b.Q5_K_M.gguf -a llm -c 4096 -ngl 33 --port 2080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped
```

My docker container logs: https://pastebin.com/raw/jkfGu7Ty

My Python script:
```py
"""Main file"""
import time  # for measuring time duration of API calls
from openai import OpenAI

# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/chat

client = OpenAI(
    base_url="http://127.0.0.1:2080/v1",
    api_key="sk-no-key-required"
)

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
COMPLETION = client.chat.completions.create(
    model="OpenHermes",
    messages=[
        {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ]
)

# calculate the time it took to receive the response
response_time = time.time() - start_time

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")

reply_content = COMPLETION.choices[0].message.content
print(f"Extracted content: \n{reply_content}")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connection to the server API fails and website returns an empty response when using Docker on Windows with WSL2 #4884

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Connection to the server API fails and website returns an empty response when using Docker on Windows with WSL2 #4884

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions