Skip to content

Connection to the server API fails and website returns an empty response when using Docker on Windows with WSL2 #4884

@celarye

Description

@celarye

I installed llama.cpp on Windows via Docker with a WSL2 backend. When running the server and trying to connect to it with a python script using the OpenAI module it fails with a connection Error, I also noticed that when I visit http://127.0.0.1:2080 (changed the port) it returns ERR_EMPTY_RESPONSE. I'm not sure if something with the server fails or if it's something on my end.

Here is my docker compose file:

name: <docker compose stack name>

networks:
  <network name>:
    external: true

services:
  llamacpp:
    container_name: llamacpp
    image: ghcr.io/ggerganov/llama.cpp:full-cuda
    networks: [ ai-discord-bot ]
    environment:
      - PUID=1014
      - PGID=1017
    volumes:
      - ./llamacpp/models:/models
    ports:
      - "2080:2080"
    command: "-s -m /models/openhermes-2.5-mistral-7b.Q5_K_M.gguf -a llm -c 4096 -ngl 33 --port 2080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

My docker container logs: https://pastebin.com/raw/jkfGu7Ty

My Python script:

"""Main file"""
import time  # for measuring time duration of API calls
from openai import OpenAI

# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/chat

client = OpenAI(
    base_url="http://127.0.0.1:2080/v1",
    api_key="sk-no-key-required"
)

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
COMPLETION = client.chat.completions.create(
    model="OpenHermes",
    messages=[
        {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ]
)

# calculate the time it took to receive the response
response_time = time.time() - start_time

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")

reply_content = COMPLETION.choices[0].message.content
print(f"Extracted content: \n{reply_content}")

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions