-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
Description
I installed llama.cpp on Windows via Docker with a WSL2 backend. When running the server and trying to connect to it with a python script using the OpenAI module it fails with a connection Error, I also noticed that when I visit http://127.0.0.1:2080 (changed the port) it returns ERR_EMPTY_RESPONSE. I'm not sure if something with the server fails or if it's something on my end.
Here is my docker compose file:
name: <docker compose stack name>
networks:
<network name>:
external: true
services:
llamacpp:
container_name: llamacpp
image: ghcr.io/ggerganov/llama.cpp:full-cuda
networks: [ ai-discord-bot ]
environment:
- PUID=1014
- PGID=1017
volumes:
- ./llamacpp/models:/models
ports:
- "2080:2080"
command: "-s -m /models/openhermes-2.5-mistral-7b.Q5_K_M.gguf -a llm -c 4096 -ngl 33 --port 2080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stoppedMy docker container logs: https://pastebin.com/raw/jkfGu7Ty
My Python script:
"""Main file"""
import time # for measuring time duration of API calls
from openai import OpenAI
# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/chat
client = OpenAI(
base_url="http://127.0.0.1:2080/v1",
api_key="sk-no-key-required"
)
# record the time before the request is sent
start_time = time.time()
# send a ChatCompletion request to count to 100
COMPLETION = client.chat.completions.create(
model="OpenHermes",
messages=[
{"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
{"role": "user", "content": "Write a limerick about python exceptions"}
]
)
# calculate the time it took to receive the response
response_time = time.time() - start_time
# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
reply_content = COMPLETION.choices[0].message.content
print(f"Extracted content: \n{reply_content}")kinshukk