-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Description
Currently, i noticed the module HFClientVLLM only calls the completion endpoints. As vLLM now support applying chat templates. Would it be possible to add support for chat completion endpoints?
the following is a naive approach to add a chat completion support, edited the _generate method for HFClientVLLM
def _generate(self, prompt, **kwargs):
kwargs = {**self.kwargs, **kwargs}
model_type = kwargs.get("model_type",None)
url = self.urls.pop(0)
self.urls.append(url)
system_prompt = kwargs.get("system_prompt",None)
if model_type == "chat":
messages = [{"role": "user", "content": prompt}]
if system_prompt:
messages.insert(0, {"role": "system", "content": system_prompt})
payload = {
"model": self.kwargs["model"],
"messages": messages,
**kwargs,
}
response = send_hfvllm_chat_request_v00(
f"{url}/v1/chat/completions",
json=payload,
headers=self.headers,
)
try:
json_response = response.json()
completions = json_response["choices"]
response = {
"prompt": prompt,
"choices": [{"text": c["message"]['content']} for c in completions],
}
return response
except Exception:
print("Failed to parse JSON response:", response.text)
raise Exception("Received invalid JSON response from server")
else:
payload = {
"model": self.kwargs["model"],
"prompt": prompt,
**kwargs,
}
response = send_hfvllm_request_v00(
f"{url}/v1/completions",
json=payload,
headers=self.headers,
)
try:
json_response = response.json()
completions = json_response["choices"]
response = {
"prompt": prompt,
"choices": [{"text": c["text"]} for c in completions],
}
return response
except Exception:
print("Failed to parse JSON response:", response.text)
raise Exception("Received invalid JSON response from server")
@CacheMemory.cache
def send_hfvllm_request_v00(arg, **kwargs):
return requests.post(arg, **kwargs)
@CacheMemory.cache
def send_hfvllm_chat_request_v00(arg, **kwargs):
return requests.post(arg, **kwargs)leocnj
Metadata
Metadata
Assignees
Labels
No labels