Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore #2484

MoritzLaurer · 2024-08-23T09:27:24Z

Describe the bug

I could previously use the following code to the inference client and it worked (e.g. in this cookbook recipe for the hf endpoints)

from huggingface_hub import InferenceClient

client = InferenceClient()

API_URL = "https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud"  #endpoint.url

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Count to 10"},
]

output = client.chat_completion(
    messages, 
    model=API_URL,
    temperature=0.2,
    max_tokens=100,
    seed=42,
)

print("The output from your API/Endpoint call with the InferenceClient:\n")
print(output)

This code now results in this error:
(Additional observation: if the endpoint is scaled to zero, then the code first works, by making the endpoint start up again, but then once the endpoint is started up, the error is thrown)

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:90, in hf_raise_for_status(response, endpoint_name)
     89 try:
---> 90     response.raise_for_status()
     91 except HTTPError as e:

File ~/miniconda/lib/python3.9/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
   1023 if http_error_msg:
-> 1024     raise HTTPError(http_error_msg, response=self)

HTTPError: 422 Client Error: Unprocessable Entity for url: https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud/

The above exception was the direct cause of the following exception:

HfHubHTTPError                            Traceback (most recent call last)
Cell In[34], line 12
      5 API_URL = "https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud/"  #endpoint.url
      7 messages = [
      8     {"role": "system", "content": "You are a helpful assistant."},
      9     {"role": "user", "content": "Count to 10"},
     10 ]
---> 12 output = client.chat_completion(
     13     messages,  # the chat template is applied automatically, if your endpoint uses a TGI container
     14     model=API_URL,
     15     temperature=0.2,
     16     max_tokens=100,
     17     seed=42,
     18 )
     20 print("The output from your API/Endpoint call with the InferenceClient:\n")
     21 print(output)

File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/inference/_client.py:861, in InferenceClient.chat_completion(self, messages, model, stream, frequency_penalty, logit_bias, logprobs, max_tokens, n, presence_penalty, response_format, seed, stop, temperature, tool_choice, tool_prompt, tools, top_logprobs, top_p)
    840 payload = dict(
    841     model=model_id,
    842     messages=messages,
   (...)
    858     stream=stream,
    859 )
    860 payload = {key: value for key, value in payload.items() if value is not None}
--> 861 data = self.post(model=model_url, json=payload, stream=stream)
    863 if stream:
    864     return _stream_chat_completion_response(data)  # type: ignore[arg-type]

File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/inference/_client.py:305, in InferenceClient.post(self, json, data, model, task, stream)
    302         raise InferenceTimeoutError(f"Inference call timed out: {url}") from error  # type: ignore
    304 try:
--> 305     hf_raise_for_status(response)
    306     return response.iter_lines() if stream else response.content
    307 except HTTPError as error:

File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:162, in hf_raise_for_status(response, endpoint_name)
    158     raise HfHubHTTPError(message, response=response) from e
    160 # Convert `HTTPError` into a `HfHubHTTPError` to display request information
    161 # as well (request id and/or server error message)
--> 162 raise HfHubHTTPError(str(e), response=response) from e

HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: iwTQuL)

I still get correct outputs via HTTP requests, so it doesn't seem to be an issue with the endpoint or my token

import requests

API_URL = "https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud"  #endpoint.url
headers = {
	"Accept" : "application/json",
	"Authorization": f"Bearer {huggingface_hub.get_token()}",
	"Content-Type": "application/json" 
}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()


output = query({
	"inputs": "Tell me a story",
	"parameters": {
	    #**generation_params
    }
})

output
# [{'generated_text': 'Tell me a story about a time when you felt truly alive.\nI remember a time when I was 25 years old, and I was traveling through Europe with a group of friends. We had been on the road for weeks, and we had finally arrived in Interlaken, Switzerland. We had planned to hike to the top of the Schilthorn mountain, but the weather forecast was looking grim, with heavy rain and strong winds predicted.\nBut we were determined to make it happen. We packed our gear,'}]

Reproduction

No response

Logs

No response

System info

{'huggingface_hub version': '0.25.0.dev0',
 'Platform': 'Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.31',
 'Python version': '3.9.5',
 'Running in iPython ?': 'Yes',
 'iPython shell': 'ZMQInteractiveShell',
 'Running in notebook ?': 'Yes',
 'Running in Google Colab ?': 'No',
 'Token path ?': '/home/user/.cache/huggingface/token',
 'Has saved token ?': True,
 'Who am I ?': 'MoritzLaurer',
 'Configured git credential helpers': 'store',
 'FastAI': 'N/A',
 'Tensorflow': 'N/A',
 'Torch': 'N/A',
 'Jinja2': '3.1.4',
 'Graphviz': 'N/A',
 'keras': 'N/A',
 'Pydot': 'N/A',
 'Pillow': 'N/A',
 'hf_transfer': 'N/A',
 'gradio': 'N/A',
 'tensorboard': 'N/A',
 'numpy': '2.0.1',
 'pydantic': '2.8.2',
 'aiohttp': 'N/A',
 'ENDPOINT': 'https://huggingface.co',
 'HF_HUB_CACHE': '/home/user/.cache/huggingface/hub',
 'HF_ASSETS_CACHE': '/home/user/.cache/huggingface/assets',
 'HF_TOKEN_PATH': '/home/user/.cache/huggingface/token',
 'HF_HUB_OFFLINE': False,
 'HF_HUB_DISABLE_TELEMETRY': False,
 'HF_HUB_DISABLE_PROGRESS_BARS': None,
 'HF_HUB_DISABLE_SYMLINKS_WARNING': False,
 'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False,
 'HF_HUB_DISABLE_IMPLICIT_TOKEN': False,
 'HF_HUB_ENABLE_HF_TRANSFER': False,
 'HF_HUB_ETAG_TIMEOUT': 10,
 'HF_HUB_DOWNLOAD_TIMEOUT': 10}

Wauplin · 2024-08-23T15:38:48Z

Hi @MoritzLaurer , thanks for reporting. It should work if you pass the URL as base_url. This is due to how URLs are treated. In this case, /v1/chat/completions must be appended to the model. I'll see what I can do to fix it.

Wauplin · 2024-08-23T15:39:40Z

In the meantime you can just do:

client = InferenceClient(base_url=API_URL)

or

client = InferenceClient(API_URL + "/v1/chat/completions")

minmin-intel · 2024-08-29T23:20:37Z

I wanted raise a related issue, I posted in langchain previously but I got no response so far from maintainers. I believe this is a huggingface-hub issue, so I'm posting it here again: langchain-ai/langchain#24720

I got the same 422 error when using the ChatHuggingFace and HuggingFaceEndpoint APIs in langchain-huggingface. I got this error when using huggingface-hub=0.24.6, but no such error when I downgrade to 0.24.0

michael-newsrx · 2024-08-30T13:19:38Z

I wanted raise a related issue, I posted in langchain previously but I got no response so far from maintainers. I believe this is a huggingface-hub issue, so I'm posting it here again: langchain-ai/langchain#24720

I got the same 422 error when using the ChatHuggingFace and HuggingFaceEndpoint APIs in langchain-huggingface. I got this error when using huggingface-hub=0.24.6, but no such error when I downgrade to 0.24.0

Also: langchain-ai/langchain#25675

There also seems to be a bind issue with parameters, but most likely a ChatHuggingFace specific issue: langchain-ai/langchain#23586 (comment)

And my nightmare with dealing with HuggingFace dedicated endpoints: langchain-ai/langchain#25675 (reply in thread) (It appears the JSON schema is sent as a tool option?) The endpoint generates the tool any error, not the client code.

Saisri534 · 2024-09-12T15:37:32Z

Even i am getting the same error
even using this
client = InferenceClient(base_url=API_URL)
or
client = InferenceClient(API_URL + "/v1/chat/completions")

error:
raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct/v1/chat/completions (Request ID: gjZMzbcCD6oWDQFV4HIXe)
code:

from huggingface_hub import InferenceClient
API_URL=https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct

client = InferenceClient(API_URL + "/v1/chat/completions")

output = client.chat_completion([{"role": "user", "content": "ok"}], response_format={"type": "json_object"})

print(output.choices[0].message.content)

hanouticelina · 2024-09-13T08:29:08Z

Hi @Saisri534, thanks for reporting this! Looks like you're running into a different bug than what this issue is about. Could you open up a new issue for that? 😄

Saisri534 · 2024-09-13T09:48:45Z

Hi @hanouticelina ,
The issue i am facing may be due to the "tiiuae/falcon-7b-instruct"
But the code is working fine for "mistralai/Mixtral-8x7B-Instruct-v0.1"
code:
`from huggingface_hub import InferenceClient
client = InferenceClient("mistralai/Mixtral-8x7B-Instruct-v0.1")

messages = [
{
"role": "user",
"content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
},
]

response_format = {
"type": "json",
"value": {
"properties": {
"location": {"type": "string"},
"activity": {"type": "string"},
"animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
"animals": {"type": "array", "items": {"type": "string"}},
},
"required": ["location", "activity", "animals_seen", "animals"],
},
}

response = client.chat_completion(
messages=messages,
response_format=response_format,
max_tokens=500,
)

print(response.choices[0].message.content)`

output while using mistral:
{
"activity": "saw",
"animals": ["puppy", "cat", "raccoon"],
"animals_seen": 3,
"location": "in the park during a bike ride"
}

error while using Falcon:
raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct/v1/chat/completions (Request ID: hfRdP0ueGjO47DHM7ENhn)

Wauplin · 2024-09-13T10:43:16Z

@Saisri534 can you summarize this in a new separate issue please? Thanks! https://github.com/huggingface/huggingface_hub/issues/new

minmin-intel · 2024-09-13T21:56:22Z

I wanted raise a related issue, I posted in langchain previously but I got no response so far from maintainers. I believe this is a huggingface-hub issue, so I'm posting it here again: langchain-ai/langchain#24720

I got the same 422 error when using the ChatHuggingFace and HuggingFaceEndpoint APIs in langchain-huggingface. I got this error when using huggingface-hub=0.24.6, but no such error when I downgrade to 0.24.0

@Wauplin @hanouticelina Could you please help look into the issue that I described? I got this error when using huggingface-hub=0.24.6, but no such error when I downgrade to 0.24.0. Thanks!

MoritzLaurer added the bug Something isn't working label Aug 23, 2024

MoritzLaurer mentioned this issue Aug 30, 2024

Chathuggingface 422 error langchain-ai/langchain#24720

Open

5 tasks

Wauplin mentioned this issue Aug 30, 2024

client.sentence_similarity() does not use correct route by default #2494

Open

Wauplin self-assigned this Sep 13, 2024

Wauplin linked a pull request Sep 13, 2024 that will close this issue

Fix resolve chat completion URL #2540

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore #2484

Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore #2484

MoritzLaurer commented Aug 23, 2024 •

edited

Loading

Wauplin commented Aug 23, 2024

Wauplin commented Aug 23, 2024

minmin-intel commented Aug 29, 2024

michael-newsrx commented Aug 30, 2024

Saisri534 commented Sep 12, 2024 •

edited

Loading

hanouticelina commented Sep 13, 2024

Saisri534 commented Sep 13, 2024

Wauplin commented Sep 13, 2024

minmin-intel commented Sep 13, 2024

Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore #2484

Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore #2484

Comments

MoritzLaurer commented Aug 23, 2024 • edited Loading

Describe the bug

Reproduction

Logs

System info

Wauplin commented Aug 23, 2024

Wauplin commented Aug 23, 2024

minmin-intel commented Aug 29, 2024

michael-newsrx commented Aug 30, 2024

Saisri534 commented Sep 12, 2024 • edited Loading

hanouticelina commented Sep 13, 2024

Saisri534 commented Sep 13, 2024

Wauplin commented Sep 13, 2024

minmin-intel commented Sep 13, 2024

MoritzLaurer commented Aug 23, 2024 •

edited

Loading

Saisri534 commented Sep 12, 2024 •

edited

Loading