Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI supports top_p = 0.0 and top_p = 1.0 but TGI fails with a validation error with either of these values. #2222

Closed
2 of 4 tasks
michael-newsrx opened this issue Jul 11, 2024 · 12 comments
Assignees
Labels

Comments

@michael-newsrx
Copy link

System Info

  • docker image: ghcr.io/huggingface/text-generation-inference:2.0.2
  • docker image: ghcr.io/huggingface/text-generation-inference:2.1.1

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Fails

ep1: InferenceEndpoint = inference_endpoint1()
    while ep1.status != "running":
        if ep1.status == "failed":
            raise RuntimeError(f"Failed to create inference endpoint: {ep1.name}")
        ep1.wait(timeout=1)

    import openai
    client = openai.OpenAI(  #
            base_url=ep1.url + "/v1",  #
            api_key=hf_bearer_token(),  #
    )

    # print(f"Available models: {client.models.list()}")
    role_system = {"role": "system", "content": "I am an evil robot overlord."}
    role_user = {"role": "user", "content": "What is your command? Be very succinct."}
    chat_completion = client.chat.completions.create(model="tgi",  #
                                                     messages=[role_system, role_user],  #
                                                     stream=True,  #
                                                     max_tokens=1024,  #
                                                     temperature=0.0,  #
                                                     top_p=1.0,  #
                                                     )

Works

ep1: InferenceEndpoint = inference_endpoint1()
    while ep1.status != "running":
        if ep1.status == "failed":
            raise RuntimeError(f"Failed to create inference endpoint: {ep1.name}")
        ep1.wait(timeout=1)

    import openai
    client = openai.OpenAI(  #
            base_url=ep1.url + "/v1",  #
            api_key=hf_bearer_token(),  #
    )

    # print(f"Available models: {client.models.list()}")
    role_system = {"role": "system", "content": "I am an evil robot overlord."}
    role_user = {"role": "user", "content": "What is your command? Be very succinct."}
    chat_completion = client.chat.completions.create(model="tgi",  #
                                                     messages=[role_system, role_user],  #
                                                     stream=True,  #
                                                     max_tokens=1024,  #
                                                     temperature=0.0,  #
                                                     top_p=0.99,  #
                                                     )

Expected behavior

See also: #1896 where the patch did not address this issue even though raised as part of the ticket.

Impact

This generally breaks libraries like guidance where the library is hard coded to use top_p=1.0 for the OpenAI interface.

@IQ179
Copy link

IQ179 commented Jul 12, 2024

let top_p = top_p
.map(|value| {
if value <= 0.0 || value >= 1.0 {
return Err(ValidationError::TopP);
}
Ok(value)
})
.unwrap_or(Ok(1.0))?;

If you want to set top_p to 1.0, you can simply sending the top_p as none, which will result in the default value of 1.0 being applied.

It seems like the equal condition in the code is causing an error.

@michael-conrad
Copy link

michael-conrad commented Jul 12, 2024

This doesn't provide a resolution to the issue.

The docker container rejects top_p=1.0 for OpenAI interface, but OpenAI interface should accept top_p=1.0 and not fail.

Is there an easy way to "patch" the container and deploy using the patched version?

let top_p = top_p
.map(|value| {
if value <= 0.0 || value >= 1.0 {
return Err(ValidationError::TopP);
}
Ok(value)
})
.unwrap_or(Ok(1.0))?;

If you want to set top_p to 1.0, you can simply sending the top_p as none, which will result in the default value of 1.0 being applied.

It seems like the equal condition in the code is causing an error.

See also: guidance-ai/guidance#945

michael-conrad added a commit to michael-conrad/text-generation-inference that referenced this issue Jul 12, 2024
Don't error on OpenAI `top_p` valid values.

* Closes: guidance-ai/guidance#945
* Closes: huggingface#2222
@ErikKaum
Copy link
Member

Hi @michael-newsrx

Thank you for bringing this to our attention and for making the PR 👍

As far as I can tell, there shouldn't be anything blocking for getting this merged in. I'll approve running the CI and can take over the merging of the PR.

@michael-conrad
Copy link

#2231

@ErikKaum ErikKaum self-assigned this Jul 16, 2024
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@cornzz
Copy link

cornzz commented Sep 3, 2024

@ErikKaum I ran into this issue as I switched out API base urls and suddenly my script broke as the new API uses TGI which doesnt allow top_p=1.0. I can work around this but it would be nice if TGI would allow this as I dont see why it is not allowed.

@ErikKaum
Copy link
Member

ErikKaum commented Sep 3, 2024

Hi @cornzz 👋

I understand that it's annoying that it breaks the client. But I think still for now we're opting for a clear error VS discarding user input without letting the user know.

But if there's a lot of demand for the top_p=1.0 == no top_p alternative we're still open. One good way to get an "indication of demand" for that would be e.g. to get thumbs up on a issue that's a feature request.

Hopefully this makes sense to you 👍

@michael-conrad
Copy link

michael-conrad commented Sep 3, 2024 via email

@cornzz
Copy link

cornzz commented Sep 3, 2024

Hey @ErikKaum, thanks for your quick response! Its not a problem, I was reusing a script and I am wondering why the authors set top_p at all since it defaults to 1.

Still, and sorry if I am misunderstanding something, but what do you mean by discarding user input? Maybe I am missing something, but why can't top_p be set to 1.0 by the user manually while not setting any value for top_p makes it default to 1.0?

@ErikKaum
Copy link
Member

ErikKaum commented Sep 5, 2024

Glad to hear that it's not a problem 👍

while not setting any value for top_p makes it default to 1.0?

No worries. So I'm pretty sure by not having a top_p it default to None and not 1:

This is where it get's processed as an Option<f32> in the router:

And then on the model logic side the branch is conditioned on top_p is not None:

class StaticWarper:
def __init__(
self,
temperature=1.0,
top_k=None,
top_p=None,
typical_p=None,
):
self.warpers = []
if temperature is not None and temperature != 1.0:
temperature = float(temperature)
self.warpers.append(TemperatureLogitsWarper(temperature))
if top_k is not None and top_k != 0:
self.warpers.append(TopKLogitsWarper(top_k=top_k))
if top_p is not None and top_p < 1.0:
self.warpers.append(TopPLogitsWarper(top_p=top_p))
if typical_p is not None and typical_p < 1.0:
self.warpers.append(TypicalLogitsWarper(mass=typical_p))
self.cuda_graph = None
self.static_scores = None
self.static_warped_scores = None
self.static_next_logprob = None

There might be something I missed here or misunderstood in your question.

@cornzz
Copy link

cornzz commented Sep 5, 2024

Ah okay, sorry then it was a misunderstanding on my side, I assumed from this comment above, that it defaults to 1.0 if it was None.

@zhksh
Copy link

zhksh commented Nov 4, 2024

i still dont understand why top_p cant be 1.0. Not only is it not OAI-Api compatible but it is really counterintuitive given the meaning of the param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants