OpenAI supports `top_p = 0.0` and `top_p = 1.0` but TGI fails with a validation error with either of these values. #2222

michael-newsrx · 2024-07-11T18:17:02Z

System Info

docker image: ghcr.io/huggingface/text-generation-inference:2.0.2
docker image: ghcr.io/huggingface/text-generation-inference:2.1.1

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Fails

ep1: InferenceEndpoint = inference_endpoint1()
    while ep1.status != "running":
        if ep1.status == "failed":
            raise RuntimeError(f"Failed to create inference endpoint: {ep1.name}")
        ep1.wait(timeout=1)

    import openai
    client = openai.OpenAI(  #
            base_url=ep1.url + "/v1",  #
            api_key=hf_bearer_token(),  #
    )

    # print(f"Available models: {client.models.list()}")
    role_system = {"role": "system", "content": "I am an evil robot overlord."}
    role_user = {"role": "user", "content": "What is your command? Be very succinct."}
    chat_completion = client.chat.completions.create(model="tgi",  #
                                                     messages=[role_system, role_user],  #
                                                     stream=True,  #
                                                     max_tokens=1024,  #
                                                     temperature=0.0,  #
                                                     top_p=1.0,  #
                                                     )

Works

ep1: InferenceEndpoint = inference_endpoint1()
    while ep1.status != "running":
        if ep1.status == "failed":
            raise RuntimeError(f"Failed to create inference endpoint: {ep1.name}")
        ep1.wait(timeout=1)

    import openai
    client = openai.OpenAI(  #
            base_url=ep1.url + "/v1",  #
            api_key=hf_bearer_token(),  #
    )

    # print(f"Available models: {client.models.list()}")
    role_system = {"role": "system", "content": "I am an evil robot overlord."}
    role_user = {"role": "user", "content": "What is your command? Be very succinct."}
    chat_completion = client.chat.completions.create(model="tgi",  #
                                                     messages=[role_system, role_user],  #
                                                     stream=True,  #
                                                     max_tokens=1024,  #
                                                     temperature=0.0,  #
                                                     top_p=0.99,  #
                                                     )

Expected behavior

See also: #1896 where the patch did not address this issue even though raised as part of the ticket.

Impact

This generally breaks libraries like guidance where the library is hard coded to use top_p=1.0 for the OpenAI interface.

The text was updated successfully, but these errors were encountered:

IQ179 · 2024-07-12T02:23:21Z

text-generation-inference/router/src/validation.rs

Lines 248 to 255 in 8511669

    
           let top_p = top_p 
        
               .map(|value| { 
        
                   if value <= 0.0 || value >= 1.0 { 
        
                       return Err(ValidationError::TopP); 
        
                   } 
        
                   Ok(value) 
        
               }) 
        
               .unwrap_or(Ok(1.0))?;

If you want to set top_p to 1.0, you can simply sending the top_p as none, which will result in the default value of 1.0 being applied.

It seems like the equal condition in the code is causing an error.

michael-conrad · 2024-07-12T19:38:32Z

This doesn't provide a resolution to the issue.

The docker container rejects top_p=1.0 for OpenAI interface, but OpenAI interface should accept top_p=1.0 and not fail.

Is there an easy way to "patch" the container and deploy using the patched version?

text-generation-inference/router/src/validation.rs

Lines 248 to 255 in 8511669

let top_p = top_p

.map(|value| {

if value <= 0.0 || value >= 1.0 {

return Err(ValidationError::TopP);

}

Ok(value)

})

.unwrap_or(Ok(1.0))?;

If you want to set top_p to 1.0, you can simply sending the top_p as none, which will result in the default value of 1.0 being applied.

It seems like the equal condition in the code is causing an error.

See also: guidance-ai/guidance#945

Don't error on OpenAI `top_p` valid values. * Closes: guidance-ai/guidance#945 * Closes: huggingface#2222

ErikKaum · 2024-07-15T09:34:12Z

Hi @michael-newsrx

Thank you for bringing this to our attention and for making the PR 👍

As far as I can tell, there shouldn't be anything blocking for getting this merged in. I'll approve running the CI and can take over the merging of the PR.

michael-conrad · 2024-07-16T01:59:35Z

#2231

github-actions · 2024-08-16T01:53:23Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

cornzz · 2024-09-03T13:23:41Z

@ErikKaum I ran into this issue as I switched out API base urls and suddenly my script broke as the new API uses TGI which doesnt allow top_p=1.0. I can work around this but it would be nice if TGI would allow this as I dont see why it is not allowed.

ErikKaum · 2024-09-03T13:46:57Z

Hi @cornzz 👋

I understand that it's annoying that it breaks the client. But I think still for now we're opting for a clear error VS discarding user input without letting the user know.

But if there's a lot of demand for the top_p=1.0 == no top_p alternative we're still open. One good way to get an "indication of demand" for that would be e.g. to get thumbs up on a issue that's a feature request.

Hopefully this makes sense to you 👍

michael-conrad · 2024-09-03T14:03:24Z

Depending on the client software. It could result in a breakage that prevents any use of TGI by a customer.

…

-Michael Conrad Telephone: +1.678.934.3989 Email: ***@***.*** Telegram: https://t.me/mconrad202 Mastodon: ***@***.***

On Tue, Sep 3, 2024 at 9:47 AM Erik Kaunismäki ***@***.***> wrote: Hi @cornzz <https://github.com/cornzz> 👋 I understand that it's annoying that it breaks the client. But I think still for now we're opting for a clear error VS discarding user input without letting the user know. But if there's a lot of demand for the top_p=1.0 == no top_p alternative we're still open. One good way to get an "indication of demand" for that would be e.g. to get thumbs up on a issue that's a feature request. Hopefully this makes sense to you 👍 — Reply to this email directly, view it on GitHub <#2222 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH4XNG7FICQDFNQVTEVASLZUW4WRAVCNFSM6AAAAABKXPZBUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRWGU3TIMBXGQ> . You are receiving this because you commented.Message ID: ***@***.***>

cornzz · 2024-09-03T15:40:21Z

Hey @ErikKaum, thanks for your quick response! Its not a problem, I was reusing a script and I am wondering why the authors set top_p at all since it defaults to 1.

Still, and sorry if I am misunderstanding something, but what do you mean by discarding user input? Maybe I am missing something, but why can't top_p be set to 1.0 by the user manually while not setting any value for top_p makes it default to 1.0?

ErikKaum · 2024-09-05T09:31:06Z

Glad to hear that it's not a problem 👍

while not setting any value for top_p makes it default to 1.0?

No worries. So I'm pretty sure by not having a top_p it default to None and not 1:

This is where it get's processed as an Option<f32> in the router:

text-generation-inference/router/src/server.rs

Line 789 in 6cb42f4

top_p: req.top_p,

And then on the model logic side the branch is conditioned on top_p is not None:

text-generation-inference/server/text_generation_server/utils/logits_process.py

Lines 26 to 49 in 6cb42f4

    
           class StaticWarper: 
        
               def __init__( 
        
                   self, 
        
                   temperature=1.0, 
        
                   top_k=None, 
        
                   top_p=None, 
        
                   typical_p=None, 
        
               ): 
        
                   self.warpers = [] 
        
                   if temperature is not None and temperature != 1.0: 
        
                       temperature = float(temperature) 
        
                       self.warpers.append(TemperatureLogitsWarper(temperature)) 
        
                   if top_k is not None and top_k != 0: 
        
                       self.warpers.append(TopKLogitsWarper(top_k=top_k)) 
        
                   if top_p is not None and top_p < 1.0: 
        
                       self.warpers.append(TopPLogitsWarper(top_p=top_p)) 
        
                   if typical_p is not None and typical_p < 1.0: 
        
                       self.warpers.append(TypicalLogitsWarper(mass=typical_p)) 
        
                   self.cuda_graph = None 
        
                   self.static_scores = None 
        
                   self.static_warped_scores = None 
        
                   self.static_next_logprob = None

There might be something I missed here or misunderstood in your question.

cornzz · 2024-09-05T13:14:46Z

Ah okay, sorry then it was a misunderstanding on my side, I assumed from this comment above, that it defaults to 1.0 if it was None.

zhksh · 2024-11-04T12:30:01Z

i still dont understand why top_p cant be 1.0. Not only is it not OAI-Api compatible but it is really counterintuitive given the meaning of the param.

michael-conrad added a commit to michael-conrad/text-generation-inference that referenced this issue Jul 12, 2024

Don't error on OpenAI valid top_p values.

5b27307

Don't error on OpenAI `top_p` valid values. * Closes: guidance-ai/guidance#945 * Closes: huggingface#2222

michael-conrad mentioned this issue Jul 12, 2024

OpenAI compatibility fix: valid top_p value range is 0.0 to 1.0. #2228

Closed

ErikKaum self-assigned this Jul 16, 2024

github-actions bot added the Stale label Aug 16, 2024

ErikKaum closed this as completed Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI supports `top_p = 0.0` and `top_p = 1.0` but TGI fails with a validation error with either of these values. #2222

OpenAI supports `top_p = 0.0` and `top_p = 1.0` but TGI fails with a validation error with either of these values. #2222

michael-newsrx commented Jul 11, 2024

IQ179 commented Jul 12, 2024

michael-conrad commented Jul 12, 2024 •

edited

Loading

ErikKaum commented Jul 15, 2024

michael-conrad commented Jul 16, 2024

github-actions bot commented Aug 16, 2024

cornzz commented Sep 3, 2024 •

edited

Loading

ErikKaum commented Sep 3, 2024

michael-conrad commented Sep 3, 2024 via email

cornzz commented Sep 3, 2024

ErikKaum commented Sep 5, 2024

cornzz commented Sep 5, 2024 •

edited

Loading

zhksh commented Nov 4, 2024

OpenAI supports top_p = 0.0 and top_p = 1.0 but TGI fails with a validation error with either of these values. #2222

OpenAI supports top_p = 0.0 and top_p = 1.0 but TGI fails with a validation error with either of these values. #2222

Comments

michael-newsrx commented Jul 11, 2024

System Info

Information

Tasks

Reproduction

Fails

Works

Expected behavior

Impact

IQ179 commented Jul 12, 2024

michael-conrad commented Jul 12, 2024 • edited Loading

ErikKaum commented Jul 15, 2024

michael-conrad commented Jul 16, 2024

github-actions bot commented Aug 16, 2024

cornzz commented Sep 3, 2024 • edited Loading

ErikKaum commented Sep 3, 2024

michael-conrad commented Sep 3, 2024 via email

cornzz commented Sep 3, 2024

ErikKaum commented Sep 5, 2024

cornzz commented Sep 5, 2024 • edited Loading

zhksh commented Nov 4, 2024

OpenAI supports `top_p = 0.0` and `top_p = 1.0` but TGI fails with a validation error with either of these values. #2222

OpenAI supports `top_p = 0.0` and `top_p = 1.0` but TGI fails with a validation error with either of these values. #2222

michael-conrad commented Jul 12, 2024 •

edited

Loading

cornzz commented Sep 3, 2024 •

edited

Loading

cornzz commented Sep 5, 2024 •

edited

Loading