Add repetition_penalty, top_k, top_p to Completion #295

francesy-scale · 2023-09-28T17:03:42Z

Summary

Add the following parameters to Completion

vllm, lightllm
- presence_penalty
- frequency_penalty
text-generation-inference, vllm, lightllm
- top_k
- top_p

Test Plan and Usage Guide

Test on local server

response = Completion.create(
    model="llama-2-7b-vllm",
    prompt="Im opening a pancake restaurant. List 3 quirky names I could name my restaurant.",
    max_new_tokens=100,
    temperature=0.6,
    frequency_penalty=0,
    presence_penalty=0,
    top_k=-1,
    top_p=0.6,
)

curl -X POST http://localhost:5001/v1/llm/completions-sync?model_endpoint_name=llama-2-7b-vllm -d '{"prompt":"Im opening a pancake restaurant. List 3 quirky names I could name my restaurant.", "max_new_tokens":100, "temperature":0.6, "presence_penalty":0, "frequency_penalty":0, "top_k":-1, "top_p":0.6}' -H "content-type: application/json"

"\nThe name of my restaurant is "The Pancake House".\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.?\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.? I'm opening a pancake restaurant. List ",

response = Completion.create(
    model="llama-2-7b-vllm",
    prompt="Im opening a pancake restaurant. List 3 quirky names I could name my restaurant.",
    max_new_tokens=100,
    temperature=0.6,
    frequency_penalty=2,
    presence_penalty=2,
    top_k=-1,
    top_p=0.6,
)

curl -X POST http://localhost:5001/v1/llm/completions-sync?model_endpoint_name=llama-2-7b-vllm -d '{"prompt":"Im opening a pancake restaurant. List 3 quirky names I could name my restaurant.", "max_new_tokens":100, "temperature":0.6, "presence_penalty":2, "frequency_penalty":2, "top_k":-1, "top_p":0.6}' -H "content-type: application/json"

"\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.\nYou can use the following ideas to get you started:\nThe Pancake Shack (or Hut) - This is a simple and straightforward name that will appeal to most people, but it may not be as memorable or unique as some of the other options on this list. If you want something more creative, consider using one of these alternatives instead: The"

adlam-scale · 2023-09-28T17:47:47Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

        request_id = str(uuid4())
        add_trace_request_id(request_id)
+        if request.top_k == 0:  # top_k can't be 0, only takes >= 1, or -1/None to disable top_k
+            request.top_k = -1


please add a check for validity of vllm/tgi/lightllm specific parameters? Otherwise the args will pass through silently and users will not know what happened.

+1, also we should centralize all of the framework-specific validation, like later in this function.

Field checks the validity, and gives out error message. In this case, I thought it doesn't make sense to have 0 as top_k anyways, so just assumed it can have the same effect as -1, but I can add another message.

yixu34 · 2023-09-28T17:46:12Z

clients/python/llmengine/completion.py

+            frequency_penalty (Optional[float]):
+                Only affects: vllm, lightllm
+                Penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
+                https://platform.openai.com/docs/guides/gpt/parameter-details


Feels a bit weird to be linking to OpenAI docs?

vllm didn't link sources for them, but google search result says presence_penalty and frequency_penalty are from OpenAI, and the range matches too, so I linked it here

clients/python/llmengine/completion.py

yixu34 · 2023-09-28T17:47:17Z

clients/python/llmengine/completion.py

                Whether to return the log probabilities of generated tokens.
                When True, the response will include a list of tokens and their log probabilities.

+            repetition_penalty (Optional[float]):


Maybe someone who's more of a Python expert can correct me if I'm wrong, but I think if you have default numeric values, there's no need to make these Optional.

there's no need but i don't see why we want to have them as non-optional

yixu34 · 2023-09-28T18:19:46Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

            if request.temperature > 0:
                args["parameters"]["temperature"] = request.temperature
                args["parameters"]["do_sample"] = True
+                if request.top_k == -1:  # tgi set to None to consider all tokens.


Ah alternatively, you can keep these as Optional like you have, but default to None.

yixu34

This is not specific to this PR, but I think we might be at the point where we have centralized framework validation modules for TGI, vLLM, LightLLM, etc. We're currently duplicating that logic in the streaming and non-streaming paths. @yunfeng-scale @ian-scale thoughts?

yunfeng-scale · 2023-09-28T19:00:50Z

clients/python/llmengine/completion.py

                Whether to return the log probabilities of generated tokens.
                When True, the response will include a list of tokens and their log probabilities.

+            repetition_penalty (Optional[float]):


i think this is the same as frequency_penalty? can you check the implementation? we shouldn't be exposing framework differences explicitly like this

I checked, they have the same intention, but different implementation. The range is also different, repetition_penalty takes [1, infinity), frequency_penalty takes [-2, 2]. I thought it will be confusing if we simply replace the name

let me take a look. i think exposing all these parameters would be confusing to end users

looks like TGI's penalty uses division like in https://arxiv.org/pdf/1909.05858.pdf and vLLM and lightLLM uses minus. let's remove repetition_penalty and provide only presence_penalty and frequency_penalty

TGI only supports repetition_penalty. Does this mean if users choose TGI framework, they won't have the option to penalize repetition?

yes, i would suggest anyone wants to use repetition penalty to migrate to vLLM.

yunfeng-scale

some nits. please address before merge, thanks!

yunfeng-scale · 2023-09-29T05:22:31Z

clients/python/llmengine/completion.py

        temperature: float = 0.2,
        stop_sequences: Optional[List[str]] = None,
        return_token_log_probs: Optional[bool] = False,
+        presence_penalty: float = 0.0,  # vllm, lightllm


can you make these optional? is there a reason for them to be not optional?

also no need to comment here about frameworks since it's specific later in main comment

clients/python/llmengine/completion.py

clients/python/llmengine/data_types.py

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

yunfeng-scale · 2023-09-29T16:51:25Z

This is not specific to this PR, but I think we might be at the point where we have centralized framework validation modules for TGI, vLLM, LightLLM, etc. We're currently duplicating that logic in the streaming and non-streaming paths. @yunfeng-scale @ian-scale thoughts?

yeah some refactoring (probably plus unit tests) is needed

seanshi-scale · 2023-09-29T17:42:18Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py


+def validate_and_update_completion_params(
+    inference_framework: LLMInferenceFramework,
+    request: Union[CompletionSyncV1Request, CompletionStreamV1Request],


btw if the type checker is still giving you trouble maybe https://docs.python.org/3.8/library/typing.html#user-defined-generic-types would help? at least this feels like the "proper" way to do things to me

I was able to pass the check if I call the function like this

new_request = validate_and_update_completion_params(endpoint_content.inference_framework, request) assert isinstance(new_request, CompletionSyncV1Request) request = new_request

nit: would bias against asserts in production code. let's convert to an if-statement that throws a ValueError

does something like new_request: CompletionSyncV1Request = validate... also work? basically another way of telling the type checker that you know it'll be a CompletionSyncV1Request

Doesn't work... I think the problem is if the return type is union, will have to do some type narrowing https://mypy.readthedocs.io/en/stable/type_narrowing.html

you can use generics https://mypy.readthedocs.io/en/stable/generics.html

from typing import TypeVar, Sequence T = TypeVar('T') def validate_and_update_completion_params( inference_framework: LLMInferenceFramework, request: T, ): ...

jenkspt · 2023-10-02T21:47:09Z

I just tried

from llmengine import Completion

response = Completion.create(
    model="llama-2-7b",
    prompt="Hello, my name is",
    max_new_tokens=10,
    temperature=0.2,
    top_p=0.6,
)

print(response.json())
print(response.output.text)

And getting

      1 from llmengine import Completion
----> 3 response = Completion.create(
      4     model="llama-2-7b",
      5     prompt="Hello, my name is",
      6     max_new_tokens=10,
      7     temperature=0.2,
      8     top_p=0.6,
      9 )
     11 print(response.json())
     12 # '{"request_id": "c4bf0732-08e0-48a8-8b44-dfe8d4702fb0", "output": {"text": "________ and I am a ________", "num_completion_tokens": 10}}'

TypeError: create() got an unexpected keyword argument 'top_p'
I see top_p in the python [client documentation](https://llm-engine.scale.com/api/python_client/#:~:text=responses%20or%20not.-,create,-classmethod)

I see top_p in the python client documentation

Do we need to add some tests for these parameters?

francesy-scale · 2023-10-02T22:23:41Z

Do we need to add some tests for these parameters?

pip install scale-llm-engine --upgrade should fix the issue now. I will add some unit tests

francesy-scale added 3 commits September 28, 2023 00:46

add repetition_penalty, top_k, top_p

b9f8766

add frequency_penalty, presence_penalty, add lightllm

1847fa2

add comments

e5de486

francesy-scale requested a review from a team September 28, 2023 17:03

francesy-scale self-assigned this Sep 28, 2023

fix

3d845e2

adlam-scale reviewed Sep 28, 2023

View reviewed changes

yixu34 reviewed Sep 28, 2023

View reviewed changes

yunfeng-scale reviewed Sep 28, 2023

View reviewed changes

francesy-scale added 2 commits September 28, 2023 22:24

fix Optional, add params validation

ffd9341

remove repetition_penalty

63d174c

yunfeng-scale approved these changes Sep 29, 2023

View reviewed changes

add back optional, update validation function

aa64673

scaleapi deleted a comment from shortcut-integration bot Sep 29, 2023

seanshi-scale reviewed Sep 29, 2023

View reviewed changes

francesy-scale and others added 2 commits September 29, 2023 20:00

type check

4b1b4fb

Merge branch 'main' into frances/completion

c541555

francesy-scale merged commit 9023370 into main Sep 29, 2023

francesy-scale deleted the frances/completion branch September 29, 2023 21:52

Add repetition_penalty, top_k, top_p to Completion #295

Add repetition_penalty, top_k, top_p to Completion #295

Uh oh!

Conversation

francesy-scale commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan and Usage Guide

Uh oh!

adlam-scale Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francesy-scale Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yunfeng-scale commented Sep 29, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francesy-scale Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jenkspt commented Oct 2, 2023

Uh oh!

francesy-scale commented Oct 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

francesy-scale commented Sep 28, 2023 •

edited

Loading

adlam-scale Sep 28, 2023 •

edited

Loading

francesy-scale Sep 28, 2023 •

edited

Loading

francesy-scale Sep 29, 2023 •

edited

Loading