-
Notifications
You must be signed in to change notification settings - Fork 67
Add repetition_penalty, top_k, top_p to Completion #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| request_id = str(uuid4()) | ||
| add_trace_request_id(request_id) | ||
| if request.top_k == 0: # top_k can't be 0, only takes >= 1, or -1/None to disable top_k | ||
| request.top_k = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a check for validity of vllm/tgi/lightllm specific parameters? Otherwise the args will pass through silently and users will not know what happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, also we should centralize all of the framework-specific validation, like later in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field checks the validity, and gives out error message. In this case, I thought it doesn't make sense to have 0 as top_k anyways, so just assumed it can have the same effect as -1, but I can add another message.
| frequency_penalty (Optional[float]): | ||
| Only affects: vllm, lightllm | ||
| Penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | ||
| https://platform.openai.com/docs/guides/gpt/parameter-details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels a bit weird to be linking to OpenAI docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm didn't link sources for them, but google search result says presence_penalty and frequency_penalty are from OpenAI, and the range matches too, so I linked it here
| Whether to return the log probabilities of generated tokens. | ||
| When True, the response will include a list of tokens and their log probabilities. | ||
| repetition_penalty (Optional[float]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe someone who's more of a Python expert can correct me if I'm wrong, but I think if you have default numeric values, there's no need to make these Optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no need but i don't see why we want to have them as non-optional
| if request.temperature > 0: | ||
| args["parameters"]["temperature"] = request.temperature | ||
| args["parameters"]["do_sample"] = True | ||
| if request.top_k == -1: # tgi set to None to consider all tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah alternatively, you can keep these as Optional like you have, but default to None.
yixu34
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not specific to this PR, but I think we might be at the point where we have centralized framework validation modules for TGI, vLLM, LightLLM, etc. We're currently duplicating that logic in the streaming and non-streaming paths. @yunfeng-scale @ian-scale thoughts?
| Whether to return the log probabilities of generated tokens. | ||
| When True, the response will include a list of tokens and their log probabilities. | ||
| repetition_penalty (Optional[float]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is the same as frequency_penalty? can you check the implementation? we shouldn't be exposing framework differences explicitly like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, they have the same intention, but different implementation. The range is also different, repetition_penalty takes [1, infinity), frequency_penalty takes [-2, 2]. I thought it will be confusing if we simply replace the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me take a look. i think exposing all these parameters would be confusing to end users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like TGI's penalty uses division like in https://arxiv.org/pdf/1909.05858.pdf and vLLM and lightLLM uses minus. let's remove repetition_penalty and provide only presence_penalty and frequency_penalty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TGI only supports repetition_penalty. Does this mean if users choose TGI framework, they won't have the option to penalize repetition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i would suggest anyone wants to use repetition penalty to migrate to vLLM.
yunfeng-scale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits. please address before merge, thanks!
| temperature: float = 0.2, | ||
| stop_sequences: Optional[List[str]] = None, | ||
| return_token_log_probs: Optional[bool] = False, | ||
| presence_penalty: float = 0.0, # vllm, lightllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make these optional? is there a reason for them to be not optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also no need to comment here about frameworks since it's specific later in main comment
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
yeah some refactoring (probably plus unit tests) is needed |
|
|
||
| def validate_and_update_completion_params( | ||
| inference_framework: LLMInferenceFramework, | ||
| request: Union[CompletionSyncV1Request, CompletionStreamV1Request], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw if the type checker is still giving you trouble maybe https://docs.python.org/3.8/library/typing.html#user-defined-generic-types would help? at least this feels like the "proper" way to do things to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to pass the check if I call the function like this
new_request = validate_and_update_completion_params(endpoint_content.inference_framework, request)
assert isinstance(new_request, CompletionSyncV1Request)
request = new_request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would bias against asserts in production code. let's convert to an if-statement that throws a ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does something like new_request: CompletionSyncV1Request = validate... also work? basically another way of telling the type checker that you know it'll be a CompletionSyncV1Request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work... I think the problem is if the return type is union, will have to do some type narrowing https://mypy.readthedocs.io/en/stable/type_narrowing.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use generics https://mypy.readthedocs.io/en/stable/generics.html
from typing import TypeVar, Sequence
T = TypeVar('T')
def validate_and_update_completion_params(
inference_framework: LLMInferenceFramework,
request: T,
):
...
|
I just tried from llmengine import Completion
response = Completion.create(
model="llama-2-7b",
prompt="Hello, my name is",
max_new_tokens=10,
temperature=0.2,
top_p=0.6,
)
print(response.json())
print(response.output.text)And getting I see Do we need to add some tests for these parameters? |
|
Summary
Add the following parameters to Completion
Test Plan and Usage Guide
Test on local server
"\nThe name of my restaurant is "The Pancake House".\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.?\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.? I'm opening a pancake restaurant. List ",
"\nI'm opening a pancake restaurant. List 3 quirky names I could name my restaurant.\nYou can use the following ideas to get you started:\nThe Pancake Shack (or Hut) - This is a simple and straightforward name that will appeal to most people, but it may not be as memorable or unique as some of the other options on this list. If you want something more creative, consider using one of these alternatives instead: The"