-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Closed
Description
Describe the feature or improvement you're requesting
The chatml.md document said every message is represented as
<|im_start|>{role}\n{text}<|im_end|>\n
That would add 5 more tokens beside the text itself.
This is correct for the gpt-3.5-turbo-0301 model. As the gpt-3.5-turbo-0613 and gpt-4 models, it seems that every message is represneted as the following form
<|im_start|>{role}\n{text}<|im_end|>
For eample
>>> openai.ChatCompletion.create(
... model="gpt-3.5-turbo-0613",
... messages=[{"role":"user", "content":"hello"}])
<OpenAIObject chat.completion id=chatcmpl-7WK2PVLogo1vxUXDAgApf9JWbXsET at 0x1da6e756210> JSON: {
"id": "chatcmpl-7WK2PVLogo1vxUXDAgApf9JWbXsET",
"object": "chat.completion",
"created": 1687937877,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 9,
"total_tokens": 17
}
}
>>> encoder.encode("hello")
[15339]
Accorsing to the response, the prompt_tokens is 8, if we follow the rule described in the chatml.md, it's represened as
<|im_start|>user\nhello<|im_end|>\n<|im_start|>assistant\n
That would be 9 tokens. My guess is that there's no need for the '\n' after the <|im_end|> token.
How can I make sure for that? Thanks.
Additional context
No response
Metadata
Metadata
Assignees
Labels
No labels