-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Description
Identify the file to be fixed
The name of the file containing the problem: How_to_count_tokens_with_tiktoken.ipynb
Describe the problem
The problem is that the supplied example code for computing token counts for chat messages appears to be off by 1 (low) for each message. The numbers returned by num_tokens_from_messages()
did not match those returned by the API endpoint. The problem is the same with both the gpt-3.5-turbo and gpt-4 endpoints even though these are separate code paths.
Describe a solution
By trial and error I made the following changes on the two lines with the WAS
comments:
elif model == "gpt-3.5-turbo-0301":
tokens_per_message = 5 # WAS 4 # every message follows <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # if there's a name, the role is omitted
elif model == "gpt-4-0314":
tokens_per_message = 4 # WAS 3
tokens_per_name = 1
With the above changes I get the correct token counts for both chat endpoints.
May I suggest that the tiktoken library itself handle the details of knowing the chat wrapper encoding?
Additional context
I tried to get tiktoken to encode the message wrappers to compute the actual token overhead using:
encoding.encode("<|im_start|>system\n<|im_end|>\n", allowed_special="all")
and
encoding.encode("<|start|>system\n<|end|>\n", allowed_special="all")
But tiktoken does not understand those special start/end tokens.