token count differs from the actual token usage from completions API #37

edwardxwu · 2023-02-24T21:19:35Z

tiktoken returns 1942, while the completion API only claimed to have 748

`
import tiktoken

a = """
{
"data": {
"attributes": {
"last_dns_records": [
{
"type": "AAAA",
"value": "2a04:4e42::773",
"ttl": 24
},
{
"type": "TXT",
"value": "d1xTs9+kADZZSz3bPphLpkMXXxBGjqn5vsQHhi2M6lo0r8AdIbm6j8LfQXPujsywVgeGSP+AXWX0vO9Iep5cUg==",
"ttl": 300
},
{
"type": "TXT",
"value": "299762315-4422055",
"ttl": 300
},
{
"type": "TXT",
"value": "google-site-verification=_QivaXNjhXy-V1y_YqrycXdAWZi2mVrcwbXerX6THeY",
"ttl": 300
},
{
"type": "TXT",
"value": "764482256-4422025",
"ttl": 300
},
{
"type": "TXT",
"value": "umfe3f9bni2s85tm3m666qbfal.",
"ttl": 300
},
{
"type": "TXT",
"value": "553992719-4400647",
"ttl": 300
},
{
"type": "TXT",
"value": "755973593-4422016",
"ttl": 300
},
{
"rname": "awsdns-hostmaster.amazon.com",
"retry": 900,
"refresh": 7200,
"minimum": 86400,
"value": "ns-47.awsdns-05.com",
"expire": 1209600,
"ttl": 900,
"serial": 1,
"type": "SOA"
},
{
"type": "TXT",
"value": "globalsign-domain-verification=2lI5pahhCu_jg_2RC5GEdolQmAa4K7rhP7_OA-lZBK",
"ttl": 300
},
{
"type": "TXT",
"value": "google-site-verification=R-Btow3Z8oU_9H1IWU4Gm4lvUQ_OVmsfxonIKhIaiPE",
"ttl": 300
},
{
"type": "TXT",
"value": "294913881-4422049",
"ttl": 300
},
{
"type": "TXT",
"value": "882269757-4422010",
"ttl": 300
},
"""

print(len(tiktoken.get_encoding("gpt2").encode(a)))

import openai

completion_api_params = {
# We use temperature of 0.0 because it gives the most predictable, factual answer.
"temperature": 0.0,
"max_tokens": 1,
"model": "text-davinci-003",
}
response = openai.Completion.create(prompt=a, **completion_api_params)
print(f"completion token usage: {response['usage']}")
`

edwardxwu · 2023-02-24T21:24:43Z

nvm I should have read https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb closely
gpt2 is not the right encode for text-davinci-003

hauntsaninja · 2023-02-24T21:36:54Z

Use tiktoken.encoding_for_model

edwardxwu closed this as completed Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

token count differs from the actual token usage from completions API #37

token count differs from the actual token usage from completions API #37

edwardxwu commented Feb 24, 2023 •

edited

Loading

edwardxwu commented Feb 24, 2023 •

edited

Loading

hauntsaninja commented Feb 24, 2023

token count differs from the actual token usage from completions API #37

token count differs from the actual token usage from completions API #37

Comments

edwardxwu commented Feb 24, 2023 • edited Loading

edwardxwu commented Feb 24, 2023 • edited Loading

hauntsaninja commented Feb 24, 2023

edwardxwu commented Feb 24, 2023 •

edited

Loading

edwardxwu commented Feb 24, 2023 •

edited

Loading