Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add im_start / im_end in cl100k_base #48

Open
spolu opened this issue Mar 5, 2023 · 4 comments
Open

Add im_start / im_end in cl100k_base #48

spolu opened this issue Mar 5, 2023 · 4 comments

Comments

@spolu
Copy link

spolu commented Mar 5, 2023

Not really directly useful given the Chat API...

But triangulating:

We know they exist.

@zyxue
Copy link

zyxue commented Jun 1, 2023

what does im mean in im_start/im_end?

@spolu
Copy link
Author

spolu commented Jun 2, 2023

They are the special tokens used in the OpenAI Chat format as it gets translated and presented to the model.

@youkaichao
Copy link
Contributor

@zyxue It seems to be "input message". Check here.

@microsoftbuild
Copy link

@spolu

It's possible that the APi side uses and extended tokenizer like: https://github.com/openai/tiktoken/tree/main#extending-tiktoken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants