-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add extra tokens in tiktoken? #9
Comments
You can create your own Encoding object by passing in a dict from token-bytes to the integer token value. Line 10 in 4226a6c
You can see examples of the arguments passed to the constructor here: tiktoken/tiktoken_ext/openai_public.py Line 15 in 4226a6c
|
If you want to have your own encoding registered so that Line 22 in 4226a6c
|
Thank you so much for the swift reply! I'll have a try. |
Sorry for "hijacking", this isssue, but feel it is related enough, instead of opening a separate issue. When looking at https://huggingface.co/docs/tokenizers/api/added-tokens, what would be the best way, to (as much as possible) stay in-sync with the original tokenizer configs), when adding special tokens in tiktoken? As to match, |
I've added more documentation for this over here: https://github.com/openai/tiktoken#extending-tiktoken |
Great work! I want to know how can we add customizable extra tokens to tiktoken. Thank you!
The text was updated successfully, but these errors were encountered: