Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparison to other tokenizers #11

transitive-bullshit opened this issue May 27, 2023 · 2 comments

comparison to other tokenizers #11

transitive-bullshit opened this issue May 27, 2023 · 2 comments


Copy link

This library looks great.

I tried to add it to, but kept running into various ESM import issues.

I'd love to compare it to the other node.js tokenizers on a consistent test set for both accuracy and speed.

Also, the one thing this library is missing currently (from what I could tell; I wasn't able to get it working in my test bed) is a dynamic function to return the tokenizer given a model name. I know the examples show you can do this statically using imports, but for a lot of libraries, the model needs to be customizable at runtime.


@niieani niieani closed this as completed in 2a55474 Jun 1, 2023
Copy link

niieani commented Jun 1, 2023

Thanks @transitive-bullshit!
I saw the issue with default imports and fixed it. Latest version should have it fixed.

Submitted a PR to your comparison repo: transitive-bullshit/compare-tokenizers#3.
I see there's some room for improvement in my package regarding performance.
I believe the extra safety features of gpt-tokenizer is what's slowing it down currently.
I'll try to get it down by making the safety (allowedSpecialTokens) optional.

Copy link

github-actions bot commented Jun 1, 2023

:tada: This issue has been resolved in version 2.1.1 :tada:

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

2 participants