-
Notifications
You must be signed in to change notification settings - Fork 1k
Add nanochat #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nanochat #1441
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @nico-martin for tokenizers.js
We'd need to add these to the mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I tried it on my MacBook and it takes super long to process and generate tokens. Also it only runs on the wasm runtime. Is that expected? |
|
Yeah I'm still figuring out the best quantization strategy here: The current q4 produces very poor outputs... so, a mixed precision approach is needed, I think.
Do you see any errors on webgpu? 👀 The model might only run in q4f16 on webgpu (but, poor quality) |
|
new quantizations are much better from my testing 👍 |
|
I did also encounter the WebGPU bug, but this is an issue with the model (not really, but more of a "backwards-compatibility" issue) and not the PR, so I will fix it on the HF side 👍 |
|
This has now been fixed 👍 https://huggingface.co/onnx-community/nanochat-d32-ONNX/commit/5e500c2ad822ea5379361f6fc08f3da9bb55fec1 Model (q4) now runs well on WebGPU in-browser, even on older JS EP. q4f16 and fp16 seem to have some overflow issues on WebGPU. Not sure what best options are here since the spec doesn't support bf16. tc39/proposal-float16array#4 (comment) |
https://github.com/karpathy/nanochat
Example usage:
Linked to huggingface/transformers#41634