Add nanochat #1441

xenova · 2025-10-16T15:35:46Z

Example usage:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/nanochat-d32-ONNX",
  { dtype: "q4" }, // Options: "fp32", "fp16", "q4", "q4f16"
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" },
];

// Generate a response
const output = await generator(messages, {
  max_new_tokens: 512,
  do_sample: false,
  streamer: new TextStreamer(generator.tokenizer, {
    skip_prompt: true,
    skip_special_tokens: true,
  }),
});
console.log(output[0].generated_text.at(-1).content);

Linked to huggingface/transformers#41634

huggingface/transformers#41634

xenova · 2025-10-16T15:36:08Z

src/tokenizers.js

cc @nico-martin for tokenizers.js

We'd need to add these to the mapping

Done: huggingface/tokenizers.js#3

HuggingFaceDocBuilderDev · 2025-10-16T15:38:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nico-martin · 2025-10-17T06:28:39Z

I tried it on my MacBook and it takes super long to process and generate tokens. Also it only runs on the wasm runtime. Is that expected?

xenova · 2025-10-18T22:56:27Z

Yeah I'm still figuring out the best quantization strategy here: The current q4 produces very poor outputs... so, a mixed precision approach is needed, I think.

Also it only runs on the wasm runtime.

Do you see any errors on webgpu? 👀 The model might only run in q4f16 on webgpu (but, poor quality)

xenova · 2025-10-19T07:03:32Z

new quantizations are much better from my testing 👍
https://huggingface.co/onnx-community/nanochat-d32-ONNX/commit/e0d8a83ed5e1cd954a7377ad57beee4cd653f7c1

xenova · 2025-10-19T18:42:50Z

I did also encounter the WebGPU bug, but this is an issue with the model (not really, but more of a "backwards-compatibility" issue) and not the PR, so I will fix it on the HF side 👍

An error occurred during model execution: "Error: [WebGPU] Kernel "[Mul] /model/layers.0/attn/k_rotary/x2_cos_mul" failed. Error: Can't perform binary op on the given tensors".

xenova · 2025-10-20T19:04:54Z

This has now been fixed 👍 https://huggingface.co/onnx-community/nanochat-d32-ONNX/commit/5e500c2ad822ea5379361f6fc08f3da9bb55fec1

Model (q4) now runs well on WebGPU in-browser, even on older JS EP.

q4f16 and fp16 seem to have some overflow issues on WebGPU. Not sure what best options are here since the spec doesn't support bf16. tc39/proposal-float16array#4 (comment)

xenova added 2 commits October 16, 2025 08:30

Add support for NanoChat

373d901

huggingface/transformers#41634

Add nanochat to supported models list

18da617

xenova commented Oct 16, 2025

View reviewed changes

xenova merged commit 85b8eb2 into main Oct 19, 2025
4 checks passed

xenova deleted the add-nanochat branch October 19, 2025 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add nanochat #1441

Add nanochat #1441

Uh oh!

xenova commented Oct 16, 2025 •

edited

Loading

Uh oh!

xenova Oct 16, 2025

Uh oh!

nico-martin Oct 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 16, 2025

Uh oh!

nico-martin commented Oct 17, 2025

Uh oh!

xenova commented Oct 18, 2025 •

edited

Loading

Uh oh!

xenova commented Oct 19, 2025

Uh oh!

Uh oh!

xenova commented Oct 19, 2025 •

edited

Loading

Uh oh!

xenova commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add nanochat #1441

Add nanochat #1441

Uh oh!

Conversation

xenova commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenova Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

nico-martin Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 16, 2025

Uh oh!

nico-martin commented Oct 17, 2025

Uh oh!

xenova commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenova commented Oct 19, 2025

Uh oh!

Uh oh!

xenova commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenova commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xenova commented Oct 16, 2025 •

edited

Loading

xenova commented Oct 18, 2025 •

edited

Loading

xenova commented Oct 19, 2025 •

edited

Loading