fast tok update #13036

itazap · 2026-01-27T12:42:14Z

Following transformers v5, we no longer have "slow" tokenizers that use a Trie - by default we use fast tokenizers. This script assumes always slow, so it is updated to work with fast!

HuggingFaceDocBuilderDev · 2026-01-27T12:50:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

itazap · 2026-01-27T16:28:29Z

@sayakpaul here is the fix around the update_trie() issue you were facing. in v5 transformers we only kept "fast" tokenizers (which don't have update_trie()) --> so we can pin v5 for transformers to address the tests ! 😊

sayakpaul · 2026-01-28T03:39:19Z

src/diffusers/loaders/textual_inversion.py

-        tokenizer._update_trie()
-        # set correct total vocab size after removing tokens
-        tokenizer._update_total_vocab_size()
+        # Fast tokenizers: serialize, filter tokens, reload


Would it work with transformers (< v5) as well?

If not, maybe we could keep maintaining two code paths? One for v5 and another one for < v5? This way, in the next release cycle, we can pin transformers ver to >=5.0.0.

WDYT?

sounds good! I added it back

sayakpaul

Thanks for this work! I left one comment regarding versioning. LMK what you think.

sayakpaul

Thanks a lot!

sayakpaul · 2026-01-28T11:09:40Z

@bot /style

github-actions · 2026-01-28T11:10:10Z

Style bot fixed some files and pushed the changes.

v5 tok update

a286ab9

ArthurZucker approved these changes Jan 27, 2026

View reviewed changes

ruff

14284cc

yiyixuxu requested a review from sayakpaul January 27, 2026 18:12

Merge branch 'main' into v5_tokenizer_update

5d8b02b

sayakpaul reviewed Jan 28, 2026

View reviewed changes

keep pre v5 slow code path

0c01cc1

sayakpaul approved these changes Jan 28, 2026

View reviewed changes

github-actions bot and others added 2 commits January 28, 2026 11:10

Apply style fixes

cbb2af8

Merge branch 'main' into v5_tokenizer_update

584dda7

sayakpaul merged commit 2ac39ba into huggingface:main Jan 28, 2026
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast tok update #13036

fast tok update #13036

itazap commented Jan 27, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 27, 2026

Uh oh!

itazap commented Jan 27, 2026

Uh oh!

sayakpaul Jan 28, 2026

Uh oh!

itazap Jan 28, 2026

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fast tok update #13036

fast tok update #13036

Conversation

itazap commented Jan 27, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 27, 2026

Uh oh!

itazap commented Jan 27, 2026

Uh oh!

sayakpaul Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

itazap Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Jan 28, 2026 •

edited

Loading