Skip to content

feat: Add weakref support to Tokenizer class#1958

Merged
ArthurZucker merged 3 commits intohuggingface:mainfrom
mrkm4ntr:weakref
Mar 25, 2026
Merged

feat: Add weakref support to Tokenizer class#1958
ArthurZucker merged 3 commits intohuggingface:mainfrom
mrkm4ntr:weakref

Conversation

@mrkm4ntr
Copy link
Copy Markdown
Contributor

@mrkm4ntr mrkm4ntr commented Mar 2, 2026

Motivation

The tokenizers.Tokenizer class currently does not support Python weak references, which prevents its use with frameworks that rely on weakref for resource management.

Error without this change:

import weakref
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
weak_ref = weakref.ref(tokenizer)
# TypeError: cannot create weak reference to 'tokenizers.Tokenizer' object

Use cases that benefit:

  • Apache Beam's shared.Shared for multi-threaded model sharing
  • Dask, Ray, and other distributed computing frameworks
  • Memory-efficient caching patterns using weakref.WeakValueDictionary
  • Any Python code using weak references for resource lifecycle management

Solution

Add weakref parameter to the #[pyclass] macro in PyO3 bindings.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM ty

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

can you run fmt + clippy!?

@ArthurZucker ArthurZucker merged commit fbf1f1a into huggingface:main Mar 25, 2026
30 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants