Potential optimizations in PreTrainedTokenizer

I noticed that in `PreTrainedTokenizer`, specifically `applyChatTemplate`, it creates Jinja Template each call. This could get very expensive, and I'm noticing some bottlenecks while building https://github.com/dinoki-ai/osaurus. 

I'm trying to benchmark against the python implementation, and noticing significant TTFT delays.

Is it naive of me to think that we could just memoize the templates to prevent it being created each time we call `applyChatTemplate`?

In the python version, it uses a cache, as seen here: https://github.com/huggingface/transformers/blob/main/src/transformers/utils/chat_template_utils.py#L480

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential optimizations in PreTrainedTokenizer #217

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential optimizations in PreTrainedTokenizer #217

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions