-
Notifications
You must be signed in to change notification settings - Fork 140
Closed
Description
I noticed that in PreTrainedTokenizer
, specifically applyChatTemplate
, it creates Jinja Template each call. This could get very expensive, and I'm noticing some bottlenecks while building https://github.com/dinoki-ai/osaurus.
I'm trying to benchmark against the python implementation, and noticing significant TTFT delays.
Is it naive of me to think that we could just memoize the templates to prevent it being created each time we call applyChatTemplate
?
In the python version, it uses a cache, as seen here: https://github.com/huggingface/transformers/blob/main/src/transformers/utils/chat_template_utils.py#L480
Metadata
Metadata
Assignees
Labels
No labels