You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
The method word_ids() does only return a list of zeros instead of the correct word_ids.
sentence = "I love my cat"
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("google/Gemma-7b") #-version a0eac5b
encoded = tokenizer(sentence, return_tensors="pt")
print(encoded.word_ids())
# [None, 0, 0, 0, 0]
I tried several variations of configurations stated in the linked issues in #28881 , but for Gemma it doesn't change the result. The llama3 tokenizer outputs the correct values with this code.
Expected behavior
The output of word_ids should look like [None, 0, 1, 2, 3]
The text was updated successfully, but these errors were encountered:
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.41.2Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The method
word_ids()
does only return a list of zeros instead of the correct word_ids.I tried several variations of configurations stated in the linked issues in #28881 , but for Gemma it doesn't change the result. The llama3 tokenizer outputs the correct values with this code.
Expected behavior
The output of
word_ids
should look like[None, 0, 1, 2, 3]
The text was updated successfully, but these errors were encountered: