You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Code to reproduce:
fromtransformers.models.auto.tokenization_autoimportAutoTokenizerphi_tokenizer=AutoTokenizer.from_pretrained("microsoft/phi-2",
add_bos_token=True,
use_fast=False,
trust_remote_code=True)
gpt2_tokenizer=AutoTokenizer.from_pretrained("gpt2",
add_bos_token=True,
use_fast=False,
trust_remote_code=True)
a="The cat sat on the mat"gpt2_tokens=gpt2_tokenizer(a, return_tensors="pt")["input_ids"][0] # torch.Size([7])gpt2_str_tokens=gpt2_tokenizer.batch_decode(gpt2_tokens) # Essentially: [gpt2_tokenizer.decode(seq) for seq in gpt2_tokens]print(gpt2_str_tokens) # <-- This is fine and will output: ['<|endoftext|>', 'The', ' cat', ' sat', ' on', ' the', ' mat']gpt2_single_decode= [gpt2_tokenizer.decode(gpt2_tokens[0])]
print(gpt2_single_decode) # <-- Decoding a 0-D tensor, this is fine and will output: ['<|endoftext|>']phi_tokens=phi_tokenizer(a, return_tensors="pt")["input_ids"][0] # torch.Size([7])phi_str_tokens=phi_tokenizer.batch_decode(phi_tokens) # Essentially: [phi_tokenizer.decode(seq) for seq in phi_tokens]print(phi_str_tokens) # <-- Cannot do this due to below...phi_single_decode= [phi_tokenizer.decode(phi_tokens[0])]
print(phi_single_decode) # <-- Cannot decode a 0-D Tensor, hence cannot do above either
System Info
transformers
version: 4.34.1Who can help?
@ArthurZucker @rooa
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Code to reproduce:
Returns:
TypeError: iteration over a 0-d tensor
Expected behavior
In the above example,
Should return: ['<|endoftext|>', 'The', ' cat', ' sat', ' on', ' the', ' mat']. This is what the gpt2 tokenizer returns for example.
Should return: ['<|endoftext|>']. This is what the gpt2 tokenizer returns for example.
The text was updated successfully, but these errors were encountered: