You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the official example scripts: (give details below)
my own modified scripts: (give details below)
The tasks I am working on is:
an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
i used the steps here to use pipelines for NER task with a little change, so my script is as follow:
from transformers import pipeline
nlp = pipeline("ner")
sequence = [
"Hugging Face Inc. is a company based in New York City.",
"Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge which is visible from the window."
]
print(nlp(sequence))
ValueError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_base.py in convert_to_tensors(self, tensor_type, prepend_batch_axis)
770 if not is_tensor(value):
--> 771 tensor = as_tensor(value)
772
ValueError: expected sequence of length 16 at dim 1 (got 38)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
6 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_base.py in convert_to_tensors(self, tensor_type, prepend_batch_axis)
786 )
787 raise ValueError(
--> 788 "Unable to create tensor, you should probably activate truncation and/or padding "
789 "with 'padding=True' 'truncation=True' to have batched tensors with the same length."
790 )
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
i know the problem is from tokenizer and i should use tokenizer with some arguments like this:
but it's not clear from the documentation how can we define these argument("truncation=True", "padding=True", "max_length=512") when using pipelines for NER task
The text was updated successfully, but these errors were encountered:
Environment info
transformers
version: transformers==4.3.2Who can help
Library:
Documentation: @sgugger
Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Expected behavior
i expected to get a list like this:
but i got this error
i know the problem is from tokenizer and i should use tokenizer with some arguments like this:
but it's not clear from the documentation how can we define these argument("truncation=True", "padding=True", "max_length=512") when using pipelines for NER task
The text was updated successfully, but these errors were encountered: