You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the official example scripts: (give details below)
my own modified scripts: (give details below)
The tasks I am working on is:
an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)
I came across this issue when setting output_hidden_states=True during instantiation of a pretrained model inorder to obtain the inferred CLS sentence embeddings using the following approach:
Where I override the postprocess function to also return the last hidden state layer. However, the issue occurs in the pipeline code itself when using a batch size equal to or larger than 16.
from transformers import BertTokenizerFast, BertForSequenceClassification
from transformers import pipeline
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3, output_hidden_states=True)
tokenizer = BertTokenizerFast.from_pretrained('yiyanghkust/finbert-tone')
nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer, device=0)
varying_length_sentences = ["there is a shortage of capital, and we need extra financing "*5,
"growth is strong and we have plenty of liquidity ",
"there are doubts about our finances" * 10,
"profits are flat",
"profits are flat "*30]*1000
similar_length_sentences = ["there is a shortage",
"growth is strong ",
"there are doubts",
"profits are flat"]*1000
results = nlp(similar_length_sentences, batch_size=16, num_workers=2)
See the Colab Notebook for reference.
Running step 3 produces the following error:
/usr/local/lib/python3.7/dist-packages/transformers/pipelines/base.py in loader_batch_item(self)
750 if k == "past_key_values":
751 continue
--> 752 if isinstance(element[self._loader_batch_index], torch.Tensor):
753 loader_batched[k] = element[self._loader_batch_index].unsqueeze(0)
754 elif isinstance(element[self._loader_batch_index], np.ndarray):
IndexError: tuple index out of range
Executing the pipeline with batch sizes smaller than 16 seem to work (see colab notebook).
Expected behavior
Pipeline runs successfully with any batch size using a model loaded to output hidden states and attention.
The text was updated successfully, but these errors were encountered:
Yes, the current system for automating batching/unbatching doesn't support hidden_states nor attentions.
I opened up a PR. Currently it explicitly needs specific keys to check for this tuples of tensors since they are not the norm.
@alwayscurious don't use it before with batch_size < 16 btw, it's just incorrect, you will receive first layer hidden states (with full batch) as your first item, second layer as second item, so on and so forth.
Environment info
transformers
version: 4.13.0.dev0Who can help
@Narsil
Library:
Information
Model I am using (Bert, XLNet ...): FinBert
The problem arises when using:
The tasks I am working on is:
I came across this issue when setting
output_hidden_states=True
during instantiation of a pretrained model inorder to obtain the inferred CLS sentence embeddings using the following approach:Where I override the postprocess function to also return the last hidden state layer. However, the issue occurs in the pipeline code itself when using a batch size equal to or larger than 16.
To reproduce
Steps to reproduce the behavior:
!pip install git+https://github.com/huggingface/transformers.git
results = nlp(similar_length_sentences, batch_size=16, num_workers=2)
See the Colab Notebook for reference.
Running step 3 produces the following error:
Executing the pipeline with batch sizes smaller than 16 seem to work (see colab notebook).
Expected behavior
Pipeline runs successfully with any batch size using a model loaded to output hidden states and attention.
The text was updated successfully, but these errors were encountered: