[1.2.2][Ubuntu] Sentiment analysis raises an exception with an empty input #769

Armillus · 2021-07-20T12:28:31Z

Describe the bug
Sentiments analysis on an empty input raises the following exception: IndexError: tuple index out of range.
I didn't see this error mentionned in current issues.

The pipeline used to obtain this result:

stanza.Pipeline(lang='en', processors='tokenize,sentiment')

The actual result:

2021-07-20 14:21:11 INFO: Loading these models for language: en (English):
========================
| Processor | Package  |
------------------------
| tokenize  | combined |
| sentiment | sstplus  |
========================

2021-07-20 14:21:11 INFO: Use device: gpu
2021-07-20 14:21:11 INFO: Loading: tokenize
2021-07-20 14:21:18 INFO: Loading: sentiment
2021-07-20 14:21:18 INFO: Done loading processors!
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    doc = nlp('')
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/pipeline/core.py", line 253, in __call__
    doc = self.process(doc)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/pipeline/core.py", line 247, in process
    doc = process(doc)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/pipeline/sentiment_processor.py", line 53, in process
    labels = cnn_classifier.label_text(self._model, text, batch_size=self._batch_size)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/models/classifiers/cnn_classifier.py", line 449, in label_text
    text, orig_idx = sort_with_indices(text, key=len, reverse=True)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/models/common/utils.py", line 223, in sort_with_indices
    return result[1], result[0]
IndexError: tuple index out of range

To Reproduce
To get the error, you can try this basic snippet of code, which is a slightly modified example from the official documentation.
However, this example assumes that you've already downloaded the corresponding model.

import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment')
doc = nlp('')
for i, sentence in enumerate(doc.sentences):
    print(i, sentence.sentiment)

Expected behavior
The expected result would be to get back a working document, without any exception.
The expected output (from my limited experience) should be:

2021-07-20 14:06:44 INFO: Loading these models for language: en (English):
========================
| Processor | Package  |
------------------------
| tokenize  | combined |
| sentiment | sstplus  |
========================

2021-07-20 14:06:44 INFO: Use device: gpu
2021-07-20 14:06:44 INFO: Loading: tokenize
2021-07-20 14:06:52 INFO: Loading: sentiment
2021-07-20 14:06:52 INFO: Done loading processors!

Environment (please complete the following information):

OS: Ubuntu (WSL 2 - 21H2 - Build 22000.71)
Python version: Python 3.8.10
Stanza version: 1.2.2

Additional context
I don't know if this error is happening with older versions of Stanza as well, since I didn't try to feed any pipeline with an empty input before. Nevertheless, I've never met this error with Stanza 1.2.0.

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2021-07-20T16:17:37Z

Oops! This is why I'll never write code for the space shuttle.

47889e3

I can push a version to testpypi if you find yourself frequently running into this

Armillus · 2021-07-21T06:55:02Z

No problem, it can be fixed quickly by checking that the input is empty before calling the pipeline, so I can wait until the next version :)

Thank you for your quick reaction time and for this excellent library!

flatplate · 2021-08-08T06:18:50Z

I run into the same problem with a string with only a space character: nlp(" "). Does this commit fix that also?

AngledLuffa · 2021-08-09T06:12:50Z

Yes!

Armillus added the bug label Jul 20, 2021

manning closed this as completed Aug 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.2.2][Ubuntu] Sentiment analysis raises an exception with an empty input #769

[1.2.2][Ubuntu] Sentiment analysis raises an exception with an empty input #769

Armillus commented Jul 20, 2021

AngledLuffa commented Jul 20, 2021

Armillus commented Jul 21, 2021 •

edited

flatplate commented Aug 8, 2021

AngledLuffa commented Aug 9, 2021 via email

[1.2.2][Ubuntu] Sentiment analysis raises an exception with an empty input #769

[1.2.2][Ubuntu] Sentiment analysis raises an exception with an empty input #769

Comments

Armillus commented Jul 20, 2021

AngledLuffa commented Jul 20, 2021

Armillus commented Jul 21, 2021 • edited

flatplate commented Aug 8, 2021

AngledLuffa commented Aug 9, 2021 via email

Armillus commented Jul 21, 2021 •

edited