Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.2.2][Ubuntu] Sentiment analysis raises an exception with an empty input #769

Closed
Armillus opened this issue Jul 20, 2021 · 4 comments
Closed
Labels

Comments

@Armillus
Copy link

Describe the bug
Sentiments analysis on an empty input raises the following exception: IndexError: tuple index out of range.
I didn't see this error mentionned in current issues.

The pipeline used to obtain this result:

stanza.Pipeline(lang='en', processors='tokenize,sentiment')

The actual result:

2021-07-20 14:21:11 INFO: Loading these models for language: en (English):
========================
| Processor | Package  |
------------------------
| tokenize  | combined |
| sentiment | sstplus  |
========================

2021-07-20 14:21:11 INFO: Use device: gpu
2021-07-20 14:21:11 INFO: Loading: tokenize
2021-07-20 14:21:18 INFO: Loading: sentiment
2021-07-20 14:21:18 INFO: Done loading processors!
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    doc = nlp('')
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/pipeline/core.py", line 253, in __call__
    doc = self.process(doc)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/pipeline/core.py", line 247, in process
    doc = process(doc)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/pipeline/sentiment_processor.py", line 53, in process
    labels = cnn_classifier.label_text(self._model, text, batch_size=self._batch_size)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/models/classifiers/cnn_classifier.py", line 449, in label_text
    text, orig_idx = sort_with_indices(text, key=len, reverse=True)
  File "/home/amillo/.local/lib/python3.8/site-packages/stanza/models/common/utils.py", line 223, in sort_with_indices
    return result[1], result[0]
IndexError: tuple index out of range

To Reproduce
To get the error, you can try this basic snippet of code, which is a slightly modified example from the official documentation.
However, this example assumes that you've already downloaded the corresponding model.

import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment')
doc = nlp('')
for i, sentence in enumerate(doc.sentences):
    print(i, sentence.sentiment)

Expected behavior
The expected result would be to get back a working document, without any exception.
The expected output (from my limited experience) should be:

2021-07-20 14:06:44 INFO: Loading these models for language: en (English):
========================
| Processor | Package  |
------------------------
| tokenize  | combined |
| sentiment | sstplus  |
========================

2021-07-20 14:06:44 INFO: Use device: gpu
2021-07-20 14:06:44 INFO: Loading: tokenize
2021-07-20 14:06:52 INFO: Loading: sentiment
2021-07-20 14:06:52 INFO: Done loading processors!

Environment (please complete the following information):

  • OS: Ubuntu (WSL 2 - 21H2 - Build 22000.71)
  • Python version: Python 3.8.10
  • Stanza version: 1.2.2

Additional context
I don't know if this error is happening with older versions of Stanza as well, since I didn't try to feed any pipeline with an empty input before. Nevertheless, I've never met this error with Stanza 1.2.0.

@Armillus Armillus added the bug label Jul 20, 2021
@AngledLuffa
Copy link
Collaborator

Oops! This is why I'll never write code for the space shuttle.

47889e3

I can push a version to testpypi if you find yourself frequently running into this

@Armillus
Copy link
Author

Armillus commented Jul 21, 2021

No problem, it can be fixed quickly by checking that the input is empty before calling the pipeline, so I can wait until the next version :)

Thank you for your quick reaction time and for this excellent library!

@flatplate
Copy link

I run into the same problem with a string with only a space character: nlp(" "). Does this commit fix that also?

@AngledLuffa
Copy link
Collaborator

AngledLuffa commented Aug 9, 2021 via email

@manning manning closed this as completed Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants