Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coref model predicted a span that crossed two sentences #1339

Closed
Ubadub opened this issue Jan 29, 2024 · 3 comments
Closed

Coref model predicted a span that crossed two sentences #1339

Ubadub opened this issue Jan 29, 2024 · 3 comments
Labels

Comments

@Ubadub
Copy link

Ubadub commented Jan 29, 2024

Describe the bug
Depending on whether tokenize_pretokenized and tokenize_no_ssplit are each True or False, the following sentence results in the coref processor yielding either the exception ValueError: The coref model predicted a span that crossed two sentences! or the exception IndexError: list index out of range error, on lines 120 and 119 of stanza/pipeline/coref_processor.py, respectively.

The sentence: The son of Mr. and Mrs. X. He is four during the events of the first book . <eos>

To Reproduce
Steps to reproduce the behavior:

Set up code:

import stanza

s = "The son of Mr. and Mrs. X. He is four during the events of the first book . <eos>"

pipeline = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref")
pipeline_no_ssplit = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_no_ssplit=True)
pipeline_pretok = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_pretokenized=True)
pipeline_pretok_no_ssplit = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_pretokenized=True, tokenize_no_ssplit=True)

Then the following line of code:

a = pipeline(s)

produces the exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 476, in __call__
    return self.process(doc, processors)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 427, in process
    doc = process(doc)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/coref_processor.py", line 120, in process
    raise ValueError("The coref model predicted a span that crossed two sentences!  Please send this example to us on our github")
ValueError: The coref model predicted a span that crossed two sentences!  Please send this example to us on our github

whereas any of the following lines of code:

b = pipeline_nossplit(s)
c = pipeline_pretok(s)
d = pipeline_pretok_nossplit(s)

produce the exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 476, in __call__
    return self.process(doc, processors)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 427, in process
    doc = process(doc)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/coref_processor.py", line 119, in process
    if sent_ids[span[1]] != sent_id:
IndexError: list index out of range

Expected behavior
All of these should just work. They should not throw any of the issues above.

Environment (please complete the following information):

  • OS: Reproduced on Mac (with CPU) and Oracle Linux (with GPU)
  • Python version: Python 3.9.16 | packaged by conda-forge
  • Stanza version: 1.7.0

Additional context
I have also seen sporadic instances of the coref model predicted a span that crossed two sentences! error elsewhere, but previously only with a large group of sentences in a single doc, omitting any one of which, strangely, resulting in the error no longer surfacing. This is the first time I've been able to reproduce it with a single sentence, hence why I am reporting it. I can, however, provide other batches of sentences that result in the same issue, if it helps.

@Ubadub Ubadub added the bug label Jan 29, 2024
@AngledLuffa
Copy link
Collaborator

AngledLuffa commented Jan 29, 2024 via email

@Ubadub
Copy link
Author

Ubadub commented Jan 30, 2024

Thank you. This has actually already been addressed in the dev branch - I should probably make a new release with that

Ah, I see that now- as reported in #1333. Yes, if you could make a release for that, it would be very helpful.

@AngledLuffa
Copy link
Collaborator

This was in 1.8.0, since superseded by 1.8.1 as there were some critical bugs in the 1.8.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants