Coref model predicted a span that crossed two sentences #1339

Ubadub · 2024-01-29T23:29:12Z

Describe the bug
Depending on whether tokenize_pretokenized and tokenize_no_ssplit are each True or False, the following sentence results in the coref processor yielding either the exception ValueError: The coref model predicted a span that crossed two sentences! or the exception IndexError: list index out of range error, on lines 120 and 119 of stanza/pipeline/coref_processor.py, respectively.

The sentence: The son of Mr. and Mrs. X. He is four during the events of the first book . <eos>

To Reproduce
Steps to reproduce the behavior:

Set up code:

import stanza

s = "The son of Mr. and Mrs. X. He is four during the events of the first book . <eos>"

pipeline = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref")
pipeline_no_ssplit = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_no_ssplit=True)
pipeline_pretok = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_pretokenized=True)
pipeline_pretok_no_ssplit = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_pretokenized=True, tokenize_no_ssplit=True)

Then the following line of code:

a = pipeline(s)

produces the exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 476, in __call__
    return self.process(doc, processors)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 427, in process
    doc = process(doc)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/coref_processor.py", line 120, in process
    raise ValueError("The coref model predicted a span that crossed two sentences!  Please send this example to us on our github")
ValueError: The coref model predicted a span that crossed two sentences!  Please send this example to us on our github

whereas any of the following lines of code:

b = pipeline_nossplit(s)
c = pipeline_pretok(s)
d = pipeline_pretok_nossplit(s)

produce the exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 476, in __call__
    return self.process(doc, processors)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 427, in process
    doc = process(doc)
  File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/coref_processor.py", line 119, in process
    if sent_ids[span[1]] != sent_id:
IndexError: list index out of range

Expected behavior
All of these should just work. They should not throw any of the issues above.

Environment (please complete the following information):

OS: Reproduced on Mac (with CPU) and Oracle Linux (with GPU)
Python version: Python 3.9.16 | packaged by conda-forge
Stanza version: 1.7.0

Additional context
I have also seen sporadic instances of the coref model predicted a span that crossed two sentences! error elsewhere, but previously only with a large group of sentences in a single doc, omitting any one of which, strangely, resulting in the error no longer surfacing. This is the first time I've been able to reproduce it with a single sentence, hence why I am reporting it. I can, however, provide other batches of sentences that result in the same issue, if it helps.

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2024-01-29T23:58:27Z

Thank you. This has actually already been addressed in the dev branch - I should probably make a new release with that

…

On Mon, Jan 29, 2024, 3:29 PM Abhinav Patil ***@***.***> wrote: *Describe the bug* Depending on whether tokenize_pretokenized and tokenize_no_ssplit are each True or False, the following sentence results in the coref processor yielding either the exception ValueError: The coref model predicted a span that crossed two sentences! or the exception IndexError: list index out of range error, on lines 120 and 119 of stanza/pipeline/coref_processor.py, respectively. The sentence: The son of Mr. and Mrs. X. He is four during the events of the first book . <eos> *To Reproduce* Steps to reproduce the behavior: Set up code: import stanza s = "The son of Mr. and Mrs. X. He is four during the events of the first book . <eos>" pipeline = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref")pipeline_no_ssplit = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_no_ssplit=True)pipeline_pretok = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_pretokenized=True)pipeline_pretok_no_ssplit = stanza.Pipeline(lang="en", processors="tokenize,pos,lemma,depparse,coref", tokenize_pretokenized=True, tokenize_no_ssplit=True) Then the following line of code: a = pipeline(s) produces the exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 476, in __call__ return self.process(doc, processors) File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 427, in process doc = process(doc) File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/coref_processor.py", line 120, in process raise ValueError("The coref model predicted a span that crossed two sentences! Please send this example to us on our github")ValueError: The coref model predicted a span that crossed two sentences! Please send this example to us on our github whereas any of the following lines of code: b = pipeline_nossplit(s)c = pipeline_pretok(s)d = pipeline_pretok_nossplit(s) produce the exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 476, in __call__ return self.process(doc, processors) File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/core.py", line 427, in process doc = process(doc) File "/Users/apatil/anaconda3/envs/base_nlp/lib/python3.9/site-packages/stanza/pipeline/coref_processor.py", line 119, in process if sent_ids[span[1]] != sent_id:IndexError: list index out of range *Expected behavior* All of these should just work. They should not throw any of the issues above. *Environment (please complete the following information):* - OS: Reproduced on Mac (with CPU) and Oracle Linux (with GPU) - Python version: Python 3.9.16 | packaged by conda-forge - Stanza version: 1.7.0 *Additional context* I have also seen sporadic instances of the coref model predicted a span that crossed two sentences! error elsewhere, but previously only with a large group of sentences in a single doc, omitting any one of which, strangely, resulting in the error no longer surfacing. This is the first time I've been able to reproduce it with a single sentence, hence why I am reporting it. I can, however, provide other batches of sentences that result in the same issue, if it helps. — Reply to this email directly, view it on GitHub <#1339>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWMTFPNU33OMRGFU323YRAWFNAVCNFSM6AAAAABCQI7N6SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDMNRSGU3TIMQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Ubadub · 2024-01-30T00:10:00Z

Thank you. This has actually already been addressed in the dev branch - I should probably make a new release with that

Ah, I see that now- as reported in #1333. Yes, if you could make a release for that, it would be very helpful.

AngledLuffa · 2024-03-03T21:44:00Z

This was in 1.8.0, since superseded by 1.8.1 as there were some critical bugs in the 1.8.0 release

Ubadub added the bug label Jan 29, 2024

AngledLuffa closed this as completed Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coref model predicted a span that crossed two sentences #1339

Coref model predicted a span that crossed two sentences #1339

Ubadub commented Jan 29, 2024

AngledLuffa commented Jan 29, 2024 via email

Ubadub commented Jan 30, 2024

AngledLuffa commented Mar 3, 2024

Coref model predicted a span that crossed two sentences #1339

Coref model predicted a span that crossed two sentences #1339

Comments

Ubadub commented Jan 29, 2024

AngledLuffa commented Jan 29, 2024 via email

Ubadub commented Jan 30, 2024

AngledLuffa commented Mar 3, 2024