Span crossing two sentences? #1333

rizpras · 2024-01-20T01:31:10Z

Hello

I got an error saying that the model predicted span that crosses two sentences and to send the example to github. Here is my code (pretty simple):

`import stanza

pipe = stanza.Pipeline("en", processors="tokenize, coref")
out = pipe("""If an electrical machine or equipment generates mechanical vibrations when in service, e.g. because it is out of balance, the vibration amplitude measured on the machine or the equipment on board shall not lie outside area A. For this evaluation, reference is made only to the self-generated vibration components. Area A may only be utilized if the loading of all components, with due allowance for local excess vibration, does not impair reliable long-term operation""")

print(out)`

My guess is on the term "Area A". Is the model currently unable to process coreference that cross two sentence? What can I do about the sentence?

Thank you

… word... reported in #1333

AngledLuffa · 2024-01-20T04:15:48Z

Ah, this was me being an idiot. I put an error check in the coref model to make sure the spans were all in the same sentence (the original code masks for that AFAIK), but the error check itself was buggy.

AngledLuffa · 2024-01-20T04:16:59Z

If you use the dev branch, it should now be fixed...

I was thinking that perhaps waiting for a bigger feature to be finished would be good for a new release, but seeing as how we've fixed a couple bugs in the last couple months, it might be worth doing an interim release

rizpras · 2024-01-21T13:30:58Z

Thank you very much! Just curious, can I still use it in google colab if it's in dev branch? Another thing, why does the model need to make sure that the spans are all in the same sentence?

AngledLuffa · 2024-01-21T19:23:46Z

Just curious, can I still use it in google colab if it's in dev branch?

I don't know how you've installed Stanza, but you should be able to pip install from a branch, if that's what you did:

https://stackoverflow.com/questions/20101834/pip-install-from-git-repo-branch

Another thing, why does the model need to make sure that the spans are all in the same sentence?

Technically it doesn't, but the model was trained to only have spans which are contained in a single sentence, and I used that assumption downstream when turning the spans into human-readable output. I had put an assertion to test that, but the assertion itself was buggy in the event that a span was exactly at the end of a sentence. Since sentence endings are usually punctuation, that hadn't come up until you hit one of the sentences for which the tokenizer is incorrectly splitting

rizpras · 2024-01-23T09:26:38Z

I installed the Stanza from dev branch and now it works! Thank you very much @AngledLuffa, you have been very helpful!

I'm closing this issue

rizpras added the bug label Jan 20, 2024

AngledLuffa added a commit that referenced this issue Jan 20, 2024

Oops, off by one. Must check the sentence id based on the actual last…

f1fbaaa

… word... reported in #1333

rizpras closed this as completed Jan 23, 2024

Ubadub mentioned this issue Jan 30, 2024

Coref model predicted a span that crossed two sentences #1339

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Span crossing two sentences? #1333

Span crossing two sentences? #1333

rizpras commented Jan 20, 2024

AngledLuffa commented Jan 20, 2024

AngledLuffa commented Jan 20, 2024

rizpras commented Jan 21, 2024

AngledLuffa commented Jan 21, 2024

rizpras commented Jan 23, 2024

Span crossing two sentences? #1333

Span crossing two sentences? #1333

Comments

rizpras commented Jan 20, 2024

AngledLuffa commented Jan 20, 2024

AngledLuffa commented Jan 20, 2024

rizpras commented Jan 21, 2024

AngledLuffa commented Jan 21, 2024

rizpras commented Jan 23, 2024