NER tags are on the tokens, not the words #999

AngledLuffa · 2022-04-08T17:30:21Z

When using an English tokenizer with MWT (such as EWT, which treats possessive clitics as MWT), the NER model puts the tags on the entire token, not just the word, and sometimes gets it wrong in situations it shouldn't

MWT tokenizer:

John's headache is bad -> no NER tags on John or 's

This makes John's headache worse -> NER tags on both John and 's

Non-MWT tokenizer:

John's headache is bad -> NER tag on just John, not 's, as expected

The text was updated successfully, but these errors were encountered:

AngledLuffa added the bug label Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER tags are on the tokens, not the words #999

NER tags are on the tokens, not the words #999

AngledLuffa commented Apr 8, 2022

NER tags are on the tokens, not the words #999

NER tags are on the tokens, not the words #999

Comments

AngledLuffa commented Apr 8, 2022