You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using an English tokenizer with MWT (such as EWT, which treats possessive clitics as MWT), the NER model puts the tags on the entire token, not just the word, and sometimes gets it wrong in situations it shouldn't
MWT tokenizer:
John's headache is bad -> no NER tags on John or 's
This makes John's headache worse -> NER tags on bothJohn and 's
Non-MWT tokenizer:
John's headache is bad -> NER tag on just John, not 's, as expected
The text was updated successfully, but these errors were encountered:
When using an English tokenizer with MWT (such as EWT, which treats possessive clitics as MWT), the NER model puts the tags on the entire token, not just the word, and sometimes gets it wrong in situations it shouldn't
MWT tokenizer:
John's headache is bad
-> no NER tags onJohn
or's
This makes John's headache worse
-> NER tags on bothJohn
and's
Non-MWT tokenizer:
John's headache is bad
-> NER tag on justJohn
, not's
, as expectedThe text was updated successfully, but these errors were encountered: