You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model I am using (Bert, XLNet ...): pipeline('ner', grouped_entities=True)
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
the official example scripts: (give details below)
my own modified scripts: (give details below)
The tasks I am working on is:
an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
fromtransformersimportpipelinener=pipeline('ner', grouped_entities=True)
ner("the sapodilla tree is native to Central America")
Expected behavior
The output says that "##di" is one of the named entities. It doesn't seem like partial tokens should possibly be returned as predicted named entities. Instead, I imagine that the desired result is that either the entire word "sapodilla" is determined to be an entity group or nothing at all. Is this a bug or was this quirk consciously chosen to be allowed?
As a side note, another similar quirk here is that something like "U.S." occasionally gives just "U" or "S" as individual named entities, where "U.S." is desired. I consider this related to the above issue.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🐛 Bug
Information
Model I am using (Bert, XLNet ...): pipeline('ner', grouped_entities=True)
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Expected behavior
The output says that "##di" is one of the named entities. It doesn't seem like partial tokens should possibly be returned as predicted named entities. Instead, I imagine that the desired result is that either the entire word "sapodilla" is determined to be an entity group or nothing at all. Is this a bug or was this quirk consciously chosen to be allowed?
As a side note, another similar quirk here is that something like "U.S." occasionally gives just "U" or "S" as individual named entities, where "U.S." is desired. I consider this related to the above issue.
Environment info
transformers
version: 3.0.2The text was updated successfully, but these errors were encountered: