Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make add_unk optional and don't use it for ner #2839

Merged
merged 1 commit into from
Jun 28, 2022

Conversation

helpmefindaname
Copy link
Collaborator

The <unk> token is a common practice to have a mechanism to add labels that are not common enough for training. This mainly relates to classification and basically simulates an "others" category.
However, for NER, we have a different mechanism with the O-tag, hence I don't think <unk> has any practical use.
Using <unk> for NER can lead to down-weighting the overall score when the NER-model is not trained long enough (random apperances of <unk> still happen. See (#2762 and #2761) and can break pipelines, that do not expect entities marked as unk (#2224).

I propose making the <unk> optional, per default True, but allow disableling it for all ner-tasks. As well as disableling it at all ner-related examples.

@whoisjones
Copy link
Member

👍

@alanakbik
Copy link
Collaborator

@helpmefindaname thanks for improving this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants