Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One word two entity labels #5475

Closed
tabergma opened this issue Mar 24, 2020 · 0 comments · Fixed by #5476
Closed

One word two entity labels #5475

tabergma opened this issue Mar 24, 2020 · 0 comments · Fixed by #5476
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@tabergma
Copy link
Contributor

Description of the problem
ConveRT and also other language models we have in our pipeline split words during tokenization into sub-words. DIETClassifier assigns different entities to the individual sub-words.

Example:

{
    "text": "Aarhus",
    "entities": [
        {
            "start": 0,
            "end": 6,
            "value": "Aarhus",
            "entity": "city"
        }
    ],
    "predicted_entities": [
        {
            "entity": "iata",
            "start": 0,
            "end": 3,
            "extractor": "DIETClassifier",
            "value": "Aar"
        },
        {
            "entity": "city",
            "start": 3,
            "end": 6,
            "extractor": "DIETClassifier",
            "value": "hus"
        }
    ]

Overview of the solution:
It should not be possible to assign two different entities to one word/token. We should add a sanity check that permits double assignments. We might want to keep the assignment with the higher confidence.

We need to check if this also happens with the CRFEntityExtractor.

@tabergma tabergma added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Mar 24, 2020
@tabergma tabergma self-assigned this Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant