Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data format to support composite entities #5465

Closed
5 tasks
tabergma opened this issue Mar 23, 2020 · 2 comments
Closed
5 tasks

Update data format to support composite entities #5465

tabergma opened this issue Mar 23, 2020 · 2 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@tabergma
Copy link
Contributor

tabergma commented Mar 23, 2020

Description of Problem:
In a sentence like I want to fly from Berlin to Amsterdam. we not only want to annotate Berlin and Amsterdam as entities city, but also want to assign specific roles to them. Berlin should have the role origin and Amsterdam should have the role destination. With the current data format this is not doable.

Overview of the Solution:
Markdown
We want to change the annotation format for entities:
I want to fly from [Berlin]{"entity": "city", "role": "origin"} to [LA]{"entity": "city", "role": "destination", "value": "Los Angeles"}. or
Show me [red]{"entity": "colour", "group": 0} [balloons]{"entity": "item", "group": 0} and [black]{"entity": "colour", "group": 1} [shoes]{"entity": "item", "group": 1}.

We will use {} to define entities that have roles/groups/synonyms. If you just have a simple entity without any role/group/synonym, the old format can still be used ([Berlin](city)).
We will deprecate the old syntax for synonyms, e.g. [LA](city:Los Angeles). Instead users should use [LA]{"entity": "city", "value": "Los Angeles"}. We should offer a script that converts the old format into the new format.

JSON
We simply add new keywords to the entity definition in the JSON format:

{
    "intent": "fly",
    "entities": [
        {
            "start": 19,
            "end": 25,
            "value": "Berlin",
            "entity": "city",
            "role": "from",          
        },
        {
            "start": 29,
            "end": 31,
            "value": "Los Angeles",
            "entity": "city",
            "role": "to",          
        }
    ],
    "text": "I want to fly from Berlin to LA."
},

Definition of Done:

  • Tests are added
  • New training data format is supported
  • We have a script in place that converts the old format to the new format
  • Documentation
  • Changelog
@tabergma tabergma added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Mar 23, 2020
@tabergma tabergma self-assigned this Mar 23, 2020
@erohmensing
Copy link
Contributor

@tabergma would there be a big argument for keeping in-line synonyms and not deprecating that completely? i.e. just leaving the separate

synonym:Los Angeles
- LA

format?

@tabergma
Copy link
Contributor Author

No reason to remove this. We can keep this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

2 participants