Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ClassLabel for labelling tasks datasets #850

Merged
merged 4 commits into from
Nov 16, 2020

Conversation

jplu
Copy link
Contributor

@jplu jplu commented Nov 13, 2020

This PR adds a specific ClassLabel for the datasets that are about a labelling task such as POS, NER or Chunking.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool !
So to summarize it seems to follow the convention:

  • the tokens are in the tokens columns as sequence of str (note that sometimes the tokenization is only splitting words simply by space)
  • the tags for POS, chunk and NER are in columns pos_tags, chunk_tag and ner_tags as sequence of ClassLabel

datasets/xtreme/xtreme.py Show resolved Hide resolved
datasets/conll2000/conll2000.py Outdated Show resolved Hide resolved
datasets/germeval_14/germeval_14.py Outdated Show resolved Hide resolved
@jplu
Copy link
Contributor Author

jplu commented Nov 13, 2020

@lhoestq Better?

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thanks :)
feel free to merge

@jplu jplu merged commit d21457e into huggingface:master Nov 16, 2020
@jplu jplu deleted the add-classlabel branch November 16, 2020 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants