Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add German MobIE NER Dataset #3351

Merged
merged 5 commits into from Oct 24, 2023
Merged

Add German MobIE NER Dataset #3351

merged 5 commits into from Oct 24, 2023

Conversation

stefan-it
Copy link
Member

Hey,

this PR adds support for the German MobIE NER Dataset.

This is a German-language dataset that has been human-annotated with 20 coarse- and fine-grained entity types, and it includes entity linking information for geographically linkable entities. The dataset comprises 3,232 social media texts and traffic reports, totaling 91K tokens, with 20.5K annotated entities, of which 13.1K are linked to a knowledge base. In total, 20 different named entities are annotated.

Usage

In Flair the dataset can be used like this:

from flair.datasets import NER_GERMAN_MOBIE

corpus = NER_GERMAN_MOBIE()

print(corpus)

Example

I've already fine-tuned models with Flair (see this blog post) and uploaded them to the Hugging Face Model Hub. Here's an example image:

image

@alanakbik
Copy link
Collaborator

Thanks a lot for adding this @stefan-it! And cool blog post!

@alanakbik alanakbik merged commit 073b339 into master Oct 24, 2023
1 check passed
@alanakbik alanakbik deleted the add-german-mobie-ner-dataset branch October 24, 2023 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants