Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGBO_NER addition to FlairNLP #2222

Merged
merged 4 commits into from Apr 21, 2021
Merged

Conversation

PhilippThamm
Copy link

Added the corpus IGBO_NER for named entity recognition in the language Igbo to the flair framework.

@alanakbik
Copy link
Collaborator

@PhilippThamm I am getting some weird results when I load this dataset:

corpus = IGBO_NER()

# print statistics
print(corpus)

# print some example sentences
print(corpus.train[0])
print(corpus.train[0])
print(corpus.train[1])
print(corpus.train[2])
print(corpus.train[3])

# print the NER dictionary (should be only NER tags like B-PER, I-PER, E-PER, B-ORG, I-ORG)
corpus.make_label_dictionary('ner')

This prints a lot of weird characters. Perhaps the encoding is wrongly set?

Copy link
Collaborator

@alanakbik alanakbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix encoding and set tag_to_bioes variable

@Dabendorf
Copy link
Contributor

@PhilippThamm I am getting some weird results when I load this dataset:

corpus = IGBO_NER()

# print statistics
print(corpus)

# print some example sentences
print(corpus.train[0])
print(corpus.train[0])
print(corpus.train[1])
print(corpus.train[2])
print(corpus.train[3])

# print the NER dictionary (should be only NER tags like B-PER, I-PER, E-PER, B-ORG, I-ORG)
corpus.make_label_dictionary('ner')

This prints a lot of weird characters. Perhaps the encoding is wrongly set?

As I understand it, the encoding should be set to utf-8 instead of Latin-1

@alanakbik
Copy link
Collaborator

@PhilippThamm thanks for adding this!

@alanakbik alanakbik merged commit f65beb8 into flairNLP:master Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants