Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The class STACKOVERFLOW_NER is taking some metadata in the corpus #2228

Closed
symeneses opened this issue Apr 18, 2021 · 0 comments · Fixed by #2229
Closed

The class STACKOVERFLOW_NER is taking some metadata in the corpus #2228

symeneses opened this issue Apr 18, 2021 · 0 comments · Fixed by #2229
Labels
bug Something isn't working

Comments

@symeneses
Copy link
Contributor

Describe the bug
The class STACKOVERFLOW_NER is taking lines that are used to identify questions and answers into the corpus. Also, the entities used need some cleaning to be the same as in the author's paper.

To Reproduce

from flair.data import Corpus
from flair.datasets import STACKOVERFLOW_NER

corpus: Corpus = STACKOVERFLOW_NER()
print(corpus)

Corpus: 14545 train + 4607 dev + 4940 test sentences

The corpus has fewer sentences as reported in the paper. Looking inside the datasets in the corpus, we can see it has metadata.

corpus.train[0:3]

[Sentence: "Question_ID : 37985879" [− Tokens: 3],
Sentence: "Question_URL : https://stackoverflow.com/questions/37985879/" [− Tokens: 3],
Sentence: "If I would have 2 tables" [− Tokens: 6 − Token-Labels: "If I would have 2 tables <S-Data_Structure>"]]

Expected behavior
The summary of the corpus should be:

print(corpus)

Corpus: 9263 train + 2896 dev + 3108 test sentences

The above values are the same number of sentences processed with the paper authors code.

Environment (please complete the following information):

  • OS: Debian GNU/Linux 10 (buster)
  • Version: 0.8
@symeneses symeneses added the bug Something isn't working label Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant