You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The class STACKOVERFLOW_NER is taking lines that are used to identify questions and answers into the corpus. Also, the entities used need some cleaning to be the same as in the author's paper.
Describe the bug
The class
STACKOVERFLOW_NER
is taking lines that are used to identify questions and answers into the corpus. Also, the entities used need some cleaning to be the same as in the author's paper.To Reproduce
Corpus: 14545 train + 4607 dev + 4940 test sentences
The corpus has fewer sentences as reported in the paper. Looking inside the datasets in the corpus, we can see it has metadata.
[Sentence: "Question_ID : 37985879" [− Tokens: 3],
Sentence: "Question_URL : https://stackoverflow.com/questions/37985879/" [− Tokens: 3],
Sentence: "If I would have 2 tables" [− Tokens: 6 − Token-Labels: "If I would have 2 tables <S-Data_Structure>"]]
Expected behavior
The summary of the corpus should be:
Corpus: 9263 train + 2896 dev + 3108 test sentences
The above values are the same number of sentences processed with the paper authors code.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: