You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is that reading the file is done using readline() which also reads the trailing newline. Splitting on default delimiter also splits an empty column after that newline. That's why the issue doesn't come up when using default delimiter. My current workaround is to add the delimiter after the NER tag. The correct solution is to trim NER tags when creating a corpus.
Describe the bug
When using custom column delimiter for CoNLL format, the newline character is taken as a part of a tag.
To Reproduce
Create a CoNLL file, load setting column_delimiter, check loaded tags. Tags are 'O\n', 'B-something\n' and 'I-something\n'
Expected behavior
Tags should be 'O', 'B-something' or 'I-something'.
The text was updated successfully, but these errors were encountered: