NER column corpus with custom delimiter doesn't work properly #1861

aragaer · 2020-09-13T12:32:39Z

Describe the bug
When using custom column delimiter for CoNLL format, the newline character is taken as a part of a tag.

To Reproduce
Create a CoNLL file, load setting column_delimiter, check loaded tags. Tags are 'O\n', 'B-something\n' and 'I-something\n'

Expected behavior
Tags should be 'O', 'B-something' or 'I-something'.

aragaer · 2020-09-13T12:38:28Z

The problem is that reading the file is done using readline() which also reads the trailing newline. Splitting on default delimiter also splits an empty column after that newline. That's why the issue doesn't come up when using default delimiter. My current workaround is to add the delimiter after the NER tag. The correct solution is to trim NER tags when creating a corpus.

alanakbik · 2020-09-21T18:48:42Z

@aragaer thanks for reporting this! I'll push a PR to fix this!

GH-1861: fix problem with custom delimiters in ColumnDataset

aragaer added the bug Something isn't working label Sep 13, 2020

alanakbik added a commit that referenced this issue Sep 21, 2020

GH-1861: fix problem with custom delimiters in ColumnDataset

1f19842

alanakbik mentioned this issue Sep 21, 2020

GH-1861: fix problem with custom delimiters in ColumnDataset #1876

Merged

alanakbik closed this as completed in #1876 Sep 21, 2020

alanakbik added a commit that referenced this issue Sep 21, 2020

Merge pull request #1876 from flairNLP/GH-1861-other-delimiters

c1b703a

GH-1861: fix problem with custom delimiters in ColumnDataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER column corpus with custom delimiter doesn't work properly #1861

NER column corpus with custom delimiter doesn't work properly #1861

aragaer commented Sep 13, 2020

aragaer commented Sep 13, 2020

alanakbik commented Sep 21, 2020

NER column corpus with custom delimiter doesn't work properly #1861

NER column corpus with custom delimiter doesn't work properly #1861

Comments

aragaer commented Sep 13, 2020

aragaer commented Sep 13, 2020

alanakbik commented Sep 21, 2020