Fix text delimiter #631

lhoestq · 2020-09-15T08:08:42Z

I changed the delimiter in the text dataset script.
It should fix the pyarrow.lib.ArrowInvalid: CSV parse error from #622

I changed the delimiter to an unused ascii character that is not present in text files : \b

thomwolf

Youpi 🎶

abhi1nandy2

Got this error on using the delimiter \b. Resolved this error by reverting back to \r. Don't know why though!

pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, got 4

lhoestq · 2020-09-22T14:46:17Z

Which OS are you using ?@abhi1nandy2

abhi1nandy2 · 2020-09-22T14:47:21Z

Which OS are you using ?

PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

lhoestq · 2020-09-22T14:49:03Z

Do you mind sharing the data you used (or part of it), so I can try to reproduce ?
Or at least some info about the text file you're using ? (size, n of lines, encoding)

abhi1nandy2 · 2020-09-22T14:59:18Z

Lot of data, difficult to share. There are 46 shards, each having about 256000 lines. using file command gives this - ASCII text, with very long lines.

lhoestq · 2020-09-22T15:03:05Z

Ok I see, no problem :)
I'll see what I can do

Could you just test with one single dummy text file with a few lines to see if you're having the issue ?
Also which version of datasets do you have ?

fix text delimiter

d3a62fe

thomwolf approved these changes Sep 15, 2020

View reviewed changes

lhoestq merged commit f38a871 into master Sep 15, 2020

lhoestq deleted the fix-text-delimiter branch September 15, 2020 08:26

JetRunner pushed a commit that referenced this pull request Sep 17, 2020

fix text delimiter (#631)

ab399aa

abhi1nandy2 reviewed Sep 22, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix text delimiter #631

Fix text delimiter #631

lhoestq commented Sep 15, 2020

thomwolf left a comment

abhi1nandy2 left a comment •

edited

lhoestq commented Sep 22, 2020 •

edited

abhi1nandy2 commented Sep 22, 2020

lhoestq commented Sep 22, 2020

abhi1nandy2 commented Sep 22, 2020

lhoestq commented Sep 22, 2020

Fix text delimiter #631

Fix text delimiter #631

Conversation

lhoestq commented Sep 15, 2020

thomwolf left a comment

Choose a reason for hiding this comment

abhi1nandy2 left a comment • edited

Choose a reason for hiding this comment

lhoestq commented Sep 22, 2020 • edited

abhi1nandy2 commented Sep 22, 2020

lhoestq commented Sep 22, 2020

abhi1nandy2 commented Sep 22, 2020

lhoestq commented Sep 22, 2020

abhi1nandy2 left a comment •

edited

lhoestq commented Sep 22, 2020 •

edited