-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix text delimiter #631
Fix text delimiter #631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Youpi 🎶
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got this error on using the delimiter \b
. Resolved this error by reverting back to \r
. Don't know why though!
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, got 4
Which OS are you using ?@abhi1nandy2 |
PRETTY_NAME="Debian GNU/Linux 9 (stretch)" |
Do you mind sharing the data you used (or part of it), so I can try to reproduce ? |
Lot of data, difficult to share. There are 46 shards, each having about 256000 lines. using |
Ok I see, no problem :) Could you just test with one single dummy text file with a few lines to see if you're having the issue ? |
I changed the delimiter in the
text
dataset script.It should fix the
pyarrow.lib.ArrowInvalid: CSV parse error
from #622I changed the delimiter to an unused ascii character that is not present in text files :
\b