-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi bug fixes for CSV parsing #5388
Conversation
33f69ae
to
1f332af
Compare
based on the documentation
and
single_quotes = FALSE should actually mean |
according to doc, maybe it was the intent, but the pseudo-autodetection logic was just systematically considering |
Maybe the documentation needs to be changed as well then. If it didn't work anyway, the change of a "documented" behavior doesn't matter. We are free to change it in whichever way that makes sense. |
…etection and during parsing
c55ad4d
to
6f33739
Compare
@michalkurka tested and (almost) ready to merge.
Would be nice to have this |
@sebhrusen I don't see a an issue in the modified code, the question is what other code should have been modified as well :) The change LGTM, eventually we should add support for autodetecting single quotes vs double quotes. Maybe it is as easy as running determineTokens one time for single and one time for double and comparing the results. |
yes, for the autodetect, I'll implement it before next fix build, and my first idea is the one you suggest as well. |
@michalkurka the new small data file was added and the new parsing test passed. |
https://h2oai.atlassian.net/browse/PUBDEV-7996
few bugs on backend setup and parsing logic:
\
) before the quote char when guessing the separator.\
as a way to escape quotes inside a quoted string.Also TestUtil hardcoded (without explanation) some parsing methods with
singleQuotes=true
instead of defaultsingleQuotes=false
: given that there are about 100 different parsing methods there currently, it made it difficult to know for sure which quote was used in the tests.==> those parsing utility methods are just nightmare, all tests should switch to
ParseSetupTransformer
if they can't use defaults.Future improvement/PR: autodetect quoting character in ParseSetup.