-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should Miller behave with double quotes in a tab-delimited file (--tsv)? #238
Comments
Using the only one RFC 4180 TSV compliant file ( test_rfc4180.tsv) one two three
"value one" "value two" "value three has a ""quoted string""" I have no error with The other files do not work because they are not RFC 4180 compliant. The RFC 4180 version of 36" is this one
It's useful to pretty print it.
To use no RFC 4180 compliant tsv, you should use tsvlite. |
OK, thanks for your explanation, this is slowly starting to make more sense. The double quotes are required to wrap double quotes, and the wrapped quotes have to be "escaped" by doubling them. And since I would've at least expected all three fields to have been quoted, in order to be compliant with the RFC, but upon further consideration, I guess it isn't really required to quote fields just because they contain spaces. |
After @aborruso's clarifications, I see the problem was with my understanding of the RFC, not any fault of Miller's. Thank, you sir! :) |
I have these three sample input files:
And when I
cat
them withmlr --tsv
, none of them really yield the "expected" (correct?) behavior:I understand, after reading #4, that tab-delimited support with
--tsv
is basically RFC 4180 CSV support, with tab as the delimiter. But if the TSV support really is just "RFC 4180 with a tab delimiter," surely one of the above files should've satisfied the RFC 4180 requirement for "escaping" double quotes:...and printed out just value three has a "quoted string" as one would expect for the third column?
If TSV is considered to be its own animal, then why the syntax error: unwrapped double quote at all? Unescaped double quotes should just be allowed everywhere.
My workaround has been to use
--tsvlite
, but I'm starting to feel like--tsvlite
should just be--tsv
and--tsv
should be called something like--tsv-strict
, because it's really hard to make it happy when it comes to double quotes within fields.I feel pretty willing to try to resolve this problem in a personal fork and open a PR, but what is the correct/expected behavior in this case?
The text was updated successfully, but these errors were encountered: