-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DKVP: pair separator in value breaks parsing #62
Comments
The RFC-compliant CSV reader handles embedded separators within double quotes; readers for the other formats do not at all. This is a non-ideal situation, for sure. This is a dup of #52 and all four of the current on-deck or active tasks are what I'm actively working on for v2.2.0. |
@johnkerl While proper quoting can solve this issue, I'm not sure it is the only solution and I'm not certain that it is preferable in this case. Having miller make a best-effort to parse unquoted-but-still-parsable data is a bug, IMO. Given the log file that I am currently working with, for example: adding quotes to all the fields would make it a LOT harder to read (as a human reading plaintext, I mean). |
Good feedback. I'll dig harder into your request this evening. |
Thanks for looking into it further, @johnkerl -- I have not spent long looking into this, but I'm wondering if the problem isn't in /c/input/lrec_reader_mmap_dkvp.c within the I wonder if this couldn't be fixed by tweaking that code to ensure that you only match on ips once per field, which should result in only the first ips being matched. |
My apologies for the hasty read; I've got double-quotes on the brain & assigned too much weight to the double-quotes in your data. :^/ This is (was) definitely a bug; fixed in c2e11c0. Thank you for the report!! :) |
No problem at all, @johnkerl -- thanks very much for the quick fix! |
Nice fix, @johnkerl -- have confirmed that with it in-place, miller can now deal very nicely with things like a web-server access.log file in dkvp format (using = as the PS now works well, even though it is used in some field values as well). |
Awesome!! |
Miller has difficulty parsing DKVP when the pair separator character is within the value of the data that it is parsing. To whit:
Source File
Expected Output of
mlr --opprint cat
Actual Output of
mlr --opprint cat
It looks like what miller does is that it reads the key until it finds the first pair separator, and then starts reading the value after the last pair separator.
What it should do, I think, is read the key until it finds the first pair separator, and then assign everything after that first pair separator right up until the field separator is found to be value.
The text was updated successfully, but these errors were encountered: