-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
splitCsv does not handle quoted values containing commas correctly #1102
Comments
I'm not sure why you want this behaviour...CSV stands for "Comma-separated values" |
@edgano Well, for one, the tools I'm using generate CSV files with quoted values. Further, it seems to be a widely accepted interpretation of the CSV specification. |
Yep, the quotes are not the "problem"... |
Quotation handling should take precedent over splitting on the separator, be it a comma or another Anyway, comma handling might be solved just with the regex above, then stripping quotes if they exist from the tokens. |
Yes, you are right. This is a bug. |
@pditommaso Thank you for the quick response and fix. I can confirm this is working for my pipelines using
|
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Bug report
It appears that
.splitCsv
does not handle quoted tokens containing commas correctly.Expected behavior and actual behavior
Given a token wrapped in quotes
"1,234"
, I would expect.splitCsv
to parse this as a single value[.., '1,234', ...]
. However, it breaks this token into two, leaving a singular quote with each half[..., "1, 234", ...]
.Steps to reproduce the problem
Using the following test case above as
test.csv
and parsing as
I've tried a variety of values of
quote
above to no avail (I'm new to Groovy so advice on how to best specify this option would be helpful).Program output
Expected output
Actual output
Environment
Additional context
I think the issue is related to how
StringUtils.splitPreserveAllTokens
is used incsvSplitter.groovy
. The line is tokenized by,
first then quotes are removed, but the tokenization by.splitPreserveAllTokens
doesn't respect quotes:yields 5 tokens:
Perhaps regex is needed to extract the proper tokens (example), then quotes can be removed from them:
yields the desired 3 tokens
The text was updated successfully, but these errors were encountered: