-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV node outputs non valid RFC4180 CSV data under certain conditions #3934
Comments
@dceejay has already started works in branch CSV-fix-n-handling-inside-text-fields |
Following our discussion on RFC for CSV node, I am almost finished (tests written etc). One last bone of contention for me is the use of So, since number parsing is on the hot path, I have been doing some performance testing and seeing if there is a better AND faster method. TL;DRI am proposing using the Do you have any objections?
|
@dceejay ^ scratch that ^ It would mess with things like phone numbers 0800100100 and the likes :( Was a nice idea until reality got in the way. PR incoming for the work we discussed last week. |
Also would it have worked with scientific notation ? 3E5 10E-2 etc ? |
@dceejay Yeah, +casting does handle scientific notation too.
I think I have a compromise which yields 13%+ performance increase (not as snappy as previous try out, but it passes all the tests and gives us 0x0, 0b0, 0o0 parsing too) By simplifying the number check regex and reversing the logic (i.e. find the things we know we DONT want to convert) then let the // original regex
const isNumberTest = /^ *[-]?(?!E)(?!0\d)\d*\.?\d*(E-?\+?)?\d+ *$/i;
function toNumber_regex(value) {
const isNumber = isNumberTest.test(value)
return isNumber ? parseFloat(value) : value
}
// alt logic
const skipNumberConversion = /^ *(\+|-0\d|0\d)/
function toNumber_proposed_alt(value) {
if(!skipNumberConversion.test(value)) {
const v = +value
return !isNaN(v) ? v : value
}
return value
} operational comparisonPerformance Summary
Performance Details (click to view)
Differences that we need to agree:
Final wordsI have now spent far too much time on this in the pursuit of a few ms. What do they say about premature optimisation? 😆 |
Hi @Steve-Mcl, can you please close this issue (resolved in 4540) because it seems that GH did not do it. Thanks 🙂 |
It will be closed when merged into master |
I have removed the needs-triage label and added fixed label |
Current Behavior
This is related #3919 (but to be addressed separately)
In the screenshot below, an object prop with a double quote generates invalid CSV then fails completely to reconvert to object
Here is what Google Sheets / Excel makes of double quotes...
Note that double quotes are doubled up and surrounded in double quotes as per RFC
Here is what convertcsv.com make of it...
Expected Behavior
If this is considered a bug, then target
master
with fixesIf this is considered "current/expected behaviour" then target
dev
with suggested enhancements...Ideally, we would make the CSV output data that is 100% compatible with the specification however if it is deemed making the CSV node 100% RFC compatible would break users existing flows, it may be preferable to implement a new
[x] Strict Mode
option that fully adheres to the specificationSteps To Reproduce
Import provided flows - they demonstrate the various problems
Example flow
Environment
The text was updated successfully, but these errors were encountered: