Quote handling issue #713

jseter · 2019-10-03T20:18:29Z

In my test 'questionably formatted' file, which is 2 MB the browser freezes, trying the parse the whole file as a single quote even though the quoting is properly formed where it is stuck. This completely locks the browser tab at 100% cpu, never making it into the step function until it exhausts the entire file or runs out of memory queuing errors.

In debugging, the main issue i see is that:

// Check up to nextDelim or nextNewline, whichever is closest
var checkUpTo = nextNewline === -1 ? nextDelim : Math.min(nextDelim, nextNewline);
var spacesBetweenQuoteAndDelimiter = extraSpaces(checkUpTo);

nextDelim and nextNewline have not been updated at this point in the code, so if they are within the quoted text they will be less than quoteSearch, used in extraSpaces function.

File delimiter is "~", but the user saved the file using ',' and settings such that all the contents ended up in the first column. the parser should have ended parsing the line at the first newline, but ends up in a loop trying to load the whole file as a single column.
ex.

"33~000114~000020333500000~003335~""CHAPPED LIPS FRESH SV    """,,,,
"33~000114~000200333500000~003335~""CHAPPED LIPS FRESH SV    """,,,,
"06~000196~000020777700000~007777~""WALL ZYGNA LIGHT SWAG CHS""",,,,

The text was updated successfully, but these errors were encountered:

pokoli · 2019-10-04T07:14:06Z

It seems that the issue is parsed due to using a bad delimiter. Is the file parsed correctly when using the right delimiter? I guess you should use the guessDelimiter functionality to avoid this kind of errors.

jseter · 2019-10-04T13:54:48Z

Using a comma would behave better here of course, although would still be the wrong data since the file is malformed. The main point though is that the after-quote handling in papa parse would not be correct in the case of either the delimiter being present in the quoted string or newline being in the quoted string. Both of these cases are part of the reason quoting exists.

It should at a minimum update these indexes if the starting point, in this case quoteSearch is greater than or equal to their current value, which is a clear indication that those indexes are inside the quoted string and not after.
Additionally it is only checking for whitespace after the quoted text. In this case there is unquoted text after the quote, which is lacking a defined behavior. Since the RFC only specifies quoted/un-quoted for a field, i would assume that this case should have reported that as malformed and quit the quote parsing. since it keeps searching for a closing quote followed by newline or delimiter it will keep scanning until it hits the end of the file.

pokoli · 2019-10-23T17:37:21Z

@jseter I've merged all your pull request but I'm not sure if this issues is already solved.

Could you please check?

jseter · 2019-10-23T18:23:03Z

I've been incrementally doing PR's for each item, but I have not submitted a PR to close #713 yet. I need to implement the strict quote option to handle the issue for this ticket.

jseter mentioned this issue Oct 16, 2019

#713 improve quote handling #719

Closed

jseter mentioned this issue Oct 23, 2019

Quote processing uses stale indexes if quote contains delimiter or newline #727

Closed

jseter pushed a commit to jseter/PapaParse that referenced this issue Oct 28, 2019

mholt#713 initial implementation of strict quoting.

287999a

jseter pushed a commit to jseter/PapaParse that referenced this issue Oct 28, 2019

mholt#713 remove duplicate test.

a6d5428

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quote handling issue #713

Quote handling issue #713

jseter commented Oct 3, 2019

pokoli commented Oct 4, 2019

jseter commented Oct 4, 2019

pokoli commented Oct 23, 2019

jseter commented Oct 23, 2019

Quote handling issue #713

Quote handling issue #713

Comments

jseter commented Oct 3, 2019

pokoli commented Oct 4, 2019

jseter commented Oct 4, 2019

pokoli commented Oct 23, 2019

jseter commented Oct 23, 2019