Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote handling issue #713

Open
jseter opened this issue Oct 3, 2019 · 4 comments
Open

Quote handling issue #713

jseter opened this issue Oct 3, 2019 · 4 comments

Comments

@jseter
Copy link
Contributor

jseter commented Oct 3, 2019

In my test 'questionably formatted' file, which is 2 MB the browser freezes, trying the parse the whole file as a single quote even though the quoting is properly formed where it is stuck. This completely locks the browser tab at 100% cpu, never making it into the step function until it exhausts the entire file or runs out of memory queuing errors.

In debugging, the main issue i see is that:

// Check up to nextDelim or nextNewline, whichever is closest
var checkUpTo = nextNewline === -1 ? nextDelim : Math.min(nextDelim, nextNewline);
var spacesBetweenQuoteAndDelimiter = extraSpaces(checkUpTo);

nextDelim and nextNewline have not been updated at this point in the code, so if they are within the quoted text they will be less than quoteSearch, used in extraSpaces function.

File delimiter is "~", but the user saved the file using ',' and settings such that all the contents ended up in the first column. the parser should have ended parsing the line at the first newline, but ends up in a loop trying to load the whole file as a single column.
ex.

"33~000114~000020333500000~003335~""CHAPPED LIPS FRESH SV    """,,,,
"33~000114~000200333500000~003335~""CHAPPED LIPS FRESH SV    """,,,,
"06~000196~000020777700000~007777~""WALL ZYGNA LIGHT SWAG CHS""",,,,
@pokoli
Copy link
Collaborator

pokoli commented Oct 4, 2019

It seems that the issue is parsed due to using a bad delimiter. Is the file parsed correctly when using the right delimiter? I guess you should use the guessDelimiter functionality to avoid this kind of errors.

@jseter
Copy link
Contributor Author

jseter commented Oct 4, 2019

Using a comma would behave better here of course, although would still be the wrong data since the file is malformed. The main point though is that the after-quote handling in papa parse would not be correct in the case of either the delimiter being present in the quoted string or newline being in the quoted string. Both of these cases are part of the reason quoting exists.

It should at a minimum update these indexes if the starting point, in this case quoteSearch is greater than or equal to their current value, which is a clear indication that those indexes are inside the quoted string and not after.
Additionally it is only checking for whitespace after the quoted text. In this case there is unquoted text after the quote, which is lacking a defined behavior. Since the RFC only specifies quoted/un-quoted for a field, i would assume that this case should have reported that as malformed and quit the quote parsing. since it keeps searching for a closing quote followed by newline or delimiter it will keep scanning until it hits the end of the file.

@pokoli
Copy link
Collaborator

pokoli commented Oct 23, 2019

@jseter I've merged all your pull request but I'm not sure if this issues is already solved.

Could you please check?

@jseter
Copy link
Contributor Author

jseter commented Oct 23, 2019

I've been incrementally doing PR's for each item, but I have not submitted a PR to close #713 yet. I need to implement the strict quote option to handle the issue for this ticket.

jseter pushed a commit to jseter/PapaParse that referenced this issue Oct 28, 2019
jseter pushed a commit to jseter/PapaParse that referenced this issue Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants