Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CSV parser do not trim quoted strings
Trimming strings can help correct simple general mistakes in CSV data files and header, but strings that are quoted are typically put there explicitly and should not be altered by the parser. The trimming logic has been moved into BufferedCharSeeker Which means it can make better decisions about trimming when quotes are involved. Basically trimming now means that whitespace around delimiters are removed and quotes will be removed if (after potential whitespace have been trimmed) value starts and ends with quotes. This results in a more predictable trimming, especially in combination with quoting. Examples of w/ or w/o trim strings: - w/o: `"a", " b " , c , "d "," e "` -> `|a| " b " | c | "d "| e |` here the string is simply chopped up at each delimiter and if the quotes aren't precisely the first and last character of the value then they will be treated as quote characters. - w/ BEFORE this commit: `"a", " b " , c , "d "," e "` -> `|a|" b "|c|"d "|e|` here the quoted values which begin with whitespace before quotes will only have those whitespaces trimmed, but keep the quote characters. It will also trim a quoted string after removing its quotes if the quotes were precisely the first and last characters of the value. - w/ AFTER this commit: `"a", " b " , c , "d "," e "` -> `|a| b |c|d | e |` here the whitespace around delimiters are trimmed and quotes are correctly recognized and therefore also removed if they are the first and last characters after initial whitespace around delimiters have been removed.
- Loading branch information
Showing
8 changed files
with
113 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters