Fix string handling and use Regex.Unescape to convert escaped characters #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #75.
The root cause of the issue was that only a special case for escapement was handled.
When the tokenizer encounters a
"
character, it enters a "string tokenizing mode" until it reads an unescaped"
again.If the tokenizer reads a
\
when in "string tokenizing mode", it ignores the\
and enters an "escaped character mode" and appends the next character verbatim to the current token builder.This worked fine if only the
"
is escaped. If any other character is escaped, the tokenizer only leaves the "escaped character mode" when it read a"
.The fix was a more general approach and an overhaul for the "escaped character mode".
It is now guaranteed to reset to the "string tokenizing mode" after an escaped character.
The escaped characters are now "unescaped" using
Regex.Unescape
, which takes care of individual escapes (like\n
or\t
) and hex escapes (\xa0
).