Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support escaped CR and CRLF in strings #58

Merged
merged 4 commits into from Jul 30, 2014
Merged

Support escaped CR and CRLF in strings #58

merged 4 commits into from Jul 30, 2014

Conversation

jordanbtucker
Copy link
Member

Support escaped CR and CRLF in strings.
Add new peek method to facilitate look-aheads.

@jordanbtucker
Copy link
Member Author

The multi-string test cases were failing on Windows due to CRLF line endings.

@aseemk
Copy link
Member

aseemk commented Jul 29, 2014

@jordanbtucker: you are totally awesome! Thanks for so many great suggestions and pull requests. I'm glad @aeisenberg stepped in for one, but he's on vacation, and I'm super busy with work right now.

Given your awesome contributions so far, I'm going to make you a collaborator and give you push rights to this repo. Please do keep on opening pull requests and giving others a chance to review your changes, but you can now make the call to merge yourself. (E.g. for minor changes, or if no one has reviewed your code after some time.)

Welcome to the team! =)

} else if (ch === '\r') {
if (peek() === '\n') {
next();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about this:

  1. Why do you need to see whether a \n is the next character? Is \r by itself not enough to ignore?
  2. Why is this within the if (ch === '\\') block? Isn't this looking at carriage returns only after an escaped backslash? Why not put this further down, next to the regular ch === '\n' check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ignore the second question; I understand. I forgot that our multi-line strings require a backslash escape, so this needs to come after that. =)

Still curious about the first q.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And btw, if \r by itself should be accepted too, then you might not need this block of code at all (nor peek of course). See the escapee dictionary up top (if you haven't already) where \n is handled.

@jordanbtucker
Copy link
Member Author

The reason I'm handling \r as a special case is due to the differences in how operating systems represent new lines.

  • \n is used by Unix, Linux, and OS X and later.
  • \r is used by OS 9 and earlier.
  • \r\n is used by Windows (note that it is a \r followed by \n).

Therefore, any of the following combinations constitute an escaped new line:

  • \ followed by \n
  • \ followed by \r not followed by \n
  • \ followed by \r\n

This conforms to the LineContinuation production, which uses the LineTerminatorSequence production.

There's no need to add \r to escapee since it's always handled as a special case.

@jordanbtucker
Copy link
Member Author

I'm thinking I should create test cases for these before merging. Working on that now.

@aseemk
Copy link
Member

aseemk commented Jul 29, 2014

I didn't mean why \r in general — yep, I'm aware of different line endings =D — I meant why are you peeking ahead in the \r case to look for an \n? Why not just ignore \r and \n each independently? It'll then work when they're back-to-back as well. And to ignore \r independent of \n, you would just add an entry for \r into escapees just like \n.

@jordanbtucker
Copy link
Member Author

I don't think that will work. Supposed we added \r to escapee and removed its special case. Then we parse a string like this:

'line1 \\\r\n line 2'

When the parser hits the \ character, it will check for a following u, which won't be there. So it moves on to check for one of the escapees, finding \r. It appends an empty string to string, and moves on to the next iteration.

Now ch contains \n, but it doesn't know that there was a \ before the \r that was before this character. so it will hit the check for \n returning a Bad string error.

I ran some tests to confirm this too.

@aseemk
Copy link
Member

aseemk commented Jul 30, 2014

Oh, got it! Duh, sorry about that. Thanks for explaining. =)

# Since we're testing different representations of new lines,
# treat all tests in this folder as binary files.

* binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, both this and the editorconfig.

@aseemk
Copy link
Member

aseemk commented Jul 30, 2014

Great test cases! 🚢

jordanbtucker added a commit that referenced this pull request Jul 30, 2014
Support escaped CR and CRLF in strings
@jordanbtucker jordanbtucker merged commit 1d54c2c into json5:develop Jul 30, 2014
@jordanbtucker jordanbtucker deleted the escaped-cr-and-crlf branch July 30, 2014 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants