-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Lexer #3115
Refactor Lexer #3115
Conversation
09def45
to
2cd8510
Compare
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
@leebyron It's hard to review since a lot of changes happened simultaneously. |
I'm happy to factor some things out if it helps |
@leebyron Thanks, a lot it would be helpful. |
@IvanGoncharov - Applied most of your suggested changes and a few other related clarifying changes in 0d6a096 |
The lexer needed some cleanup, I found myself doing this as part of a Unicode RFC, but factoring all that out to make the Unicode RFC PR easier to follow. * Always use hexadecimal form for code values. * Remove use of `isNaN` for checking source over-reads. * Defines `isSourceCharacter` * Add more documentation and comments, also replaces regex with lexical grammar * Simplifies error messages * Adds additional tests
Co-authored-by: Ivan Goncharov <ivan.goncharov.ua@gmail.com>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
Depends on #3115 Implements RFC at graphql/graphql-spec#849. * Replaces `isSourceCharacter` with `isUnicodeScalarValue` * Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source. * Updates `printCodePointAt` to correctly print supplementary code points. * Adds variable-width Unicode escape sequences * Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs. * Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing. Implements: > When producing a {StringValue}, implementations should use escape sequences to > represent non-printable control characters (U+0000 to U+001F and U+007F to > U+009F). Other escape sequences are not necessary, however an implementation may > use escape sequences to represent any other range of code points. Closes #2449 Co-authored-by: Andreas Marek <andimarek@fastmail.fm>
The lexer needed some cleanup, I found myself doing this as part of a Unicode RFC, but factoring all that out to make the Unicode RFC PR easier to follow.
isNaN
for checking source over-reads.isSourceCharacter