Skip to content

Commit

Permalink
Clarify that lexing is greedy
Browse files Browse the repository at this point in the history
GraphQL syntactical grammars intend to be unambiguous. While lexical grammars should also be - there has historically been an assumption that lexical parsing is greedy. This is obvious for numbers and words, but less obvious for empty block strings.

Either way, the additional clarity removes ambiguity from the spec

Partial fix for #564

Specifically addresses #564 (comment)
  • Loading branch information
leebyron committed Jul 23, 2019
1 parent dfd7571 commit 919f3f2
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 9 deletions.
8 changes: 5 additions & 3 deletions spec/Appendix A -- Notation Conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,12 @@ ListOfLetterA :

The GraphQL language is defined in a syntactic grammar where terminal symbols
are tokens. Tokens are defined in a lexical grammar which matches patterns of
source characters. The result of parsing a sequence of source Unicode characters
produces a GraphQL AST.
source characters. The result of parsing a source text sequence of Unicode
characters first produces a sequence of lexical tokens according to the lexical
grammar which then produces abstract syntax tree (AST) according to the
syntactical grammar.

A Lexical grammar production describes non-terminal "tokens" by
A lexical grammar production describes non-terminal "tokens" by
patterns of terminal Unicode characters. No "whitespace" or other ignored
characters may appear between any terminal Unicode characters in the lexical
grammar production. A lexical grammar production is distinguished by a two colon
Expand Down
26 changes: 20 additions & 6 deletions spec/Section 2 -- Language.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ common unit of composition allowing for query reuse.

A GraphQL document is defined as a syntactic grammar where terminal symbols are
tokens (indivisible lexical units). These tokens are defined in a lexical
grammar which matches patterns of source characters (defined by a
double-colon `::`).
grammar which matches patterns of source characters. In this document, syntactic
grammar productions are distinguished with a colon `:` while lexical grammar
productions are distinguished with a double-colon `::`.

Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more details about the definition of lexical and syntactic grammar and other notational conventions
used in this document.
Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more information
about the lexical and syntactic grammar and other notational conventions used
throughout this document.


## Source Text
Expand All @@ -25,6 +27,19 @@ ASCII range so as to be as widely compatible with as many existing tools,
languages, and serialization formats as possible and avoid display issues in
text editors and source control.

**Greedy Lexical Parsing**

The source text of a GraphQL document is first converted into a sequence of
lexical tokens, {Token}, and ignored tokens, {Ignored}. The source text is
scanned from left to right, repeatedly taking the longest possible sequence of
unicode characters as the next token.

For example, the sequence `123` is always interpreted as a single {IntValue},
and `""""""` is always interpreted as a single block {StringValue}.

This sequence of lexical tokens are then scanned from left to right to produce
an abstract syntax tree (AST) according to the {Document} syntactical grammar.


### Unicode

Expand Down Expand Up @@ -118,8 +133,7 @@ Token ::
A GraphQL document is comprised of several kinds of indivisible lexical tokens
defined here in a lexical grammar by patterns of source Unicode characters.

Tokens are later used as terminal symbols in a GraphQL Document
syntactic grammars.
Tokens are later used as terminal symbols in GraphQL syntactic grammar rules.


### Ignored Tokens
Expand Down

0 comments on commit 919f3f2

Please sign in to comment.