Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of lexical token delimiter specification #49

Open
liancheng opened this issue Mar 19, 2018 · 0 comments
Open

Lack of lexical token delimiter specification #49

liancheng opened this issue Mar 19, 2018 · 0 comments

Comments

@liancheng
Copy link
Contributor

It seems that the current JMESPath specification doesn't specify any lexical token delimiters, which leaves ambiguity and confusion.

For example, considering the following two grammar rules in the spec:

comparator-expression = expression comparator expression
raw-string            = "'" *raw-string-char "'"

Sub-rules are concatenated in the same way in the above two rules but clearly the first one allows white spaces between expression and comparator since we have the following official example in the examples page:

people[?age > `20`].[name, age]

while white spaces are not allowed between "'" and *raw-string-char since a raw-string is expected to be a single lexical token.

After studying RFC 4234, I foud the following statements in section 3.1:

   LINEAR WHITE SPACE: Concatenation is at the core of the ABNF parsing
   model.  A string of contiguous characters (values) is parsed
   according to the rules defined in ABNF.  For Internet specifications,
   there is some history of permitting linear white space (space and
   horizontal tab) to be freely and implicitly interspersed around major
   constructs, such as delimiting special characters or atomic strings.

   NOTE:

      This specification for ABNF does not provide for implicit
      specification of linear white space.

   Any grammar that wishes to permit linear white space around
   delimiters or string segments must specify it explicitly.  It is
   often useful to provide for such white space in "core" rules that are
   then used variously among higher-level rules.  The "core" rules might
   be formed into a lexical analyzer or simply be part of the main
   ruleset.

Shall we specify lexical token delimiters explicitly and distinguish lexical rules from grammar rules to eliminate the ambiguity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant