Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursive grammar #6

Closed
danieldroit opened this issue Feb 9, 2018 · 2 comments
Closed

recursive grammar #6

danieldroit opened this issue Feb 9, 2018 · 2 comments
Labels

Comments

@danieldroit
Copy link

danieldroit commented Feb 9, 2018

Having the following grammar:

// a partial javascript grammar in simple JSON format
{
        
// prefix ID for regular expressions used in the grammar
"RegExpID"                          : "RE::",
    
// Style model
"Style"                             : {
    
     "comment"                      : "comment"
    ,"atom"                         : "atom"
    ,"keyword"                      : "keyword"
    ,"this"                         : "keyword"
    ,"builtin"                      : "builtin"
    ,"operator"                     : "operator"
    ,"identifier"                   : "variable"
    ,"property"                     : "attribute"
    ,"number"                       : "number"
    ,"string"                       : "string"
    ,"date"                         : "string-2"
    ,"regex"                        : "string-2"
    
},

// Lexical model
"Lex"                               : {
    
     "comment"                      : {"type":"comment","tokens":[
                                    // line comment
                                    // start, end delims  (null matches end-of-line)
                                    [  "//",  null ],
                                    // block comments
                                    // start,  end    delims
                                    [  "/*",   "*/" ]
                                    ]}
    ,"identifier"                   : "RE::/[_A-Za-z$][_A-Za-z0-9$]*/"
    ,"this"                         : "RE::/this\\b/"
    ,"property"                     : "RE::/[_A-Za-z$][_A-Za-z0-9$]*/"
    ,"date"                         : "RE::/\\d{4}-\\d{2}-\\d{2}$]*/"
    ,"number"                       : [
                                    // floats
                                    "RE::/\\d*\\.\\d+(e[\\+\\-]?\\d+)?/",
                                    "RE::/\\d+\\.\\d*/",
                                    "RE::/\\.\\d+/",
                                    // integers
                                    // hex
                                    "RE::/0x[0-9a-fA-F]+L?/",
                                    // binary
                                    "RE::/0b[01]+L?/",
                                    // octal
                                    "RE::/0o[0-7]+L?/",
                                    // decimal
                                    "RE::/[1-9]\\d*(e[\\+\\-]?\\d+)?L?/",
                                    // just zero
                                    "RE::/0(?![\\dx])/"
                                    ]
    ,"string"                       : {"type":"escaped-block","escape":"\\","tokens":
                                    // start, end of string (can be the matched regex group ie. 1 )
                                    [ "RE::/(['\"])/",   1 ]
                                    }
    ,"regex"                        : {"type":"escaped-block","escape":"\\","tokens":
                                    // javascript literal regular expressions can be parsed similar to strings
                                    [ "/",    "RE::#/[gimy]{0,4}#" ]
                                    }
    ,"operator"                     : {"tokens":[
                                    "+", "-", "++", "--", "%", ">>", "<<", ">>>",
                                    "*", "/", "^", "|", "&", "!", "~",
                                    ">", "<", "<=", ">=", "!=", "!==",
                                    "=", "==", "===", "+=", "-=", "%=",
                                    ">>=", ">>>=", "<<=", "*=", "/=", "|=", "&="
                                    ]}
    ,"delimiter"                    : {"tokens":[
                                    "(", ")", "[", "]", "{", "}", ",", "=", ";", "?", ":",
                                    "+=", "-=", "*=", "/=", "%=", "&=", "|=", "^=", "++", "--",
                                    ">>=", "<<="
                                    ]}
    ,"atom"                         : {"autocomplete":true,"tokens":[
                                    "null", "undefined", 
                                    "NaN", "Infinity"
                                    ]}
    ,"boolean"                      : {"autocomplete":true,"tokens":["true", "false"]}
    ,"keyword"                      : {"autocomplete":true,"tokens":[ 
                                    "if", "while", "with", "else", "do", "try", "finally",
                                    "return", "break", "continue", "new", "delete", "throw",
                                    "var", "const", "let", "function", "catch", "void",
                                    "for", "switch", "case", "default", "class", "import", "yield",
                                    "in", "typeof", "instanceof"
                                    ]}
    ,"builtin"                      : {"autocomplete":true,"tokens":[ 
                                    "Object", "Function", "Array", "String", 
                                    "Date", "Number", "RegExp", "Math", "Exception",
                                    "setTimeout", "setInterval", "parseInt", "parseFloat", 
                                    "isFinite", "isNan", "alert", "prompt", "console", 
                                    "window", "global", "this"
                                    ]}

},

// Syntax model (optional)
"Syntax"                            : {
    
    "dot_property"                  : {"sequence":[".", "property"]},
    "prim"                          : "number | string | boolean",
    "kv"                            : {"sequence": ["string", "':'", "json"]},
    "json"                          : "object | array | prim",
    "object"                        : "'{' kv (',' kv)* '}'",
    "array"                         : "'[' json (',' json)* ']'",
    "a" : "'\"and\"'",
    "relation"                      : "'[' op ',' operand ',' operand ']'",
    "op"                            : "'\"=\"' | '\">\"' | '\">=\"' | '\"<\"' | '\"<=\"'",
    "operand"                       : "identifier | date | string | number | boolean",    
    "criteria"                      : "relation | and | or",
    "and"                           : "'['  '\"and\"' (',' criteria)+ ']'",
    "or"                             : "'['   '\"or\"' (',' criteria)+ ']'",
    "query"                         : "'{' '\"query\"'    ':' string ',' '\"where\"' ':' criteria '}'",
    "top"                           : "query"
},

// what to parse and in what order
"Parser"                            : [ ["top"] ]

}

I'd expect the input :

{"query": "sdasdas",
 "where": 
  ["and",
     ["=", 1, 1],
     ["=", 1, 1],
     ["or", 
        ["=", 1, 1],
        ["=", 1, 1]
     ]
  ]   
}

would have no syntax errors, but the there is an error reported.
How do I define recursive grammar like this one?

@foo123
Copy link
Owner

foo123 commented Feb 10, 2018

The problem is that the criteria token includes "relation | and | or" which all begin with a '[' token and the parfser automaticaly selects the first one ("relation") so next expects an "op, operand, operand". No matter what. This happens because the parsing is greedy, it does not backtrack (it is impossible for this type of grammars). One thing you can do is factor the grammar like the following part:

    "relation"                      : "op ',' operand ',' operand",
    "op"                            : "'\"=\"' | '\">\"' | '\">=\"' | '\"<\"' | '\"<=\"'",
    "operand"                       : "identifier | date | string | number | boolean",    
    "criteria"                      : "'[' (relation | and | or) ']'",
    "and"                           : "'\"and\"' (',' criteria)+",
    "or"                             : "'\"or\"' (',' criteria)+",

This parses correctly what you expect (you can try it here http://foo123.github.io/examples/codemirror-grammar/)

@foo123 foo123 closed this as completed Feb 10, 2018
@danieldroit
Copy link
Author

Many thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants