Racc based parser for the spark API filter syntax.
Ruby Yacc Shell
Latest commit fe9eaa5 Nov 18, 2016 @wlmcewen wlmcewen committed on GitHub Merge pull request #47 from wlmcewen/master
API-4392 fix Not(Not ...) levels


SparkQL query language parser

This gem contains the syntax parser for processing spark api filter queries into manageable expressions. To get an overview of the language syntax-wise, refer to the following files:

  • lib/sparkql/parser.y # BNF Grammar
  • lib/sparkql/token.rb # Token matching rules


Add the gem to your gemfile:

Gemfile gem 'sparkql', '~> 0.0.1'

When completed, run 'bundle install'.


Ruby 1.9 or greater is required.

See test/unit/parser_test.rb for generic parsing examples. In most cases an extended parser is needed to do anything of significance, such as the postgres and db2 search implementations in the API.

Here is a basic example:

expressions = Parser.new.parse("Hello Eq 'World'")

The return value will be an array with one expression element containing the query information:

  :field => "Hello",
  :type => :character,
  :value => "'World'",
  :operator => 'Eq'
  # ...


The parser is based on racc, a yacc like LR parser that is a part of the ruby runtime. The grammar is located at lib/sparkql/parser.y and is compiled as part of the test process. Refer to the Rakefile for details. When modifying the grammar, please checkin BOTH the parser.y and parser.rb files.

Debugging grammar issues can be done by hand using the "racc" command. For example, a dump of the parser states (and conflicts) can be generated via

racc -o lib/sparkql/parser.rb lib/sparkql/parser.y -v  # see lib/sparkql/parser.output

The rails/journey project was an inspiration for this gem. Look it up on github for reference.