Skip to content

Commit

Permalink
Added possibility do define Treetop parsers in pure Ruby code.
Browse files Browse the repository at this point in the history
Extended some core Ruby classes. Modified 'lib/treetop/ruby_extensions.rb'
file to load all .rb files from 'lib/treetop/ruby_extensions' directory.
  • Loading branch information
dejw authored and cjheath committed Aug 22, 2009
1 parent cb3d629 commit 1346477
Show file tree
Hide file tree
Showing 13 changed files with 426 additions and 4 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Expand Up @@ -4,4 +4,5 @@
*.ipr
doc/site/*.*
benchmark/*.dat
benchmark/*.log
benchmark/*.log
*~
126 changes: 126 additions & 0 deletions doc/using_in_ruby.markdown
Expand Up @@ -19,3 +19,129 @@ If a grammar by the name of `Foo` is defined, the compiled Ruby source will defi
else
puts 'failure'
end

##Defining Grammars Directly in Ruby
It is possible to define parser directly in Ruby source file.

###Grammars
Defining parsers in Ruby code is as much similar to original definition as it is possible. To create a grammar just write:

include Treetop::Syntax
grammar :Foo do
end
parser = FooParser.new

Treetop will automatically compile and load it into memory, thus an instance of `FooParser` can be created.

###Syntactic Recognition
To create a rule inside of a grammar simply write:

include Treetop::Syntax
grammar :Foo do
rule :bar do
...
end
end

Inside the rule any of Treetop syntactic elements can be used. Each element of a rule is created with standard Ruby classes: Strings act as Terminals, Symbols stand for Nonterminals, Arrays are sequences, Regexps are character classes.

_Note: it is better not to use Numbers, as terminal symbols; use Strings instead._

Sequences can be defined as follows:

rule :sequence do
[ "foo", "bar", "baz" ]
end

Ordered choices use `/` operator:

rule :choice do
"foo" / "bar"
end

Sequences have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence. For example:

rule :nested do
["foo", "bar", "baz" / "bop" ] # -> "foo" "bar" ( "baz" / "bop" )
end

Special operators like `!`, `&`, `?`, `+` and `*` are available through methods (all of the methods return element itself so calls can be chained) of elements in a rule:

Op. | Method
-----------
! | bang
& | amper
? | mark
+ | plus
* | kleene

For example grammar:

grammar :Foo do
rule :bar do
[ "baz" / "bop" ].kleene
end
end

can generate any word that contain words "bar" and "bop".

###Semantic Interpretation

Syntax node declaration can be added by `node` method (which may be called the same as operators above):

grammar :Parens do
rule :parenthesized_letter do
([ '(', :parenthesized_letter, ')'] / /[a-z]/ ).node(:ParenNode)
end
end

It is also possible to add inline blocks of code. They are in fact strings strictly inserted into generated grammar:

grammar :Parens do
rule :parenthesized_letter do
(['(', :parenthesized_letter, ')'] / /[a-z]/ ).block(%{
def depth
if nonterminal?
parenthesized_letter.depth + 1
else
0
end
end
})
end
end

Labels in rule definitions can be written as follow (example taken from documentation):

rule :labels do
[/[a-z]/.label(:first_letter), [', ', /[a-z]/.kleene.label(:letter)].label(:rest_letters)].block(%{
...
})
end

###Composition

Inclusion of a grammar works thanks to `include` function call inside the grammar definition:

grammar :One do
rule :a do
foo"
end

rule :b do
"baz"
end
end

grammar :Two do
include :One
rule :a do
:super / "bar" / :c
end
rule :c do
:b
end
end

Grammar Two can generate `"foo"`, `"bar"` and `"baz"` words.
105 changes: 105 additions & 0 deletions examples/ruby_syntax/syntax_test.rb
@@ -0,0 +1,105 @@
dir = File.dirname(__FILE__)
require File.expand_path("#{dir}/test_helper")

class SyntaxTest < Test::Unit::TestCase
include Treetop::Syntax
include SyntaxTestHelper

def test_simple
assert_grammar {
grammar :OnlyGrammar do
end
}
end

def test_rules
assert_grammar {
grammar :Simple do
rule :foo do
["foo", :bar]
end

rule :bar do
"bar" / "baz"
end
end
}
parse('foobar')
parse('foobaz')
end

def test_nested
assert_grammar {
grammar :Nested do
rule :nested do
["foo", "bar", "baz" / "bop"]
end
end
}
parse('foobarbaz')
parse('foobarbop')
end

def test_operators
assert_grammar {
grammar :Kleene do
rule :Kleene do
"foo".kleene
end
end
}
parse("")
parse("foo")
parse("foofoo")

assert_grammar {
grammar :Plus do
rule :Plus do
"foo".plus
end
end
}
parse("foo")
parse("foofoo")

assert_grammar {
grammar :Optional do
rule :Optional do
"foo".mark
end
end
}
parse("")
parse("foo")
end

def test_inclusion
assert_grammar {
grammar :One do
rule :a do
"foo"
end

rule :b do
"baz"
end
end
}

assert_grammar {
grammar :Two do
include :One
rule :a do
:super / "bar" / :c
end

rule :c do
:b
end
end
}
parse("foo")
parse("bar")
parse("baz")
end
end
28 changes: 28 additions & 0 deletions examples/ruby_syntax/test_helper.rb
@@ -0,0 +1,28 @@
require 'test/unit'
require 'rubygems'
require 'treetop'

dir = File.dirname(__FILE__)
require File.expand_path("#{dir}/../../lib/treetop/ruby_extensions")
require File.expand_path("#{dir}/../../lib/treetop/syntax")

module SyntaxTestHelper
def assert_grammar
g = yield
assert_not_nil g
flunk "Badly generated parser" unless g
@parser = eval("#{g}.new")
end

def parse(input)
result = @parser.parse(input)
unless result
puts @parser.terminal_failures.join("\n")
end
assert_not_nil result
if result
assert_equal input, result.text_value
end
result
end
end
1 change: 1 addition & 0 deletions lib/treetop.rb
Expand Up @@ -11,6 +11,7 @@ module Treetop
require File.join(TREETOP_ROOT, "ruby_extensions")
require File.join(TREETOP_ROOT, "runtime")
require File.join(TREETOP_ROOT, "compiler")
require File.join(TREETOP_ROOT, "syntax")

require 'polyglot'
Polyglot.register(Treetop::VALID_GRAMMAR_EXT, Treetop)
4 changes: 3 additions & 1 deletion lib/treetop/ruby_extensions.rb
@@ -1,2 +1,4 @@
dir = File.dirname(__FILE__)
require "#{dir}/ruby_extensions/string"
Dir.glob("#{dir}/ruby_extensions/*.rb").each do |file|
require file
end
22 changes: 22 additions & 0 deletions lib/treetop/ruby_extensions/array.rb
@@ -0,0 +1,22 @@
class Array
def join_with(method, pattern = "")
return join(pattern) unless method
return "" if self.length == 0

args = []
if method.respond_to? :to_hash
args = method[:args] || []
method = method[:name]
end

output = self[0].send(method, *args)
for i in (1...self.length)
output += pattern + self[i].send(method, *args)
end
output
end

def to_tt
self.join_with({:name => :seq_to_tt, :args => [true]}, " ")
end
end
5 changes: 5 additions & 0 deletions lib/treetop/ruby_extensions/nil.rb
@@ -0,0 +1,5 @@
class NilClass
def to_tt
""
end
end
57 changes: 57 additions & 0 deletions lib/treetop/ruby_extensions/object.rb
@@ -0,0 +1,57 @@
class Object
def sequence
@sequence ||= []
end

def /(other)
sequence.push(other)
self
end

def seq_to_tt(inline = false)
separator = inline ? " / " : "\n/\n"
tt = if sequence.length == 0
self.to_tt
else
output = self.to_tt + separator + sequence.join_with({:name => :seq_to_tt, :args => [true]}, separator)
output = "( #{output} )" if inline
output
end

# Operators
tt = "&" + tt if @amper
tt = "!" + tt if @bang
tt += "*" if @kleene
tt += "+" if @plus
tt += "?" if @mark

tt += " <#{@node.to_s}>" if @node
tt += " {\n#{@block.gsub("\t", " ").justify.indent_paragraph(2)}\n}" if @block
tt = @label.to_s + ':' + tt if @label
tt
end

def node(name)
@node = name
self
end

def block(content)
@block = content
self
end

def label(name)
@label = name
self
end

[:mark, :kleene, :plus, :amper, :bang].each do |sym|
Object.class_eval(%{
def #{sym.to_s}
@#{sym.to_s} = true
self
end
})
end
end
5 changes: 5 additions & 0 deletions lib/treetop/ruby_extensions/regexp.rb
@@ -0,0 +1,5 @@
class Regexp
def to_tt
self.inspect
end
end

0 comments on commit 1346477

Please sign in to comment.