There is a easy way to ignore whitespace? #24

geovanisouza92 · 2012-01-19T15:01:09Z

Hi!

I'm writing my grammar using treetop, for a new programming language (prototype for while), but I cannot found a way to ignore whitespace/comments assertively.

There's an feature in tool, or will be implemented soon?

geovanisouza92 · 2012-01-19T16:18:05Z

I'm tried modify the source code, and I imagine something like

in treetop/runtime/compiled_parser.rb

  def has_terminal?(terminal, regex, index)
    # FIXME: Enable to ignore tabs and newlines (separated)
    if @ignore_whitespace
      index += 1 if input[index] == " "
    end
    if regex
      rx = @regexps[terminal] ||= Regexp.new(terminal)
      input.index(rx, index) == index
    else
      input[index, terminal.size] == terminal
    end
  end

And include an '-w/--ignore-whitespace' option in 'tt' binary...

Some ideas/comments?

presidentbeef · 2012-01-19T18:34:57Z

You could just gsub(/\s/, "") your source before parsing it?

cjheath · 2012-01-19T22:14:57Z

On 20/01/2012, at 2:01 AM, Geovani de Souza wrote:

I'm writing my grammar using treetop, for a new programming language (prototype for while), but I cannot found a way to ignore whitespace/comments assertively.

There's an feature in tool, or will be implemented soon?

PEG Parsers do not (usually) have such a feature, because they do not
separate lexing from parsing. You need to implement white-space skipping
along with your lexical rules.

For an example of how to do this, you could view my parser for CQL at
https://github.com/cjheath/activefacts/tree/master/lib/activefacts/cql
Note that CQLParser.treetop includes multiple other grammars from the
associated files, including LexicalRules.treetop, in which I define S for
mandatory whitespace/comments and s for optional whitespace. You'll
see these rules used widely to skip whitespace and comments.

When parsing keywords, be careful to avoid the trap of omitting trailing
look-ahead or non-alphanumeric. Using the lookahead prevents the parser
from recognising the first characters of "foobar" as the keyword "foo".
See the bottom of this file as an example:
https://github.com/cjheath/activefacts/blob/master/lib/activefacts/cql/Language/English.treetop

Please send any further requests to the mailing list at treetop-dev@googlegroups.com
Best of luck,

Clifford Heath.

cjheath · 2012-01-19T23:18:53Z

On Thu, Jan 19, 2012 at 2:14 PM, Clifford Heath
clifford.heath@gmail.com wrote:

associated files, including LexicalRules.treetop, in which I define S for
mandatory whitespace/comments and s for optional whitespace. You'll
see these rules used widely to skip whitespace and comments.

GMTA, at least for rule s. :)

I found a huge productivity gain in doing this work by preprocessing
input. Instead of making the grammar handle vagaries of whitespace and
comments, have the preprocessor do it.

I found I used 's' a lot less, and ' ' a lot more. My grammars are
cleaner and have to deal with fewer edge cases. It also leads to a
cleaner separation of concerns, I feel.

cjheath closed this as completed Jul 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is a easy way to ignore whitespace? #24

There is a easy way to ignore whitespace? #24

geovanisouza92 commented Jan 19, 2012

geovanisouza92 commented Jan 19, 2012

presidentbeef commented Jan 19, 2012

cjheath commented Jan 19, 2012

cjheath commented Jan 19, 2012

There is a easy way to ignore whitespace? #24

There is a easy way to ignore whitespace? #24

Comments

geovanisouza92 commented Jan 19, 2012

geovanisouza92 commented Jan 19, 2012

presidentbeef commented Jan 19, 2012

cjheath commented Jan 19, 2012

cjheath commented Jan 19, 2012