Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a easy way to ignore whitespace? #24

Closed
geovanisouza92 opened this issue Jan 19, 2012 · 4 comments
Closed

There is a easy way to ignore whitespace? #24

geovanisouza92 opened this issue Jan 19, 2012 · 4 comments

Comments

@geovanisouza92
Copy link

Hi!

I'm writing my grammar using treetop, for a new programming language (prototype for while), but I cannot found a way to ignore whitespace/comments assertively.

There's an feature in tool, or will be implemented soon?

@geovanisouza92
Copy link
Author

I'm tried modify the source code, and I imagine something like

in treetop/runtime/compiled_parser.rb

  def has_terminal?(terminal, regex, index)
    # FIXME: Enable to ignore tabs and newlines (separated)
    if @ignore_whitespace
      index += 1 if input[index] == " "
    end
    if regex
      rx = @regexps[terminal] ||= Regexp.new(terminal)
      input.index(rx, index) == index
    else
      input[index, terminal.size] == terminal
    end
  end

And include an '-w/--ignore-whitespace' option in 'tt' binary...

Some ideas/comments?

@presidentbeef
Copy link

You could just gsub(/\s/, "") your source before parsing it?

@cjheath
Copy link
Collaborator

cjheath commented Jan 19, 2012

On 20/01/2012, at 2:01 AM, Geovani de Souza wrote:

I'm writing my grammar using treetop, for a new programming language (prototype for while), but I cannot found a way to ignore whitespace/comments assertively.

There's an feature in tool, or will be implemented soon?

PEG Parsers do not (usually) have such a feature, because they do not
separate lexing from parsing. You need to implement white-space skipping
along with your lexical rules.

For an example of how to do this, you could view my parser for CQL at
https://github.com/cjheath/activefacts/tree/master/lib/activefacts/cql
Note that CQLParser.treetop includes multiple other grammars from the
associated files, including LexicalRules.treetop, in which I define S for
mandatory whitespace/comments and s for optional whitespace. You'll
see these rules used widely to skip whitespace and comments.

When parsing keywords, be careful to avoid the trap of omitting trailing
look-ahead or non-alphanumeric. Using the lookahead prevents the parser
from recognising the first characters of "foobar" as the keyword "foo".
See the bottom of this file as an example:
https://github.com/cjheath/activefacts/blob/master/lib/activefacts/cql/Language/English.treetop

Please send any further requests to the mailing list at treetop-dev@googlegroups.com
Best of luck,

Clifford Heath.

@cjheath
Copy link
Collaborator

cjheath commented Jan 19, 2012

On Thu, Jan 19, 2012 at 2:14 PM, Clifford Heath
clifford.heath@gmail.com wrote:

associated files, including LexicalRules.treetop, in which I define S for
mandatory whitespace/comments and s for optional whitespace. You'll
see these rules used widely to skip whitespace and comments.

GMTA, at least for rule s. :)

I found a huge productivity gain in doing this work by preprocessing
input. Instead of making the grammar handle vagaries of whitespace and
comments, have the preprocessor do it.

I found I used 's' a lot less, and ' ' a lot more. My grammars are
cleaner and have to deal with fewer edge cases. It also leads to a
cleaner separation of concerns, I feel.

@cjheath cjheath closed this as completed Jul 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants