Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using nanopass with non-Lispy source language. #16

Open
abhi18av opened this issue Feb 27, 2017 · 2 comments
Open

Using nanopass with non-Lispy source language. #16

abhi18av opened this issue Feb 27, 2017 · 2 comments

Comments

@abhi18av
Copy link

Hi @akeep

I've been intensely engaged in understanding and exploring the nanopass framework and I am curious whether this can be used for non-Lispy languages as well. I do sense that it's, of course, possible.

I've come up with a very basic flow, I wish to benchmark against a single source file for now.

  1. Pass the abc.x source file via standard input or via explicit in-scheme command
  2. Parse ( into what form?) the contents of the file read in as a Unicode string.
  3. Use the nanopass framework based transformations to slowly move towards the target language.

So, as you can see it's the very initial part that's hazy.

Could you give a general sketch of

  1. How to use the lexer-phase
  2. How to pass the chunked form to the form expected by nanopass.

I know it's not an issue with the framework but something like a basic tutorial for non-Lispy source lang. would be really helpful, apart from the current docs and references.

@akeep
Copy link
Member

akeep commented Mar 5, 2017

Hey @abhi18av,

Yes, you can definitely use the nanopass framework for languages that are not S-expression based, but you do need to write your own lexer and parser (or scanner-less parser) to parse the source files into an AST. One thing you may notice looking at S-expressions, is that they look a great deal like ASTs.

There are a few toolkits out there for writing parsers in Scheme, though I'm not really sure what to recommend, since I've not really used most of them. We do have an LALR parser toolkit that we've used in the past, with a hand-written lexer, but I need to double check that we can share it. I've also written my own hand-rolled recursive decent parser, partially to deal with some grammar productions that required more look-ahead than the LALR framework we have allows for.

My approach is something like the following:

First, we need to have the grammar we want to parse in some form that we can fairly easily read, so something like EBNF.

Second, I usually try to make as exact a translation from the original EBNF into a nanopass grammar as I can. This just keeps things simpler, since the parser can then simply produce nanopass forms directly as the parser discovers the structure of the program from the source file.

Third, I usually write the lexer (a.k.a scanner, tokenizer) to tokenize the input. As I said, I've always done this by hand by writing a state machine that matches the input and can be called to provide the next input.

Finally, I write the parser, either with a parser toolkit or by hand with a recursive decent parser. There are some tricks to this with regard to parsing expressions correctly, in that you need to either construct the grammar in the parser to obey the relative priority of expressions or follow an expression parsing technique to get these correct.

One of the nice things about hand-writing the parser is that you can capture whatever source information you'd like to gather in order to provide better error messages, both during the parsing phase and later in the compiler, if the source information is preserved.

I'll try to write up an example of the and include it in the documentation and testing examples over the next couple of weeks.

@abhi18av
Copy link
Author

abhi18av commented Mar 9, 2017

Thank you @akeep for the wonderfully detailed response!

The kind of source language I have in mind is somewhat like Ruby and I'm trying out implementing the compiler frontend based on the nanopass framework. However, since you've mentioned the possibilities of a well tested Lexer and framework in Scheme - I'm actually quite excited!

The thing is I do have an inkling that the parser might involve a slightly more complex mechanism since I have to implement the type inference algorithm within the compiler itself.

Please, give me a couple more days to think upon what you have outlined and how I could integrate the information - I'm running a bit busy for a while now.

Absolutely looking forward to the write-up! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants