VIP: Custom Parser #563

fubuloubu · 2017-12-08T17:44:09Z

Preamble

VIP: 563
Title: Custom Parser
Author: @fubuloubu @DavidKnott @jacqueswww
Type: Standard
Status: Draft
Created: 2017-12-08

Simple Summary

Implement a custom parser for Viper that doesn't directly tie us to Python-only syntax, enabling a more focused grammer for our langauge

Abstract

We've been discussing this for a while. A custom parser would allow us to define our syntax more precisely rather than leveraging Python syntax and being tied to only what Python's syntax can provide. We will continue to use Python as a template due to it's clarity and ease of reading, but we need to make decisions that diverge from Python and a custom Parse will enable that.

Motivation

There are specific things that have been discussed where this is necessary:

Clarifying the external contract type
Changing the mapping type for greater clarity
Custom Types
etc...

Specification

We may be able to leverage a Python-compatible lex/yacc library like ply. We should also leverage some of the work the k-framework guys are doing in order to infer a grammar that is consistent and free of formalized conflicts

Backwards Compatibility

Try to maintain backwards compatibility initially, however some of the VIPs this one will enable will be breaking changes in the syntax.

Copyright

Copyright and related rights waived via CC0

dani-jozsef · 2017-12-12T14:32:48Z

Please don't. Let's not turn this into another Solidity. :(

fubuloubu · 2017-12-13T00:51:33Z

We're already running into limitations of the underlying python syntax, and for future growth there will be a need to violate the syntax in subtle and not-so-subtle ways. Viper is definitely a different language from Python, we will try to stick to the syntax as closely as possible but there are different situations that we need a custom parser to handle.

DavidKnott · 2017-12-13T07:22:03Z

I don't think having a custom parser will negatively effect clarity, given how different writing smart contract code is to python code (particularly in terms of security) it would be helpful to customize certain parts of the parser. Here are a couple examples where a customer parser could come in handy:

Using a more descriptive keyword than class for defining external
Changing the syntax of logging to make it clearer, as MyLog: __log__({arg1: num}) is limited by python syntax

tmke8 · 2017-12-15T15:01:58Z

The danger of a custom parser is bugs. (There were 8 critical bugs in the Serpent compiler when Augur ordered an audit...)

External contracts could be nicely specified if this suggestion was implemented. You could say that you can only inherit from Contract and ExternalContract and there can only be one Contract per file. The inheritance would basically do nothing.

More descriptive keywords could be realized by the preprocessor that is mentioned in this proposal.

For the mapping syntax, something like this should be possible: Mapping[KeyType, ValueType].

mslipper · 2017-12-18T05:33:16Z

One potential solution is to copy the existing Python grammar, modify it to match the Viper language, then generate a parser from that grammar using a tool like ANTLR. This confers a number of benefits:

Generated parsers have been around forever and are used regularly in security-critical applications.
Development is simplified considerably. The Viper team can focus on the language's syntax and features without spending time maintaining a bespoke parser.
Parser generators favor a grammar-first development workflow. The grammar won't be just another document to maintain - it'll be an integral part of the language. Having an up-to-date grammar simplifies third-party tool development (i.e., static analysis tools, syntax highlighters, etc.) considerably.

If there's interest, I'd be happy to build a small prototype.

fubuloubu · 2017-12-18T14:34:38Z

@mslipper 👍 I think this is the approach we were getting at. Thanks for suggesting a tool!

Is there an easy way to integrate this with our Python flow (e.g. ANTLR wrapper module) so that the build process could be managed 100% in Python? ANTLR is a Java program, but I see some evidence that this is possible here

fubuloubu · 2017-12-18T15:34:28Z

Meeting Minutes:

Can we use custom keywords in the Python AST module?
Is there a better way to segment/streamline work with an external Module to handle Lexing/Parsing

mslipper · 2017-12-18T20:30:28Z

@fubuloubu ANTLR is written in Java, but it'll generate a parser in any language it supports. The python-target you linked is the exact solution you're looking for 😄. For build flow, I'd suggest adding a target to your Makefile that runs ANTLR prior to making the egg and running tests. That way there is no Java dependency for Viper's users, only developers.

fubuloubu · 2017-12-18T21:13:09Z

We were discussing this along with a few other things in the call we had today. I think we're still a little reluctant to move to a custom solution fully. We were trying to figure out if there was a way to modify or extend the AST module to get what we're looking for, I think to do that we need a summary of the changes we are looking to make.

From the original post above, these are (with examples):

Clarifying the external contract type VIP: Contract data type #541

my_contract: contract(
    foo(),
    bar() -> num,
)

Changing the mapping syntax VIP: Change Mapping Syntax #564

my_map: map(basetype1 -> basetype2)

Allowing custom types/type aliasing VIP: Named Structs #300

wei := num("wei")
fee: wei

We also chatted a bit today about #584, and I believe my proposed solution may be able to sidestep all of this by changing how types are handled a bit. I think most of our reasons for wanting a custom parser have more to do with being able to specify and easily work with different kinds of globals. Check out the bottom of that issue and feel free to add to the discussion.

maurelian · 2018-07-25T18:10:16Z

I've started working on defining the grammar in ANTLR. I'm following a similar approach used to create a js-solidity parser.

This will enable us to generate a parser to use in our Surya tool.

So far I've just been extracting the grammar from documentation and examples, but if the vyper project itself might make use of it in the future, it would be better to ensure the names and structure of nodes is similarly defined. Would someone from the core team be willing to spend 30 minutes walking me through the parser code, or even collaborate on defining the grammar?

jacqueswww · 2018-07-25T18:44:02Z

@maurelian sure, glad to help - we can arrange a call time on gitter.

This will be a good start to define a grammer: https://github.com/python/cpython/blob/master/Parser/Python.asdl

fubuloubu · 2018-07-25T19:52:22Z

One approach I've been wanting to take is a conversion step from the Python AST to a Vyper-specific AST. This can be defined in a friendly way for ANTLR or the K framework.

from vyper import ast
# Parses with Python ast, then Vyper ast
print('Vyper AST:', ast.parse(code))
# Prints out the grammer, perhaps in an ANTLR/K friendly format
print('Vyper AST Grammer rules:', ast._grammer)

jakerockland · 2018-11-28T00:50:30Z

Arrived here from what I've been following in #300. Curious on what the status is for the pathway to implementing this. From previous calls/conversations was the consensus on moving forward with a solution built with Sly @fubuloubu? That's what it seemed from the convo with @charles-cooper on #300, which makes sense to me but was also curious if/why the parser generator route had been ruled out.

fubuloubu · 2018-11-28T04:28:13Z

@jakerockland we can discuss it for sure at the next meeting. If people want to take on this challenge, it may be the time to do it.

A few things to note:

Writing a compiler front end is difficult. This will take at least 1 person-months to get right.
Any good front end should be formally verified to avoid the risk of really insidious bugs. K framework can help.
You will probably have to refactor a substantial portion of the compiler. That might not be the worst idea from a readability/maintainability perspective.

In regards to refactoring our current codebase, that is something @davesque was exploring. The current codebase mixes too many things from parsing into code generation. It would be nicer to see all compiler stages as separate modules with distinct interfaces between the stages, more akin to how you build compilers with functional languages like OCaml (which has an excellent set of libraries for that). I have a really, really, really old example of how that might be done here: https://github.com/fubuloubu/blocktract/blob/master/blocktract/ast.py

The idea is that each stage would be formalized in separate modules e.g. tokens.py, grammar.py, ast.py, a types/ directory, optimization/, etc. When adding new features or types, it would be pretty obvious how to do that, and there could be an auto-registration function that makes it trivial to register these new features with the overall compiler pipeline. That makes it easy to traverse through the stages and see how the code forms and morphs between each stage, making it really easy to debug when things go wrong and also easy to work on each step in series as you progress. Finally, the compiler interface should make it easy to hook into each stage and configure how the stages work together, which is important for handling optimzation correctly. Example of that here

jacqueswww · 2018-11-28T08:48:32Z

I don't think a custom parser at this point is a good idea, we can plan this for the 0.2 release. But at this stage there is plenty of other "not as flashy" issues to work on. Using the tokeniser for class isn't the most elegant solution, but can be done without too much trouble. There is a reason we want to keep vyper parsable by the python ast, and that is that it will always stay firmly rooted in python.

To me the codebase isn't ready for a custom parser (yet), and needs refactoring, whereafter one probably does not need the custom parser :P

Happy to discuss further on the next call.

jakerockland · 2018-11-28T15:16:54Z

@fubuloubu @jacqueswww Thank you both for all the input here! Would be great to loop back on this on the next call but definitely doesn't have to be a deep dive as there are a lot of hotter button issues that need to be resolved. Was mostly just curious what the state of this issue was 😄 👍

davesque · 2018-11-28T17:26:36Z

@jakerockland I also had some thoughts about using a custom parser when I originally started looking at Vyper. However, I think I agree with @jacqueswww that there are higher priorities. Of course, I'm still learning a lot about the entire codebase so my opinion is tastier with salt. 😄

charles-cooper · 2019-03-11T15:29:15Z

A resource for researching different parser generators and tools https://wiki.python.org/moin/LanguageParsing

charles-cooper · 2019-03-11T18:39:28Z

Of the options in the above link, the following seem reasonably modern/maintained, and also use grammars defined as some variant of EBNF (rather than python code):
https://github.com/erikrose/parsimonious
https://github.com/lark-parser/lark/
https://github.com/neogeny/TatSu/
https://github.com/pyparsing/pyparsing/

lark, pyparsing and tatsu have pre-written python grammar examples:
lark example
tatsu example
pyparsing example

pipermerriam · 2019-03-11T19:37:26Z

We have been using https://github.com/erikrose/parsimonious for eth-abi for a bit and I'm in the process of using it to define the grammar for the s-expression format of webassembly. My experience thus far is quite positive, though I have little to compare it to. The library is quite small and simple and the manner in which the parsing happens has been easy to reason about.

charles-cooper · 2019-03-19T17:35:22Z

I'm kind of liking tatsu, it lets us create our VyperAST module by just annotating the grammar (https://tatsu.readthedocs.io/en/stable/mini-tutorial.html#object-models) and it has abstractions for ast traversal (https://tatsu.readthedocs.io/en/stable/mini-tutorial.html#one-rule-per-expression-type) and code generation (https://tatsu.readthedocs.io/en/stable/mini-tutorial.html#code-generation). Not sure how powerful the latter is but I can see its potential.

fubuloubu · 2019-03-19T17:45:00Z

Just noticed TatSu is a refactor of Grako, so it has a LOT more pedigree than the GitHub would lead you to believe!

fubuloubu · 2019-03-19T21:06:11Z

Also, to summarize a discussion we had, this PR will be split into a few "stages". The stages of this VIP concerning actually replacing the use of the AST module is beyond the scope of the v0.1 release, but the early stages will prepare for this to make it as seamless as possible.

jacqueswww · 2019-04-04T09:38:56Z

Closing in favour of #1363.

fubuloubu · 2019-04-04T14:02:56Z

Can we make #1363 a VIP then? Capture the important bits of this one?

jacqueswww · 2019-04-04T14:05:15Z

@fubuloubu We can, but we haven't really ever had to do a VIP for internal before? (or have we?)

fubuloubu · 2019-04-04T14:08:50Z

That's true! If it doesn't change syntax, than it's just a refactor. If there are any syntax changes, we should make sure to capture those separately as VIPs so people can stay informed.

fubuloubu mentioned this issue Dec 8, 2017

VIP: Change Mapping Syntax #564

Closed

DavidKnott added VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting post beta labels Jan 15, 2018

GNSPS mentioned this issue Jun 14, 2018

Support Vyper Consensys/surya#15

Closed

charles-cooper mentioned this issue Nov 27, 2018

VIP: Named Structs #300

Closed

charles-cooper mentioned this issue Feb 28, 2019

VIP: [research] Static balance sheet analysis for Vyper contracts #1277

Open

jacqueswww mentioned this issue Mar 11, 2019

Meeting 11th March #1305

Closed

fubuloubu added this to To do in v1.0 Release Candidate via automation Mar 19, 2019

fubuloubu added this to To do in v2.0 Release Candidate via automation Mar 19, 2019

jacqueswww mentioned this issue Mar 20, 2019

The road to a custom parser #1363

Open

3 tasks

jacqueswww closed this as completed Apr 4, 2019

v1.0 Release Candidate automation moved this from To do to Done Apr 4, 2019

v2.0 Release Candidate automation moved this from To do to Done Apr 4, 2019

fubuloubu removed this from Done in v1.0 Release Candidate Apr 4, 2019

fubuloubu removed this from Done in v2.0 Release Candidate Apr 4, 2019

fubuloubu mentioned this issue Jul 15, 2020

Standalone Binary #1953

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VIP: Custom Parser #563

VIP: Custom Parser #563

fubuloubu commented Dec 8, 2017 •

edited

dani-jozsef commented Dec 12, 2017

fubuloubu commented Dec 13, 2017

DavidKnott commented Dec 13, 2017

tmke8 commented Dec 15, 2017

mslipper commented Dec 18, 2017

fubuloubu commented Dec 18, 2017

fubuloubu commented Dec 18, 2017

mslipper commented Dec 18, 2017

fubuloubu commented Dec 18, 2017

maurelian commented Jul 25, 2018

jacqueswww commented Jul 25, 2018

fubuloubu commented Jul 25, 2018 •

edited

jakerockland commented Nov 28, 2018 •

edited

fubuloubu commented Nov 28, 2018 •

edited

jacqueswww commented Nov 28, 2018 •

edited

jakerockland commented Nov 28, 2018

davesque commented Nov 28, 2018 •

edited

charles-cooper commented Mar 11, 2019

charles-cooper commented Mar 11, 2019

pipermerriam commented Mar 11, 2019 •

edited

charles-cooper commented Mar 19, 2019

fubuloubu commented Mar 19, 2019

fubuloubu commented Mar 19, 2019

jacqueswww commented Apr 4, 2019

fubuloubu commented Apr 4, 2019 •

edited by jacqueswww

jacqueswww commented Apr 4, 2019 •

edited

fubuloubu commented Apr 4, 2019

VIP: Custom Parser #563

VIP: Custom Parser #563

Comments

fubuloubu commented Dec 8, 2017 • edited

Preamble

Simple Summary

Abstract

Motivation

Specification

Backwards Compatibility

Copyright

dani-jozsef commented Dec 12, 2017

fubuloubu commented Dec 13, 2017

DavidKnott commented Dec 13, 2017

tmke8 commented Dec 15, 2017

mslipper commented Dec 18, 2017

fubuloubu commented Dec 18, 2017

fubuloubu commented Dec 18, 2017

mslipper commented Dec 18, 2017

fubuloubu commented Dec 18, 2017

maurelian commented Jul 25, 2018

jacqueswww commented Jul 25, 2018

fubuloubu commented Jul 25, 2018 • edited

jakerockland commented Nov 28, 2018 • edited

fubuloubu commented Nov 28, 2018 • edited

jacqueswww commented Nov 28, 2018 • edited

jakerockland commented Nov 28, 2018

davesque commented Nov 28, 2018 • edited

charles-cooper commented Mar 11, 2019

charles-cooper commented Mar 11, 2019

pipermerriam commented Mar 11, 2019 • edited

charles-cooper commented Mar 19, 2019

fubuloubu commented Mar 19, 2019

fubuloubu commented Mar 19, 2019

jacqueswww commented Apr 4, 2019

fubuloubu commented Apr 4, 2019 • edited by jacqueswww

jacqueswww commented Apr 4, 2019 • edited

fubuloubu commented Apr 4, 2019

fubuloubu commented Dec 8, 2017 •

edited

fubuloubu commented Jul 25, 2018 •

edited

jakerockland commented Nov 28, 2018 •

edited

fubuloubu commented Nov 28, 2018 •

edited

jacqueswww commented Nov 28, 2018 •

edited

davesque commented Nov 28, 2018 •

edited

pipermerriam commented Mar 11, 2019 •

edited

fubuloubu commented Apr 4, 2019 •

edited by jacqueswww

jacqueswww commented Apr 4, 2019 •

edited