Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern-matching compiler #1593

Merged
merged 15 commits into from
May 20, 2018

Conversation

Kodiologist
Copy link
Member

@Kodiologist Kodiologist commented Apr 24, 2018

(Currently, this PR depends on most of the small PRs I've posted recently. Once those PRs are merged, I can rebase out the corresponding commits.) (Done.)

I've written a small module hy.model_patterns that uses funcparserlib to provide parser combinators for Hy model objects. I expect this library to be useful to Hy programmers, but its chief purpose is for the compiler. I've rewritten the compiler to parse special forms with these model patterns rather than by manipulating model trees by hand (closes #1573). While I'm at it, I've also done a lot of refactoring of the compiler, although unfortunately, it's so huge that there's still more refactoring work to be done. The compiler is about 400 lines shorter and is now finally below 2,000 lines. I haven't made Hy models immutable (#1542), but I've made some progress in that direction by removing a lot of mutating code.

There's a catch, which is that error messages for special-form parse failures aren't very informative. They show where the parser stopped, but they don't say which tokens would be legal in the given position. I think I could patch funcparserlib to fix this, but its author (@vlasovskikh) seems to no longer be maintaining it, so we'd need to make our own fork.

Other thoughts for the future:

  • hy.model_patterns could use some documentation and tests and perhaps a helper function or two for using model patterns in macros.
  • We now depend on funcparserlib as well as rply, which is kind of silly since both of these are parsing libraries. We could rewrite Hy's core lexer and parser to use funcparserlib, which would make them more concise, but possibly lead to a noticeable slowdown for large Hy programs.
  • Parsing special forms more strictly exposes some obscure bugs in autoboxing; e.g., a docstring might not be autoboxed soon enough to be recognized as a docstring for Python 3.7. I think the thing to do here is to clean up the autoboxing and source-position-jockeying code so that a single recursive run is performed at compiler entry and after each macro-expansion. Making this change turns out to be harder than it sounds, so I'm leaving it for another PR.

@gilch
Copy link
Member

gilch commented Apr 24, 2018

Veto. It's not that I'm opposed to this pattern matching, it's just that this is a big change that will take a while to review, and as a deep change, it should have two reviewers. (Then we'll have at least three people who still understand how the compiler works.) It does seem like an improvement, and this pattern matching stuff could really help with macros too. I wonder how this compares to the Scheme-style macros.

@Kodiologist
Copy link
Member Author

Two reviews would definitely be nice, but the chances of getting them seem slim.

@Kodiologist Kodiologist mentioned this pull request Apr 25, 2018
@vodik
Copy link
Contributor

vodik commented Apr 27, 2018

Oh man, this looks awesome. I'll make some time to review this tomorrow.

@vodik vodik self-requested a review April 27, 2018 04:23
@Kodiologist Kodiologist force-pushed the compiler-pattern-matching branch 4 times, most recently from 117346d to 15c1436 Compare April 30, 2018 16:53
@Kodiologist
Copy link
Member Author

@gilch Sure enough, here we are two weeks later with no reviews. If you have any objections to the changes, then let's by all means discuss them, but a veto with no actual opposition to the changes defeats the purpose of the two-week policy, which is to allow uncontroversial PRs to be merged even when reviewers aren't available.

@gilch
Copy link
Member

gilch commented May 8, 2018

Not objecting to the concept is different than approving this implementation of it. I'll consider rescinding my veto after I have reviewed it myself, if no-one else is planning to. (@vodik @kirbyfan64 @tuturto you three have been the most active lately, any takers?)

If you feel the team is too much of a hindrance to be bothered with, we can't stop you from maintaining your own fork. But otherwise understand that we're all volunteers here. We're maintaining Hy without pay in our spare time. Hy is one of many priorities competing for our time and energy.

@Kodiologist
Copy link
Member Author

Not objecting to the concept is different than approving this implementation of it.

I'm asking for objections to the implementation of it, not just the concept.

The problem is not that the team is hindering me (or anyone else). The problem is that the team is often not there, because we're all volunteers who are maintaining Hy without pay in our spare time, and Hy is one of many priorities competing for our time and energy. That's why I pushed for the two-week policy, which largely solves the problem of non-availability of reviewers. To veto because nobody has the time to review a PR recreates the problem.

I really don't want to fork, because that would fragment the already small core team.

@tuturto
Copy link
Contributor

tuturto commented May 8, 2018

I'll have a look tonight

Copy link
Contributor

@tuturto tuturto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks super-cool

if type(atom) not in _model_compilers:
return
# Compilation methods may mutate the atom, so copy it first.
atom = copy.copy(atom)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily something that needs to be done now, but what would it require to make atoms immutable? I sort of thought that would be the case already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's #1542. Basically we'll just need to remove the remaining mutating code from the compiler.

some(lambda x: isinstance(x, group_type)) >>
(lambda x: group_type(whole(parsers).parse(x)).replace(x, recursive=False)))

def sb(*parsers):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sb I couldn't parse before reading the definition down here. We don't want to use too long name, because typing it would get annoying, but is there better than sb that isn't too long?

Copy link
Member Author

@Kodiologist Kodiologist May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

square or squareb? brackets is perhaps obvious, but people sometimes use that word to mean bracketing characters in general, including parentheses. Maybe sbrackets or sqbrackets or sqbrack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of those, I like brackets best. But this is nothing we need to solve if good name doesn't come easy. I just thought that since I really like the pattern matching rules we have now, we could have obvious names for them. Now new developer has to check the definition to understand what's going on (not that the code is long).

Copy link
Member Author

@Kodiologist Kodiologist May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'll do a git filter-branch to change it to brackets.

@Kodiologist
Copy link
Member Author

Thanks!

hy/compiler.py Outdated
if entries[0] == "quote":
@special(["quote", "quasiquote"], [FORM])
def compile_quote(self, expr, root, arg):
if root == "quote":
# Never allow unquoting
level = float("inf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably just be Inf here to be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Contributor

@vodik vodik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I don't see any reason to block this. Most of my comments don't really have anything to do with the code, I think this is very reasonable.

Question is should we maybe make another small release before merging this, or just going for it?

@@ -30,7 +30,9 @@ Other Breaking Changes
* `hy-repr` uses registered functions instead of methods
* `HyKeyword` no longer inherits from the string type and has been
made into its own object type.
* `HySymbol` no longer inherits from `HyString`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

* `(except)` is no longer allowed. Use `(except [])` instead.
* `(import [foo])` is no longer allowed. Use `(import foo)` instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this, maybe it makes sense to go ahead and remove the second level nesting of arrays: (import [foo bar baz]) now means from foo import bar, baz, instead of the existing (import [foo [bar baz]])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea, but it can be done in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vodik The import syntax has bothered me for a while. See #851. require too for that matter.

(defmacro a-set [] #{1 2})
(assert (= (a-set) #{1 2}))
(defmacro a-none [])
(assert (= (a-none) None)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation, but maybe it makes sense to add another test for NotImplemented. Its another special value that, while it works fine in Hy, we never test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, testing a macro that returns NotImplemented? It doesn't work (TypeError: Don't know how to wrap a <class 'NotImplementedType'> object to a HyObject).

hy/compiler.py Outdated
_decoratables += (ast.AsyncFunctionDef,)
# _bad_roots are illegal at the top level.
_bad_roots = tuple(ast_str(x) for x in (
"unquote", "unquote-splice", "unpack-mapping", "except"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add return, yield, yield-from and await here?

Top level, these forms are currently accepted by the hy compiler, but generate invalid AST and thus causes Python's compile to crash with "SyntaxError: '...' outside function", which is completely detached from the source code.

Maybe it makes for a better user experience to catch it early.

Copy link
Member Author

@Kodiologist Kodiologist May 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is perhaps misleading. The detection for _bad_roots doesn't distinguish between forms inside a function and forms at the top level; it just triggers for any attempt to use one of the given names as a special operator. except is illegal at the top level in the sense that it isn't a true special operator; it's handled inside @special("try", …) instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the comment more descriptive.

expression.pop(0)
cond = self.compile(expression.pop(0))
body = self.compile(expression.pop(0))
@special("if*", [FORM, FORM, maybe(FORM)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. Nice work.

In the process, I've banned the syntax `(import [foo])` in favor of `(import foo)`.
I haven't bothered to refine the patterns for these since I intend to completely overhaul the comprehension forms in the near future.
@Kodiologist
Copy link
Member Author

Question is should we maybe make another small release before merging this, or just going for it?

I want to go for it. It'll be at least a few weeks till the next release, and I'll be running Hy master in the meantime, so at least I'll find any big new bugs if nobody else does.

@gilch
Copy link
Member

gilch commented May 20, 2018

I was stepping through this with the debugger today, but looks like @vodik beat me to it. That's two approvals, since we already had one from @tuturto. You're free to merge. If I find anything it will just have to wait for another PR.

@Kodiologist
Copy link
Member Author

Okay. If you find something bad, you can just post an issue and I'll probably be able to make PR for a fix promptly.

@Kodiologist Kodiologist merged commit 3705635 into hylang:master May 20, 2018
@vodik
Copy link
Contributor

vodik commented May 20, 2018

I'll open an issue with the other import syntax change suggestion then (and maybe try to implement it to see if I really understand how this works now)

@vlasovskikh
Copy link
Contributor

@Kodiologist What are the changes to funcparserlib that you're proposing? I haven't been developing funcparserlib for a few years, but I still maintain it and I'd like to see improvements in the library by the community in the form of pull requests.

@Kodiologist
Copy link
Member Author

@vlasovskikh Oh, cool, good to talk to you. The change I'm proposing is to store the token that failed to match every time an alternation (|) switches to another option. Then, if the parse fails, the parser can report all the token types that would have been legal at the final position, and we can provide error messages like "expected symbol or number".

@vlasovskikh
Copy link
Contributor

@Kodiologist I like the general idea of being able to list expected terminals or non-terminals. However it's a bit unclear to me how you propose to fix it. We can continue this discussion at https://github.com/vlasovskikh/funcparserlib. Could you file an issue with a simple example grammar, an example file to parse and the desired output on parse errors? Or maybe even a pull request with an implementation of your idea for finding expected tokens? :)

@Kodiologist
Copy link
Member Author

See vlasovskikh/funcparserlib#52.

@Kodiologist Kodiologist deleted the compiler-pattern-matching branch January 20, 2019 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pattern-matching compiler
5 participants