Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewritten interpreter and AST optimizations #4

Merged
merged 49 commits into from
Sep 16, 2015

Conversation

iconara
Copy link
Contributor

@iconara iconara commented Jan 22, 2015

This adds a new JIT-friendly interpreter and AST optimizations that makes evaluating expressions much faster, 2-4x for the benchmarks I'm running.

The main thing that has changes is that instead of TreeInterpreter#dispatch walking the AST using a big case statement, each AST node contains can interpret itself. This removes almost all of the branching from the interpretation, and it makes it easier for JITing runtimes like JRuby to optimize the interpreter code. In the current implementation quite a lot of time is spent just running Symbol#=== because of all of the cases that needs to be checked to decide which code to run for the next node in the AST, this has been completely side-stepped since each node calls the next node directly.

I've tried to move as many decisions as possible to the parser phase to avoid branching in the interpreter. For example, each function is its own AST node, instead of having a case statement that needs to be run at interpretation. The same goes for projections, comparisons, etc.

I've also started on AST optimizations, such as flattening long chains of field and index lookups, conditions with literals, and a few other things. This part is a work in progress, but it can speed up some expressions by 20-30%.

I'd like to work on this a bit more, I don't think it's ready to merge yet. The tests all pass, but I think there's a need for some more tests around the optimizer, for example.

I'm also considering adding options for the optimizer to assume that values will always be hashes, never structs, and always have strings, never symbol keys (I would of course not change the current defaults, this would be an option only), because this would enable another level of optimizations that are not currently possible.

Replace the hash AST with one that consists of node objects. Each AST node can evaluate itself, which means that the job of TreeInterpreter is replaced by calling #visit on the top node in the AST.
#compare_values is exactly #==, and it's not much more code to write x.is_a?(Integer) than is_int(x).
if/else tends to be much easier to read.

The semantics of Condition changes slightly with this change, the test is no longer strictly for true, but for truthiness, but it looks like we're guaranteed to only get true or false so it doesn't seem to matter.
Get rid of ExprNode completely.
Instead of creating a new array for each iteration just accumulate.
They're not used so there is no need for them to be there.
The only reason Field has to have an accessor for #key is that it is used to hold the function name while parsing. This introduces a temporary object that will hold the name.
It always has two children, so it makes more sense for it to name them than taking in an array
It always has two children, so it makes more sense for it to name them than taking in an array
It always has just a single child, so it makes more sense for it to name it than taking in an array
It always has a value, so it makes more sense for it to name it than taking in an array
It always has two children, so it makes more sense for it to name them than taking in an array
It always has two children, so it makes more sense for it to name them than taking in an array
It always has two children, so it makes more sense for it to name them than taking in an array

This temporarily moves #hash_like? into Leaf, but the Node/Leaf thing will be refactored soon.
It always has two children, so it makes more sense for it to name them than taking in an array
Node was introduced for things that had an arbitrary number of children, but most things actually don't, it only looked that way.
This avoids branching at runtime.
This avoids using #send at runtime.
This inlines #hash_like?, which might be unfortunate, but it means that we can avoid creating arrays and check each branch twice, and a few other things.
It was only used temporarily for checking compatibility with the old TreeInterpreter
They are exactly the same thing
Also remove the comment, it doesn't explain anything the code doesn't.
Better to let Expression encapsulate the visiting.
Each literal string and array is an object allocation, every time the expression is run.
E.g. foo.bar.baz becomes one Node instead of several Subexpression:s with Field children.
When the projection is Current there's no need to run it
It shouldn't skip false values, just nil values
Also combine runs of Field
Right now the children of a Chain are guaranteed to be optimized, but if it were used from somewhere else that would not hold.
This is maybe a bit of an aggressive optimization, but Field and Index are probably common in longer runs, so this avoids a lot of method calls, potentially.
Every clause really has a subclause, so if/else makes more sense. The change also avoids using #key? when #[] + !#nil? will avoid another hash lookup.
@trevorrowe
Copy link
Contributor

I need to apologize for forgetting about this pull-request. I do intend to take a closer look at this soon. Jmespath has recently added support at the language level for python style ranges and so I intend to add support for these as well.

@iconara
Copy link
Contributor Author

iconara commented Apr 14, 2015

Thanks. I put the work aside for a while, partly to get your comments and partly because the project where we were going to use JMESPath got put on the back burner. I'd love to see this merged in one form or another, though, so when you have time your comments and feedback would be much appreciated.

@bjorne
Copy link

bjorne commented Jun 9, 2015

While @iconara's specific use-case for this change got halted, we are now using this branch in production with success. I just wanted to make a bump here and report that we have not yet found any unexpected behavior. Would love to see this merged.

@iconara
Copy link
Contributor Author

iconara commented Sep 4, 2015

@trevorrowe is there any chance of this getting merged and released? Is there anything we can do to help?

@trevorrowe trevorrowe merged commit de5e54d into jmespath:master Sep 16, 2015
@trevorrowe
Copy link
Contributor

Thanks for being patient. I did get this merged, tested and pushed to master. I also took the opportunity to update the bundled suite of compliance tests to add newer functionality. I'll try to get this released shortly.

@iconara
Copy link
Contributor Author

iconara commented Sep 17, 2015

Thanks!

We forked and released our own gem, burtpath, but we'll probably switch back to jmespath.rb. There are some more contributions coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants