Rewritten interpreter and AST optimizations #4

iconara · 2015-01-22T19:17:10Z

This adds a new JIT-friendly interpreter and AST optimizations that makes evaluating expressions much faster, 2-4x for the benchmarks I'm running.

The main thing that has changes is that instead of TreeInterpreter#dispatch walking the AST using a big case statement, each AST node contains can interpret itself. This removes almost all of the branching from the interpretation, and it makes it easier for JITing runtimes like JRuby to optimize the interpreter code. In the current implementation quite a lot of time is spent just running Symbol#=== because of all of the cases that needs to be checked to decide which code to run for the next node in the AST, this has been completely side-stepped since each node calls the next node directly.

I've tried to move as many decisions as possible to the parser phase to avoid branching in the interpreter. For example, each function is its own AST node, instead of having a case statement that needs to be run at interpretation. The same goes for projections, comparisons, etc.

I've also started on AST optimizations, such as flattening long chains of field and index lookups, conditions with literals, and a few other things. This part is a work in progress, but it can speed up some expressions by 20-30%.

I'd like to work on this a bit more, I don't think it's ready to merge yet. The tests all pass, but I think there's a need for some more tests around the optimizer, for example.

I'm also considering adding options for the optimizer to assume that values will always be hashes, never structs, and always have strings, never symbol keys (I would of course not change the current defaults, this would be an option only), because this would enable another level of optimizations that are not currently possible.

Replace the hash AST with one that consists of node objects. Each AST node can evaluate itself, which means that the job of TreeInterpreter is replaced by calling #visit on the top node in the AST.

#compare_values is exactly #==, and it's not much more code to write x.is_a?(Integer) than is_int(x).

if/else tends to be much easier to read. The semantics of Condition changes slightly with this change, the test is no longer strictly for true, but for truthiness, but it looks like we're guaranteed to only get true or false so it doesn't seem to matter.

Get rid of ExprNode completely.

Instead of creating a new array for each iteration just accumulate.

They're not used so there is no need for them to be there.

The only reason Field has to have an accessor for #key is that it is used to hold the function name while parsing. This introduces a temporary object that will hold the name.

It always has two children, so it makes more sense for it to name them than taking in an array

It always has just a single child, so it makes more sense for it to name it than taking in an array

It always has a value, so it makes more sense for it to name it than taking in an array

It always has two children, so it makes more sense for it to name them than taking in an array

It always has two children, so it makes more sense for it to name them than taking in an array This temporarily moves #hash_like? into Leaf, but the Node/Leaf thing will be refactored soon.

It always has two children, so it makes more sense for it to name them than taking in an array

Node was introduced for things that had an arbitrary number of children, but most things actually don't, it only looked that way.

This avoids branching at runtime.

This avoids using #send at runtime.

This inlines #hash_like?, which might be unfortunate, but it means that we can avoid creating arrays and check each branch twice, and a few other things.

It was only used temporarily for checking compatibility with the old TreeInterpreter

They are exactly the same thing

Also remove the comment, it doesn't explain anything the code doesn't.

Better to let Expression encapsulate the visiting.

Each literal string and array is an object allocation, every time the expression is run.

E.g. foo.bar.baz becomes one Node instead of several Subexpression:s with Field children.

When the projection is Current there's no need to run it

It shouldn't skip false values, just nil values

Also combine runs of Field

…teral

Right now the children of a Chain are guaranteed to be optimized, but if it were used from somewhere else that would not hold.

This is maybe a bit of an aggressive optimization, but Field and Index are probably common in longer runs, so this avoids a lot of method calls, potentially.

Every clause really has a subclause, so if/else makes more sense. The change also avoids using #key? when #[] + !#nil? will avoid another hash lookup.

trevorrowe · 2015-04-13T17:43:46Z

I need to apologize for forgetting about this pull-request. I do intend to take a closer look at this soon. Jmespath has recently added support at the language level for python style ranges and so I intend to add support for these as well.

iconara · 2015-04-14T05:44:49Z

Thanks. I put the work aside for a while, partly to get your comments and partly because the project where we were going to use JMESPath got put on the back burner. I'd love to see this merged in one form or another, though, so when you have time your comments and feedback would be much appreciated.

bjorne · 2015-06-09T15:13:47Z

While @iconara's specific use-case for this change got halted, we are now using this branch in production with success. I just wanted to make a bump here and report that we have not yet found any unexpected behavior. Would love to see this merged.

iconara · 2015-09-04T12:23:46Z

@trevorrowe is there any chance of this getting merged and released? Is there anything we can do to help?

trevorrowe · 2015-09-16T21:15:41Z

Thanks for being patient. I did get this merged, tested and pushed to master. I also took the opportunity to update the bundled suite of compliance tests to add newer functionality. I'll try to get this released shortly.

iconara · 2015-09-17T06:22:42Z

Thanks!

We forked and released our own gem, burtpath, but we'll probably switch back to jmespath.rb. There are some more contributions coming.

iconara added 30 commits January 21, 2015 08:48

Make the AST interpret itself

e9c732e

Replace the hash AST with one that consists of node objects. Each AST node can evaluate itself, which means that the job of TreeInterpreter is replaced by calling #visit on the top node in the AST.

Inline comparisons and type checks in Comparator

6820e75

#compare_values is exactly #==, and it's not much more code to write x.is_a?(Integer) than is_int(x).

Simplify the interpretation of Expression

a8b9002

Get rid of ExprNode completely.

Optimize Flatten

7cc2729

Instead of creating a new array for each iteration just accumulate.

Inline #projection into #visit in Projection

666347a

Inline slicing logic in Slice#visit

f729e40

Divide the nodes into nodes with children and leaf nodes

9d6ce38

Remove all unnecessary Node/Leaf accessors

c853e20

They're not used so there is no need for them to be there.

Don't re-use Field for function names

54bd3cf

The only reason Field has to have an accessor for #key is that it is used to hold the function name while parsing. This introduces a temporary object that will hold the name.

Use #each_with_object instead of #each + #with_object

348010b

Change Comparator to take explicit arguments

f004a6c

It always has two children, so it makes more sense for it to name them than taking in an array

Change Condition to take explicit arguments

19c2535

It always has two children, so it makes more sense for it to name them than taking in an array

Change Flatten to take explicit arguments

4c7f84e

It always has just a single child, so it makes more sense for it to name it than taking in an array

Change KeyValuePair to take explicit arguments

765bf10

It always has a value, so it makes more sense for it to name it than taking in an array

Change Or to take explicit arguments

727d837

It always has two children, so it makes more sense for it to name them than taking in an array

Change Pipe to take explicit arguments

cd38ec0

It always has two children, so it makes more sense for it to name them than taking in an array

Change Projection to take explicit arguments

be89ffa

It always has two children, so it makes more sense for it to name them than taking in an array This temporarily moves #hash_like? into Leaf, but the Node/Leaf thing will be refactored soon.

Change Subexpression to take explicit arguments

08be998

It always has two children, so it makes more sense for it to name them than taking in an array

Get rid of Leaf, everything is a Node

5c22392

Node was introduced for things that had an arbitrary number of children, but most things actually don't, it only looked that way.

Break Projection apart into {Array,Object}Protection

8530247

This avoids branching at runtime.

Break apart Comparator into {Eq,Neq,Gt,Gte,Lt,Lte}Comparator

3fa270f

This avoids branching at runtime.

Break apart Function into subclasses for each function

c016b84

This avoids using #send at runtime.

Optimize type detection in Function

36eb6e9

This inlines #hash_like?, which might be unfortunate, but it means that we can avoid creating arrays and check each branch twice, and a few other things.

Remove #to_h from all AST nodes

da83fc4

It was only used temporarily for checking compatibility with the old TreeInterpreter

Make Pipe an alias for Subexpression

235f322

They are exactly the same thing

Rename things in Projection

a5077fc

Also remove the comment, it doesn't explain anything the code doesn't.

Change how Expression and Function interacts

a97e20a

Better to let Expression encapsulate the visiting.

Avoid creating so many strings and arrays in Function

2b422cd

Each literal string and array is an object allocation, every time the expression is run.

Dispatch max_by and min_by on a symbol instead of interpolating a string

96659b0

iconara added 19 commits January 21, 2015 19:56

Name the parameters in Function#number_compare and ContainsFunction

00779d2

Move utility functions out of Function and into mixins

07eb986

Attempt to optimize runs of Field to a single Node

4e20823

E.g. foo.bar.baz becomes one Node instead of several Subexpression:s with Field children.

Make a slightly ugly optimization in Projection

b585f1a

When the projection is Current there's no need to run it

Fix a bug in Projection

283c88f

It shouldn't skip false values, just nil values

Flatten runs of Subexpression

65f4dd2

Also combine runs of Field

Make sure all nodes with children propagate the #optimize call

b1fe522

Make sure projections get optimized

8b95d61

Inline Comparator into Condition on #optimize

925f928

Optimize the case when the RHS of a comparison in a condition is a Li…

296582e

…teral

Rename "children" in MultiSelectHash to "kv_pairs"

c48eced

Make the arguments to Slice explicit

fd00e14

Optimze single step slices with positive start and stop

be4ac61

Optimize an extra time in Chain

407f236

Right now the children of a Chain are guaranteed to be optimized, but if it were used from somewhere else that would not hold.

Make the chaining done in Chain more generic

a244238

Optimize runs of Index to ChainIndex

bd9edca

Make Index an alias for Field

e6e7b9b

This is maybe a bit of an aggressive optimization, but Field and Index are probably common in longer runs, so this avoids a lot of method calls, potentially.

Cache the symbolized key in Key

033afd2

Rewrite the case/whens in Field to if/else

de5e54d

Every clause really has a subclause, so if/else makes more sense. The change also avoids using #key? when #[] + !#nil? will avoid another hash lookup.

trevorrowe merged commit de5e54d into jmespath:master Sep 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewritten interpreter and AST optimizations #4

Rewritten interpreter and AST optimizations #4

iconara commented Jan 22, 2015

trevorrowe commented Apr 13, 2015

iconara commented Apr 14, 2015

bjorne commented Jun 9, 2015

iconara commented Sep 4, 2015

trevorrowe commented Sep 16, 2015

iconara commented Sep 17, 2015

Rewritten interpreter and AST optimizations #4

Rewritten interpreter and AST optimizations #4

Conversation

iconara commented Jan 22, 2015

trevorrowe commented Apr 13, 2015

iconara commented Apr 14, 2015

bjorne commented Jun 9, 2015

iconara commented Sep 4, 2015

trevorrowe commented Sep 16, 2015

iconara commented Sep 17, 2015