Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify filter handling in adapters #320

Closed
dmajda opened this issue Feb 3, 2016 · 6 comments
Closed

Simplify filter handling in adapters #320

dmajda opened this issue Feb 3, 2016 · 6 comments
Assignees

Comments

@dmajda
Copy link
Contributor

dmajda commented Feb 3, 2016

Right now, adapters which translate static filter expressions into native queries have to walk a non-trivial AST, typically using ASTVisitor. This is not a great API, for several reasons:

  • Filter AST is complex and it requires handling of many cases, some of them irrelevant for the adapters (e.g. FilterLiteral).
  • Unsupported features have to be explicitly handled (typically by visitXxx functions throwing exceptions). This is tedious and prone to errors and divergence across adapters.
  • Various levels of filter expressions are handled at the same place. For example, visitBinaryExpression needs to handle expression-level operators AND and OR as well as term-level comparison operators like <.

The goal of this issue is to solve these problems by providing adapters a nicer filter processing API.

Proposal

The key insight is that filter expressions coming into the adapter consist of these concepts, building on top of each other:

  • field references
  • values
  • expression terms which compare field references and values using comparison operators
  • full-text terms
  • expressions which combine both kinds of terms using AND, OR and NOT operators

For most adapters, it probably makes sense to handle each of these concepts separately when building a native query.

One way to implement this idea is to provide a class (named e.g. SimpleFilterCompilerBase) which the adapters would inherit from and implement methods responsible for translating concepts they support. For unsupported concepts the adapter simply wouldn’t implement the relevant method, which means a default base class implementation which throws a suitable exception would be used.

The SimpleFilterCompilerBase class can have an interface like this (in TypeScript):

// Juttle AST node.
interface Node {}

// Base class of simple filter compilers.
class SimpleFilterCompilerBase {
    // Compiles a node. Generic method to be called when recursing.
    compile: (node: Node) => any;

    // Compiles a field reference. It is called only with Field nodes, e.g.:
    //
    //     { type: "Field", name: "a" }
    //
    compileField: (node: Node) => any;

    // Compiles a Juttle literal. It is called only with *Literal nodes, e.g.:
    //
    //     { type: "NumericLiteral", value: 5 }
    //
    // Typically, the method will switch on node.type, compile supported node
    // types and throw an exception in the default clause.
    compileLiteral: (node: Node) => any;

    // Compiles an expression term. It is only called with BinaryExpression
    // nodes, e.g.:
    //
    //    {
    //        type: "BinaryExpression",
    //        operator: "<",
    //        left: { type: "Variable", name: "a" },
    //        right: { type: "NumericLiteral", value: 5 }
    //    }
    //
    // Typically, the the method will switch on the operator, compile supported
    // operations (checking their operators for the right node types) and throw
    // an exception in the default clause.
    compileExpressionTerm: (node: Node) => any;

    // Compiles a fulltext term. It is only called with StringLiteral nodes,
    // e.g.:
    //
    //    { type: "StringLiteral", value: "abcd" }
    //
    compileFulltextTerm: (node: Node) => any;

    // Compiles an AND expression. It is only called with BinaryExpression
    // nodes, e.g.:
    //
    //    {
    //        type: "BinaryExpression",
    //        operator: "AND"
    //        left: { type: "BinaryExpression", ... },
    //        right: { type: "BinaryExpression", ... }
    //    }
    //
    compileAndExpression: (node: Node) => any

    // Compiles an OR expression. It is only called with BinaryExpression nodes,
    // e.g.:
    //
    //    {
    //        type: "BinaryExpression",
    //        operator: "OR"
    //        left: { type: "BinaryExpression", ... },
    //        right: { type: "BinaryExpression", ... }
    //    }
    //
    compileOrExpression: (node: Node) => any

    // Compiles a NOT expression. It is only called with UnaryExpression nodes,
    // e.g.:
    //
    //    {
    //        type: "UnaryExpression",
    //        operator: "NOT",
    //        expression: { type: "BinaryExpression", ... }
    //    }
    //
    compileNotExpression: (node: Node) => any;
}

Technically, the class would be implemented using a visitor which would call its methods. But this would be invisible to the adapters.

Discusssion

Why not just offer some kind of specialized ASTVisitor?

Offering a visitor which would provide default implementation of most methods (which would throw an exception or do an obvious thing like descending into a subexpression in cases like visitExpressionFilterTerm) is definitely a viable alternative. However this solution would retain two problems solved by SimpleFilterCompilerBase:

  • Conflation of AND/OR expressions and other binary operators in visitBinaryExpression
  • Complexity in cases like visitSimpleFilterTerm

What about preventing conflation by using different AST node types for AND/OR and other operators?

If done across all ASTs it would multiply the number of AST node types, which I’d like to avoid. And doing it only for filter ASTs would mean having two different kinds of ASTs, which I’d also like to avoid if possible.

What about getting rid of some of the complexity by simplifying the AST?

It is possible to some degree (e.g. eliminating FilterLiteral and ExpressionFilterTerm nodes before passing the AST to adapters) but not everywhere (e.g. we still need SimpleFilterTerm or something similar to designate FTS). And again, it would mean having two different kinds of ASTs, which I’d like to avoid if possible.

Wouldn’t it make more sense to avoid AST nodes in the API?

It is true that compileField could just take a string with a name, similarly for compileFulltextTerm. And compileLiteral could just take a Javascript representation of the value. In case of compileExpressionTerm, compileAndExpression, compileOrExpression, and compileNotExpression it would make sense to compile the operands and pass the methods the results.

The approach would probably work and it would save some complexity and explicit recursive calls. It would mean AST nodes wouldn’t have to be part of the external API. However, there are quite severe disadvantages:

  • Nodes contain location info which is used mainly for error reporting. Without them, the location info would have to be passed explicitly. While this is doable using extra arguments, it is ugly and it would also give methods only the location info covering the whole compiled expression (while methods like compileExpressionTerm would presumably also want to use location info covering some parts of the expression, e.g. when reporting invalid type of one of the operands).
  • In case of expressionFilterTerm we need to distinguish between field and value operands. In compiled form, these may not be distinguishable (e.g. they can both compile to strings).
  • We need explicit recursion in some cases anyway, e.g. in compilation of array literals.

For these reasons I think the node-based approach makes more sense.

Do we really want to use switch statements in compileLiteral and compileExpressionTerm?

We can either use switch statements, split the handling into separate methods, or follow the AST (like we do in visitors). This is mostly a style choice. The described solution roughly maps the methods to filter expression concepts as described above.

One advantage of fine-grained visitXxx approach is that the default implementations can just throw an exception and no code is required to make a construct unsupported. But if adapter authors use default clauses in their switch statements, it is easy to achieve a similar result.

@dmajda
Copy link
Contributor Author

dmajda commented Feb 3, 2016

@demmer Long read but I’d appreciate if you could have a look. Consider the proposal as a starting point for a discussion, not necessarily the final design.

@dmajda
Copy link
Contributor Author

dmajda commented Feb 9, 2016

Updated the description based on the results of a discussion with @demmer:

  • The compileBinaryBooleanExpression method was split and became compileAndExpression & compileOrExpression.
  • The compileUnaryBooleanExpression became compileNotExpression.
  • The methods were slightly reordered.

@dmajda
Copy link
Contributor Author

dmajda commented Feb 11, 2016

Current status:

@dmajda
Copy link
Contributor Author

dmajda commented Feb 16, 2016

There are three prototype implementations which use the API now:

Lessons learned:

  1. The new code is about the same length as with the visitors. Its structure is slightly better (e.g. the handling of AND/OR is separated from handling of comparison operators), but the “feel” is mostly the same.
  2. Generally, various compilers which use ASTVisitor present a compile method as an interface to users. It optionally sets some state and then calls this.visit(...). With the new API, the compile method is used by the framework in essentially the same sense as visit, which means it can’t be used to set any state. This can be worked around only by renaming (which leads to unnatural interface) or setting state in the constructor (which means it isn’t safe to use the compiler instance multiple times — something which is not readily apparent from its interface).
  3. Handling all literals in one method is good when the handling is trivial, bad if not (which may be the case for arrays/objects).
  4. We need to explicitly say how unsupported literals and operators are to be handled. In the prototype, we used calls like super.compileLiteral(node) in a default clause of a switch statement over literal and operator types, implicitly relying on the base class implementation doing the right thing (throwing an exception with an appropriate error message). This seems OK, it just needs some small tweaks in the base class to work well.

Overall, the transformation removed some boilerplate code and error handling, but the benefits are small. This made me wonder whether the abstraction is worth its weight. I originally wanted to avoid AST modification, but what if we do a small pass on the filter before it is handed over to an adapter, which would:

  1. Remove ExpressionFilterTerm nodes (which the compiler typically just passes-through).
  2. Replace SimpleFilterTerm nodes which contain StringLiteral (and thus represent FTS) with structurally simpler FulltextFilterTerm nodes.
  3. Replace SimpleFilterTerm nodes which contain FilterLiteral (and thus represent embedded filter expression coming from an input ) with the actual FilterLiteral AST (effectively inlining the literal).

This, together with already introduced Filter node and a ASTVisitor subclass which would throw errors for filter various constructs by default, would eliminate all the boilerplate and give us 90% of the new API’s benefits without introducing any new abstraction. The only remaining problem would be mixing AND/OR with the remaining operators.

I’m currently leaning to this solution, but I welcome thoughts (@demmer).

@demmer
Copy link
Contributor

demmer commented Feb 16, 2016

I haven't looked at the code at all but from what you wrote here I agree that the AST rewrite makes sense.

@dmajda
Copy link
Contributor Author

dmajda commented Feb 17, 2016

OK, going ahead with that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants