Skip to content
This repository has been archived by the owner on Apr 12, 2023. It is now read-only.

Query DSL representation #2

Closed
fangel opened this issue Feb 25, 2015 · 4 comments
Closed

Query DSL representation #2

fangel opened this issue Feb 25, 2015 · 4 comments

Comments

@fangel
Copy link
Member

fangel commented Feb 25, 2015

We need to decide on how to store/process the Query DSL internally inside this plugin. In the original Metadata-search issue (imbo/imbo#268), it was proposed and agreed upon to use a Mongo query-language subset to specify searches in. So this determines the external/textual representation of our query-DSL.

However, the internal structure can technically be whatever we want it to be. So here is my proposal to the three obvious internal representations (AST) of the query-DSL.

Option 1 Store the Mongo JSON as is

That is, the input-query

{"foo": "bar", "baz": {"$not": {"$gt": 42}}}

would be stored internally as

['foo' => 'bar', 'baz' => ['$not' => ['$gt': 42]]]

So basically the result of calling json_decode - Albeit with a few modifications (translate to lower-case) and checks (throw exceptions on unknown operators like $regex.

Option 2 Store the Mongo JSON as a normalised Mongo queries

In Mongo it is possible to represent many queries in multiple different, equivalent ways. Take for instance the two queries

{"foo": "bar", "baz": "blargh"}
{"$and": [{"foo": "bar"}, {"baz": "blargh"}]}

They are equivalent when executed, but the latter is much easier to transform into other query-languages because there will only be a few ways of building up queries. Because basically all queries can be normalised into being of one of the following 5 query structures

{"$and": [term, term, term]}
{"$or": [term, term, term]}
{"field": "value"}
{"field": {"$operator": "value"}}
{"field": {"$not": {"$operator": "value"}}}

So for instance the query

{"foo": "bar", "baz": {"$not": {"$gt": 42}}}

would be stored internally as

['$and' => [
  ['foo' => 'bar']
  ['baz' => ['$not' => ['$gt' => 42]]]]
]]

Doing recursive decents over such a simple data-structure makes it a lot easer to translate it into e.g. ElasticSearch queries.

Option 3 Store normalized Mongo queries as instances of AST-classes

This is basically doing the normalisation from option 2, but instead of storing it as associative arrays, it would be stored as instances of specific classes, like \Imbo\MetadataSearch\Dsl\Ast\And

So the query

{"foo": "bar", "baz": {"$not": {"$gt": 42}}}

would internally be stored as

new Dsl\Ast\And([
    new Dsl\Ast\Field('foo', new Dsl\Ast\Comparison\Equal('bar')),
    new Dsl\Ast\NegatedField('baz', new Dsl\Ast\Comparison\GreaterThan(42))
])

This structure makes it even easier / more readable to do recursive dececents over the query-DSLs AST. You could do something like the following (I admit this looks a bit silly, but you know - without pattern matching, there is only so much you can do)

function transformToEs(Dsl\Ast $query) {
    switch(TRUE) {
        case $query instanceof Dsl\Ast\And:
            return '(' . implode(' AND ', array_map('transformToEs', $query)) . ')';
        case $query instanceof Dsl\Ast\Or:
            return '(' . implode(' OR ', array_map('transformToEs', $query)) . ')';
        case $query instanceof Dsl\Ast\Field:
           return $query->field . ':' . transformComparisonToEs($query->value);
        case $query instanceof Dsl\Ast\NegatedField:
           return 'NOT ' . $query->field . ':' . transformComparisonToEs($query->comparison);
    }
}
function transformComparisonToEs(Dsl\Ast\Comparison $query) {
    switch(TRUE) {
        case $query instanceof Dsl\Ast\Comparison\Equal:
            return $query->value;
        case $query instanceof Dsl\Ast\Comparison\GreaterThan:
            return '>' . $query->value;
        // and so forth, for >=, <= and <
    }
}

Personally, I would want to go with either option 2 or 3. By going with option 1, we're going to make it harder than necessary to write transformations for multiple search backends. Doing the normalisation will also allow us to reject more malformed queries...

The differences between 2 and 3 is basically just that option 3 adds a more rigidly enforced structure on the internal representation (AST). It also can make it easier to read transformation functions, because can have potentially more descriptive class-names than the text-string that Mongo uses for operators. But this structure does come with the "overhead" of requiring quite a few class-definitions of all rather small classes that needs to contain 1-2 values.

So what are peoples opinion on how the query-DSL should be represented internally in this plugin?

-Morten.

@kbrabrand
Copy link
Contributor

My vote goes to number three. It looks a bit messy now, but adding a few use statements will shorten the code considerably. I think the slightly more rigid parsing and rule set is a good thing in this case.

We need to at least implement the $exists operator and a wildcard operator was mentioned, but I'm not sure what operator was used for wildcards in the end. Should check the implementation by @christeredvartsen and implement the same set of operators.

@fangel
Copy link
Member Author

fangel commented Feb 26, 2015

Yes. Mongo already has support for $exists in their Query DSL, so our subset will just have to include this too.

The Mongo Query-DSL does not have (as far as I know) support for a $wildcard operator, so technically speaking, our DSL won't be a subset of Mongos - but hey. $wildcard can however be emulated using $regex, so Mongo does support it, albeit with a different syntax. So it's fine for us to have it, as all of our search backends can support it (SQL would likely have to be some sort of LIKE, which is slow but functional).

@kbrabrand
Copy link
Contributor

I don't know how you planned to implement the AST building and handling, but I looked into a few different alternatives and Dissect seems to provide a pretty straight forward way of building and working with ASTs. Just a thought.

@fangel
Copy link
Member Author

fangel commented Feb 27, 2015

Closing this issue, as we now have a a good start to an implementation that follows option 3.

@fangel fangel closed this as completed Feb 27, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants