Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify a DSL for metadata searches #268

Closed
wants to merge 26 commits into from
Closed

Specify a DSL for metadata searches #268

wants to merge 26 commits into from

Conversation

christeredvartsen
Copy link
Member

Imbo needs to be able to perform metadata searches against the images resource, and for this we need to specify a DSL.

Metadata is usually key => value pairs, the can also have nested key => value pairs. The search needs to support this along with other basic operations.

@christeredvartsen christeredvartsen added this to the DrPublish milestone Feb 19, 2014
@christeredvartsen christeredvartsen self-assigned this Feb 19, 2014
@christeredvartsen
Copy link
Member Author

One idea is to take a small subset of MongoDB query operators, and simply send the query as JSON in a query parameter.

The following comparison operators should be supported (taken from http://docs.mongodb.org/manual/reference/operator/query-comparison/):

  • $gt: Matches values that are greater than the value specified in the query.
  • $gte: Matches values that are equal to or greater than the value specified in the query.
  • $in: Matches any of the values that exist in an array specified in the query.
  • $lt: Matches values that are less than the value specified in the query.
  • $lte: Matches values that are less than or equal to the value specified in the query.
  • $ne: Matches all values that are not equal to the value specified in the query.

The following logical operators should be supported (taken from http://docs.mongodb.org/manual/reference/operator/query-logical/):

  • $or: Joins query clauses with a logical OR returns all documents that match the conditions of either clause.
  • $and: Joins query clauses with a logical AND returns all documents that match the conditions of both clauses.

To do wildcard searches I propose the following operator:

  • $wildcard: Used to perform wildcard searches, for instance *foo*.

@christeredvartsen
Copy link
Member Author

Today the Doctrine adapter disallows :: in key names as :: is used to separate parent/child relations (only in the Doctrine adapter). Other rules should be applied to metadata keys:

  • Can not contain .: This is not allowed in MongoDB, and because of this it should not be allowed in any other adapter either.
  • Can not start with $: Same reason as above.

@christeredvartsen
Copy link
Member Author

For case insensitive searches to work in both the Doctrine and MongoDB adapters the data stored in MongoDB will be duplicated, and searches will go against the normalized (lowercased) data. Searches will also be lowercased (for now).

@christeredvartsen
Copy link
Member Author

Before merging this PR the following tasks must be completed:

  • Document the metadata query feature - fixed in 9b48678
  • Document the change of rules in metadata keys - fixed in 4a79845
  • Update ChangeLog - fixed in 828250c
  • Create a feature file for Behat - fixed in 94fe17e
  • Add more integration tests for the different search operators to make sure they work in all adapters
  • Add a parser for the metadata query for Doctrine that will update the query builder object accordingly
  • Create a script for populating MongoDB with the normalized data that the search will use - fixed in 48e3ff6
  • Enforce the rules regarding characters that are no longer valid in metadata key names - fixed in 578df1b

@rexxars
Copy link
Member

rexxars commented Mar 12, 2014

Two issues I've found:

  • Wildcard queries should be quoted before sending them to MongoDBs regex operator (to prevent people from using regex stuff where we only intend to support wildcards)
  • Wildcard character (*) needs to be replaced by .* to be valid regex and actually do what we intend it to

@rexxars
Copy link
Member

rexxars commented Mar 13, 2014

An $exists operator would also be nice - useful when you want to return any image that has a metadata key but don't care about the value.

@christeredvartsen
Copy link
Member Author

@rexxars $exists has been implemented in 565f675 and the wildcard stuff was fixed in f26f8cd.

@christeredvartsen
Copy link
Member Author

I discovered a whole bunch of issues in the Doctrine version of the metadata query. Once fixed this PR is getting close to finished.

@kbrabrand
Copy link
Contributor

Given the recent discussion on this, and the conclusion that the functionality should be provided by a plugin and not be part of the core, this looks a bit different.

I'll use elasticsearch for metadata search and I suggest using the elasticsearch query DSL as well. It is well thought out, and proven.

Thoughts?

@rexxars
Copy link
Member

rexxars commented Feb 23, 2015

Slightly torn on this. It would certainly make it easier to implement.

I'd say if we rename the plugin to have es or elasticsearch in the name, that is an acceptable solution.

@fangel
Copy link
Member

fangel commented Feb 23, 2015

In my mind, the point of using a DSL for searching is that is should be possible to easily translate it into whatever query-language the search-provider (Mongo, Elasticsearch, etc) is using. So two things needs to be considered

  1. What textual syntax would Imbo like to use - ie, what do you send to Imbo. In this PR the proposal is using the textual (ehh, JSON) syntax that MongoDB uses - but with some extensions like $exists. But it could also easily be the Elasticsearch syntax. The easier it is to parse into a AST to transform into other query-languages the better (IMHO).
    Many of ES' search-features would be rather specific to ES - like boosting, proximity searches, fuzziness etc. As for "writability", the basic syntax of field:value [operator field:value]+ that ES uses, is nicer than the JSON-based one that Mongo uses. But for straight ease-of-parsing into a (pseudo)-AST, MongoDBs is way easier.
  2. What is the lowest-common-denominator to target. If the desire is to support Mongo, Doctrine (ie MySQL + PostgreSQL + ?) and Elasticsearch "out of the box", then it needs to be decided if you should only be allowed to write queries in the DSL that can actually be executed on all of the targets. In MySQL for instance, it might be hard to implement wildcard searches in a responsible way. Does this mean that the DSL shouldn't have wildcard searches? In my opinion, wildcard searches are important, and if you want to use certain relational databases as a search providers, you will just have to accept that it'll likely be slow.

The reason for this is of course to avoid lock-in and enable many different search-providers. But also to avoid having a leaky abstraction. Should it be Imbo that does the searches, or should Imbo just act as a "proxy" for one specific search-provider (Elasticsearch)? To avoid the latter, it's important to not rely on being able to take a unknown text-blob search-query and just forward it to Elasticsearch. Instead, Imbo must be able to parse the query-DSL and the compile it into a (possible identical) Elasticsearch query.

At least that's my to cents.

@kbrabrand
Copy link
Contributor

Lots of good points here. First; by moving the search feature out of the Imbo core and instead providing it as an addon to Imbo, the requirement of supporting Doctrine and Mongo as search backends is in my opinion not sound.

On the point of the DSL you're making a good point of bring able to translate the query into something that is consumeable for different search backends.

After some thinking I'm leaning towards making the search plugin easily pluggable and with interfaces that makes integrating other search backends (Solr and Lucene are obvious candidates) fairly easy.. And I'll stick to the DSL proposed by @christeredvartsen in this issue.

I'll implement the ES backend only now due to deadlines, but adding other backends sholdn't be a big job.

@fangel
Copy link
Member

fangel commented Feb 24, 2015

@kbrabrand, We, here at TV 2 Danmark, will likely also want to use ES as our metadata search-backend, so let me know if there is anything we can help with, then we can allocate resources for it too.

I can try and write a first draft of a Mongo/Imbo search-DSL -> ElasticSearch query transformation, if that has any interest...

@kbrabrand
Copy link
Contributor

That sounds great @fangel. I think the transformation should be done from the DSL and to something that can be used for querying with the elasticsearch-php client, which is basically just an array with properties.

We are running against the clock here in Oslo, so in order to plan the next weeks; do you know what your time frame for this looks like?

@fangel
Copy link
Member

fangel commented Feb 24, 2015

Yes, transforming into elastisearch-php query-parameters sounds like a good approach. It should be easy to write some unit-tests for the test-cases we care about and write a transformation that fits these...

And we're have scheduling our implementation beginning this week, so we can help from today if necessary.

@kbrabrand
Copy link
Contributor

That sounds awesome, @fangel!

@kbrabrand
Copy link
Contributor

@fangel: @christeredvartsen implemented the transformations for mongo for this issue, so you might want to take a look at the implementation and tests. It was done a while ago, so I applied the commits to a recent version of develop when I picked up this task last week; https://github.com/kbrabrand/imbo/tree/issue-268

@rexxars
Copy link
Member

rexxars commented Mar 19, 2015

Metadata search is implemented as a separate project/plugin, see https://github.com/imbo/imbo-metadata-search. Closing this.

@rexxars rexxars closed this Mar 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants