Evaluate rules on "startup" #15

magro · 2015-05-03T21:17:45Z

Rules should be evaluated when they're loaded/parsed to improve performance at query time.

magro · 2015-05-03T21:18:06Z

@renekrie Do you have a hint where to start? Then I'd give it a try.

renekrie · 2015-05-04T17:54:33Z

The idea would be to partially evaluate queries that constitute the right-hand side of Common Rules, for example, 'personal computers' in

pc =>
    SYNONYM: personal computers

The following information could be loaded and cached on startup:

the Lucene query that is created for a given term of the rhs query (per field). This will probably be a simple TermQuery in most cases but it could also be a BooleanQuery in case the analysis chain emits more than one token for the input term.
whether the Lucene query has any results (maybe this is an optional information but at least for TermQueries this is easy to retrieve via DF).

Loading this info on startup would save query time especially if there are many query fields and if a Common Rule adds many query terms. Adding 10 synonyms with 10 query fields would result in 100 additional TermQueries. We would always have to go through the Lucene analysis in order to create these queries - regardless of Solr caching - and some of these TermQueries would never match any document. Doing the analysis on startup and caching the TermQueries (or BooleanQueries) together with the DF information should therefore reduce query execution time later.

Where to start:

I've created a branch 'crpreload' for the development of this feature. In this branch, querqy-core/querqy.trie.TrieMap has already been made an Iterable over its values. TrieMap contains the mapping between an input and the resulting instructions and it is filled on startup.You can thus iterate over its values (getting you Instructions objects, which are just lists of Instruction objects) and inspect the instructions to get the rhs queries that you need for the preload (visitor pattern to deal with the different instruction types?).

Note that instructions and the Querqy query object model are search-engine independent. Maybe you'd want to pass an abstract Preloader to querqy.rewrite.commonrules.SimpleCommonRulesRewriterFactory and provide the implementation on the querqy-lucene module.

To create the cached Lucene query have a look at querqy-lucene/querqy.lucene.rewrite.LuceneQueryBuilder and at querqy-solr/querqy.solr.QuerqyDismaxQParser for passing in analyzers and params.

We'll have to clone the cached query when executing it later as boost factors might change per request. This will still be cheaper than running the full analysis chain.

renekrie · 2016-07-25T13:34:01Z

This has been available for some time and is now described here: https://github.com/renekrie/querqy#advanced-configuration-caching .

renekrie closed this as completed Jul 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate rules on "startup" #15

Evaluate rules on "startup" #15

magro commented May 3, 2015

magro commented May 3, 2015

renekrie commented May 4, 2015

renekrie commented Jul 25, 2016

Evaluate rules on "startup" #15

Evaluate rules on "startup" #15

Comments

magro commented May 3, 2015

magro commented May 3, 2015

renekrie commented May 4, 2015

renekrie commented Jul 25, 2016