New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate rules on "startup" #15
Comments
@renekrie Do you have a hint where to start? Then I'd give it a try. |
The idea would be to partially evaluate queries that constitute the right-hand side of Common Rules, for example, 'personal computers' in
The following information could be loaded and cached on startup:
Loading this info on startup would save query time especially if there are many query fields and if a Common Rule adds many query terms. Adding 10 synonyms with 10 query fields would result in 100 additional TermQueries. We would always have to go through the Lucene analysis in order to create these queries - regardless of Solr caching - and some of these TermQueries would never match any document. Doing the analysis on startup and caching the TermQueries (or BooleanQueries) together with the DF information should therefore reduce query execution time later. Where to start: I've created a branch 'crpreload' for the development of this feature. In this branch, querqy-core/querqy.trie.TrieMap has already been made an Iterable over its values. TrieMap contains the mapping between an input and the resulting instructions and it is filled on startup.You can thus iterate over its values (getting you Note that instructions and the Querqy query object model are search-engine independent. Maybe you'd want to pass an abstract Preloader to querqy.rewrite.commonrules.SimpleCommonRulesRewriterFactory and provide the implementation on the querqy-lucene module. To create the cached Lucene query have a look at querqy-lucene/querqy.lucene.rewrite.LuceneQueryBuilder and at querqy-solr/querqy.solr.QuerqyDismaxQParser for passing in analyzers and params. We'll have to clone the cached query when executing it later as boost factors might change per request. This will still be cheaper than running the full analysis chain. |
This has been available for some time and is now described here: https://github.com/renekrie/querqy#advanced-configuration-caching . |
Rules should be evaluated when they're loaded/parsed to improve performance at query time.
The text was updated successfully, but these errors were encountered: