Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Opening an Interface Carring SearchContext and Query for Pre-Execution between QueryBuilder and QueryPhaseSearcher #13320

Open
conggguan opened this issue Apr 22, 2024 · 5 comments
Labels
enhancement Enhancement or improvement to existing feature or request Plugins RFC Issues requesting major changes Roadmap:Search and ML Project-wide roadmap label Search:Query Insights Search Search query, autocomplete ...etc

Comments

@conggguan
Copy link

conggguan commented Apr 22, 2024

Description

This RFC is related opensearch-project/neural-search#646.

Intro

Briefly, we want enhance the performance, divide a neuralSparseQuery to a less computationally expensive query and a more expensive query, we use the former to get a TopDocs firstly, and then use the more computationally expensive query as a rescoreQuery to adjust the score of the results.

limitation

There have lots of entry point to process the query, such as doRewrite(QueryRewriteContext queryShardContext), doToQuery(QueryShardContext queryShardContext). But each of them have no access to searchContext and whole query information. When we build a new feature which needs considering the global compound query information. Beside NeuralSparseQuery, HybridQuery also needs a interface carrying searchContext and whole query.

Existing compromise solution

@Override  
public Optional<QueryPhaseSearcher> getQueryPhaseSearcher() {   
    return Optional.of(new HybridQueryPhaseSearcher());  
}

The current solution is that register a QueryPhaseSearcher by the interface of SearchPlugin. So that we can get a searchWith entry point, all of query will reach this function and with searchContext, in this function, we can do what we want to do of searchContext and query.

public boolean searchWith(  
    final SearchContext searchContext,  
    final ContextIndexSearcher searcher,  
    final Query query,  
    final LinkedList<QueryCollectorContext> collectors,  
    final boolean hasFilterCollector,  
    final boolean hasTimeout  
){
	// do any preprocess we need...
	preProcess(query, searchContext);
	// do the origin searchWith
	super.searchWith();
}

pain point

  1. We take over the searchWith function, but what we do is just a preProcess without any search logic.
  2. Any query will reach here, but we don't need the query doesn't be support.
  3. One cluster, One QueryPhaseSearcher, this makes a limitation for different plugins including customQueryPhaseSearcher.
  4. When reach this, the query's error will not be explicit error any more , but a query shard failed instead.

Describe the solution you'd like

Open a plugin preProcess API that can do some custom progress before QueryPhaseSearcher.searchWith.

Design draft

Earlier AggregationProcessor

In QueryPhaseSearcher, we can Override aggregationProcessor() interface to add a preProcess or postProcess. We can do a similar but earlier interface in QueryPhase.

	// QueryPhase

	// plugin's custom preProcess
	customProcessor().preProcess();

	// origin logic ....
	final AggregationProcessor aggregationProcessor = queryPhaseSearcher.aggregationProcessor(searchContext);  
	aggregationProcessor.preProcess(searchContext);  
	boolean rescore = executeInternal(searchContext, queryPhaseSearcher);
	// origin logic ....

	// plugin's custom postProcess
	customProcessor().preProcess();

A TwoPhaseScorer

There is a class named TwoPhaseIterator, but indeed, it's two phase means there has two phases to determine the next iterate's step.
Build a Scorer that can use score result of first loop and do a second score process. It will make's more easy to do some job about multi-stage query.

open the access of searchContext in QueryBuilder's doToQuery() and add a proProcess interface for searchContext

We can register some proProcess lambda function of this funciton.

Related component

QueryPhaseSearcher, SearchPlugin

@conggguan conggguan added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 22, 2024
@conggguan conggguan changed the title [Feature Request] Open More Flexable Interface in QueryBuilder Stage [Feature Request] Open More Flexable Interface between QueryBuilder and QueryPhaseSearcher Apr 22, 2024
@conggguan conggguan changed the title [Feature Request] Open More Flexable Interface between QueryBuilder and QueryPhaseSearcher [Feature Request] Opening an Interface Carring SearchContext and Query for Pre-Execution between QueryBuilder and QueryPhaseSearcher Apr 24, 2024
@conggguan conggguan changed the title [Feature Request] Opening an Interface Carring SearchContext and Query for Pre-Execution between QueryBuilder and QueryPhaseSearcher [RFC] Opening an Interface Carring SearchContext and Query for Pre-Execution between QueryBuilder and QueryPhaseSearcher Apr 24, 2024
@peternied peternied added Search Search query, autocomplete ...etc Search:Query Insights RFC Issues requesting major changes and removed untriaged labels Apr 24, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@conggguan Thanks for creating this RFC looking forward to seeing how this evolves.

@andrross andrross added the Roadmap:Search and ML Project-wide roadmap label label May 14, 2024
@jed326
Copy link
Collaborator

jed326 commented May 14, 2024

Hey @conggguan thanks for opening the issue. I definitely agree the 1 QueryPhaseSearcher limit is a pain point that we need to address and I think the main problem is that the function of QueryPhaseSearcher::searchWith is too broad currently.

Like you mentioned, there are cases such as the neural-search plugin where basically the HybridQueryPhaseSearcher::searchWith is used to perform some preprocessing with the Query and SearchContext but there are also cases like the ConcurrentQueryPhaseSearcher where it's used to execute a completely different search path. Cases like the latter are why we don't support multiple QueryPhaseSearchers today but cases like the former seem pretty reasonable to move out of the QueryPhaseSearcher construct entirely.

I think the high level idea of having some sort of preprocess(Query, SearchContext) is useful, but I'm wondering if you have thought about how to make this support multiple plugins? Especially if we want to be able to mutate SearchContext from this preprocess entry point then it seems like we would need to have some notion of ordering for the processors.

@jed326
Copy link
Collaborator

jed326 commented May 14, 2024

@conggguan I read through the related issue opensearch-project/neural-search#646 and from that it looks like you are exploring the search pipelines approach instead, does that mean you do not need this functionality?

Regardless it would be nice to hear if you have any thoughts on how to compose the preProcess steps if there are multiple plugins providing their own preprocess implementation. Thanks!

@conggguan
Copy link
Author

Hi @jed326
Thanks your reply.
As you mentioned, I am exploring the search pipelines approach for now, but this is more of a short-term plan. In the future, as the Neural-Search plugin evolves, I anticipate there will be an increasing need for preprocessing requirements.

And for your concerns of how to compose the different preprocess of plugins, I gave it some thought and came up with the following possibilities.

PreProcess Level

We can manage and prioritize different action by different level.
For example,

Level Action Limitation
Basic Level Only validation and error reporting Only Read
Mutate SubQuery Do some change of subQuery Only for registered query type, and only one preprocess for one query type.
Mutate QueryCollectorContext Change/add QueryCollector
Mutate SearchContext Add/change rescore/size/searchExt

Consider the compatibility, I think can use a similar way, gradually open up some interfaces for developers to use.

@jed326
Copy link
Collaborator

jed326 commented May 15, 2024

Thanks @conggguan. I'm also working on a feature (RFC to be published soon) where we need to modify the QueryCollectorContext and we need to work around the 1 QueryPhaseSearcher limitation as well. We were also thinking of a priority based approach similar to what you described where we can perform the preprocessing based on priority, sort of similar to what ActionFilters do today:

/**
* The position of the filter in the chain. Execution is done from lowest order to highest.
*/
int order();

But aside from that I think one of the big problems still is that the current searchWIth plugin interface is very broad and should only be used for changing the actual search execution path while any query/context processing should happen outside of it because only 1 plugin should be changing the search execution path but multiple plugins can introduce their own processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Plugins RFC Issues requesting major changes Roadmap:Search and ML Project-wide roadmap label Search:Query Insights Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants