Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Improving Search relevancy through Generic Reranker interfaces #485

Closed
HenryL27 opened this issue Nov 3, 2023 · 53 comments
Closed
Labels
Features Introduces a new unit of functionality that satisfies a requirement RFC

Comments

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 3, 2023

Problem statement

Addresses #248

Reranking the top search results with a cross-encoder has been shown to improve search relevance rather dramatically. We’d like to do that. Furthermore, we’d like to do that inside of OpenSearch, for a couple reasons: 1/ it belongs there - it’s a technique to make your search engine search better, and 2/ it needs to precede RAG to integrate with it - the retrieval that augments the generation needs to be as good as possible - and succeed the initial retrieval, obviously - so it should be in OpenSearch.

Goals

  • Users will be able to create reranking search pipelines in OpenSearch to rerank search results using a cross encoder.
  • Users will be able to upload, deploy, and predict with cross-encoder models with ml-commons ([RFC] Support more local model types ml-commons#1164)

Non-goals

  • Query plans
  • recursive reranking
  • remote LLM-based reranking
  • prompt engineer your cross-encoder

Proposed solution

Reranking will be implemented as a search response processor, similar to RAG. Cross-Encoders will be introduced into ml-commons to support this.

Architecture / Rerank Search Path

RerankerArchitecture

Rest APIs

Create Rerank Pipeline

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "ml_opensearch": {
          "model_id": id of TEXT_SIMILARITY model [required]
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...]
        }
      }
    }
  ]
}

"ml_opensearch" refers to the kind of rerank processor.
"model_id" should be the id of the text_similarity model in ml-commons
"context" tells the pipeline how to construct the context it needs in order to rerank
"document_fields" are a list of fields of the document (in _source or fields) to rerank based on. Multiple fields will be concatenated as strings.

Query Rerank Pipeline

Provide to the search pipeline as a search ext the params for the reranker. Use either "query_text", which acts as the direct text to compare all the docs against, or "query_text_path", which is an xpath that points to another location in the query object.

POST index/_search?search_pipeline=rerank_pipeline
{
  "query": {...}
  "ext": {
    "rerank": {
      "query_context": {
         "query_text | query_text_path": <the query text to use for reranking | 
                                         the path to the query text to use for reranking>
      }
    }
  }
}

For example, with a neural query we might have

"query_text_path": "query.neural.embedding.query_text"

The rerank processor will evaluate the all search results, and then sort them based on the new scores.

Upload Cross Encoder Model

POST /_plugins/_ml/models/_upload
{
  "name": "model name" [required],
  "version": "1.0.0 or something" [required],
  "description": "description" [required],
  "model_format": "TORCH_SCRIPT" [required],
  "function_name": "TEXT_SIMILARITY" [required],
  "model_content_hash_value": "hash browns" [required],
  "url": "https://url-of-model" [required]
}

This is not a new API and all the other model-based APIs should still work for the cross encoder model/function name with minimal work to integrate.

Predict with Cross Encoder Model

See the Cross-Encoder PR

Risks

  • Query Latency. Cross-encoder latency can be large - one inference is about the same cost as an embedding inference, but every rerank query will make k inferences - depending on hardware, batch size, gpu size, etc., especially if you’re trying to rerank a lot of documents

Implementation Details

The overall reranking flow will be:

  1. Generate a context object used to rerank documents
  2. Use that context to rescore them
  3. Sort the documents according to the new scores

We will implement two main base classes for this work: RerankProcessor and ContextSourceFetcher.

ContextSourceFetcher
This will retrieve the context needed to rerank documents. Essentially, step 1. A particular rerank processor may make use of several of these, and they can get their context from any source.

RerankProcessor
Orchestrates the flow by combining all the context from the ContextSourceFetchers, then generates scores for the documents via an abstract score method, then does the sorting.

Extensibility

It is my hope that these interfaces are simple enough to extend and configure that we can create a rich ecosystem of rerank processors. To implement the cross-encoder reranker, all I need to do is create a NlpComparisonReranker subclass that says "score things with ml-commons", a DocumentContextSourceFetcher subclass that retrieves fields from documents, and a QueryContextSourceFetcher that retrieves context from the query ext.

If I wanted to implement the Amazon Personalize reranker of the search-processors repo, I would implement an AmazonPersonalizeSourceContextFetcher and an AmazonPersonalizeReranker, which only have to do the minimal amount of work to make the logic functional.

I also think is should be possible incorporate some of the work from the Score Normalization and Combination feature, but that's outside the scope of this RFC.

Alternative solutions

Rerank Query Type

Another option is to implement some kind of rerank query. This would wrap another query and rerank it. For example

POST index/_search
{
  "query": {
    "rerank": {
      "query": {
	      "neural": {
	        "embedding": {
	          "query_text": "Oh where is my hairbrush",
	          "k": 100,
	          "model_id": "embedding model id"
	        }
        }
      },
      "top_k": 25,
      "model_id": "reranker model id",
      "context_field": "text_representation",
      "query_text": "Oh where is my hairbrush"
    }
  }
}

Pros:

  • Abstracts away the search pipeline layer
  • Potentially allows composing reranks in other kinds of boolean and hybrid queries

Cons:

  • Rerank should be a post-search step. Allowing composition with other kinds of queries defeats the purpose of reranking
  • Rerank probably needs to be global - after the fetch phase
  • I personally think this api is kinda messy. Can probably be improved, but I’m not sure how
@dylan-tong-aws
Copy link

dylan-tong-aws commented Nov 4, 2023

Hi Henry,

Thanks, for putting this together. I have a few questions...

  1. Can you clarify how you propose to supporting pre and post request format processing in this pipeline? Is this built into the pipeline or did you envision this to be part of the connector? It would be great to have a search processor that provides an easy way to configure JSON-to-JSON transforms to simplify the effort with integrating with various downstream APIs and models.

  2. What controls does the user have around configuring how results are sent to the re-ranker? Let say the re-ranker isn't a managed API and it's hosted on a model server--are you proposing any controls like the ability to send results as async mini-batches and performing post processing like merge and sort?

  3. What controls does the user have with configuring what data get's sent to the re-ranker model? There are slight variations in re-ranking use cases in terms of what inputs are passed to the re-ranker model. In some cases, it's just the search results. Other use cases require the query context.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 4, 2023

Thanks @dylan-tong-aws. I have a few responses!

  1. My mental model is that JSON-to-JSON search-result transforms should belong in their own response processor. This processor will look for a source field (specified at pipeline creation) and package that off to the reranker with the query_text. If you want your context text to look in a certain way, throw a processor before the reranker processor that performs your transformation.
  2. I'm currently adding ml-commons support for cross-encoder (text similarity) models; that entails a new kind of MLInput that contains a list of text pairs that the model will evaluate the relevance between. What this processor will do (sorry if that was unclear from the RFC) is use this interface to re-score the top k search results, and then re-sort them inside the processor itself. I understand that Cohere has a rerank API? I'm not using it by default here. But hopefully the ml-commons text similarity interface I'm building will integrate via connector with that.
  3. Cross encoders (the case I'm trying to support here) always require the search results and the query. See point 1 for what I think about customizing what the document text looks like. As for the query text, the user specifies it, so that should be sufficiently controllable. This work does not intend to support reranking with only the documents, nor reranking based on user context from other sources. I'm not convinced that there's a general rerank interface sufficiently different from the response processor interface to justify such a thing existing, so this is not trying to do that.

This is a narrow use-case. Just take all your docs and ask a (text-to-float) language model how similar they are. Then sort based off of that. Nonetheless, this alone can give like a 15-20% boost to recall in the top couple, so I think it's worth knocking out.

p.s. Ok, I read up on the cohere rerank api and it should be able to connect to this work more readily than without it

@navneet1v navneet1v added Features Introduces a new unit of functionality that satisfies a requirement RFC and removed untriaged labels Nov 4, 2023
@navneet1v
Copy link
Collaborator

Hi,
@HenryL27 thanks for creating the RFC. I have some suggestions and comments:

  1. In all the new APIs Inputs that you have provided can you please add what the required and what are not required parameters.
  2. Why do we need a top_k parameter in processor and in ext.
  3. When the documents are re-ranked what will happen to the scores of the documents?
  4. context_field in the response processor is not fitting what the actual value will be. Please rename.
  5. When working with vector search, recommendation is not get _source fetch vector field adds latency. Rather it is advised to fetch the fields which are only required, how re-ranking will work in that case?
  6. The risk that has been added as part of RFC seems like a big risk, do you have any proposal on how latencies can be reduced?
  7. Will the cross encoder model be a local model or remote model?
  8. As part of RFC can you add some open source cross encoder models?
  9. Can you please provide some understanding why we need python extensions for cross encoder model? Given that Opensearch with ML commons provide a support for models, why we need this?
  10. For Query Rerank Pipeline I see two options please add what is the recommended solution there.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 5, 2023

@navneet1v thanks!

  1. done
  2. I'm not sure we need it in both places. In most cases you'll probably just set it in the processor and forget about it. But I thought that maybe if you know you need to rerank a lot of things for a particular query (or you only need to rerank a few things for a particular query) it would be nice if you had an override switch.
  3. The scores will be overwritten with the new scores from the cross encoder. Yes, this throws out any previous normalization, and yes, if you only rerank some of the documents you can get weird inconsistencies. I can normalize the cross-encoder scores maybe? I'm not sure it's worth it though. If you have a good idea of what the behavior should be I'm all ears
  4. I'm not sure I understand what you mean by this. context_field tells the processor what text to send from each document as context against which to compare the query. A cross encoder takes (query, context) pairs. So we need to specify what the query and contexts are. What did you have in mind?
  5. You wouldn't rerank based on a vector; it only makes sense to rerank based on semantically meaningful text. Presumably, that semantically meaningful text that you're reranking is also the semantically meaningful text that you care about as a search user; though I recognize that that's an assumption. If you don't fetch the context field, then the reranking processor should either do nothing or error out, since there's nothing to rerank.
  6. This is the purpose of top_k (and maybe it should be required bc of this?) - all we can really do is force the user to think about it. If the user tries to rerank 10,000 documents, that's kinda on them, ya know? We can optimize the hell out of this code, but the performance bottleneck is the cross encoder model and there's really not much we can do about that besides make it clear that this can be an issue.
  7. Currently building cross encoders as local, but it should be possible to Connect to the Cohere endpoint as well (does anyone else offer a rerank API?)
  8. Sure thing. For my own testing I've been playing with the BGE reranker. Once I have cross-encoder support in ml-commons I'll at least publish some upload recipes, and try to put something in the pretrained model archive.
  9. Hah! No, we don't need python extensions. Someone asked me if we could use python extensions for this, and that section of the rfc is me saying "we could, but we really shouldn't."
  10. I think the best option is option 1 (fully write your query text). I included option 2 because people have gotten indignant in the past when we ask them to rewrite their query text. In theory, we could support both options (make it like an either/or - you must have a query_text xor a query_text_path), but that may make the API unnecessarily overcomplicated. Most queries will be constructed by code anyway, where putting a variable in two places instead of one is trivial.

@vamshin
Copy link
Member

vamshin commented Nov 6, 2023

thanks @HenryL27. Few comments/questions

  • Looks like this RFC focus only on local models support for cross encoder reranking. To be a complete solution, can we also incorporate supporting remote models? To me supporting both local and remote models should be a goal.

  • rescoring only subset of results definitely leads to inconsistencies as you called out. This should be taken part of the solution. May be you can give lowest possible score to non competing docs? Lets take a goal of making reranking processor leave results consistent.

  • We seem to name processor neural_rerank. While this name looks like supporting generic reranker, RFC focuses only on cross encoder based reranking. Are we sure if current interface can support different techniques in future? If not, should we call something like neural_crossencoder_rerank to avoid backward compatibility issues as we try to make it more generic? I am not a fan of creating processor for every use case, would rather prefer evaluating current approach to make it more generic.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 6, 2023

@vamshin thanks

  • Yep, I'll work on making it compatible with Cohere's cross-encoder API. I don't know of any other remote APIs that do reranking but I think they'll all look relatively similar. May require some Connector finagling but otherwise this should work.
  • Yeah, I think spreading the lowest score to the other docs (or maybe minus a delta?) is probably the behavior we want. Another option I considered was introducing the rerank-score as another search hit field altogether, maybe _rescore so we don't override the original _score value, wdyt?
  • How about
PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "cross-encoder": {
          "top_k": int (how many to rerank) [optional],
          "model_id": id of cross-encoder [required],
          "context_field": str (source field to compare to query) [required]
        }
      }
    }
  ]
}

Implementation-wise I think this becomes a single "rerank" processor and depending on the type ("cross-encoder" here) it casts itself to whatever it needs to be or something

@navneet1v
Copy link
Collaborator

@HenryL27

I'm not sure we need it in both places. In most cases you'll probably just set it in the processor and forget about it. But I thought that maybe if you know you need to rerank a lot of things for a particular query (or you only need to rerank a few things for a particular query) it would be nice if you had an override switch.

If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.

The scores will be overwritten with the new scores from the cross encoder. Yes, this throws out any previous normalization, and yes, if you only rerank some of the documents you can get weird inconsistencies. I can normalize the cross-encoder scores maybe? I'm not sure it's worth it though. If you have a good idea of what the behavior should be I'm all ears

As this inconsistencies are arriving because of topk parameter, I would not even put that parameter in the processor. So if user is getting X documents from opensearch, we should re-rank all of them.

I'm not sure I understand what you mean by this. context_field tells the processor what text to send from each document as context against which to compare the query. A cross encoder takes (query, context) pairs. So we need to specify what the query and contexts are. What did you have in mind?

The main point in this was the name context_field is very generic. Lets rename this.

You wouldn't rerank based on a vector; it only makes sense to rerank based on semantically meaningful text. Presumably, that semantically meaningful text that you're reranking is also the semantically meaningful text that you care about as a search user; though I recognize that that's an assumption. If you don't fetch the context field, then the reranking processor should either do nothing or error out, since there's nothing to rerank.

I am not saying re-rank based on vector field. When doing query customer may put _source as false(which is very standard usecase for vector search) and make fields:['title', 'description'] etc. In that case _source will be empty, but there will be an array of fields in the response. So we should not just rely on the _source.

This is the purpose of top_k (and maybe it should be required bc of this?) - all we can really do is force the user to think about it. If the user tries to rerank 10,000 documents, that's kinda on them, ya know? We can optimize the hell out of this code, but the performance bottleneck is the cross encoder model and there's really not much we can do about that besides make it clear that this can be an issue.

My recommendation for this would be that these re-rankers model should run outside of OpenSearch cluster like remote models, where users can use GPU based instances for doing re-ranking. The reason is if the latency for re-ranking is in 100 of ms for like 100 records, then the feature become unusable.

Currently building cross encoders as local, but it should be possible to Connect to the Cohere endpoint as well (does anyone else offer a rerank API?)

we should explore this more. May be our local models deployed in some other services like Sagemakers etc, and not specifically cohere.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 6, 2023

@navneet1v

If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.

True, this is possible. But in a case where I already have a rather complicated search pipeline I might not want to rewrite it all, and I'm not sure that saying "if you want to use a different value for top_k then rewrite your processor" actually makes the API cleaner. Maybe it can just be a required param of ext and leave it out of the processor definition entirely?

As this inconsistencies are arriving because of topk parameter, I would not even put that parameter in the processor. So if user is getting X documents from opensearch, we should re-rank all of them.

Maybe. I guess the assumption with reranking is generally that anything not in the top k is irrelevant and therefore doesn't need to be reranked; as such why return it in the first place? That said, you might want to see things that are not in the top k - in high-latency cases where reranking is constrained, or in testing/tuning cases where you want to see what the reranker doesn't get to see to troubleshoot. We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.

The main point in this was the name context_field is very generic. Lets rename this.

Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.

I am not saying re-rank based on vector field. When doing query customer may put _source as false(which is very standard usecase for vector search) and make fields:['title', 'description'] etc. In that case _source will be empty, but there will be an array of fields in the response. So we should not just rely on the _source.

Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then.

Regarding remote models, yep, I'm looking into that. Fundamentally the Connector interface should handle all the juicy api discrepancies between remote models. I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right?

@navneet1v
Copy link
Collaborator

We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.

I don't completely agree on this thought. We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them. Being in the middle state is bad. One way to solve this is by OverSampling processor. Customer asked for X, we retrieved let say 2 * X, re-ranked all 2X documents and returned X documents back.

Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.

Re-ranking field is one name that come to my mind.

Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then.

Yes they are key value pairs. But what I meant to say was, we need to handle this use case too, because doing queries like this, provides latency boost.

I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right?

Yeah may be. Will wait for that to be out.

@dylan-tong-aws
Copy link

dylan-tong-aws commented Nov 7, 2023

  • Yep, I'll work on making it compatible with Cohere's cross-encoder API. I don't know of any other remote APIs that do reranking but I think they'll all look relatively similar. May require some Connector finagling but otherwise this should work.

@HenryL27, another scenario we're looking to support is for custom re-rank models that are hosted on an external model server like an Amazon SageMaker endpoint. The search pipeline will require more flexibility than what it takes to integrate with a managed API. The hosted model may simply be a classification/regression model that's trained for a re-ranking task (eg. XGBoost).

The gist is that we'll need some flexibility around how the data transfer and protocol:

  1. Data transform (request/response): We need the ability to perform a request and response data transform within the search pipeline or a model serving script on the external endpoint. The latter option is covered by the user. What you suggest around a generic JSON-to-JSON request processor would make it easier to package all the required functionality within an OpenSearch query workflow.

  2. Data exchange: Unlike the managed API, which may have more sophisticated functionality built into the API, a hosted model may require multiple inference calls per query. The hosted model is likely suited to score a mini batch of search results at a time, so multiple async mini-batches might have to be performed to score results when "k" size reaches a point. We need to do further research to determine the importance of more sophisticated scenarios. LTR, for instance, does shard level re-ranking. We need to evaluate how critical this is whether re-ranking as a post process suffices. We also want to investigate how to best integrate LTR ranking models into this pipeline.

@dylan-tong-aws
Copy link

dylan-tong-aws commented Nov 7, 2023

We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them.

This should suffice. It's generally how second stage re-ranking works, and what we were planning to support. I can re-validate this with our customers--I didn't receive requirements to normalize the re-scored results with results that weren't re-scored.

@vamshin
Copy link
Member

vamshin commented Nov 7, 2023

@HenryL27

Yeah, I think spreading the lowest score to the other docs (or maybe minus a delta?) is probably the behavior we want. Another option I considered was introducing the rerank-score as another search hit field altogether, maybe _rescore so we don't override the original _score value, wdyt?

I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this

PUT /_search/rerank_pipeline
{
"response_processors": [
{
"rerank": {
"cross-encoder": {
"top_k": int (how many to rerank) [optional],
"model_id": id of cross-encoder [required],
"context_field": str (source field to compare to query) [required]
}
}
}
]
}

This LGTM! This can be the direction if rerankers cannot be generic

@dylan-tong-aws
Copy link

I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this

@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores? I am not aware of use cases that require some sort of way to normalize and combine scores. As far as I know, customers just expect to re-rank "k" results or all default to all the results retrieved by the initial retrieval. There's no need for anything fancy.

@navneet1v
Copy link
Collaborator

@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores?

@dylan-tong-aws I am not sure if you understood what I was trying to say but its not definitely this.

What I am trying to say is lets say if a customer goes ahead and retrieve 100 results, and we re-rank only first 50, then the score of first 50 documents and the later ones will not be consistent. We should be consistent in our results scores.

@dylan-tong-aws
Copy link

dylan-tong-aws commented Nov 7, 2023

@navneet1v, right, so if a user says return K number of re-score results, it just returns K results even if the first-stage retrieval had N > K results. It's my understanding that the proposal is to return N results and find a way to normalize the K re-score results so they are consistent. I am agreement that we can just return the K results. As far as I know, this sufficiently delivers on customer requirements.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 7, 2023

Ok, consensus on the top k issue; can I get thumbs up?

We will simply rerank every search result that goes through the processor.

Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay)

@navneet1v
Copy link
Collaborator

Yes I am aligned with removing topK. If for any other reason a customer want to fetch more results and just want to re-rank few results, they can us Oversampling processor, as mentioned here in my previous comment. (#485 (comment))

@dylan-tong-aws
Copy link

Ok, consensus on the top k issue; can I get thumbs up?

We will simply rerank every search result that goes through the processor.

Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay)

This implementation still honors the search result limit safe guards and query timeout settings, correct?

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 7, 2023

I think so? That's more a question for Froh I think

@vamshin
Copy link
Member

vamshin commented Nov 7, 2023

Aligned on removing topK to keep consistent results. Also this is not a one way door decision. If use cases arise to expose such params we can always revisit.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 11, 2023

@vamshin @navneet1v Alright, here's a rough sketch of my low-level plan for generality here:
We'll introduce an interface called RerankProcessor that extends SearchResponseProcessor or something. We'll implement a RerankProcessorFactory that will construct various implementations of this interface as we come up with them. The interface (maybe actually ABC, idk) will have 3 important methods:

  1. abstract score(SearchResults, ScoringContext) - scores all search results given a context
  2. rerank(SearchResults, ScoringContext) - reranks the search results given a context
  3. abstract generateScoringContext(SearchResults, SearchQuery) - generate the context that the prev. two use

The default rerank implementation will simply call score to get new scores, replace them in the search results, and then re-sort the search results.
The default processSearchResults implementation will first generateScoringContext and then rerank.
ScoringContext is just gonna be like a <String, Object> map.

I think this should allow any implementable reranking processor to be implemented cleanly, and will align nicely with the new PUT /_search/pipeline API.

In the case of the cross encoder reranker the score method will call an ml-commons cross-encoder, and the generateScoringContext method will find the query_text field in the search query.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Nov 16, 2023

@navneet1v I ran a query where I asked for stuff as fields and it came back and told me not to do that as it would cost performance. I'd have to turn it into some kind of re-invertible inverted index or something? It seems to want to use fields for keyword fields, whereas typically for reranking you'll want to rerank based off of a more fuzzy similarity field, right? idk, I feel like I'm not understanding something. Is the fielddata=true index shenaniganery more effificant that source anyway? I'm also not convinced that this would ever be the bottleneck... idk, it's easy to implement so I will/did, it's just not quite adding up for me

@navneet1v
Copy link
Collaborator

@HenryL27

I worry that all reranking options might not use a model. Maybe I have a reranker that attempts to fit as many individual documents into a given context window for a future rag step. Arguably that isn't reranking, but you can see there is potential for rerankers that do different things than simply compare a query to a set of fields

Here is my understanding, if we are seeing this kind of use-case then we can either have another processor called as Non-ML-Re-ranker or some better name. But I don't see creating re-ranker type based on model types like cross-encoder re-ranker or cohere re-ranker that is too much granularity.

For the _search part of the API, do we need query_context layer? The current implementation simply looks for the path xor the text as fields of the rerank object

I didn't get this question.

Another worry. Models are gonna want to have different contexts. XGBoost, for example, will want a feature vector. Maybe that's constructed ahead of time in a JSON2JSON processor, but I think it would make sense for an XGBoost rerank processor to be configured at pipeline creation to construct such a vector.

This brings up an interesting question, which is who should own the responsibility to creating the feature vector is it Neural Plugin or ML Commons Plugin?

Or a model that uses user information to help rerank. We have to tell it where to find that user info, no?

This is the reason why I was suggesting context option in the API. So that we can add different context source fetcher options like fetch data from a data source or from the _source etc. Currently I suggested only ranking_source_fields because we are implementing different context source like lets say DDB, S3, mogodb etc.

Cross encoders can score the similarity of a pair of strings - what would be the expected behavior in this particular use-case if a user said to rerank on multiple fields?

I see here as 2 options, in general what I saw from cohere is if we want to provide more than 1 field values as context we can concatenate the strings and pass them as 1 string. So we can go with this. Going forward if we need different behavior or different way concatenation we can provide those options in the processor and default behavior can be simple concatination.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 4, 2023

@navneet1v

  • I agree that specifying model type is too much granularity. What about naming the subtype after the ml-algortihm/function name it uses? So this would be text similarity re-ranker or something?

  • Here I'm just asking why the "query_context" layer can't be left out of this API

POST index/_search?search_pipeline=rerank_pipeline
{
  "query": {...}
  "ext": {
    "rerank": {
      "query_context": {
         "query_string": "", // optional
         "path": "" optional str (path in the search body to the query text) [required],
      }
    }
  }
}

instead taking something like

POST index/_search?search_pipeline=rerank_pipeline
{
  "query": {...}
  "ext": {
    "rerank": {
       "query_string": "", // optional
       "path": "" optional str (path in the search body to the query text) [required],
    }
  }
}
  • My take is that the ml-commons predict API should be a very thin wrapper around models themselves. So ml-commons would take a feature vector (or some kind of MLInput that represents that) and translate it into the form that the model wants. The decisions about what goes in the vector/input belong in neural search.

  • makes sense, ok

  • ok, will concat for now. I think there's some work in the pipes for some kind of prompt framework (which is essentially just f-strings) that maybe we can make use of in the future? Although maybe the best model is for that to just be another processor so we don't need to construct such a string at all

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 4, 2023

@HenryL27

Here I'm just asking why the "query_context" layer can't be left out of this API

POST index/_search?search_pipeline=rerank_pipeline
{
  "query": {...}
  "ext": {
    "rerank": {
      "query_context": {
         "query_string": "", // optional
         "path": "" optional str (path in the search body to the query text) [required],
      }
    }
  }
}

This one is better abstraction. I am aligned on this.

Although maybe the best model is for that to just be another processor so we don't need to construct such a string at all

On this I am not that much convinced for creating a processor. But we can leave this decision when the use case arrives. As of now lets go with concatenation. This one will have quite a similarity with the Summarization Processor. There also we might want to summarize multiple fields. So we can think of a common generic way.

My take is that the ml-commons predict API should be a very thin wrapper around models themselves. So ml-commons would take a feature vector (or some kind of MLInput that represents that) and translate it into the form that the model wants. The decisions about what goes in the vector/input belong in neural search.

This is a valid point if we just look from predict api standpoint. I am not sure if going forward predict api will be integrated with Agents framework of MLCommons, if yes then it becomes counter intuitive to say predict api as thin wrapper because then we can build an Re-ranking agent whom we pass search response, model and other information and it make sure that it gives the final re-ranked results.

Again its not a use case for now, so lets keep park it for future.

I agree that specifying model type is too much granularity. What about naming the subtype after the ml-algortihm/function name it uses? So this would be text similarity re-ranker or something?

For this can you put comment the interface which you have in mind.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 4, 2023

@navneet1v
example text similarity-based rerank API

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "text_similarity": {
          "model_id": id of TEXT_SIMILARITY model [required],
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...],
          ...
        }
      }
    }
  ]
}

In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API

@navneet1v
Copy link
Collaborator

@HenryL27

@navneet1v example text similarity-based rerank API

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "text_similarity": {
          "model_id": id of TEXT_SIMILARITY model [required],
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...],
          ...
        }
      }
    }
  ]
}

In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API

I found 2 re-rankers that doesn't use ML models. Please check this: https://github.com/opensearch-project/search-processor/tree/main/amazon-personalize-ranking, https://github.com/opensearch-project/search-processor/tree/main/amazon-kendra-intelligent-ranking

I think we should see how we can merge all these interfaces or the interface that we are building is it extendable enough to support those re-rankers in future.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 7, 2023

@navneet1v
Just glancing at these, I don't think it will be too difficult. Your interface would look something like (e.g. for personalize)

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "amazon_personalize": {
          "campaign": blah,
          "iam_role_arn": blah,
          "recipe": blah,
          "region": blah,
          "weight": blah,
        },
        "context": {
          "personalize_context": {
             "item_id_field": blah,
             "user_id_field": blah
          },
          "document_fields": ["not", "sure", "these", "are", "used", "in", "this?"]
        }
      }
    }
  ]
}

Implementing this within the framework I've provided should be fairly straightforward (just implement a AmazonPersonalizeSourceContextFetcher and a AmazonPersonalizeRerankProcessor). Also may want to include some score-normalization stuff - so might want to implement a sibling of RescoringRerankProcessor called ScoreCombinationRerankProcessor, but that can consume a lot of work that's already been done for hybrid search.

Oh also to update on the architecture in case you haven't seen the latest changes to the PR:
I introduced the concept of a ContextSourceFetcher which is something that, well, fetches context. Currently there's two implementations that are being used by text_similarity - DocumentContextSourceFetcher and QueryContextSourceFetcher. They do pretty much what you would expect.

The factory now creates the context source fetchers based on the configuration, and they are used by the top-level RerankProcessor, which is now an abstract class rather than an interface. I probably need to reorganize the files a bit.

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 7, 2023

@HenryL27 yeah this looks pretty neat.

One more thing, in the original ML Model based re-ranker I see we want to use text_similarity to define what kind of re-ranker it is. Can we think of a better name?

Another thing can we now update the proposal (by creating a new section with updated interfaces) and also add a comment with what changes we have done in the proposal.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 7, 2023

define "better". I used text_similarity to mirror the function name in ml-commons. Do you have a suggestion?

updated RFC

@navneet1v
Copy link
Collaborator

because text_similarity is mirror of ML commons and as a user when he/she looks text_similarity what does it tell them? Like amazon_personalize tells that it is using Amazon personalize re-ranker but that is not the same with text_similarity.

I have some suggestions but those are also not that great may be ml_ranker, model_reranker.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 8, 2023

@navneet1v
Well, I would argue that when a user looks at text_similarity it tells them that this reranker (remember rerank is still the top layer of the api - I don't think we need a reminder that this is a reranking method) is measuring the similarity of text to rerank. I think that's what we would want, although perhaps the model id comes a bit out of left field.

How about nlp_comparison? That says "this reranks by comparing natural language snippets," and also implies "this uses machine learning to do it"

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 8, 2023

Well, I would argue that when a user looks at text_similarity it tells them that this reranker (remember rerank is still the top layer of the api - I don't think we need a reminder that this is a reranking method)

Yes that is fair. Hence I was saying my suggestions are not that great. :D

In nlp_comparison can we drop comparison and just use nlp. But nlp_comparison is better than text_similarity. Lets update the proposal with this new name. I think @dylan-tong-aws can help us coming with a better name.

@HenryL27 can you update the proposal with final interfaces as summary and recommended approach.

Once that is done we can ask @sean-zheng-amazon, @vamshin , @dylan-tong-aws to review.
Also please mention how other re-rankers in Opensearch can be extended from this base re-ranker.

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 9, 2023

updated. @navneet1v to your satisfaction?

@HenryL27
Copy link
Contributor Author

@navneet1v so, where are we at with this?

@navneet1v
Copy link
Collaborator

@HenryL27 So, did some discussion and here are some names that got suggested.
text_similarity, ml-commons, ml_commons_text_similarity, ml_opensearch

Among all the above options I am leaning towards ml_opensearch. Here is the thought process: The name like amazon_personalize, amazon_kendra, ml_opensearch are the vendors who are providing the re-ranking capability. For local or remote model use case the from a neural search standpoint the vendor is ml commons and not the local model or cohere as remote model. The model entity of ML Commons is providing the abstraction. Hence ml_opensearch suits very well here.

cc: @vamshin , @dylan-tong-aws

@HenryL27
Copy link
Contributor Author

@navneet1v ml_opensearch it is! I've also gone through your CR comments; thank you

@navneet1v navneet1v changed the title [RFC] Simple Reranking with Cross Encoders [RFC] Improving Search relevancy through Generic Second stage reranker Dec 19, 2023
@navneet1v navneet1v changed the title [RFC] Improving Search relevancy through Generic Second stage reranker [RFC] Improving Search relevancy through Generic Reranker interfaces Dec 19, 2023
@macohen
Copy link

macohen commented Dec 21, 2023

Is ml_opensearch the only provider of rerankers inside OpenSearch? I know I seem like the "LTR Champion" or whatever, but how do you see Learning-to-Rank fitting in here? It's works at the shard level, so maybe it doesn't, but it might be good for users to think of the API for re-ranking as re-ranking, however it works under the covers. Is this feasible?

@navneet1v, @HenryL27

@HenryL27
Copy link
Contributor Author

@macohen shard level does seem to imply that it wouldn't fit in well as a rerank response processor. But maybe we at some point make a reranking search phase results processor or whatever it needs to be - and then I would simply give it a name like ltr_opensearch. (ltr is its own plugin right, not run through ml-commons? following navneet's vendor-based naming scheme I think this makes sense)

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 21, 2023

+1 on @HenryL27 comment. @macohen please provide any other feedback you have.

@dylan-tong-aws
Copy link

dylan-tong-aws commented Dec 27, 2023

I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker.

Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request.

A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store?

@HenryL27
Copy link
Contributor Author

HenryL27 commented Dec 28, 2023

@dylan-tong-aws the ContextSourceFetcher interface we're introducing here should make connecting to external feature stores / constructing feature vectors relatively easy. I'm not sure about the details but it should look something like implementing a (e.g.) FeastVectorContextSourceFetcher or something that just makes the appropriate network calls. We also don't currently have a plan for a FeatureVectorRerankProcessor implementation but it should be a fairly simple extension of RerankProcessor.

@navneet1v
Copy link
Collaborator

I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker.

Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request.

A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store?

@dylan-tong-aws thanks for adding the info. The way I will look at this is, basically 1 and 3 are the same thing. We need to fetch the context for the reranker from a source.

As provided by @HenryL27 the interface is currently and we can add these fetchers as per the need.

@HenryL27
Copy link
Contributor Author

@navneet1v are there any next steps for me? Or am I just waiting on security review?

@martin-gaievski
Copy link
Member

@HenryL27 I can help you with that, while Navneet is off for some time. We put some steps into this GH issue #542. Application Security Review step is on us as repo maintainers, you need to take care about the rest of steps.

@tianjing-li
Copy link

@HenryL27 following this issue, very interested in this feature as I would be looking to build on top of this to enable Cohere rerank. Has there been any progress?

@navneet1v
Copy link
Collaborator

@tianjing-li the code has been merged in the feature branch of Neural search plugin. https://github.com/opensearch-project/neural-search/tree/feature/reranker Soon it will be merged in the main branch and will be backported to 2.x.

You can use the feature branch and start your development for cohere re-ranker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement RFC
Projects
Status: Done
Development

No branches or pull requests

7 participants