Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Search relevancy through Generic Second stage reranker #248

Closed
aliasneo1 opened this issue Aug 8, 2023 · 33 comments
Closed

Improving Search relevancy through Generic Second stage reranker #248

aliasneo1 opened this issue Aug 8, 2023 · 33 comments
Assignees
Labels
backlog All the backlog features should be marked with this label Features Introduces a new unit of functionality that satisfies a requirement neural-search v2.12.0 Issues targeting release v2.12.0

Comments

@aliasneo1
Copy link

Issue Description:
We are currently utilizing a neural retriever based on the bi-encoder vector search method. However, it has come to our attention that the performance of the bi-encoder approach is suboptimal when compared to the cross-encoder method, as highlighted in the referenced research paper (link).

Desired Solution:
We propose the integration of both Cross-Encoder and Bi-Encoder methods to enhance retrieval performance, particularly in scenarios involving large datasets. Cross-Encoders demonstrate superior performance, but they encounter scalability challenges with extensive datasets. To address this, a hybrid approach can be employed in scenarios like Information Retrieval and Semantic Search. Here's the suggested process:

Initiate retrieval using an efficient Bi-Encoder to identify the top 100 most similar sentences for a given query.
Subsequently, employ a Cross-Encoder to re-rank the initial 100 matches. This involves computing scores for each (query, hit) pairing.
By incorporating a Cross-Encoder-based re-ranker after the initial retrieval, a notable enhancement in final results for users can be achieved.

Considered Alternatives:
We have evaluated several alternative solutions and features in pursuit of improved retrieval performance. However, none have proven as effective as the combined Cross-Encoder and Bi-Encoder approach proposed above.

Additional Context:
For a more comprehensive understanding, any supplementary context or relevant screenshots related to this feature request will be provided as necessary. Your consideration of this enhancement would be greatly appreciated.

@msfroh
Copy link

msfroh commented Aug 9, 2023

@navneet1v -- Let's move this to the neural-search repo

@macohen
Copy link

macohen commented Aug 9, 2023

@opensearch-project/admin I think we need an OpenSearch maintainer or admin to move this to https://github.com/opensearch-project/neural-search

@prudhvigodithi prudhvigodithi transferred this issue from opensearch-project/OpenSearch Aug 9, 2023
@navneet1v
Copy link
Collaborator

@aliasneo1 thanks for opening this github issue. This seems very interesting. I have some couple of questions related to this:

  1. I am still confused where when we want to the re-ranking in whole query flow of OpenSearch? Can you please add some details around that.

@msfroh msfroh removed the untriaged label Aug 16, 2023
@navneet1v navneet1v added the Features Introduces a new unit of functionality that satisfies a requirement label Aug 25, 2023
@GauravTech1986
Copy link

GauravTech1986 commented Sep 8, 2023

Waiting eagerly for this feature! Do we have an estimated release date for it? I believe this is an important use case for search

@navneet1v navneet1v added help wanted Extra attention is needed backlog All the backlog features should be marked with this label labels Sep 15, 2023
@austintlee
Copy link

@navneet1v @ylwu-amzn What do you guys think about adding support for this in the RAG pipeline? This is a perfect use case for RAG. We can add a search response processor (similar to the Kendra re-ranker) that makes use of cross encoders.

What are some good candidates for pre-trained models to bring into ml-commons? Any suggestions?

@navneet1v
Copy link
Collaborator

What do you guys think about adding support for this in the RAG pipeline?

what do you mean by this?

We can add a search response processor (similar to the Kendra re-ranker) that makes use of cross encoders.

I think there has been some thoughts given to this, but if you want to start coming up with a proposal for this feel free to do that. As per my understanding we want to build this feature in Neural search. So if you are interested feel free to create a proposal for that.

What are some good candidates for pre-trained models to bring into ml-commons? Any suggestions?

No idea around this. This needs to be researched.

@austintlee
Copy link

A desired solution was already stated above.

Initiate retrieval using an efficient Bi-Encoder to identify the top 100 most similar sentences for a given query.
Subsequently, employ a Cross-Encoder to re-rank the initial 100 matches.

Since KNN does the first part, we need a search processor that does the second part (re-ranking). We have a reranker that uses Amazon Kendra. What I had in mind is a reranker that uses a cross encoder which executes in a search response processor. I'm hoping that this can help improve the results that the RAG processor feeds to LLMs. So, in this use case, the reranker would run after the hybrid search processor runs.

ml-commons might be a better place for this since cross encoders can run on BM25 (or sparse vector) results (independent of semantic search/KNN).

@navneet1v
Copy link
Collaborator

@austintlee in terms of improving Search relevance we are targeting to put features in Neural Search Plugin, or we should put Re-ranking basic interface in core. ML Commons can provide the model which will do the re-ranking but in terms of providing the interface to do re-ranking ML Commons is not a right option. Its closely related to Search.

@austintlee
Copy link

I don't think of ml-commons as just a model serving layer, although I think we do want to use it for serving cross encoders. I mainly want to see this functionality come to life. Even if we put this in neural-search, it's still going to be a search processor, right?

@navneet1v
Copy link
Collaborator

I don't think of ml-commons as just a model serving layer

Actually it is. Reason why I was thinking to build the re-ranking feature outside of ML commons so that users can write their own re-rankers which can use models not served by ML Commons.

So here is what I was thinking:

  1. Define a standard/extensible Re-ranking Search Results Processor Interface in OpenSearch Core, to be used by users to create more specific re-rankers using lib, remote model, cross encoders etc.
  2. Neural Search plugin extends that interface to create a Re-ranker that used a model provided by ML commons.
  3. Other plugins external/internal to OpenSearch Project can use the extensible re-ranker interface to query an external re-ranking service to do the re-ranking for them with more personalized user based re-ranking.

@navneet1v
Copy link
Collaborator

Even if we put this in neural-search, it's still going to be a search processor, right?

Yes initial thought is around that only. But I was stepping up feature here and see if we can do better.

Plus there is one big question that we need to ans before we start working is, does re-ranking using Cross Encoder improve Search relevance, if yes by how much and if we compare it with techniques like Normalization and Score Combination what is the diff.

@navneet1v
Copy link
Collaborator

@austintlee I have added this in the VectorDB hot backlog. @vamshin lets see if we can prioritize this.

@austintlee
Copy link

Plus there is one big question that we need to ans before we start working is, does re-ranking using Cross Encoder improve Search relevance, if yes by how much and if we compare it with techniques like Normalization and Score Combination what is the diff.

I know this is an important question and we want to get at least a rough idea of how much, if any, improvement this will get us before we take this on, but at the same time, there clearly seems to be appetite from the community in making this feature available so people can experiment on their own data.

@austintlee
Copy link

I'm super interested in the vector db roadmap. Can we have a public meeting where we can discuss hot topics, what's coming, etc?

@navneet1v
Copy link
Collaborator

navneet1v commented Oct 17, 2023

I'm super interested in the vector db roadmap. Can we have a public meeting where we can discuss hot topics, what's coming, etc?

@austintlee
Thanks for showing the interest. We are actually working on getting it up and running. I will check with @vamshin where are we on that.

FYI this is the public roadmap: https://github.com/orgs/opensearch-project/projects/145

@samuel-oci
Copy link

Plus there is one big question that we need to ans before we start working is, does re-ranking using Cross Encoder improve Search relevance, if yes by how much and if we compare it with techniques like Normalization and Score Combination what is the diff.

@navneet1v curious, are you saying that you see re-ranking and normalization as mutually exclusive? or maybe I misunderstood?
Also, I have seen some solutions that have up to 3 layers of re-ranking with each one applying additional sorting and filtering. Are we planning to support multi tier of re-ranking?
Should we just treat the normalization and score combination step be orthogonal to re-ranking? as re-ranking seems like just another step that can come in the response processing pipeline in a specific order.

@austintlee
Copy link

That's how I am approaching this. Another search response processor. Maybe there is a reranking processor that is tailored to re-ranking, but it would just be an extension of a search processor.

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 1, 2023

Plus there is one big question that we need to ans before we start working is, does re-ranking using Cross Encoder improve Search relevance, if yes by how much and if we compare it with techniques like Normalization and Score Combination what is the diff.

@navneet1v Some metrics from experiments using a reranker in python-land on a customer dataset:

recall@5 recall@100 MRR avg pos. of first gold
no reranker 0.607456 0.975877 0.326642 8.526316
reranker 0.692982 0.975877 0.557356 2.842105

These experiments were run using hybrid search (min-max norm, linear combo, weighted (0.111, 0.889) towards neural over bm25) with embedding model "thenlper/gte-small" and reranker "BAAI/bge-reranker-large", reranking the top 100 documents. There are about 1000 documents in the index, and 19 question, docs pairs (each of the 19 questions matches to about 1-3 relevant documents)
I have tried a couple other hybrid search configurations (though not exhaustively) and haven't been able to beat these. So I'd say, (at least on this particular dataset) yes, reranking is worth it.

@navneet1v
Copy link
Collaborator

navneet1v commented Nov 2, 2023

@HenryL27 thanks for sharing the results. This looks really awesome. I think this is good evidence. I have already put this task in hot backlog meaning it will be picked up soon.

@vamshin do you have some timeline in mind around this.

@austintlee
Copy link

Can we target this for 2.12?

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 2, 2023

@navneet1v @vamshin Is someone working on this? If not, I'll take it

@vamshin
Copy link
Member

vamshin commented Nov 2, 2023

@HenryL27 thanks for your interest. We are looking into this. We will need your support with RFC/Code reviews.

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 2, 2023

@vamshin so yes, someone is working on this? What's the timeline? Any sub-issues I can pick up? I don't see any issues yet.

@vamshin
Copy link
Member

vamshin commented Nov 2, 2023

@HenryL27 we started scoping this work and using this GitHub issue as a feature request. As a first step, we will do RFC to get community feedback on the approaches. Our idea is to build a more generic reranker(multi-stage) capable of passing metadata(user context) and OpenSearch results to a remote connector for reranking results.

Let me get back on the timelines. Happy to collaborate.

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 2, 2023

@vamshin We have a customer who wants this now - can we scope this down to a simple 1-stage reranker and then expand it later?

Here's the API spec I have in mind

APIs

Create Rerank Pipeline

PUT /_search/pipeline
{
  "response_processors": [
    {
      "neural-rerank": {
        "top_k": int (how many to rerank),
        "model_id": id of cross-encoder,
        "context_field": str (source field to compare to query)
      }
    }
  ]
}

Query Rerank Pipeline

POST index/_search
{
  "query": {...},
  "ext": {
    "rerank": {
      "query_text": str (query text to compare)
    }
  }
}

or alternatively

POST index/_search
{
  "query": {...}
  "ext": {
    "rerank": {
      "query_text_path": str (path in the search body to the query text)
    }
  }
}

For example, with a neural query we might have

"query_text_path": "query.neural.embedding.query_text"

The rerank processor will evaluate the "top_k" search results, and then sort them based on the new scores. Documents outside of the top k will be left in place and not evaluated, meaning that they might conceivably have higher "_score" values than the reranked documents, while being lower on the list. This will also override any sorts in the search query, although I think the use case for sorts and semantic reranking are largely non-overlapping.

Upload Cross Encoder Model

POST /_plugins/_ml/models/_upload
{
  "name": "model name",
  "version": "1.0.0 or something",
  "description": "description",
  "model_format": "TORCH_SCRIPT",
  "function_name": "TEXT_SIMILARITY",
  "model_content_hash_value": "hash browns",
  "url": "https://url-of-model"
}

This is not a new API or anything, and all the other model-based APIs should still work for the cross encoder model/function name with minimal work to integrate.

Basically, a simple search response processor that a user can plug in to whatever search request they have, that looks a lot like how a neural search works so should be familiar. That would solve probably 90% of use cases.

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 3, 2023

@vamshin RFC draft before I post it for real

@dylan-tong-aws
Copy link

All, I'm catching up on this thread. This is on our product roadmap. We're going to implement a generic second stage re-ranking search pipeline. You will be able to integrate a second-stage re-ranker like a cross-encoder via the AI connectors available in ml-commons. This functionality will be integrated with the neural search and LTR experience.

I have a forum post a while back to collect community feedback.

@austintlee
Copy link

@dylan-tong-aws Henry is essentially signing up to do the work described in that post.

@vamshin
Copy link
Member

vamshin commented Nov 4, 2023

@HenryL27 please feel free to publish RFC and we can let community provide the feedback. Thanks

@HenryL27
Copy link
Contributor

HenryL27 commented Nov 4, 2023

See #485

@vamshin
Copy link
Member

vamshin commented Dec 11, 2023

@HenryL27 @navneet1v Is it ok to change the title to "Generic Second stage reranker to Improve search relevancy"?

@navneet1v
Copy link
Collaborator

I don't see any concern. Let me change this,

@navneet1v navneet1v changed the title Improving Retrieval Performance through Combined Cross-Encoder and Bi-Encoder Approach Improving Search relevancy through Generic Second stage reranker Dec 11, 2023
@vamshin vamshin added v2.12.0 Issues targeting release v2.12.0 and removed v2.13.0 labels Feb 21, 2024
@vamshin
Copy link
Member

vamshin commented Feb 21, 2024

Closing this issue as its released in 2.12.0

@vamshin vamshin closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog All the backlog features should be marked with this label Features Introduces a new unit of functionality that satisfies a requirement neural-search v2.12.0 Issues targeting release v2.12.0
Projects
Status: Done
Development

No branches or pull requests

10 participants