-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Hybrid Search Notebook #1389
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: reina-w The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -0,0 +1,584 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this tutorial, we will demonstrate how to conduct hybrid search with Milvus and BGE-M3 model. BGE-M3 model can convert text into both dense and sparse vectors. Milvus supports storing both types of vectors in one collection, so that hybrid search on them can improve search result relevance.
Milvus supports Dense, Sparse, and Hybrid retrieval methods:
- Dense Retrieval: Utilizes semantic context to understand the meaning behind queries.
- Sparse Retrieval: Emphasizes keyword matching to find results based on specific terms. This is equivalent to full-text search.
- ...
Reply via ReviewNB
4fae2dd
to
8ffb814
Compare
@@ -0,0 +1,620 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,620 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BGE-M3 model can convert text into dense and sparse vectors. Milvus supports storing both types of vectors in one collection, allowing for hybrid search that enhances the result relevance.
Reply via ReviewNB
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To demonstrate search, we need a corpus of documents. Let's use the Quora Duplicate Questions dataset and place it in the local directory.
Source of the dataset: First Quora Dataset Release: Question Pairs
Reply via ReviewNB
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this cell under "Use BGE-M3 Model for Embeddings ". this is a backup option. We shall show the suggested approach first, and then mention the backup option just in case the reader has trouble running the suggested one.
Reply via ReviewNB
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code block doesn't generate embeddings, it just creates a function. the last line of code shall use random_embedding() to generate docs_embeddings
Reply via ReviewNB
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个写法非常confusing。dense_results应该是用dense retrieve method搜出来的东西,不应当是sparse_weight=0.0, dense_weight=1.0 配比出来的东西。可以说
only_pick_dense_results = xxx
最好的办法是
dense_results = client.search(anns="dense_vector" ...)
sparse_results = client.search(anns="sparse_vector" ...)
ybrid_results = hybrid_search(query_embeddings, sparse_weight=0.7, dense_weight=1.0)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reina-w change this to 3 different searches: sparse_search, dense_search, hybrid_search (sparse+dense)
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,552 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a Hybrid Search notebook corresponding to the streamlit application