Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Hybrid Search Notebook #1389

Merged
merged 1 commit into from
Jul 29, 2024
Merged

Conversation

reina-w
Copy link

@reina-w reina-w commented Jul 24, 2024

Added a Hybrid Search notebook corresponding to the streamlit application

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: reina-w

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
bootcamp/tutorials/quickstart/hybrid_demo.ipynb Outdated Show resolved Hide resolved
@@ -0,0 +1,584 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this tutorial, we will demonstrate how to conduct hybrid search with Milvus and BGE-M3 model. BGE-M3 model can convert text into both dense and sparse vectors. Milvus supports storing both types of vectors in one collection, so that hybrid search on them can improve search result relevance.

Milvus supports Dense, Sparse, and Hybrid retrieval methods:

  • Dense Retrieval: Utilizes semantic context to understand the meaning behind queries.
  • Sparse Retrieval: Emphasizes keyword matching to find results based on specific terms. This is equivalent to full-text search.
  • ...


Reply via ReviewNB

@reina-w reina-w force-pushed the hybrid_notebook branch 2 times, most recently from 4fae2dd to 8ffb814 Compare July 26, 2024 00:56
@@ -0,0 +1,620 @@
{
Copy link
Contributor

@jaelgu jaelgu Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these outputs


Reply via ReviewNB

@@ -0,0 +1,620 @@
{
Copy link
Contributor

@jaelgu jaelgu Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still invalid link to the app example


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hybrid Search with Dense and Sparse Vectors in Milvus


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BGE-M3 model can convert text into dense and sparse vectors. Milvus supports storing both types of vectors in one collection, allowing for hybrid search that enhances the result relevance.


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To demonstrate search, we need a corpus of documents. Let's use the Quora Duplicate Questions dataset and place it in the local directory. 

Source of the dataset: First Quora Dataset Release: Question Pairs


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this cell under "Use BGE-M3 Model for Embeddings ". this is a backup option. We shall show the suggested approach first, and then mention the backup option just in case the reader has trouble running the suggested one.


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code block doesn't generate embeddings, it just creates a function. the last line of code shall use random_embedding() to generate docs_embeddings


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengjia

这个写法非常confusing。dense_results应该是用dense retrieve method搜出来的东西,不应当是sparse_weight=0.0, dense_weight=1.0 配比出来的东西。可以说

only_pick_dense_results = xxx

最好的办法是

dense_results = client.search(anns="dense_vector" ...)

sparse_results = client.search(anns="sparse_vector" ...)

ybrid_results = hybrid_search(query_embeddings, sparse_weight=0.7, dense_weight=1.0)


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reina-w change this to 3 different searches: sparse_search, dense_search, hybrid_search (sparse+dense)

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    # Dense search results

ditto


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    print("Dense Search Results:")

ditto


Reply via ReviewNB

@@ -0,0 +1,552 @@
{
Copy link
Collaborator

@codingjaguar codingjaguar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The red highlighting is really cool! 👍


Reply via ReviewNB

@jaelgu jaelgu merged commit 8b347c6 into milvus-io:master Jul 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants