Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scores are explained in vectorestore docs #5613

Merged
merged 3 commits into from Jun 6, 2023

Conversation

berkedilekoglu
Copy link
Contributor

@berkedilekoglu berkedilekoglu commented Jun 2, 2023

Scores in Vectorestores' Docs Are Explained

Following vectorestores can return scores with similar documents by using similarity_search_with_score:

  • chroma
  • docarray_hnsw
  • docarray_in_memory
  • faiss
  • myscale
  • qdrant
  • supabase
  • vectara
  • weaviate

However, in documents, these scores were either not explained at all or explained in a way that could lead to misunderstandings (e.g., FAISS). For instance in FAISS document: if we consider the score returned by the function as a similarity score, we understand that a document returning a higher score is more similar to the source document. However, since the scores returned by the function are distance scores, we should understand that smaller scores correspond to more similar documents.

For the libraries other than Vectara, I wrote the scores they use by investigating from the source libraries. Since I couldn't be certain about the score metric used by Vectara, I didn't make any changes in its documentation. The links mentioned in Vectara's documentation became broken due to updates, so I replaced them with working ones.

VectorStores / Retrievers / Memory

my twitter: berkedilekoglu

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome!

would it be possible to update the docstrings in the files themselves as well?

@berkedilekoglu
Copy link
Contributor Author

this is awesome!

would it be possible to update the docstrings in the files themselves as well?

Of course! I'm on it.

@berkedilekoglu
Copy link
Contributor Author

I updated the docstrings in the relevant functions in vectorestore files.
@hwchase17

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is greatly appreciated - thanks!

@hwchase17 hwchase17 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 5, 2023
@hwchase17 hwchase17 merged commit f907b62 into langchain-ai:master Jun 6, 2023
13 checks passed
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
# Scores in Vectorestores' Docs Are Explained

Following vectorestores can return scores with similar documents by
using `similarity_search_with_score`:
- chroma
- docarray_hnsw
- docarray_in_memory
- faiss
- myscale
- qdrant
- supabase
- vectara
- weaviate

However, in documents, these scores were either not explained at all or
explained in a way that could lead to misunderstandings (e.g., FAISS).
For instance in FAISS document: if we consider the score returned by the
function as a similarity score, we understand that a document returning
a higher score is more similar to the source document. However, since
the scores returned by the function are distance scores, we should
understand that smaller scores correspond to more similar documents.

For the libraries other than Vectara, I wrote the scores they use by
investigating from the source libraries. Since I couldn't be certain
about the score metric used by Vectara, I didn't make any changes in its
documentation. The links mentioned in Vectara's documentation became
broken due to updates, so I replaced them with working ones.

VectorStores / Retrievers / Memory
  - @dev2049

my twitter: [berkedilekoglu](https://twitter.com/berkedilekoglu)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This was referenced Jun 25, 2023
kacperlukawski pushed a commit to kacperlukawski/langchain that referenced this pull request Jun 29, 2023
# Scores in Vectorestores' Docs Are Explained

Following vectorestores can return scores with similar documents by
using `similarity_search_with_score`:
- chroma
- docarray_hnsw
- docarray_in_memory
- faiss
- myscale
- qdrant
- supabase
- vectara
- weaviate

However, in documents, these scores were either not explained at all or
explained in a way that could lead to misunderstandings (e.g., FAISS).
For instance in FAISS document: if we consider the score returned by the
function as a similarity score, we understand that a document returning
a higher score is more similar to the source document. However, since
the scores returned by the function are distance scores, we should
understand that smaller scores correspond to more similar documents.

For the libraries other than Vectara, I wrote the scores they use by
investigating from the source libraries. Since I couldn't be certain
about the score metric used by Vectara, I didn't make any changes in its
documentation. The links mentioned in Vectara's documentation became
broken due to updates, so I replaced them with working ones.

VectorStores / Retrievers / Memory
  - @dev2049

my twitter: [berkedilekoglu](https://twitter.com/berkedilekoglu)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants