Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Similarity Metric Used to leaderboard #766

Closed
aminst opened this issue May 20, 2024 · 7 comments
Closed

Add Similarity Metric Used to leaderboard #766

aminst opened this issue May 20, 2024 · 7 comments
Labels
leaderboard issues related to the leaderboard

Comments

@aminst
Copy link

aminst commented May 20, 2024

Hi, thanks for this awesome benchmark.
Is it possible to add the similarity metric used in each model in the benchmark? From what I understand, the choice of what similarity metric is used in each model influences what similarity metric people should use when storing the generated embeddings in a vector database for later similarity searches.
I believe this would help people to easily choose what similarity metric to use when storing the embeddings.
I can help and add this if it's valuable. Thanks!

@KennethEnevoldsen KennethEnevoldsen added the leaderboard issues related to the leaderboard label May 20, 2024
@KennethEnevoldsen
Copy link
Contributor

@tomaarsen what are your thoughts on adding this to the leaderboard? My guess is that almost all models would use cosine sim. in which case it wouldn't add much information

@tomaarsen
Copy link

@KennethEnevoldsen I do think it makes sense to show this in the leaderboard for all tasks - I think we currently only say it for STS:

Metric: Spearman correlation based on cosine similarity

But the other tasks primarily (exclusively?) use Cosine Similarity too. There are some models/tasks that perform a bit better with (non-normalized) dot as it prefers longer passages, but they're few and far between & not high on the leaderboard.

  • Tom Aarsen

@KennethEnevoldsen
Copy link
Contributor

From my understanding, @aminst refers to the intended distance metric of the model itself (@aminst do correct me if I am wrong) and not the task?

However, I do agree that a model might have been trained with a different metric in mind, and assuming a distance metric seems problematic. I would ideally allow the model to supply the distance metric and then we just report the score (e.g. spearman correlation) for whatever distance metric the model selects.

@aminst
Copy link
Author

aminst commented May 21, 2024

@KennethEnevoldsen
Yes, that is exactly what I meant. It would be great if the leaderboard also shows the distance metric the model used during training. It would also help people to not misuse the embeddings with a different metric.
The use case I have in mind is the following, does it make sense?

  1. Somebody wants to convert their data into vector embeddings and store it in a vector database for later retrieval and semantic search.
  2. The person uses the leaderboard to find the model to use.
  3. They should manually search for the distance metric to use, which the leaderboard itself can offer.

@tomaarsen
Copy link

Ohh, I see! Yes, that would indeed be optimal. I realised something similar with Sentence Transformers, so in Sentence Transformers v3 it will be possible to configure the similarity function in the model configuration. This will then be used when calling the new SentenceTransformer.similarity or SentenceTransformer.similarity_pairwise methods.

Additionally, ST models will start reporting their similarity function in the model card automatically, e.g. here.

That should help, at least with ST-based models.

  • Tom Aarsen

@KennethEnevoldsen KennethEnevoldsen changed the title Similarity Metric Used Add Similarity Metric Used to leaderboard May 22, 2024
@KennethEnevoldsen
Copy link
Contributor

It sounds like this is something that we might consider adding after the additions to ST3. I will leave the issue open, but atm. we probably won't add it in.

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Jun 5, 2024

I have added an issue related to using a custom sim. within the benchmark, but for the similarity of the model we will probably leave that to the model card.

edit: will close for now, but feel free to re-open the discussion if you believe that there is more to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

3 participants