Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CosineSimilarity #75

Closed
SkafteNicki opened this issue Mar 14, 2021 · 6 comments · Fixed by #305
Closed

Add CosineSimilarity #75

SkafteNicki opened this issue Mar 14, 2021 · 6 comments · Fixed by #305
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed New metric
Milestone

Comments

@SkafteNicki
Copy link
Member

🚀 Feature

Add Cosine similarity metric
https://en.wikipedia.org/wiki/Cosine_similarity
Measures the similarity between two feature vectors by calculating the angle between them.

Motivation

Pitch

Alternatives

Additional context

@SkafteNicki SkafteNicki added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Mar 14, 2021
@avinashsai
Copy link

Hi,
https://pytorch.org/docs/stable/generated/torch.nn.CosineSimilarity.html. There is already cosine similarity in torch. Can I use it?

@SkafteNicki
Copy link
Member Author

@avinashsai depends on your purpose.
If you want the calculate the mean score on a single batch, no problem.
But lets say that you want to accumulate over a complete dataset, then you need some kind of custom accumulation:

preds = [torch.randn(5, 2), torch.randn(10, 2)]
target = [torch.randn(5, 2), torch.randn(10, 2)]

individual_scores = []
for p,t in zip(preds, target):
    individual_scores.append(torch.nn.CosineSimilarity()(p, t).mean())
# total score and correct score is different
total_score = torch.mean(torch.stack(individual_scores))
correct_score = torch.nn.CosineSimilarity()(torch.cat(preds, dim=0), torch.cat(target, dim=0)).mean()

the global mean does not equal the mean of the score on the individual batches.
This is basically the reason why we also have custom implementation of mean squared error, mean absolute error ect even though these are also a part of pytorch, because when you want to accumulate over multiple batches you need to be careful about the order of operations.

@avinashsai
Copy link

@SkafteNicki Thanks for the clarification. So, this cosine similarity metric will compute between 2 n-dim tensors??

@SkafteNicki
Copy link
Member Author

It should still be two [N, d] tensors meaning N tensors each with dimension d.
First the function should calculate the cosine similarity between the two tensors and then it should have a reduction argument that determines if we should take the sum or the mean over the batch dimension

@N-georgakopoulos
Copy link

Hey, I would like to work on this, could you assign it to me?

@Borda Borda added this to the v0.4 milestone Apr 19, 2021
@stale
Copy link

stale bot commented Jun 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 18, 2021
@stale stale bot closed this as completed Jun 28, 2021
@Borda Borda changed the title [New metric] CosineSimilarity Add CosineSimilarity Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed New metric
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants