Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about scoring function #62

Closed
zhiqihuang opened this issue Jul 27, 2021 · 2 comments
Closed

Question about scoring function #62

zhiqihuang opened this issue Jul 27, 2021 · 2 comments

Comments

@zhiqihuang
Copy link

Hi

Thanks for sharing the code. I have a question about score calculation.

In doc function, the representation of the document is multiplied with the mask.

However, could the max(2) in the score function accidentally choose the padding token score (which score is 0) if all the non-padding token scores are negative?

I asked because I check the implementation here and found they assign a large negative value score[~exp_mask] = - 10000 before search for the max score.

@okhat
Copy link
Collaborator

okhat commented Sep 19, 2021

That's indeed a good edge case to keep in mind — thanks! Since we care about the largest/max similarities per query token for the high-scoring documents, these are practically always positive. But we might want to enforce this more directly with a ReLU before the max sim.

@okhat okhat closed this as completed Sep 19, 2021
@zhiqihuang
Copy link
Author

It is based on your similarity function. L2 is always positive, but cosine similarity could be negative. I agree with you that all negative similarities are definitely a concern case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants