Question about scoring function #62

zhiqihuang · 2021-07-27T21:43:03Z

Hi

Thanks for sharing the code. I have a question about score calculation.

In doc function, the representation of the document is multiplied with the mask.

However, could the max(2) in the score function accidentally choose the padding token score (which score is 0) if all the non-padding token scores are negative?

I asked because I check the implementation here and found they assign a large negative value score[~exp_mask] = - 10000 before search for the max score.

The text was updated successfully, but these errors were encountered:

okhat · 2021-09-19T15:42:32Z

That's indeed a good edge case to keep in mind — thanks! Since we care about the largest/max similarities per query token for the high-scoring documents, these are practically always positive. But we might want to enforce this more directly with a ReLU before the max sim.

zhiqihuang · 2021-10-01T18:54:42Z

It is based on your similarity function. L2 is always positive, but cosine similarity could be negative. I agree with you that all negative similarities are definitely a concern case.

okhat closed this as completed Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about scoring function #62

Question about scoring function #62

zhiqihuang commented Jul 27, 2021

okhat commented Sep 19, 2021

zhiqihuang commented Oct 1, 2021

Question about scoring function #62

Question about scoring function #62

Comments

zhiqihuang commented Jul 27, 2021

okhat commented Sep 19, 2021

zhiqihuang commented Oct 1, 2021