New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative sampling #36
Comments
For implicit models, the only user-item pairs your dataset should contain are those where an interaction has been observed. You should not include pairs where an interaction is missing. Does this answer your question? |
Thank you for the reply. The negative sampling part of the code was sampling from all items. I guess this should exclude the observed interactions, as they are definitely not negative. For example, for user A who already bought item 1,2,3, when sampling negative prediction, I think we should exclude the item 1,2,3?
def _get_negative_prediction(self, user_ids):
negative_items = sample_items(
self._num_items,
len(user_ids),
random_state=self._random_state)
negative_var = Variable(
gpu(torch.from_numpy(negative_items), self._use_cuda)
)
negative_prediction = self._net(user_ids, negative_var)
return negative_prediction |
In principle you are right. In practice, as long as the number of all items is (much) larger than the number of positive items (which is usually the case), I haven't found this omission detrimental to accuracy. I suspect that you could even make the argument that this approach gives you an implicit regularizer. |
Yeah, I agree. It can only be a potential problem when the number of items is small. Thank you! |
Hi, I try both on MovieLens 1M dataset, and I find this sampling significantly reduce the performance. |
@maciejkula I guess we should remove the items found the training dataset before the negative sampling. Otherwise, it might make the learning less effective?
The text was updated successfully, but these errors were encountered: