-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom dataset, negative sample and metrics #55
Comments
Glad you find it useful!
|
I am trying to add precision@k and recall@k to spotlight, hope I can finish it soon. |
@maciejkula RE your second bullet point, I was having issues reliably evaluating my models, for two reasons:
To resolve this, I changed the API of the 'fit' function. I now pass in train_pos, test_pos, and test_neg. train_pos is the list of positive training examples (previously just called interactions). test_pos and test_neg together constitute the training set, which the user constructs ahead of time. I fix issue 1 by checking during negative sample generation that the samples are present in neither train_pos or test_pos. I fix issue 2 by checking during negative sample generation that the samples are not present in test_neg. |
For the negative samples, I ran a couple of experiments with using BPR on a different project - testing the same model with excluding negative samples that had been liked by the user - and without excluding those negative samples:
It seems that there is a decent improvement by removing positives from the negative samples - even when almost all of the randomly selected negative items are in fact negative. As an example, with the last.fm dataset only 1.75% of negative items sampled were in fact positive - but by excluding these these items p@10 still rose 10%. Anyways - just thought I'd pipe in here. I'm not sure the best way of implementing this in torch, I'm guessing it will require a native extension in order to do it efficiently =( |
Hey @benfred, thanks for the heads-up! I already do this in LightFM, so it makes sense to do this here as well (though, as you say, a native extension may be necessary). I am surprised the differences are so large, I would have imagined that not doing this would simply have acted as a mild regularizer on the model. Did you see this in SGD/backprop training? I remember implicit uses coordinate gradient descent (?) for ALS? |
Great idea running those experiments! Thanks for doing it.
I just want to mention what I did to accomplish my task at hand. Since I
needed to exclude negative samples, I implemented a negative train-test
split routine using scipy's sparse matrix library. It's fairly efficient
but it does have some limitations. I'm happy to go into the cost/benefit if
someone thinks it would be useful.
…On Sun, May 20, 2018, 4:04 PM Maciej Kula ***@***.***> wrote:
Hey @benfred <https://github.com/benfred>, thanks for the heads-up! I
already do this in LightFM, so it makes sense to do this here as well
(though, as you say, a native extension may be necessary).
I am surprised the differences are so large, I would have imagined that
not doing this would simply have acted as a mild regularizer on the model.
Did you see this in SGD/backprop training? I remember implicit uses
coordinate gradient descent (?) for ALS?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGfk9T1VrQU5f_gDoV0ud7AlAUOk1OfTks5t0cxGgaJpZM4PNmqd>
.
|
@maciejkula: I was also surprised the results were so large - I wouldn't have thought that this would have made that much of a difference either. I actually started testing this out when I noticed that the BPR model in LightFM was substantially outperforming the prediction accuracy of the BPR model I added in implicit. When looking into this, I found that most of the accuracy difference was because LightFM was removing true positives from the negative samples and I wasn't originally. I am using SGD for training the BPR model. To verify this isn't just a problem with my code, I quickly ran some tests with LightFM and found a similar drop in P@K without the check (I hacked up the 'in_positives' function to always return 0 to test out not verifying the negatives): Using a LightFM model like: model = LightFM(learning_rate=0.05, loss='bpr', no_components=16)
model.fit(train, epochs=20) I got results like:
It's worth pointing out that I'm calculating P@K differently than you do in LightFM: I don't include the positive items from the training set in the ranking (that is P@K is evaluated by ranking only negative/missing items from the training set and comparing those ranks against the withheld positive items in the test set). Removing train set positives when testing avoids some potential problems in evaluating models. I can go into much more depth on why I think this is necessary if you're interested =). Calculating P@K including the training set positives in the ranking leads to these results for LightFM - which is probably more inline with what you've seen in your experiments:
The difference isn't quite as pronounced calculating P@k like this - but still pretty large.
Finally, the ALS Model in implicit uses a Conjugate Gradient optimizer, which is pretty optimal because of the least squares loss. It also doesn't sample negative items so it doesn't have this problem. Instead, ALS uses all the unliked data, but does it efficiently on just the positive items (I talked about this model here https://www.benfrederickson.com/matrix-factorization/ and https://www.benfrederickson.com/fast-implicit-matrix-factorization/). |
Hi @maciejkula,
Thanks for your awesome work!
I am now using it for my own research and I found there are some small issues:
I think right now the project is not so friendly with custom datasets, as sometimes the users id is not start from 0, or even sometimes they are unique strings like "AMEVO2LY6VEJA". So I recommend to use a dictionary to convert unique user identifier to int. It will be nice if this project can have a build-in data converter.
For negative sampling here, it cannot guarantee the sampled item is truely "negative". In other words, it can be positive, though the probability is relatively low.
It will be good if spotlight can have more ranking metrics like "Precision", "Recall", "AUC" and "MAP".
The text was updated successfully, but these errors were encountered: