Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cornac BPR deep dive notebook #937

Merged
merged 15 commits into from
Sep 26, 2019
Merged

Add Cornac BPR deep dive notebook #937

merged 15 commits into from
Sep 26, 2019

Conversation

tqtg
Copy link
Contributor

@tqtg tqtg commented Sep 20, 2019

Description

Add Cornac Bayesian Personalized Ranking (BPR) deep dive notebook

Related Issues

#931

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@tqtg tqtg changed the title Tqtg/cornac bpr tutorial Add Cornac BPR deep dive notebook Sep 20, 2019
@msftclas
Copy link

msftclas commented Sep 20, 2019

CLA assistant check
All CLA requirements met.

@tqtg
Copy link
Contributor Author

tqtg commented Sep 20, 2019

Hi @yueguoguo,

I see the failed checks. How do I get access to the log to know the reasons?

@gramhagen
Copy link
Collaborator

We currently don't have a way for external contributors to trigger the tests from the forked branch, so these aren't failures due to your code.

Thanks for putting this in. I'll take a look at it soon.

@miguelgfierro
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@miguelgfierro
Copy link
Collaborator

@eisber, I'm trying to set the trick you told me, but I'm getting this error:

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Have you seen this error? staging shouldn't be excluded

@gramhagen
Copy link
Collaborator

This is going to run on the fork not staging right? We would need to enable that somehow, but just disabled automatic triggers at pull request.

@yueguoguo
Copy link
Collaborator

Thanks @tqtg for the contribution!

We may have to come back to this after sorting out some of the testing pipeline issues.

tests/integration/test_notebooks_python.py Outdated Show resolved Hide resolved
tests/smoke/test_notebooks_python.py Outdated Show resolved Hide resolved
item = rating_true.iloc[1]["itemID"]
assert preds[(preds["userID"] == user) & (preds["itemID"] == item)][
"prediction"
].values == pytest.approx(mf.rate(train_set.uid_map[user],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we are cheating :-) because we are comparing two routines that are the same. What we want is to compare the prediction that comes from predict_rating with a df explicitly created by us as a fixture, so you would need to generate a new df with the result.

Question, any reason to select MF instead of BPR for the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is still valid. We are testing ground-truth training data and the predictions from a trained MF model. In theory, MF should be able to overfit on this small input data, thus, it will be a small approximation error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MF is used instead of BPR because I want to reuse tests from Surprise (MF is equivalent to SVD in Surprise). For the purpose of testing utility functions, it should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @miguelgfierro ,

Tests are updated using some evaluation metrics. It might not be that clean since it involves the implementation of metrics. However, I think it's reliable enough to be unit tests.

train_set = cornac.data.Dataset.from_uir(rating_true.itertuples(index=False), seed=42)
mf.fit(train_set)

preds = predict_ranking(mf, rating_true, remove_seen=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, you would need to input the result df as a fixture and compare it with the output of predict_ranking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

assert preds["itemID"].dtypes == rating_true["itemID"].dtypes
user = preds.iloc[1]["userID"]
item = preds.iloc[1]["itemID"]
assert preds[(preds["userID"] == user) & (preds["itemID"] == item)][
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

notebooks/02_model/cornac_bpr_deep_dive.ipynb Outdated Show resolved Hide resolved
"\n",
"One usual approach for item recommendation is directly predicting a preference score $\\hat{x}_{u,i}$ given to item $i$ by user $u$. BPR uses a different approach by using item pairs $(i, j)$ and optimizing for the correct ranking given preference of user $u$, thus, there are notions of *positive* and *negative* items. The training data $D_S : U \\times I \\times I$ is defined as:\n",
"\n",
"$$D_S := \\{(u, i, j) \\mid i \\in I^{+}_{u} \\wedge j \\in I \\setminus I^{+}_{u}\\}$$\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind to check if the math is rendered correctly? I can't see it on github

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rendered properly on my browser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the formulas are rendered correctly on GitHub.

tests/integration/test_notebooks_python.py Show resolved Hide resolved
notebooks/02_model/cornac_bpr_deep_dive.ipynb Outdated Show resolved Hide resolved
@miguelgfierro
Copy link
Collaborator

hey @tqtg I'm really impressed how quickly you were able to understand our ways of working. You did a deep dive, added the tests, even added black. To be honest, not many people outside our team are able to ramp up that fast, even some folks at MS don't do it.

I would like to ask you, how much time did you spend reading our documentation to be able to create the PR? also, if you think there is something that can be clearer, also when there is too much or less info about anything

@tqtg
Copy link
Contributor Author

tqtg commented Sep 24, 2019

hey @tqtg I'm really impressed how quickly you were able to understand our ways of working. You did a deep dive, added the tests, even added black. To be honest, not many people outside our team are able to ramp up that fast, even some folks at MS don't do it.

I would like to ask you, how much time did you spend reading our documentation to be able to create the PR? also, if you think there is something that can be clearer, also when there is too much or less info about anything

Thank you @miguelgfierro for your kind words.

To be honest, I didn't read the documentation but read through the codebase (it's quite easy to read and not that big though :D). To create the notebook, I mainly based on other available notebooks and existing utilities.

For the whole project, to me, it's quite clear and well structured, though, some part of the documentation is not rendered properly I guess.

I hope to contribute more to the project causes it's very useful for the community!

@yueguoguo
Copy link
Collaborator

For the whole project, to me, it's quite clear and well structured, though, some part of the documentation is not rendered properly I guess.

Awesome @tqtg ! Thanks again. which part of the doc was rendered wrongly?

@tqtg
Copy link
Contributor Author

tqtg commented Sep 25, 2019

For the whole project, to me, it's quite clear and well structured, though, some part of the documentation is not rendered properly I guess.

Awesome @tqtg ! Thanks again. which part of the doc was rendered wrongly?

For example: https://microsoft-recommenders.readthedocs.io/en/latest/dataset.html
Only the last section Knowledge graph utilities appears where other sections don't (e.g. Recommendation datasets). Is it something expected or not?

@yueguoguo
Copy link
Collaborator

@tqtg good catch - it looks like the readthedoc build needs a pre-configured environment to successfully parse the source files.

FYI @miguelgfierro

@miguelgfierro
Copy link
Collaborator

Only the last section Knowledge graph utilities appears where other sections don't (e.g. Recommendation datasets). Is it something expected or not?

good catch @tqtg, thanks

notebooks/02_model/cornac_bpr_deep_dive.ipynb Show resolved Hide resolved
@@ -0,0 +1,595 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that our model is fitted

->

Now that our model is trained


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@gramhagen
Copy link
Collaborator

Cool. Ok @miguelgfierro , ready to merge?

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome!

@miguelgfierro
Copy link
Collaborator

@miguelgfierro miguelgfierro merged commit cb8c7b4 into recommenders-team:staging Sep 26, 2019
@miguelgfierro
Copy link
Collaborator

hey @tqtg for large contributions like yours, we are adding the authors to this list: https://github.com/microsoft/recommenders/blob/staging/AUTHORS.md#contributors--sorted-alphabetically

Feel free to add your name if you want

@tqtg tqtg deleted the tqtg/cornac_bpr_tutorial branch September 26, 2019 11:01
@tqtg
Copy link
Contributor Author

tqtg commented Sep 26, 2019

@miguelgfierro: can you tell me more how to add myself in? Would it be another PR from stagging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants