Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add DCG and NDCG #7739

Merged
merged 27 commits into from Jun 8, 2017
Merged

[MRG] Add DCG and NDCG #7739

merged 27 commits into from Jun 8, 2017

Conversation

@davidgasquez
Copy link
Contributor

@davidgasquez davidgasquez commented Oct 24, 2016

Reference Issue

DCG and NDCG implementation. Issue #2805

What does this implement/fix? Explain your changes.

Add two scores, dcg_score and ndcg_score to compute Discounted Cumulative Gain (DCG) or the Normalized one (NDCG) at rank K.

Any other comments?

As a first time collaborator, let me know if I should add or change something!

return np.sum(gain / discounts)


def ndcg_score(ground_truth, predictions, k=5):

This comment has been minimized.

@TomDLT

TomDLT Oct 24, 2016
Member

you probably should use y_true and y_pred, or did I miss something?

"""
lb = LabelBinarizer()
lb.fit(range(len(predictions) + 1))
T = lb.transform(ground_truth)

This comment has been minimized.

@TomDLT

TomDLT Oct 24, 2016
Member

We generally avoid single-letter variable name

@TomDLT
Copy link
Member

@TomDLT TomDLT commented Oct 24, 2016

  • Could you add a reference link for these metrics?
  • Also, we will need some unit testing, that you can add in sklearn/metrics/tests/test_ranking.py.
  • And you can add the metrics in scikit-learn/sklearn/metrics/__init__.py
  • Ideally, you could write a small text on the documentation, with the maths formula and explaining when these metrics should be used.
>>> ground_truth = [1, 0, 2]
>>> predictions = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
>>> score = ndcg_score(ground_truth, predictions, k=2)
1.0

This comment has been minimized.

@TomDLT

TomDLT Oct 24, 2016
Member

If you store the result in score, there is no printing, which is why this doctest is failing

@davidgasquez
Copy link
Contributor Author

@davidgasquez davidgasquez commented Oct 27, 2016

Hey there @TomDLT! Thanks a lot for this useful tips for someone like me! I've tried to fix and improve the implementation a bit. I still have to:

  • Add reference links to the metrics
  • Include unit testing
  • Add the metrics in scikit-learn/sklearn/metrics/__init__.py

Again, I'd appreciate a lot if you want to let me know anything that I could improve. Thanks again! 😄

@davidgasquez
Copy link
Contributor Author

@davidgasquez davidgasquez commented Dec 14, 2016

Hey there @TomDLT! Sorry for the delay with this one. Would love to hear how it feels now that it has his own test and they are complete. 😄

Also, I'd love to get some input on which might be the next steps for me. Thanks for your help!

Copy link
Member

@TomDLT TomDLT left a comment

Thanks for the update, it looks much better!
You have to fix your unit test, it's failing.

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

----------
.. [1] `Wikipedia entry for the Discounted Cumulative Gain
<https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
.. [2] `Gist about Ranking Metrics`

This comment has been minimized.

@TomDLT

TomDLT Dec 15, 2016
Member

I am not sure we want to keep this reference.

y_score, y_true = check_X_y(y_score, y_true)

lb = LabelBinarizer()
lb.fit(range(max(max(y_true) + 1, len(y_true))))

This comment has been minimized.

@TomDLT

TomDLT Dec 15, 2016
Member

why do you need this?
can you add a comment?

This comment has been minimized.

@davidgasquez

davidgasquez Jan 31, 2017
Author Contributor

This is to avoid skipping some test labels when the input is [0, 1, 3] in the test set but in the training step we were using [0, 1, 2, 3]

@davidgasquez
Copy link
Contributor Author

@davidgasquez davidgasquez commented Jan 31, 2017

Just pushed another series of small commits @TomDLT!

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

Not sure if I'm the best one to do that as I've only used it once and briefly!

…into dcg_metrics
@davidgasquez
Copy link
Contributor Author

@davidgasquez davidgasquez commented Mar 24, 2017

Any updates on this?

@TomDLT
Copy link
Member

@TomDLT TomDLT commented Mar 24, 2017

If your pull-request is ready to be merged, change the prefix from [WIP] to [MRG], and wait for some reviews. I'll try to take a look shortly.

@davidgasquez davidgasquez changed the title [WIP] Add DCG and NDCG [MRG] Add DCG and NDCG Mar 24, 2017
@agramfort agramfort merged commit 419f21b into scikit-learn:master Jun 8, 2017
3 checks passed
3 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@agramfort
Copy link
Member

@agramfort agramfort commented Jun 8, 2017

thanks @davidgasquez

I'll send a PR now to fix cosmits + update what's new

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* Add DCG and NDCG ranking functions
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Add DCG and NDCG ranking functions
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Add DCG and NDCG ranking functions
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Add DCG and NDCG ranking functions
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Add DCG and NDCG ranking functions
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Add DCG and NDCG ranking functions
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 16, 2017

This appears to have been merged without review..? Issues at #9921 (comment), #9930, #9929, #9931

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 16, 2017

I think we should consider reverting this for 0.19.1. It implements a ?non-standard definition of NDCG without sufficient documentation, and does so in a way that makes it unusable.

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Add DCG and NDCG ranking functions
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Add DCG and NDCG ranking functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.