[MRG] Add DCG and NDCG #7739

Merged
merged 27 commits into from Jun 8, 2017

Conversation

Projects
None yet
4 participants
@davidgasquez
Contributor

davidgasquez commented Oct 24, 2016

Reference Issue

DCG and NDCG implementation. Issue #2805

What does this implement/fix? Explain your changes.

Add two scores, dcg_score and ndcg_score to compute Discounted Cumulative Gain (DCG) or the Normalized one (NDCG) at rank K.

Any other comments?

As a first time collaborator, let me know if I should add or change something!

sklearn/metrics/ranking.py
+ return np.sum(gain / discounts)
+
+
+def ndcg_score(ground_truth, predictions, k=5):

This comment has been minimized.

@TomDLT

TomDLT Oct 24, 2016

Member

you probably should use y_true and y_pred, or did I miss something?

@TomDLT

TomDLT Oct 24, 2016

Member

you probably should use y_true and y_pred, or did I miss something?

sklearn/metrics/ranking.py
+ """
+ lb = LabelBinarizer()
+ lb.fit(range(len(predictions) + 1))
+ T = lb.transform(ground_truth)

This comment has been minimized.

@TomDLT

TomDLT Oct 24, 2016

Member

We generally avoid single-letter variable name

@TomDLT

TomDLT Oct 24, 2016

Member

We generally avoid single-letter variable name

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Oct 24, 2016

Member
  • Could you add a reference link for these metrics?
  • Also, we will need some unit testing, that you can add in sklearn/metrics/tests/test_ranking.py.
  • And you can add the metrics in scikit-learn/sklearn/metrics/__init__.py
  • Ideally, you could write a small text on the documentation, with the maths formula and explaining when these metrics should be used.
Member

TomDLT commented Oct 24, 2016

  • Could you add a reference link for these metrics?
  • Also, we will need some unit testing, that you can add in sklearn/metrics/tests/test_ranking.py.
  • And you can add the metrics in scikit-learn/sklearn/metrics/__init__.py
  • Ideally, you could write a small text on the documentation, with the maths formula and explaining when these metrics should be used.
sklearn/metrics/ranking.py
+ >>> ground_truth = [1, 0, 2]
+ >>> predictions = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
+ >>> score = ndcg_score(ground_truth, predictions, k=2)
+ 1.0

This comment has been minimized.

@TomDLT

TomDLT Oct 24, 2016

Member

If you store the result in score, there is no printing, which is why this doctest is failing

@TomDLT

TomDLT Oct 24, 2016

Member

If you store the result in score, there is no printing, which is why this doctest is failing

@davidgasquez

This comment has been minimized.

Show comment
Hide comment
@davidgasquez

davidgasquez Oct 27, 2016

Contributor

Hey there @TomDLT! Thanks a lot for this useful tips for someone like me! I've tried to fix and improve the implementation a bit. I still have to:

  • Add reference links to the metrics
  • Include unit testing
  • Add the metrics in scikit-learn/sklearn/metrics/__init__.py

Again, I'd appreciate a lot if you want to let me know anything that I could improve. Thanks again! 😄

Contributor

davidgasquez commented Oct 27, 2016

Hey there @TomDLT! Thanks a lot for this useful tips for someone like me! I've tried to fix and improve the implementation a bit. I still have to:

  • Add reference links to the metrics
  • Include unit testing
  • Add the metrics in scikit-learn/sklearn/metrics/__init__.py

Again, I'd appreciate a lot if you want to let me know anything that I could improve. Thanks again! 😄

@davidgasquez

This comment has been minimized.

Show comment
Hide comment
@davidgasquez

davidgasquez Dec 14, 2016

Contributor

Hey there @TomDLT! Sorry for the delay with this one. Would love to hear how it feels now that it has his own test and they are complete. 😄

Also, I'd love to get some input on which might be the next steps for me. Thanks for your help!

Contributor

davidgasquez commented Dec 14, 2016

Hey there @TomDLT! Sorry for the delay with this one. Would love to hear how it feels now that it has his own test and they are complete. 😄

Also, I'd love to get some input on which might be the next steps for me. Thanks for your help!

@TomDLT

Thanks for the update, it looks much better!
You have to fix your unit test, it's failing.

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

sklearn/metrics/ranking.py
+ ----------
+ .. [1] `Wikipedia entry for the Discounted Cumulative Gain
+ <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
+ .. [2] `Gist about Ranking Metrics`

This comment has been minimized.

@TomDLT

TomDLT Dec 15, 2016

Member

I am not sure we want to keep this reference.

@TomDLT

TomDLT Dec 15, 2016

Member

I am not sure we want to keep this reference.

sklearn/metrics/ranking.py
+ y_score, y_true = check_X_y(y_score, y_true)
+
+ lb = LabelBinarizer()
+ lb.fit(range(max(max(y_true) + 1, len(y_true))))

This comment has been minimized.

@TomDLT

TomDLT Dec 15, 2016

Member

why do you need this?
can you add a comment?

@TomDLT

TomDLT Dec 15, 2016

Member

why do you need this?
can you add a comment?

This comment has been minimized.

@davidgasquez

davidgasquez Jan 31, 2017

Contributor

This is to avoid skipping some test labels when the input is [0, 1, 3] in the test set but in the training step we were using [0, 1, 2, 3]

@davidgasquez

davidgasquez Jan 31, 2017

Contributor

This is to avoid skipping some test labels when the input is [0, 1, 3] in the test set but in the training step we were using [0, 1, 2, 3]

@davidgasquez

This comment has been minimized.

Show comment
Hide comment
@davidgasquez

davidgasquez Jan 31, 2017

Contributor

Just pushed another series of small commits @TomDLT!

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

Not sure if I'm the best one to do that as I've only used it once and briefly!

Contributor

davidgasquez commented Jan 31, 2017

Just pushed another series of small commits @TomDLT!

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

Not sure if I'm the best one to do that as I've only used it once and briefly!

@davidgasquez

This comment has been minimized.

Show comment
Hide comment
@davidgasquez

davidgasquez Mar 24, 2017

Contributor

Any updates on this?

Contributor

davidgasquez commented Mar 24, 2017

Any updates on this?

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Mar 24, 2017

Member

If your pull-request is ready to be merged, change the prefix from [WIP] to [MRG], and wait for some reviews. I'll try to take a look shortly.

Member

TomDLT commented Mar 24, 2017

If your pull-request is ready to be merged, change the prefix from [WIP] to [MRG], and wait for some reviews. I'll try to take a look shortly.

@davidgasquez davidgasquez changed the title from [WIP] Add DCG and NDCG to [MRG] Add DCG and NDCG Mar 24, 2017

davidgasquez added some commits Mar 31, 2017

@agramfort agramfort merged commit 419f21b into scikit-learn:master Jun 8, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 8, 2017

Member

thanks @davidgasquez

I'll send a PR now to fix cosmits + update what's new

Member

agramfort commented Jun 8, 2017

thanks @davidgasquez

I'll send a PR now to fix cosmits + update what's new

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 16, 2017

Member

This appears to have been merged without review..? Issues at #9921 (comment), #9930, #9929, #9931

Member

jnothman commented Oct 16, 2017

This appears to have been merged without review..? Issues at #9921 (comment), #9930, #9929, #9931

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 16, 2017

Member

I think we should consider reverting this for 0.19.1. It implements a ?non-standard definition of NDCG without sufficient documentation, and does so in a way that makes it unusable.

Member

jnothman commented Oct 16, 2017

I think we should consider reverting this for 0.19.1. It implements a ?non-standard definition of NDCG without sufficient documentation, and does so in a way that makes it unusable.

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG] Add DCG and NDCG (#7739)
* Add DCG and NDCG ranking functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment