[MRG+1] Add Normalized Discounted Cumulative Gain #9951

jeromedockes · 2017-10-18T19:22:30Z

After #9921, it was decided that the old implementation of NDCG would be removed (#9932), but that a new one might be useful.

Discounted Cumulative Gain and Normalized Discounted Cumulative Gain are popular ranking metrics (https://en.wikipedia.org/wiki/Discounted_cumulative_gain)

TODO:

handle ties correctly
test on toy examples
test boundary cases (all scores equal for some samples, perfect score)
test invariants due to perturbing perfect y_score
add narrative docs.
examples section in the docstring
add public functions to tests

jnothman

Thsnks!

Please try make use of, or extend, test_common.py. narrative docs in doc/modules/model_evaluation.rst should be added and probably a scorer in metrics/scorers.py

I've not yet looked at tests and implementation in detail.

jnothman · 2017-10-18T23:40:06Z

sklearn/metrics/__init__.py

@@ -117,5 +120,5 @@
    'silhouette_score',
    'v_measure_score',
    'zero_one_loss',
-    'brier_score_loss',
+    'brier_score_loss'


Please don't make unrelated changes if you can help it!

jnothman · 2017-10-18T23:41:07Z

sklearn/metrics/ranking.py

+    Parameters
+    ----------
+    y_true : array, shape = [n_samples, n_labels]
+        True labels.


Are these classes (possibly strings)? Ints? Floats?

jnothman · 2017-10-18T23:42:44Z

sklearn/metrics/ranking.py

+    """
+    if y_true.shape != y_score.shape:
+        raise ValueError("y_true and y_score have different shapes")
+    y_true = np.atleast_2d(y_true)


Usually we'd be more explicit using something like check_array and check_consistent_length

jnothman · 2017-10-18T23:43:32Z

sklearn/metrics/ranking.py

+    Parameters
+    ----------
+    y_true : array, shape = [n_samples, n_labels]
+        True labels.


codecov · 2017-10-19T19:02:38Z

Codecov Report

Merging #9951 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #9951      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62613    62674      +61     
==========================================
+ Hits        60218    60279      +61     
  Misses       2395     2395

Impacted Files	Coverage Δ
sklearn/metrics/__init__.py	`100% <100%> (ø)`	⬆️
sklearn/metrics/ranking.py	`98.93% <100%> (+0.12%)`	⬆️
sklearn/metrics/tests/test_ranking.py	`100% <100%> (ø)`	⬆️
sklearn/feature_selection/base.py	`94.76% <0%> (ø)`	⬆️
sklearn/tests/test_pipeline.py	`99.64% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98c4db3...dbf3826. Read the comment docs.

jnothman · 2017-11-01T23:37:44Z

It would be good if you added to the PR description a list of tasks you intend to complete before changing WIP to MRG.

jnothman

I would like to see tests including:

known toy examples (e.g. from a reference paper or easy to calculate by hand)
boundary cases (all scores equal for some samples, perfect score)
perhaps invariants due to perturbing perfect y_score

And please add narrative docs.

Also, is there any value in supporting multiclass inputs, then binarized, as the previous implementation attempted to?

jnothman · 2017-11-07T00:50:33Z

sklearn/metrics/ranking.py

+    -------
+    normalized_discounted_cumulative_gain : float in [0., 1.]
+        The averaged NDCG scores for all samples.
+


Please add an Examples section where you demonstrate a simple invocation.

jnothman · 2017-11-07T00:54:00Z

sklearn/metrics/ranking.py

+                      "multiclass-multioutput"):
+        raise ValueError("{0} format is not supported".format(y_type))
+
+    ranking = np.argsort(y_score)[:, ::-1]


should we be using rankdata to handle ties?

It's true that we should handle ties.

averaging the ranks of equally scored results may not work because the summation of gains has to be cutoff @ k (we need to know how many elements of a tied group fall beyond k) . In

Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores. Marc Najork, Frank McSherry. ECIR, 2008

the authors average the true gains of results in a tied group before multiplying by the discount (discounts beyond k are 0)

jeromedockes · 2017-11-07T18:18:33Z

Sorry for the late answer. TODO list:

handle ties correctly
test on toy examples
test boundary cases (all scores equal for some samples, perfect score)
test invariants due to perturbing perfect y_score
add narrative docs.
examples section in the docstring
allow multiclass inputs?
add public functions to tests

rth

@jeromedockes Thanks for working on this! A few comments are below.

You can also edit your first post of this PR and include the above todo list there so it would be included in the PR summary view (cf github docs).

rth · 2017-11-09T11:38:39Z

sklearn/metrics/ranking.py

+        The NDCG score for each sample (float in [0., 1.]).
+
+    References
+    ----------


These are internal functions, and docs won't be built for them, so I think you could remove the References section here and in _dcg_sample_scores particularly since these references can be found in the corresponding public functions.

rth · 2017-11-09T11:42:34Z

sklearn/metrics/tests/test_ranking.py

@@ -30,6 +31,7 @@
 from sklearn.metrics import label_ranking_loss
 from sklearn.metrics import roc_auc_score
 from sklearn.metrics import roc_curve
+from sklearn.metrics.ranking import _ndcg_sample_scores, _dcg_sample_scores


Would it be possible to tests the public (score-averaged) functions in addition to the private ones? They are tested in common tests (with respect to symmetry invariance etc) but still there is currently no tests verifying that ndcg_score and dcg_score produce the right values.

jeromedockes · 2017-11-19T22:35:45Z

the authors average the true gains of results in a tied group before multiplying by the discount (discounts beyond k are 0)

I'm having a hard time implementing this efficiently. I have tried writing the loop explicitly, and writing it as a dot product with a sparse block diagonal matrix, but it takes a long time. It must not take a long time, because in the vast majority of cases, there shouldn't be any ties - since it is a metric for evaluating a ranking, the scores computed by the estimator should indeed induce an ordering on the labels. For example if we are scoring a document retrieval or a recommendation system, its scores should allow it to decide in which order it will display results for a user -> there shouldn't be ties, at least among the relevant results.

I'll start working on improving the tests in the meanwhile

jnothman

Assuming we deal with one row (i.e. y_score and y_true are vectors) at a time, I think you can do the tie handling with something like:

_, inv, count = np.unique(y_score, return_inverse=True, return_counts=True)
n_unique = len(count)
ranked = np.zeros(n_unique)
np.add.at(ranked, inv, y_true)  # or ranked = np.bincount(inv, weights=y_true, minlength=n_unique)
ranked /= count

I'm not sure if this is more efficient than what you've experimented with... If this slows things down a great deal, we can eventually optimise in a way that fast-paths the all-unique-scores case.

jnothman · 2017-11-29T02:10:57Z

sklearn/metrics/ranking.py

+    ranked = y_true[np.arange(ranking.shape[0])[:, np.newaxis], ranking]
+    if k is not None:
+        ranked = ranked[:, :k]
+    discount = 1 / (np.log(np.arange(ranked.shape[1]) + 2) / np.log(log_basis))


np.arange(2, k + 2) would be clearer.

jnothman · 2017-11-29T02:27:33Z

Ah sorry I forgot to make the rank descending with respect to scores... just do ranked = ranked[:, ::-1][:, :k]

jnothman · 2017-11-29T02:29:09Z

Btw, if you choose a solution with unique's return_counts, there's a backport for old numpy versions in #9976 that can be merged into this PR.

jeremiedbb

just a few nitpicks

doc/modules/model_evaluation.rst

doc/whats_new/v0.22.rst

sklearn/metrics/tests/test_ranking.py

jeromedockes · 2019-07-19T09:32:37Z

thanks @jeremiedbb !

jeremiedbb · 2019-07-19T12:43:10Z

I'm wondering if the cost of using np.unique is too high or not to determine whether we can use the fast method or if we need to use the normal method.
It could avoid having the ignore_ties parameter

jeromedockes · 2019-07-19T15:14:37Z

I'm wondering if the cost of using np.unique is too high or not to determine whether we can use the fast method or if we need to use the normal method.
It could avoid having the ignore_ties parameter

after timing a few examples actually I am not seeing such big differences anymore, maybe we can remove the ignore_ties option altogether

import time

import numpy as np

from sklearn.metrics.ranking import ndcg_score


y_true = np.random.randn(10000, 100)
y_score = np.random.randn(*y_true.shape)

# y_true = np.random.binomial(5, .2, (10000, 100))
# y_score = np.random.binomial(5, .2, y_true.shape)

start = time.time()
dcg = ndcg_score(y_true, y_score)
stop = time.time()
print('with ties:', stop - start)

start = time.time()
dcg_ignore_ties = ndcg_score(y_true, y_score, ignore_ties=True)
stop = time.time()
print('ignore ties:', stop - start)

trying with a few different sizes, I see a speedup around 5x in some cases but not much more

jeremiedbb

We can keep this parameter. It gives a bit more flexibility and 5x is not bad. Besides it just adds 3 lines in the code, and it's the easiest part to follow in the code.

I just added a small request. Besides that, LGTM !

jeremiedbb · 2019-07-19T16:04:54Z

sklearn/metrics/ranking.py

+
+
+def _tie_averaged_dcg(y_true, y_score, discount_cumsum):
+    _, inv, counts = np.unique(


I think think function deserves a comment about what it does (and how) because it's not easy to follow

jnothman

I think "basis" is not the right term... log_base might be better than log_basis but it might be best to check other parts of the library / ecosystem.

Thanks

jnothman · 2019-07-25T07:57:51Z

sklearn/metrics/ranking.py

+
+    ignore_ties : bool, optional (default=False)
+        Assume that there are no ties in y_score (which is likely to be the
+        case if y_score is continuous) for performance gains.


Performance is ambiguous. Use efficiency

jnothman · 2019-07-25T08:00:42Z

sklearn/metrics/ranking.py

+
+    """
+    gain = _dcg_sample_scores(y_true, y_score, k, ignore_ties=ignore_ties)
+    normalizing_gain = _dcg_sample_scores(y_true, y_true, k, ignore_ties=True)


Please comment why it is safe to ignore_ties here

jnothman · 2019-07-25T08:06:36Z

sklearn/metrics/ranking.py

+    np.add.at(ranked, inv, y_true)
+    ranked /= counts
+    groups = np.cumsum(counts) - 1
+    discount_sums = np.zeros(len(counts))


Use

Suggested change

discount_sums = np.zeros(len(counts))

discount_sums = np.empty(len(counts))

jnothman · 2019-07-25T08:09:25Z

sklearn/metrics/tests/test_ranking.py

+        -.2, .2, size=y_score.shape)
+    assert _dcg_sample_scores(y_true, y_score) == pytest.approx(
+        3 / np.log2(np.arange(2, 7)))
+    assert _dcg_sample_scores(y_true, y_score) == pytest.approx(


Can we use pytest.mark.parameterize to test the ignore_ties equivalence?

jnothman · 2019-07-25T08:11:30Z

sklearn/metrics/tests/test_ranking.py

+
+def test_ndcg_ignore_ties_with_k():
+    a = np.arange(12).reshape((2, 6))
+    ndcg_score(a, a, k=3, ignore_ties=True)


Shouldn't this be ensuring the result is the same as with ignore_ties=False?

jeromedockes · 2019-07-25T08:52:51Z

I think "basis" is not the right term... log_base might be better than log_basis but it might be best to check other parts of the library / ecosystem.

Thanks, "base" is indeed the right term (used in the references, in /benchmarks/bench_isotonic.py, and everywhere else -- I don't know why I wrote "basis")

glemaitre · 2019-08-12T11:28:18Z

@jnothman Do you have other changes to request?

jnothman

I'm happy with the API so will merge on the basis of the existing approvals

jnothman · 2019-08-13T03:02:19Z

Thanks @jeromedockes

jeromedockes · 2019-08-13T06:48:21Z

thanks!

jeromedockes added 4 commits October 18, 2017 20:08

add DCG and NDCG

4f70065

add wikipedia links for ndcg

0e06a80

missing blank line

3542fcd

fix ndcg test

7e5190e

jnothman reviewed Oct 18, 2017

View reviewed changes

jeromedockes added 4 commits October 19, 2017 02:57

undo unrelated change to metrics/__init__.py

f78f8cb

improve dcg and ndcg docstrings

776ed36

use check_array and check_consistent_length

17ca372

add dcg and ndcg to test_common thresholded metrics

dbf3826

TomDLT added the New Feature label Oct 20, 2017

johny-c added 3 commits October 22, 2017 19:23

backport np.unique with return_counts in sklearn.utils.fixes

f9d741f

add a test function in utils.tests.test_fixes.py

140a682

change link to correct numpy commit

ad71190

unused import

c3552f0

jnothman reviewed Nov 7, 2017

View reviewed changes

rth reviewed Nov 9, 2017

View reviewed changes

jeromedockes added 4 commits November 19, 2017 23:38

remove references in private functions' docstrings

ca48505

add test dcg and ndcg on toy examples

560ccd3

add examples to ndcg_score and dcg_score docstrings

d68ee1a

fix ndcg_score doctests

aa4bd9c

jnothman reviewed Nov 29, 2017

View reviewed changes

jeromedockes added 3 commits January 3, 2018 17:09

average ties in ndcg

e6c2c01

faster dcg tie averaging

1bc867e

Merge branch 'master' into add_ndcg

f395ba0

glemaitre changed the title ~~[MRG] Add Normalized Discounted Cumulative Gain~~ [MRG+1] Add Normalized Discounted Cumulative Gain Jul 19, 2019

Merge remote-tracking branch 'upstream/master' into add_ndcg

81ac36c

jeremiedbb reviewed Jul 19, 2019

View reviewed changes

doc/modules/model_evaluation.rst Show resolved Hide resolved

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved

sklearn/metrics/tests/test_ranking.py Outdated Show resolved Hide resolved

jeromedockes added 2 commits July 19, 2019 11:28

address comments on doc

ba2c07c

revert change unrelated to NDCG

96233cb

jeremiedbb approved these changes Jul 19, 2019

View reviewed changes

add docstring for _tie_averaged_dcg

67f4dc3

jnothman reviewed Jul 25, 2019

View reviewed changes

jeromedockes added 7 commits July 25, 2019 10:29

log_basis -> log_base

00c2d2a

performance -> efficiency

450065c

comment about ignoring ties when computing normalization in ndcg

11574ce

np.zeros -> np.empty

ccc0875

parametrize test_ndcg_toy_examples with ignore_ties

3d4f2cc

check same results with k and ignore_ties in the absence of ties

cb8a6b4

pep8

3d6d286

amueller added the Waiting for Reviewer label Aug 6, 2019

Merge branch 'master' into add_ndcg

24413d8

glemaitre self-assigned this Aug 12, 2019

Merge remote-tracking branch 'origin/master' into add_ndcg

0cafb21

jnothman reviewed Aug 13, 2019

View reviewed changes

jnothman merged commit be27d90 into scikit-learn:master Aug 13, 2019

jeromedockes deleted the add_ndcg branch August 13, 2019 06:48

adrinjalali mentioned this pull request Jul 19, 2024

NDCG in case of abscence of relevant items #29521

Open



		def _tie_averaged_dcg(y_true, y_score, discount_cumsum):
		_, inv, counts = np.unique(

	discount_sums = np.zeros(len(counts))
	discount_sums = np.empty(len(counts))

[MRG+1] Add Normalized Discounted Cumulative Gain #9951

[MRG+1] Add Normalized Discounted Cumulative Gain #9951

Conversation

jeromedockes commented Oct 18, 2017 • edited Loading

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 19, 2017 • edited Loading

Codecov Report

jnothman commented Nov 1, 2017

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeromedockes commented Nov 7, 2017 • edited Loading

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeromedockes commented Nov 19, 2017

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Nov 29, 2017 • edited Loading

jnothman commented Nov 29, 2017

jeremiedbb left a comment

Choose a reason for hiding this comment

jeromedockes commented Jul 19, 2019

jeremiedbb commented Jul 19, 2019 • edited Loading

jeromedockes commented Jul 19, 2019 • edited Loading

jeremiedbb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeromedockes commented Jul 25, 2019

glemaitre commented Aug 12, 2019

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Aug 13, 2019

jeromedockes commented Aug 13, 2019

jeromedockes commented Oct 18, 2017 •

edited

Loading

codecov bot commented Oct 19, 2017 •

edited

Loading

jeromedockes commented Nov 7, 2017 •

edited

Loading

jnothman commented Nov 29, 2017 •

edited

Loading

jeremiedbb commented Jul 19, 2019 •

edited

Loading

jeromedockes commented Jul 19, 2019 •

edited

Loading