Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add Normalized Discounted Cumulative Gain #9951

Merged
merged 67 commits into from Aug 13, 2019

Conversation

@jeromedockes
Copy link
Contributor

jeromedockes commented Oct 18, 2017

After #9921, it was decided that the old implementation of NDCG would be removed (#9932), but that a new one might be useful.

Discounted Cumulative Gain and Normalized Discounted Cumulative Gain are popular ranking metrics (https://en.wikipedia.org/wiki/Discounted_cumulative_gain)

TODO:

  • handle ties correctly
  • test on toy examples
  • test boundary cases (all scores equal for some samples, perfect score)
  • test invariants due to perturbing perfect y_score
  • add narrative docs.
  • examples section in the docstring
  • add public functions to tests
Copy link
Member

jnothman left a comment

Thsnks!

Please try make use of, or extend, test_common.py. narrative docs in doc/modules/model_evaluation.rst should be added and probably a scorer in metrics/scorers.py

I've not yet looked at tests and implementation in detail.

@@ -117,5 +120,5 @@
'silhouette_score',
'v_measure_score',
'zero_one_loss',
'brier_score_loss',
'brier_score_loss'

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 18, 2017

Member

Please don't make unrelated changes if you can help it!

Parameters
----------
y_true : array, shape = [n_samples, n_labels]
True labels.

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 18, 2017

Member

Are these classes (possibly strings)? Ints? Floats?

"""
if y_true.shape != y_score.shape:
raise ValueError("y_true and y_score have different shapes")
y_true = np.atleast_2d(y_true)

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 18, 2017

Member

Usually we'd be more explicit using something like check_array and check_consistent_length

Parameters
----------
y_true : array, shape = [n_samples, n_labels]
True labels.

This comment has been minimized.

Copy link
@jnothman
@codecov

This comment has been minimized.

Copy link

codecov bot commented Oct 19, 2017

Codecov Report

Merging #9951 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9951      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62613    62674      +61     
==========================================
+ Hits        60218    60279      +61     
  Misses       2395     2395
Impacted Files Coverage Δ
sklearn/metrics/__init__.py 100% <100%> (ø) ⬆️
sklearn/metrics/ranking.py 98.93% <100%> (+0.12%) ⬆️
sklearn/metrics/tests/test_ranking.py 100% <100%> (ø) ⬆️
sklearn/feature_selection/base.py 94.76% <0%> (ø) ⬆️
sklearn/tests/test_pipeline.py 99.64% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98c4db3...dbf3826. Read the comment docs.

@TomDLT TomDLT added the New Feature label Oct 20, 2017
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 1, 2017

It would be good if you added to the PR description a list of tasks you intend to complete before changing WIP to MRG.

Copy link
Member

jnothman left a comment

I would like to see tests including:

  • known toy examples (e.g. from a reference paper or easy to calculate by hand)
  • boundary cases (all scores equal for some samples, perfect score)
  • perhaps invariants due to perturbing perfect y_score

And please add narrative docs.

Also, is there any value in supporting multiclass inputs, then binarized, as the previous implementation attempted to?

-------
normalized_discounted_cumulative_gain : float in [0., 1.]
The averaged NDCG scores for all samples.

This comment has been minimized.

Copy link
@jnothman

jnothman Nov 7, 2017

Member

Please add an Examples section where you demonstrate a simple invocation.

"multiclass-multioutput"):
raise ValueError("{0} format is not supported".format(y_type))

ranking = np.argsort(y_score)[:, ::-1]

This comment has been minimized.

Copy link
@jnothman

jnothman Nov 7, 2017

Member

should we be using rankdata to handle ties?

This comment has been minimized.

Copy link
@jeromedockes

jeromedockes Nov 7, 2017

Author Contributor

It's true that we should handle ties.

averaging the ranks of equally scored results may not work because the summation of gains has to be cutoff @ k (we need to know how many elements of a tied group fall beyond k) . In

Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores. Marc Najork, Frank McSherry. ECIR, 2008

the authors average the true gains of results in a tied group before multiplying by the discount (discounts beyond k are 0)

@jeromedockes

This comment has been minimized.

Copy link
Contributor Author

jeromedockes commented Nov 7, 2017

Sorry for the late answer. TODO list:

  • handle ties correctly
  • test on toy examples
  • test boundary cases (all scores equal for some samples, perfect score)
  • test invariants due to perturbing perfect y_score
  • add narrative docs.
  • examples section in the docstring
  • allow multiclass inputs?
  • add public functions to tests
Copy link
Member

rth left a comment

@jeromedockes Thanks for working on this! A few comments are below.

You can also edit your first post of this PR and include the above todo list there so it would be included in the PR summary view (cf github docs).

The NDCG score for each sample (float in [0., 1.]).
References
----------

This comment has been minimized.

Copy link
@rth

rth Nov 9, 2017

Member

These are internal functions, and docs won't be built for them, so I think you could remove the References section here and in _dcg_sample_scores particularly since these references can be found in the corresponding public functions.

@@ -30,6 +31,7 @@
from sklearn.metrics import label_ranking_loss
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics.ranking import _ndcg_sample_scores, _dcg_sample_scores

This comment has been minimized.

Copy link
@rth

rth Nov 9, 2017

Member

Would it be possible to tests the public (score-averaged) functions in addition to the private ones? They are tested in common tests (with respect to symmetry invariance etc) but still there is currently no tests verifying that ndcg_score and dcg_score produce the right values.

@jeromedockes

This comment has been minimized.

Copy link
Contributor Author

jeromedockes commented Nov 19, 2017

the authors average the true gains of results in a tied group before multiplying by the discount (discounts beyond k are 0)

I'm having a hard time implementing this efficiently. I have tried writing the loop explicitly, and writing it as a dot product with a sparse block diagonal matrix, but it takes a long time. It must not take a long time, because in the vast majority of cases, there shouldn't be any ties - since it is a metric for evaluating a ranking, the scores computed by the estimator should indeed induce an ordering on the labels. For example if we are scoring a document retrieval or a recommendation system, its scores should allow it to decide in which order it will display results for a user -> there shouldn't be ties, at least among the relevant results.

I'll start working on improving the tests in the meanwhile

Copy link
Member

jnothman left a comment

Assuming we deal with one row (i.e. y_score and y_true are vectors) at a time, I think you can do the tie handling with something like:

_, inv, count = np.unique(y_score, return_inverse=True, return_counts=True)
n_unique = len(count)
ranked = np.zeros(n_unique)
np.add.at(ranked, inv, y_true)  # or ranked = np.bincount(inv, weights=y_true, minlength=n_unique)
ranked /= count

I'm not sure if this is more efficient than what you've experimented with... If this slows things down a great deal, we can eventually optimise in a way that fast-paths the all-unique-scores case.

ranked = y_true[np.arange(ranking.shape[0])[:, np.newaxis], ranking]
if k is not None:
ranked = ranked[:, :k]
discount = 1 / (np.log(np.arange(ranked.shape[1]) + 2) / np.log(log_basis))

This comment has been minimized.

Copy link
@jnothman

jnothman Nov 29, 2017

Member

np.arange(2, k + 2) would be clearer.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 29, 2017

Ah sorry I forgot to make the rank descending with respect to scores... just do ranked = ranked[:, ::-1][:, :k]

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 29, 2017

Btw, if you choose a solution with unique's return_counts, there's a backport for old numpy versions in #9976 that can be merged into this PR.

@glemaitre glemaitre changed the title [MRG] Add Normalized Discounted Cumulative Gain [MRG+1] Add Normalized Discounted Cumulative Gain Jul 19, 2019
Copy link
Member

jeremiedbb left a comment

just a few nitpicks

doc/modules/model_evaluation.rst Show resolved Hide resolved
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_ranking.py Outdated Show resolved Hide resolved
@jeromedockes

This comment has been minimized.

Copy link
Contributor Author

jeromedockes commented Jul 19, 2019

thanks @jeremiedbb !

@jeremiedbb

This comment has been minimized.

Copy link
Member

jeremiedbb commented Jul 19, 2019

I'm wondering if the cost of using np.unique is too high or not to determine whether we can use the fast method or if we need to use the normal method.
It could avoid having the ignore_ties parameter

@jeromedockes

This comment has been minimized.

Copy link
Contributor Author

jeromedockes commented Jul 19, 2019

I'm wondering if the cost of using np.unique is too high or not to determine whether we can use the fast method or if we need to use the normal method.
It could avoid having the ignore_ties parameter

after timing a few examples actually I am not seeing such big differences anymore, maybe we can remove the ignore_ties option altogether

import time

import numpy as np

from sklearn.metrics.ranking import ndcg_score


y_true = np.random.randn(10000, 100)
y_score = np.random.randn(*y_true.shape)

# y_true = np.random.binomial(5, .2, (10000, 100))
# y_score = np.random.binomial(5, .2, y_true.shape)

start = time.time()
dcg = ndcg_score(y_true, y_score)
stop = time.time()
print('with ties:', stop - start)

start = time.time()
dcg_ignore_ties = ndcg_score(y_true, y_score, ignore_ties=True)
stop = time.time()
print('ignore ties:', stop - start)

trying with a few different sizes, I see a speedup around 5x in some cases but not much more

Copy link
Member

jeremiedbb left a comment

We can keep this parameter. It gives a bit more flexibility and 5x is not bad. Besides it just adds 3 lines in the code, and it's the easiest part to follow in the code.

I just added a small request. Besides that, LGTM !



def _tie_averaged_dcg(y_true, y_score, discount_cumsum):
_, inv, counts = np.unique(

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Jul 19, 2019

Member

I think think function deserves a comment about what it does (and how) because it's not easy to follow

Copy link
Member

jnothman left a comment

I think "basis" is not the right term... log_base might be better than log_basis but it might be best to check other parts of the library / ecosystem.

Thanks

ignore_ties : bool, optional (default=False)
Assume that there are no ties in y_score (which is likely to be the
case if y_score is continuous) for performance gains.

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 25, 2019

Member

Performance is ambiguous. Use efficiency

"""
gain = _dcg_sample_scores(y_true, y_score, k, ignore_ties=ignore_ties)
normalizing_gain = _dcg_sample_scores(y_true, y_true, k, ignore_ties=True)

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 25, 2019

Member

Please comment why it is safe to ignore_ties here

np.add.at(ranked, inv, y_true)
ranked /= counts
groups = np.cumsum(counts) - 1
discount_sums = np.zeros(len(counts))

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 25, 2019

Member

Use

Suggested change
discount_sums = np.zeros(len(counts))
discount_sums = np.empty(len(counts))
-.2, .2, size=y_score.shape)
assert _dcg_sample_scores(y_true, y_score) == pytest.approx(
3 / np.log2(np.arange(2, 7)))
assert _dcg_sample_scores(y_true, y_score) == pytest.approx(

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 25, 2019

Member

Can we use pytest.mark.parameterize to test the ignore_ties equivalence?


def test_ndcg_ignore_ties_with_k():
a = np.arange(12).reshape((2, 6))
ndcg_score(a, a, k=3, ignore_ties=True)

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 25, 2019

Member

Shouldn't this be ensuring the result is the same as with ignore_ties=False?

@jeromedockes

This comment has been minimized.

Copy link
Contributor Author

jeromedockes commented Jul 25, 2019

I think "basis" is not the right term... log_base might be better than log_basis but it might be best to check other parts of the library / ecosystem.

Thanks, "base" is indeed the right term (used in the references, in /benchmarks/bench_isotonic.py, and everywhere else -- I don't know why I wrote "basis")

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Aug 12, 2019

@jnothman Do you have other changes to request?

@glemaitre glemaitre self-assigned this Aug 12, 2019
Copy link
Member

jnothman left a comment

I'm happy with the API so will merge on the basis of the existing approvals

@jnothman jnothman merged commit be27d90 into scikit-learn:master Aug 13, 2019
15 checks passed
15 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
scikit-learn.scikit-learn Build #20190812.16 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 13, 2019

Thanks @jeromedockes

@jeromedockes

This comment has been minimized.

Copy link
Contributor Author

jeromedockes commented Aug 13, 2019

thanks!

@jeromedockes jeromedockes deleted the jeromedockes:add_ndcg branch Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.