Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential gain function in dcg calculation #18817

Open
zkid18 opened this issue Nov 12, 2020 · 4 comments
Open

Exponential gain function in dcg calculation #18817

zkid18 opened this issue Nov 12, 2020 · 4 comments

Comments

@zkid18
Copy link

zkid18 commented Nov 12, 2020

Describe the workflow you want to enable

Please correct me if I wrong.
Currently, the DCG score is calculated with the linear gain function.

    discount = 1 / (np.log(np.arange(y_true.shape[1]) + 2) / np.log(log_base))
    if k is not None:
        discount[k:] = 0
    if ignore_ties:
        ranking = np.argsort(y_score)[:, ::-1]
        ranked = y_true[np.arange(ranking.shape[0])[:, np.newaxis], ranking]
        cumulative_gains = discount.dot(ranked.T)
    else:
        discount_cumsum = np.cumsum(discount)
        cumulative_gains = [_tie_averaged_dcg(y_t, y_s, discount_cumsum)
                            for y_t, y_s in zip(y_true, y_score)]
        cumulative_gains = np.asarray(cumulative_gains)
    return cumulative_gains

However, in the industry, we use the alternative formulation of DCG as an exponential function, such as
gains = 2 ** y_true - 1
The exp function emphasis retrieving the highly relevant docs.

Can we add the alternative formulation of gain?

@zkid18
Copy link
Author

zkid18 commented Nov 12, 2020

A couple of references of dcg calculation with a dedicated interface for gain function selection:
https://gist.github.com/mblondel/7337391
https://github.com/catalyst-team/catalyst/blob/master/catalyst/metrics/ndcg.py (disclaimer: author of the patch)

@zkid18
Copy link
Author

zkid18 commented Nov 12, 2020

Reference for DCG calculation from cs276 Stanford slides
https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

@jeromedockes
Copy link
Contributor

I agree that as this transformation is very often used it could be useful to add a parameter that allows applying it. In the meanwhile IIUC it just amounts to providing different gains so this or any other transformation of the gains can be applied to y_true before passing it to ndcg_score: ndcg_score(2 ** y_true - 1, y_score) -- you can see the first parameter as "the true gains"

(detail: if a parameter is added to apply the exponential transformation, the implementation should check that the input gains are not too large)

@jeromedockes
Copy link
Contributor

https://github.com/catalyst-team/catalyst/blob/master/catalyst/metrics/ndcg.py (disclaimer: author of the patch)

I think this link is outdated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants