Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different NDCG@10 score #19

Closed
w-y-li opened this issue Feb 22, 2024 · 4 comments
Closed

different NDCG@10 score #19

w-y-li opened this issue Feb 22, 2024 · 4 comments

Comments

@w-y-li
Copy link

w-y-li commented Feb 22, 2024

Sorry to bother you,I have two small questions:

  1. I noticed that your ndcg@10 score of BM25 of the NFcorpus dataset is different from Pyserini's BM25 flat score,but other datasets are consistent,is there anything special?

  2. I have also try to construct the NDCG function instead of directly use the trec-eval,but they have different scores,do you have any ideas if you can help me look at this,really thanks!
    my function:

def ndcg(golden, current): # golden:original unranked order list.   current:current ranked list
    log2_table = np.log2(np.arange(2, len(golden) + 2))

    def dcg_at_n(rel, n):
        rel = np.asfarray(rel)[:n]

        dcg = np.sum(np.divide(np.power(2, rel) - 1,
                               log2_table[:rel.shape[0]]))
        return dcg

    k = len(current)
    idcg = dcg_at_n(sorted(golden, reverse=True), n=k)
    dcg = dcg_at_n(current, n=k)
    tmp_ndcg = 0 if idcg == 0 else dcg / idcg

    return tmp_ndcg
@sunnweiwei
Copy link
Owner

Hi,

  1. I did not perform any special processing on this data. I am currently unsure why the results are different.
  2. This code does not seem to use the golden relevance score when calculating the DCG (dcg = dcg_at_n(current, n=k)), which could be problematic (ref. https://en.wikipedia.org/wiki/Discounted_cumulative_gain).

@w-y-li
Copy link
Author

w-y-li commented Feb 25, 2024

Thanks for your help,however after I double check the code,I still don't find any problem with it according to the formula of DCG,I am still confuse about my code.
To emphasize,the golden and current are both the list,both item of the two list are the relevance score but not the docid,and the order of the two list are the ranking-order.
Really thanks for your help!

@w-y-li
Copy link
Author

w-y-li commented Mar 4, 2024

Excuse me,do you have any idea?

@sunnweiwei
Copy link
Owner

Hi, I am uncertain about the problem. Regarding the function input, golden should be the ground truth doc relevance sorted by ground truth order, and current should be the ground truth (not model-predicted) doc relevance scores sorted in the order predicted by the model. If there's no issue with the input either, then the problem might be related to the truncation of the doc list. This code (https://github.com/cvangysel/pytrec_eval/blob/master/benchmarks/native_python_vs_pytrec_eval.py) includes a Python implementation of NDCG and compares it with the C++ implementation in pytrec_eval, which might be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants