different NDCG@10 score #19

w-y-li · 2024-02-22T16:39:59Z

Sorry to bother you,I have two small questions:

I noticed that your ndcg@10 score of BM25 of the NFcorpus dataset is different from Pyserini's BM25 flat score,but other datasets are consistent,is there anything special?
I have also try to construct the NDCG function instead of directly use the trec-eval,but they have different scores,do you have any ideas if you can help me look at this,really thanks!
my function:

def ndcg(golden, current): # golden:original unranked order list.   current:current ranked list
    log2_table = np.log2(np.arange(2, len(golden) + 2))

    def dcg_at_n(rel, n):
        rel = np.asfarray(rel)[:n]

        dcg = np.sum(np.divide(np.power(2, rel) - 1,
                               log2_table[:rel.shape[0]]))
        return dcg

    k = len(current)
    idcg = dcg_at_n(sorted(golden, reverse=True), n=k)
    dcg = dcg_at_n(current, n=k)
    tmp_ndcg = 0 if idcg == 0 else dcg / idcg

    return tmp_ndcg

The text was updated successfully, but these errors were encountered:

sunnweiwei · 2024-02-24T21:39:50Z

Hi,

I did not perform any special processing on this data. I am currently unsure why the results are different.
This code does not seem to use the golden relevance score when calculating the DCG (dcg = dcg_at_n(current, n=k)), which could be problematic (ref. https://en.wikipedia.org/wiki/Discounted_cumulative_gain).

w-y-li · 2024-02-25T11:36:16Z

Thanks for your help,however after I double check the code,I still don't find any problem with it according to the formula of DCG,I am still confuse about my code.
To emphasize,the golden and current are both the list,both item of the two list are the relevance score but not the docid,and the order of the two list are the ranking-order.
Really thanks for your help!

w-y-li · 2024-03-04T11:34:40Z

Excuse me,do you have any idea?

sunnweiwei · 2024-03-04T16:14:06Z

Hi, I am uncertain about the problem. Regarding the function input, golden should be the ground truth doc relevance sorted by ground truth order, and current should be the ground truth (not model-predicted) doc relevance scores sorted in the order predicted by the model. If there's no issue with the input either, then the problem might be related to the truncation of the doc list. This code (https://github.com/cvangysel/pytrec_eval/blob/master/benchmarks/native_python_vs_pytrec_eval.py) includes a Python implementation of NDCG and compares it with the C++ implementation in pytrec_eval, which might be helpful.

Albert-Ma closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different NDCG@10 score #19

different NDCG@10 score #19

w-y-li commented Feb 22, 2024

sunnweiwei commented Feb 24, 2024

w-y-li commented Feb 25, 2024

w-y-li commented Mar 4, 2024

sunnweiwei commented Mar 4, 2024

different NDCG@10 score #19

different NDCG@10 score #19

Comments

w-y-li commented Feb 22, 2024

sunnweiwei commented Feb 24, 2024

w-y-li commented Feb 25, 2024

w-y-li commented Mar 4, 2024

sunnweiwei commented Mar 4, 2024