# NDCG Calculation

In this exercise, you'll have to evaluate system rankings, by computing the Normalized Discounted Cumulative Gain (NDCG) measure.

In [None]:
import ipytest
import math
import pytest

from typing import Dict, List

ipytest.autoconfig()

### Rankings produced for each query

The key is the query ID (string), the value is a list of document IDs (ints).

In [None]:
system_rankings = {
    "q1": [2, 1, 3, 4, 5, 6, 10, 7, 9, 8],
    "q2": [1, 2, 9, 4, 5, 6, 7, 8, 3, 10],
    "q3": [1, 7, 4, 5, 3, 6, 9, 8, 10, 2]
}

### Ground truth

The key is the query ID, the value is a dictionary with (document ID, relevance) pairs. Relevance is measured on a 3-point scale: non-relevant (0), poor (1), good (2), excellent (3). Documents not listed here are non-relevant (relevance=0).

In [None]:
ground_truth = {
    "q1": {4: 3, 1: 2, 2: 1},
    "q2": {3: 3, 4: 3, 1: 2, 2: 1, 8: 1},
    "q3": {1: 3, 4: 3, 7: 2, 5: 2, 6: 1, 8: 1}
}

## Computing evaluation metrics

Discounted cumulative gain at rank $k$ is computed as:

$$DCG_k = rel_1 + \sum_{i=2}^k\frac{rel_i}{\log_2 i}$$

Normalized discounted cumulative gain at rank $k$ is computed as:

$$NDCG_k = \frac{DCG_k}{IDCG_k}$$

where $IDCG_k$ is the $DCG_k$ score of an idealized (perfect) ranking.

In [None]:
def dcg(relevances: Dict[int, int], k: int) -> float:
    """Computes DCG@k, given the corresponding relevance levels for a ranked list of documents.
    
    For example, given a ranking [2, 3, 1] where the relevance levels according to the ground 
    truth are {1:3, 2:4, 3:1}, the input list will be [4, 1, 3].
    
    Args:
        relevances: List with the ground truth relevance levels corresponding to a ranked list of documents.
        k: Rank cut-off.
        
    Returns:
        DCG@k (float).
    """
    # TODO: complete
    return 0

In [None]:
def ndcg(system_ranking: List[int], ground_truth: List[int], k:int = 10) -> float:
    """Computes NDCG@k for a given system ranking.
    
    Args:
        system_ranking: Ranked list of document IDs (from most to least relevant).
        ground_truth: Dict with document ID: relevance level pairs. Document not present here are to be taken with relevance = 0.
        k: Rank cut-off.
    
    Returns:
        NDCG@k (float).
    """
    relevances = []  # Holds corresponding relevance levels for the ranked docs.
    # TODO: complete
        
    relevances_ideal = []  # Relevance levels of the idealized ranking.
    # TODO: complete
    
    return dcg(relevances, k) / dcg(relevances_ideal, k)        

Tests.

In [None]:
%%run_pytest[clean]

@pytest.mark.parametrize("qid,k,correct_value", [
    ("q1", 5, 0.799),
    ("q1", 10, 0.799),
    ("q2", 5, 0.549),
    ("q2", 10, 0.705),
    ("q3", 5, 0.908),
    ("q3", 10, 0.949),
])
def test_queries(qid, k, correct_value):    
    assert ndcg(system_rankings[qid], ground_truth[qid], k) == pytest.approx(correct_value, rel=1e-3)
