# Walking through the nitty gritty: nDCG calculations

## nDCG Calculations w/ worked example

In a given testset, User 001 has ten different ratings. For nDCG, we don't actually care what movie they are for; we just care about the numercial values of the ratings. In this case, I'll make up some ratings for a user that we can use as a worked example.

In [None]:
ratings_in_testset = [3, 4, 5, 1, 2, 3, 4, 5, 5, 4, ]

For these 10 items, the RecSys will estimate a rating, presumably with some error. In this worked example, let's assume the error is an alternating plus or minus 0.5.


In [None]:
estimated_vals = []
flip = True
for rating in ratings_in_testset:
    if flip:
        estimated_vals.append(rating - 0.5)
    else:
        estimated_vals.append(rating + 0.5)
    flip = not flip

Currently, the function that calculates precision, recall, and nDCG expects a list of tuples, with each tuple being a pair of (estimated value, true value)

In [None]:

user_ratings = [(x, y) for x, y in zip(estimated_vals, ratings_in_testset)]
user_ratings

In [None]:
# Now here's the calculations
# Sort user ratings by estimated value
user_ratings_sorted_by_est = sorted(user_ratings, key=lambda x: x[0], reverse=True)
user_ratings_sorted_by_est

In [None]:
# also need to sort by true value for Ideal DCG
user_ratings_sorted_by_true = sorted(user_ratings, key=lambda x: x[1], reverse=True)
user_ratings_sorted_by_true

We're going to need to define a function that calculates DCG for a given list of ratings. Let's use the formula defined in this paper from MSR: https://dl.acm.org/citation.cfm?doid=1102351.1102363

The numerator is (2^relevance_score - 1) in this definition (others just use relevance_score as the definition).

In [None]:
def dcg_at_k(ratings):
    """
    Discounted cumulative gain at k
    https://en.wikipedia.org/wiki/Discounted_cumulative_gain
    Using formula from this MSR IR paper:
    https://dl.acm.org/citation.cfm?doid=1102351.1102363

    k is assumed to be the length of the input list
    args:
        ratings: a list of relevance scores, e.g. explicit ratings 1-5
    returns:
        a dcg_at_k value
    """
    k = len(ratings)

    return sum([
        (2 ** rating - 1) / 
        (np.math.log(i + 1, 2))
        for rating, i in zip(ratings, range(1, k+1))
    ])

We can use the ratings sorted by true values to calculate ideal nDCG for various k values. In this example, let's just do 10 and 5.

We'll want to get the first k TRUE RATINGS from the list sorted by true ratings as well as the list sorted by estimated ratings.

In [None]:
true_ratings_of_first_10_true = [x[1] for x in user_ratings_sorted_by_true[:10]]
true_ratings_of_first_10_est = [x[1] for x in user_ratings_sorted_by_est[:10]]

In [None]:
true_ratings_of_first_5_true = [x[1] for x in user_ratings_sorted_by_true[:5]]
true_ratings_of_first_5_est = [x[1] for x in user_ratings_sorted_by_est[:5]]

In [None]:
ideal_dcg_at_10 = dcg_at_k(true_ratings_of_first_10_true)
ideal_dcg_at_5 = dcg_at_k(true_ratings_of_first_5_true)
print('At 10:', ideal_dcg_at_10, 'At 5:', ideal_dcg_at_5)

Now calculate the dcg based on estimated values

In [None]:
est_dcg_at_10 = dcg_at_k(true_ratings_of_first_10_est)
est_dcg_at_5 = dcg_at_k(true_ratings_of_first_5_est)
print('At 10:', est_dcg_at_10, 'At 5:', est_dcg_at_5)

And finally, we can add the n to nDCG by normalizing!

In [None]:
ndcg_at_10 = est_dcg_at_10 / ideal_dcg_at_10
ndcg_at_5 = est_dcg_at_5 / ideal_dcg_at_5
print('nDCG@10:', ndcg_at_10, 'nDCG@5:', ndcg_at_5)

## Some problems that might arise...

What if a user doesn't have ten ratings in a testset? How do we compute nDCG@10 for that testset?