`#6` Hand on (1 of 1)

---
`Page 27` : Quick workout

- There are 5 relevant documents in total.
    
    *T represents a relevant document, and F represents an irrelevant document.*
    - System A results: `FTTTTTFFFF`
    - System B results: `TTTFFFFFTT`
- Comparing the mAP values between the two systems

In [29]:
import numpy as np

def mean_average_precision(system_results, relevant_count):
    relevant_seen = 0
    precision_at_k = []
    
    for k, result in enumerate(system_results, start=1):
        if result == 'T':
            relevant_seen += 1
            precision = round(relevant_seen / k, 3)
            precision_at_k.append(precision)
    
    return round(sum(precision_at_k) / relevant_count, 3) if relevant_count > 0 else 0, precision_at_k

# System A
system_a = list('FTTTTTFFFF')
map_a, prec_a = mean_average_precision(system_a, 5)

# System B
system_b = list('TTTFFFFFTT')
map_b, prec_b = mean_average_precision(system_b, 5)

print(f"System A precisions: {prec_a}")
print(f"mAP System A: {map_a}")
print(f"System B precisions: {prec_b}")
print(f"mAP System B: {map_b}")
print(f"mAP total: {(map_a+map_b)/2:.3f}")

System A precisions: [0.5, 0.667, 0.75, 0.8, 0.833]
mAP System A: 0.71
System B precisions: [1.0, 1.0, 1.0, 0.444, 0.5]
mAP System B: 0.789
mAP total: 0.750


---
`Page 38` : In class activity

- Given an automated system used to rank reported bugs, where the most
critical bugs should be addressed first. Suppose there are 5 bugs, all deemed
critical.
    - Bug ranking system A outputs the following order: Minor, Critical, Critical, Critical,
    Critical, Minor, Critical, Minor, Minor, Minor.
    - Bug ranking system B outputs the following order: Critical, Critical, Critical, Minor,
    Minor, Minor, Minor, Critical, Critical, Minor.
    - Relevance scores are defined as: Critical = 3, Major = 2, Minor = 1.
    - Using Python, calculate the NDCG@5 for both ranking systems A and B.

In [56]:
import numpy as np

# DCG Calculation
def dcg(relevance_scores):
    return sum(rs / np.log2(i + 1) if i > 1 else rs for i, rs in enumerate(relevance_scores, start=1))

# NDCG@k Calculation
def ndcg_at_k(system_ranking, relevance_dict, k=5):
    # Actual relevances for the system ranking
    actual_relevances = [relevance_dict[bug] for bug in system_ranking[:k]]
    
    # Ideal ranking is the items sorted by relevance, from highest to lowest
    ideal_relevances = sorted(system_ranking, key=lambda x: relevance_dict[x], reverse=True)
    ideal_relevances = [relevance_dict[bug] for bug in ideal_relevances[:k]]
    
    dcg_k = dcg(actual_relevances)
    idcg_k = dcg(ideal_relevances)
    
    return dcg_k / idcg_k if idcg_k > 0 else 0, dcg_k, idcg_k

# Define relevance scores
relevance = {'Critical': 3, 'Major': 2, 'Minor': 1}

# System A Ranking
system_a_ranking = ['Minor', 'Critical', 'Critical', 'Critical', 'Critical', 'Minor' , 'Critical', 'Minor', 'Minor', 'Minor']
ndcg_a, dcg_a, idcg_a = ndcg_at_k(system_a_ranking, relevance, 5)

# System B Ranking
system_b_ranking = ['Critical', 'Critical', 'Critical', 'Minor', 'Minor', 'Minor', 'Minor', 'Critical', 'Critical', 'Minor']
ndcg_b, dcg_b, idcg_b = ndcg_at_k(system_b_ranking, relevance, 5)

print(f"System A, NDCG@5 : {ndcg_a:.4f}, DCG : {dcg_a}, IDCG : {idcg_a}")
print(f"System B, NDCG@5 : {ndcg_b:.4f}, DCG : {dcg_b}, IDCG : {idcg_b}")


System A, NDCG@5 : 0.7739, DCG : 6.845377356638177, IDCG : 8.845377356638178
System B, NDCG@5 : 0.8152, DCG : 7.210318626022307, IDCG : 8.845377356638178
