# Assignment 2A, Part 1: Evaluation

You are given two sample files, `data/sample_ranking.csv` and `data/sample_qrels.csv`, to test your solution.

This notebook is to be used for evaluating the rankings generated in [Part 2](2_Retrieval.ipynb) and [Part 3](3_Multifield_retrieval.ipynb).

In [1]:
#RANKING_FILE = "data/bm25_output.csv"  # file with the document rankings
RANKING_FILE = "data/lm_output.csv"  # file with the document rankings
QRELS_FILE = "data/qrels2.csv"  # file with the relevance judgments (ground truth)

**TODO**: Complete the function that calculates evaluation metrics for a given a ranking (`ranking`) against the ground truth (`gt`). It should return the results as a dictionary, where the key is the retrieval metric.

(Hint: see [Exercises #1 and #2 from Lecture 8](https://github.com/kbalog/uis-dat640-fall2019/tree/master/exercises/lecture_08).)

In [2]:
def eval_query(ranking, gt):
    """Calculates the ranking against the ground truth for a given query."""
    p10, ap, rr, num_rel = 0, 0, 0, 0
    
    for i, doc_id in enumerate(ranking):
        if doc_id in gt:  # doc is relevant
            num_rel += 1  
            pi = num_rel / (i + 1)  # P@i
            ap += pi  # AP
            if i < 10:  # P@10
                p10 += 1
            if rr == 0:  # Reciprocal rank
                rr = 1 / (i + 1)

    
    p10 /= 10
    ap /= len(gt)   # divide by the number of relevant documents
    
    return {"P10": p10, "AP": ap, "RR": rr}

**TODO**: Complete the function that evaluates an output file, which contains rankings for a set of queries. It is almost complete, you just need to add the computation of mean scores (over the entire query set).

In [3]:
def eval(gt_file, output_file):
    """Prints evaluation scores for each query as well as the means over the query set."""
    # load data from ground truth file
    gt = {}  # holds a list of relevant documents for each queryID
    with open(gt_file, "r") as fin:
        header = fin.readline().strip()
        if header != "queryID,docIDs":
            raise Exception("Incorrect file format!")
        for line in fin.readlines():
            qid, docids = line.strip().split(",")
            gt[qid] = docids.split()
         
    # load data from output file
    output = {}
    with open(output_file, "r") as fin:
        header = fin.readline().strip()
        if header != "QueryId,DocumentId":
            raise Exception("Incorrect file format!")
        for line in fin.readlines():
            qid, docid = line.strip().split(",")
            if qid not in output:
                output[qid] = []
            output[qid].append(docid)
    # evaluate each query that is in the ground truth
    print("  QID  P@10   (M)AP  (M)RR")
    sum_p10, sum_ap, sum_rr = 0, 0, 0
    for qid in sorted(gt.keys()):
        
        res = eval_query(output.get(qid, []), gt.get(qid, []))
        sum_p10 += res["P10"]
        sum_ap += res["AP"]
        sum_rr += res["RR"]

        print("%5s %6.3f %6.3f %6.3f" % (qid, res["P10"], res["AP"], res["RR"]))
    
    # TODO compute averages over the entire query set
    
    # print averages
    print("%5s %6.3f %6.3f %6.3f" % ("ALL", sum_p10, sum_ap, sum_rr))
    
    print("%5s %6.3f %6.3f %6.3f" %  ('AVG',round(sum_p10 / len(output), 3), round(sum_ap / len(output), 3), round(sum_rr / len(output), 3)))

### Main

In [4]:
RANKING_FILE = "data/lm_jm_output.csv"  # file with the document rankings
eval(QRELS_FILE, RANKING_FILE)

FileNotFoundError: [Errno 2] No such file or directory: 'data/lm_jm_output.csv'

In [5]:
RANKING_FILE = "data/bm25_output.csv"  # file with the document rankings
eval(QRELS_FILE, RANKING_FILE)

  QID  P@10   (M)AP  (M)RR
  303  0.400  0.324  1.000
  307  0.100  0.028  0.143
  310  0.100  0.016  0.143
  314  0.100  0.071  0.500
  322  0.000  0.005  0.077
  325  0.000  0.000  0.000
  330  0.000  0.000  0.000
  336  0.300  0.341  0.333
  341  0.100  0.043  0.143
  344  0.100  0.010  0.111
  347  0.500  0.068  1.000
  353  0.000  0.000  0.000
  354  0.800  0.044  1.000
  362  0.600  0.210  0.333
  363  0.500  0.106  1.000
  367  0.000  0.019  0.091
  372  0.000  0.016  0.031
  374  0.500  0.215  0.200
  383  0.300  0.030  0.143
  389  0.000  0.000  0.000
  393  0.000  0.018  0.091
  399  0.700  0.172  1.000
  401  0.400  0.052  1.000
  404  0.200  0.028  0.250
  408  0.100  0.005  0.500
  409  0.200  0.107  1.000
  416  0.000  0.048  0.071
  419  0.300  0.140  0.333
  426  0.000  0.031  0.050
  427  0.600  0.431  0.500
  433  0.200  0.156  1.000
  435  0.100  0.021  0.333
  436  0.400  0.131  1.000
  439  0.000  0.007  0.077
  443  0.300  0.074  0.500
  448  0.000  0.002  0.050
 

In [31]:
RANKING_FILE = "data/lm_dir_output.csv"  # file with the document rankings
eval(QRELS_FILE, RANKING_FILE)

  QID  P@10   (M)AP  (M)RR
  303  0.000  0.000  0.000
  307  0.000  0.000  0.000
  310  0.000  0.000  0.000
  314  0.000  0.000  0.000
  322  0.000  0.000  0.000
  325  0.000  0.000  0.000
  330  0.000  0.000  0.000
  336  0.000  0.000  0.000
  341  0.000  0.000  0.000
  344  0.000  0.000  0.000
  347  0.200  0.022  0.200
  353  0.000  0.000  0.000
  354  0.300  0.016  0.167
  362  0.000  0.000  0.000
  363  0.000  0.000  0.018
  367  0.200  0.027  0.500
  372  0.000  0.000  0.000
  374  0.000  0.000  0.000
  383  0.000  0.000  0.000
  389  0.000  0.000  0.000
  393  0.000  0.000  0.000
  399  0.100  0.008  1.000
  401  0.000  0.000  0.010
  404  0.000  0.000  0.000
  408  0.000  0.000  0.011
  409  0.000  0.000  0.000
  416  0.000  0.000  0.000
  419  0.000  0.004  0.050
  426  0.000  0.000  0.000
  427  0.000  0.000  0.000
  433  0.000  0.000  0.000
  435  0.000  0.000  0.000
  436  0.000  0.000  0.000
  439  0.000  0.000  0.000
  443  0.000  0.000  0.000
  448  0.000  0.000  0.019
 

In [23]:
RANKING_FILE = "data/bm25F_output.csv"  # file with the document rankings
eval(QRELS_FILE, RANKING_FILE)

  QID  P@10   (M)AP  (M)RR
  303  0.600  0.317  1.000
  307  0.100  0.033  0.167
  310  0.000  0.010  0.091
  314  0.100  0.071  0.500
  322  0.000  0.003  0.043
  325  0.000  0.000  0.000
  330  0.000  0.000  0.000
  336  0.300  0.299  0.333
  341  0.000  0.011  0.019
  344  0.100  0.011  0.125
  347  0.500  0.052  1.000
  353  0.000  0.000  0.000
  354  0.900  0.050  1.000
  362  0.800  0.237  0.500
  363  0.600  0.106  1.000
  367  0.000  0.029  0.071
  372  0.000  0.017  0.033
  374  0.600  0.247  1.000
  383  0.100  0.022  0.125
  389  0.000  0.000  0.000
  393  0.000  0.014  0.071
  399  0.900  0.172  1.000
  401  0.200  0.031  0.333
  404  0.200  0.013  0.167
  408  0.000  0.001  0.034
  409  0.400  0.098  0.500
  416  0.100  0.026  0.100
  419  0.300  0.177  0.500
  426  0.000  0.027  0.042
  427  0.700  0.456  0.500
  433  0.200  0.157  1.000
  435  0.100  0.029  0.167
  436  0.600  0.115  0.500
  439  0.000  0.006  0.071
  443  0.400  0.101  0.500
  448  0.000  0.002  0.042
 

In [7]:
RANKING_FILE = "data/output_mlm_jm.csv"  # file with the document rankings
eval(QRELS_FILE, RANKING_FILE)

  QID  P@10   (M)AP  (M)RR
  303  0.700  0.383  1.000
  307  0.000  0.000  0.000
  310  0.000  0.000  0.002
  314  0.000  0.000  0.000
  322  0.000  0.000  0.000
  325  0.000  0.009  0.009
  330  0.000  0.000  0.000
  336  0.000  0.001  0.004
  341  0.000  0.001  0.002
  344  0.000  0.000  0.001
  347  0.000  0.050  0.091
  353  0.000  0.000  0.000
  354  0.200  0.123  0.500
  362  0.100  0.009  0.167
  363  0.000  0.001  0.001
  367  0.000  0.042  0.042
  372  0.000  0.000  0.000
  374  0.400  0.362  0.200
  383  0.000  0.004  0.007
  389  0.000  0.000  0.000
  393  0.000  0.010  0.023
  399  0.400  0.107  0.200
  401  0.000  0.001  0.031
  404  0.000  0.010  0.026
  408  0.000  0.023  0.017
  409  0.000  0.001  0.000
  416  0.000  0.001  0.000
  419  0.000  0.009  0.014
  426  0.000  0.001  0.001
  427  0.000  0.086  0.059
  433  0.000  0.000  0.000
  435  0.000  0.002  0.000
  436  0.000  0.017  0.043
  439  0.100  0.029  0.167
  443  0.000  0.001  0.001
  448  0.000  0.009  0.006
 