# Task 1 Evaluation

This notebook contains the evaluation for Task 1 of the TREC Fair Ranking track.

## Setup

We begin by loading necessary libraries:

In [None]:
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import gzip
import binpickle

Set up progress bar and logging support:

In [None]:
from tqdm.auto import tqdm
tqdm.pandas(leave=False)

In [None]:
import sys, logging
logging.basicConfig(level=logging.INFO, stream=sys.stderr)
log = logging.getLogger('task1-eval')

Import metric code:

In [None]:
import metrics
from trecdata import scan_runs

And finally import the metric itself:

In [None]:
metric = binpickle.load('task1-eval-metric.bpk')

## Importing Data

Let's load the runs now:

In [None]:
runs = pd.DataFrame.from_records(row for (task, rows) in scan_runs() if task == 1 for row in rows)
runs

Since we only have annotations for the first 20 for each run, limit the data:

In [None]:
runs = runs[runs['rank'] <= 20]

## Computing Metrics

We are now ready to compute the metric for each (system,topic) pair.  Let's go!

In [None]:
rank_awrf = runs.groupby(['run_name', 'topic_id'])['page_id'].progress_apply(metric)
rank_awrf = rank_awrf.unstack()
rank_awrf

Now let's average by runs:

In [None]:
run_scores = rank_awrf.groupby('run_name').mean()
run_scores.sort_values('Score', ascending=False)

## Analyzing Scores

What is the distribution of scores?

In [None]:
run_scores.describe()

In [None]:
sns.displot(x='Score', data=run_scores)
plt.show()

In [None]:
sns.relplot(x='nDCG', y='AWRF', data=run_scores)
sns.rugplot(x='nDCG', y='AWRF', data=run_scores)
plt.show()

## Per-Topic Stats

We need to return per-topic stats to each participant, at least for the score.

In [None]:
topic_stats = rank_awrf.groupby('topic_id').agg(['mean', 'median', 'min', 'max'])
topic_stats

Make final score analysis:

In [None]:
topic_range = topic_stats.loc[:, 'Score']
topic_range = topic_range.drop(columns=['mean'])
topic_range

And now we combine scores with these results to return to participants.

In [None]:
ret_dir = Path('results')
for system, runs in rank_awrf.groupby('run_name'):
    aug = runs.join(topic_range).reset_index().drop(columns=['run_name'])
    fn = ret_dir / f'{system}.tsv'
    log.info('writing %s', fn)
    aug.to_csv(fn, sep='\t', index=False)