# Analyzing and ordering reviews for proposal selection

Every year, SciPy gets $N$ proposals, yet must restrict its selection to $M << N$ for each of its track.
Selecting these is a tricky exercise, especially when trying to avoid one's own biases.
One useful way of doing so is to rely on review scores provided by reviewers to list proposals by decreasing order of presumed importance.
However, despite the guidelines on scoring reviews, reviewers don't all score the same way.
Some score high on average, making their top scores difficult to leverage as strong statements of quality regarding a paper.
Others score low on average, with the similar but opposite results.
Yet others make a deliberate effort to spread their scores across the whole $[-10, 10]$ scale, trying to align some notion of score centrality to 0.

In this notebook, we endeavor to emulate the latter scorers,
providing a projection of every review scores on a zero-centered scale with respect to the set of scores specific to each reviewer.
Aligning the median score of every reviewer to zero, we compute a linear interpolation of each of their scores so that 1.0 and -1.0 match either their $0.75$ or $0.25$ quantile respectively.
The result is a measurement of how amazed or disappointed a reviewer is with respect to their median appreciation of a proposal.

Remark that such score projections are not an authoritative recasting of reviewer scores.
We should heed reviewer comments, and still look at their nominal scores.
However, the mean score projection for each review turns out to be useful in a pivot table analysis,
enabling a conference chair to list proposals by decreasing order of how _wowed_ reviewers were by papers on average,
and this can make the selection process a little easier.

In [None]:
%pip install python-dotenv pandas pyarrow requests tqdm ipywidgets

In [None]:
from contextlib import closing
import json
import numpy as np
import os
import pandas as pd
import requests as rq
from tqdm.auto import tqdm

We get the data necessary for this work out of Pretalx, to which we authenticate using our API token.
In order to avoid writing up tokens in notebook code
(which I keep forgetting to take out before committing),
let's write it up in a file named `.env` in the current directory.
All this file needs to contain is the following:

```
TOKEN = "API token copy-pasted out of one's Pretalx profile"
```

Fetch the token from [here](https://cfp.scipy.org/orga/me),
then use the following `%%writefile` cell magic to set it up once
(by turning it to a code cell).
Once done, make it a raw cell again and carry on.

In [None]:
%load_ext dotenv
%dotenv

Check here that the token was properly loaded.
On error, check that you effectively wrote up a file named `.env` in the current directory,
and that this file contains the definition of a `TOKEN` as described above.
If everything looks fine, run the previous cell again to retry.

In [None]:
TOKEN = os.environ["TOKEN"]

## Fetch proposals and reviews

Both proposals (_submissions_ in Pretalx talx) and reviews are accessible through HTTP request/response streams through the Pretalx API.
Let's fetch all of them.

In [None]:
def fetch_sequence_cfp_scipy(url1, max_queries=50):
    sequence = []
    url = url1
    max_queries = 50
    num_queries = 0
    num_results_expected = None

    with closing(tqdm(total=max_queries)) as progress:
        while True:
            response = rq.get(url, headers={"Authorization": f"Token {TOKEN}"})
            assert response.ok
            data = response.json()
            progress.update()
            num_queries += 1

            assert "results" in data
            assert "next" in data

            if num_results_expected is None and "count" in data:
                num_results_expected = data["count"]
                max_queries = int(np.ceil(num_results_expected / len(data["results"])))
                progress.reset(max_queries)
                progress.update(num_queries)
            else:
                assert num_results_expected == data["count"]

            sequence += data["results"]
            url = data["next"]
            if not url:
                break

    return sequence

In [None]:
submissions_ = fetch_sequence_cfp_scipy(
    "https://cfp.scipy.org/api/events/2024/submissions/"
)
len(submissions_)

In [None]:
reviews_ = fetch_sequence_cfp_scipy("https://cfp.scipy.org/api/events/2024/reviews/")
len(reviews_)

Submissions are delivered in JSON form, let's make them into a Pandas data frame.

In [None]:
submissions = pd.DataFrame.from_records(submissions_)
submissions

Bit of data clean-up.

In [None]:
submissions["submission_type"] = submissions["submission_type"].apply(
    lambda x: x["en"] if isinstance(x, dict) else x
)
submissions

Now, reviews, along with its own data clean-up.

In [None]:
reviews = pd.DataFrame.from_records(reviews_)
reviews

In [None]:
reviews["score"] = reviews["score"].map(float)
reviews

## Compute score projections

The score projection scheme described [above](#score-projection) involves computing the $(0.25, 0.5, 0.75)$ quantiles of review scores, grouped by reviewer.

In [None]:
score_quantiles = (
    reviews.groupby("user", as_index=False)
    .agg({"score": lambda g: list(g.quantile(q=[0.25, 0.5, 0.75]))})
    .rename(columns={"score": "quantiles"})
)
score_quantiles

Compute score projections and append them to other relevant review parameters.

In [None]:
def normalize_by_quantiles(score, q_low, med, q_up):
    if score <= med:
        if med == q_low:
            return 0.0
        else:
            return (score - med) / (med - q_low)
    else:
        if med == q_up:
            return 0.0
        else:
            return (score - med) / (q_up - med)


reviews_proj = reviews[["submission", "text", "user", "score"]].merge(
    score_quantiles, on="user"
)
reviews_proj["projection"] = [
    normalize_by_quantiles(score, q_low, med, q_up)
    for score, (q_low, med, q_up) in reviews_norm[["score", "quantiles"]].itertuples(
        index=False
    )
]
reviews_proj

And now join submissions to reviews.

In [None]:
submissions_reviewed = (
    submissions.assign(
        authors=submissions["speakers"].map(lambda x: ", ".join(a["name"] for a in x))
    )[["code", "authors", "title"]]
    .merge(
        reviews_proj[["submission", "user", "text", "score", "projection"]],
        how="left",
        left_on="code",
        right_on="submission",
    )
    .drop(columns=["submission"])
)
submissions_reviewed

## Eyeball it all 👀

To display submissions in some order of aggregate score or projection, compute these aggregates over `submissions_reviewed` and use `.sort_values()`.

In [None]:
submissions_reviewed_agg = (
    submissions_reviewed.groupby(["code", "authors", "title"], as_index=False)
    .agg({"text": "count", "score": "median", "projection": "mean"})
    .rename(columns={"text": "num_reviews"})
)
submissions_reviewed_agg

In [None]:
submissions_reviewed_agg.sort_values("score", ascending=False)

In [None]:
submissions_reviewed_agg.sort_values("projection", ascending=False)

Do we have any submissions for which we don't have a review?

In [None]:
submissions_reviewed_agg.loc[submissions_reviewed_agg["score"].isna()]

One may go and check in Pretalx whether these submissions got reviewed, and if they do have any, get on with debugging.
One funny quirk is that if the user running the analysis has submitted a proposal,
this user is barred from seeing the raw reviews their proposal received.
Hence, their proposal will show up in the previous listing as if it went unreviewed.