Data generation
---------------------------

First, we generate a dataset with the `peerannot simulate` command.
This dataset has 30 workers, 200 tasks for 5 classes. Each task receives 10 votes.

In [None]:
from pathlib import Path
path = (Path() / ".." / "_build" / "notebooks")
path.mkdir(exist_ok=True, parents=True)

! peerannot simulate --n-worker=30 --n-task=200  --n-classes=5 \
                     --strategy independent-confusion \
                     --feedback=10 --seed 0 \
                     --folder ../_build/notebooks/

We can visualize the generated votes and the true labels of the tasks.
For example let us consider task 5:

In [None]:
import json
import numpy as np
import matplotlib.pyplot as plt

with open(path / "answers.json") as f:
    answers = json.load(f)
gt = np.load(path / "ground_truth.npy")

print("Task 5:", answers["5"])
print("Number of votes:", len(answers["5"]))
print("Ground truth:", gt[5])
fig, ax = plt.subplots()

counts = np.bincount(list(answers["5"].values()), minlength=5)
classes = [f"class {str(i)}" for i in [0, 1, 2, 3, 4]]

ax.bar(classes, counts)
plt.yticks(range(0, max(counts)+1))
ax.set_ylabel("Number of votes")
ax.set_title("Number of votes for each class for task 5")
plt.tight_layout()
plt.show()

Command Line Aggregation
------------------------

Let us run some aggregation methods on the dataset we just generated using the command line interface.

In [None]:
for strat in ["MV", "NaiveSoft", "DS", "GLAD", "DSWC[L=5]", "Wawa"]:
    ! peerannot aggregate ../_build/notebooks/ -s {strat}

Now, as we know the ground truth we can evaluate the performance of the aggregation methods.
In this example we consider the accuracy. Other metrics such as F1-scores, precision, recall, etc. can be used.

In [None]:
import pandas as pd

def accuracy(labels, gt):
    return np.mean(labels == gt) if labels.ndim == 1 else np.mean(np.argmax(labels, axis=1) == gt)

results = {  # initialize results dictionary
    "mv": [],
    "naivesoft": [],
    "glad": [],
    "ds": [],
    "wawa": [],
    "dswc[l=5]": [],
}
for strategy in results.keys():
    path_labels = path / "labels" / f"labels_independent-confusion_{strategy}.npy"
    labels = np.load(path_labels)  # load aggregated labels
    results[strategy].append(accuracy(labels, gt))  # compute accuracy
results["NS"] = results["naivesoft"]  # rename naivesoft to NS
results.pop("naivesoft")

# Styling the results
results = pd.DataFrame(results, index=["AccTrain"])
results.columns = map(str.upper, results.columns)
results = results.style.set_table_styles(
    [dict(selector="th", props=[("text-align", "center")])]
)
results.set_properties(**{"text-align": "center"})
results = results.format(precision=3)
results

API Aggregation
------------------------

We showed how to use the command line interface, but what about the API?
It's just as simple!

In [None]:
from peerannot.models import agg_strategies

strategies = ["MV", "GLAD", "DS", "NaiveSoft", "DSWC", "Wawa"]
yhats = []
for strat in strategies:
    agg = agg_strategies[strat]
    if strat != "DSWC":
        agg = agg(answers, n_classes=5, n_workers=30, n_tasks=200, dataset=path)
    else:
        agg = agg(answers, L=5, n_classes=5, n_workers=30, n_tasks=200)
    if hasattr(agg, "run"):
        agg.run(maxiter=20)
    yhats.append(agg.get_answers())

In [None]:
results = {  # initialize results dictionary
    "mv": [],
    "glad": [],
    "ds": [],
    "naivesoft": [],
    "dswc[l=5]": [],
    "wawa": [],
}
for i, strategy in enumerate(results.keys()):
    labels = yhats[i] # load aggregated labels
    results[strategy].append(accuracy(labels, gt))  # compute accuracy
results["NS"] = results["naivesoft"]  # rename naivesoft to NS
results.pop("naivesoft")

# Styling the results
results = pd.DataFrame(results, index=["AccTrain"])
results.columns = map(str.upper, results.columns)
results = results.style.set_table_styles(
    [dict(selector="th", props=[("text-align", "center")])]
)
results.set_properties(**{"text-align": "center"})
results = results.format(precision=3)
results

The difference in performance shown result from the random tie-breaks generated.