## Simulate the dataset

We simulate 40 workers in a hammer-spammer setting. There are $100\times 0.7=70$ spammers that will answer randomly. All other workers answer the true labels.

In [None]:
from pathlib import Path
path = (Path() / ".." / "_build" / "notebooks")
path.mkdir(exist_ok=True, parents=True)

! peerannot simulate --n-worker=100 --n-task=300  --n-classes=5 \
                     --strategy hammer-spammer \
                     --ratio 0.7 \
                     --feedback=10 --seed 0 \
                     --folder {path}

Note that if the dataset comes with an install file (like the `LabelMe` dataset available in peerannot), simply run the install file to download the dataset:

```
$ peerannot install labelme.py
```

Below, we always precise where the labels are stored in the dataset. This is to hilight that multiple datasets can be used with the same code, as long as the labels are stored in the same way.

## Value of the krippendorff alpha

The closer to 0, the less reliable the data. The closer to 1, the more reliable the data.


In [None]:
! peerannot identify -s krippendorffalpha {path} \
                     --labels {path}/answers.json \
                     --n-classes 5


We obtain $\alpha\simeq 0.08$ which indicates that the data is not reliable.

## Identify spammers

If there are ambiguities, we can identify spammers by looking at the spammer score. The closer to 0, the more likely the annotator is a spammer.

In [None]:
! peerannot identify -s spam_score {path} \
                     --labels {path}/answers.json \
                     --n-classes 5

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

spam_scores = np.load(path / "identification" / "spam_score.npy")
plt.figure()
plt.hist(spam_scores, bins=20)
plt.xlabel("Spam score")
plt.ylabel("Count")
plt.show()

We can get the ID of workers with a spam score below $0.5$:

In [None]:
print(np.where(spam_scores < 0.5))

## Aggregation with and without identification

In [None]:
from peerannot.models import Dawid_Skene as DS
from peerannot.models import MV
import json

with open(path / "answers.json") as f:
    answers = json.load(f)

gt = np.load(path / "ground_truth.npy")

In [None]:
y_mv = MV(answers, n_classes=5).get_answers()
ds = DS(answers, n_classes=5, n_workers=100)
ds.run()
y_ds = ds.get_answers()
print(f"""
        - MV accuracy: {np.mean(y_mv == gt)}
        - DS accuracy: {np.mean(y_ds == gt)}
      """)

Because the DS model models the confusions, it was able to generate better predicions than the majority vote. Let's see if we can identify the spammers and improve the predictions.

In [None]:
id_spammers = list(np.where(spam_scores < 0.5)[0])

ans_cleaned = {}
worker_ids = {}
for task in answers.keys():
    ans_cleaned[task] = {}
    for worker, label in answers[task].items():
        if int(worker) in id_spammers:
            pass
        else:
            if worker_ids.get(worker, None) is None:
                worker_ids[worker] = len(worker_ids)
            ans_cleaned[task][worker_ids[worker]] = label

In [None]:
y_mv = MV(ans_cleaned, n_classes=5).get_answers()
ds = DS(ans_cleaned, n_classes=5, n_workers=len(worker_ids))
ds.run()
y_ds = ds.get_answers()
print(
    f"""
        - MV accuracy: {np.mean(y_mv == gt)}
        - DS accuracy: {np.mean(y_ds == gt)}
      """
)

Now that we cleaned the data, we can aggregate the labels again and obtain a majority vote that performs as good as the DS strategy !

Similar modifications can be done by identifying the ambiguous tasks and not the workers.