Clone this wiki locally
The algorithm is a variation of the expectation-maximization algorithm of Dawid and Skene from the paper: _Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm, Applied Statistics, Vol. 28, No. 1. (1979), pp. 20-28_. The algorithm runs in rounds, performing the following steps in each round:
- Using the labels given by multiple workers, estimate the most likely "correct" label for each object.
- Based on the estimated correct answer for each object, compute the error rates for each worker.
- Taking into consideration the error rates for each worker, recompute the most likely "correct" label for each object.
- Go to step 2
A few key differences with the original algorithm:
- When evaluating the quality of a worker (i.e., its own confusion matrix) we compare the labels assigned by the worker with the "most probable" category of the object. However, unlike the original Dawid-Skene, we do not take into consideration the labels of the worker in determining the category of the object but we use only the labels assigned by the other workers that labeled this object.
- We compute scalar quality metrics that are based on the expected cost of the misclassifications of the workers; these metrics are normalized to be between 0 and 1, and take into consideration the different costs of the misclassification decisions.
- We give the ability to have fixed prior values for the categories, instead of defining the priors in a maximum likelihood manner by the data
- We allow the inclusion of "gold" data, which are objects that have immutable categories that are not being modified by the algorithm but are used to evaluate workers. (This makes the algorithm "semi"-supervised instead of completely unsupervised.)
For details on how to run the algorithm see How-to-Run-Get-Another-Label