Skip to content
Panos Ipeirotis edited this page May 19, 2022 · 4 revisions

The Dawid-Skene Algorithm

The algorithm is a variation of the expectation-maximization algorithm of Dawid and Skene from the paper: _Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm, Applied Statistics, Vol. 28, No. 1. (1979), pp. 20-28_. The algorithm runs in rounds, performing the following steps in each round:

  1. Using the labels given by multiple workers, estimate the most likely "correct" label for each object.
  2. Based on the estimated correct answer for each object, compute the error rates for each worker.
  3. Considering the error rates for each worker, recompute the most likely "correct" label for each object.
  4. Go to step 2

Differences of "Get Another Label" from The Dawid-Skene Algorithm

A few key differences with the original algorithm:

  • When evaluating the quality of a worker (i.e., its confusion matrix), we compare the labels assigned by the worker with the "most probable" category of the object. However, unlike the original Dawid-Skene, we do not take into consideration the labels of the worker in determining the category of the object, but we use only the labels assigned by the other workers that labeled this object.
  • We compute scalar quality metrics based on the expected cost of the misclassifications of the workers. The worker quality metrics have values between 0 and 1 and take into consideration the different costs of the misclassification decisions.
  • We give the ability to have fixed prior values for the categories instead of defining the priors in a maximum likelihood manner by the data
  • We allow the inclusion of "gold" data (objects with immutable categories, not modified by the algorithm) but only used to evaluate workers. (This makes the algorithm "semi"-supervised instead of completely unsupervised.)

Details on Running the Algorithm

For details on how to run the algorithm, see How-to-Run-Get-Another-Label