Skip to content

Specific agreement coefficient

Jeffrey Girard edited this page Nov 14, 2022 · 25 revisions

Overview

Specific agreement is an index of the reliability of categorical measurements. Unlike other measures, it describes the amount of agreement observed with regard to specific categories. Thus, multiple specific agreement scores are typically used (i.e., one for each category). With two raters, the interpretation of specific agreement for any category is the probability of one rater assigning an item to that category given that the other rater has also assigned that item to that category. With more than two raters, the interpretation becomes the probability of a randomly chosen rater assigning an item to that category given that another randomly chosen rater has also assigned that item to that category. When applied to binary (i.e., dichotomous) tasks, specific agreement on the positive category is often referred to as positive agreement (PA) and specific agreement on the negative category is often referred to as negative agreement (NA).

History

Dice (1945) proposed specific agreement as a measure of ecological association between two species. Beck et al. (1962) recognized its applicability to inter-observer reliability and used it to assess agreement between two diagnosticians on specific psychiatric disorders. Uebersax (1982) provided extended formulas that could be applied to multiple raters and designs (including those with missing data). Finally, Cicchetti and Feinstein (1990) proposed specific agreement as a solution to the problems and paradoxes associated with chance-adjusted reliability indexes. It is worth noting that positive agreement (PA) for two raters is equivalent to the $F_1$ score commonly used in computer science (Van Rijsbergen, 1979). As far as I know, I am the first person (here) to generalize the specific agreement coefficient to accept any weighting scheme.

MATLAB Functions

  • mSPECIFIC %Calculates SA using vectorized formulas

Simplified Formulas

Use these formulas with two raters and two (dichotomous) categories:


$$SA_1 = \frac{2n_{11}}{2n_{11} + n_{12} + n_{21}}$$

$$SA_2 = \frac{2n_{22}}{2n_{22} + n_{12} + n_{21}}$$


$n_{11}$ is the number of items that both raters assigned to category 1

$n_{12}$ is the number of items that rater 1 assigned to category 1 and rater 2 assigned to category 2

$n_{21}$ is the number of items that rater 1 assigned to category 2 and rater 2 assigned to category 1

$n_{22}$ is the number of items that both raters assigned to category 2

Contingency Table

Extended Formulas

Use this formula with multiple raters, multiple categories, and missing data:


$$SA_k = \frac{ \sum\limits_{i=1}^{n'} r_{ik} (r_{ik} - 1) }{ \sum_\limits{i=1}^{n'} r_{ik} (r_i - 1) }$$


$n'$ is the number of items that were coded by two or more raters

$r_{ik}$ is the number of raters that assigned item $i$ to category $k$

$r_i$ is the number of raters that assigned item $i$ to any category

References

  1. Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.
  2. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. E., & Erbaugh, J. K. (1962). Reliability of psychiatric diagnosis. 2. A study of consistency of clinical judgments and ratings. The American Journal of Psychiatry, 119, 351–357.
  3. Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the reliability of psychiatric diagnosis. British Journal of Psychiatry, 125, 341–7.
  4. Uebersax, J. S. (1982). A design-independent method for measuring the reliability of psychiatric diagnosis. Journal of Psychiatric Research, 17(4), 335–342.
  5. Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551-558.
  6. Van Rijsbergen, C. J. (1979). Information Retrieval (2nd edition). Butterworth.