Specific agreement coefficient

Jeffrey M Girard edited this page May 10, 2018 · 20 revisions

Overview

Specific agreement is an index of the reliability of categorical measurements. Unlike other measures, it describes the amount of agreement observed with regard to specific categories. Thus, multiple specific agreement scores are typically used (i.e., one for each category). With two raters, the interpretation of specific agreement for any category is the probability of one rater assigning an item to that category given that the other rater has also assigned that item to that category. With more than two raters, the interpretation becomes the probability of a randomly chosen rater assigning an item to that category given that another randomly chosen rater has also assigned that item to that category. When applied to binary (i.e., dichotomous) tasks, specific agreement on the positive category is often referred to as positive agreement (PA) and specific agreement on the negative category is often referred to as negative agreement (NA).

History

Dice (1945) proposed specific agreement as a measure of ecological association between two species. Beck et al. (1962) recognized its applicability to inter-observer reliability and used it to assess agreement between two diagnosticians on specific psychiatric disorders. Uebersax (1982) provided extended formulas that could be applied to multiple raters and designs (including those with missing data). Finally, Cicchetti and Feinstein (1990) proposed specific agreement as a solution to the problems and paradoxes associated with chance-adjusted reliability indexes. It is worth noting that positive agreement (PA) for two raters is equivalent to the F1 score commonly used in computer science (Van Rijsbergen, 1979). As far as I know, I am the first person (here) to generalize the specific agreement coefficient to accept any weighting scheme.

MATLAB Functions

  • mSPECIFIC %Calculates SA using vectorized formulas

Simplified Formulas

Use these formulas with two raters and two (dichotomous) categories:


SA_1

SA_2


n_11 is the number of items that both raters assigned to category 1

n_12 is the number of items that rater 1 assigned to category 1 and rater 2 assigned to category 2

n_21 is the number of items that rater 1 assigned to category 2 and rater 2 assigned to category 1

n_22 is the number of items that both raters assigned to category 2

Contingency Table

Extended Formulas

Use this formula with multiple raters, multiple categories, and missing data:


SA_k


n' is the number of items that were coded by two or more raters

r_ik is the number of raters that assigned item i to category k

r_i is the number of raters that assigned item i to any category

References

  1. Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.
  2. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. E., & Erbaugh, J. K. (1962). Reliability of psychiatric diagnosis. 2. A study of consistency of clinical judgments and ratings. The American Journal of Psychiatry, 119, 351–357.
  3. Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the reliability of psychiatric diagnosis. British Journal of Psychiatry, 125, 341–7.
  4. Uebersax, J. S. (1982). A design-independent method for measuring the reliability of psychiatric diagnosis. Journal of Psychiatric Research, 17(4), 335–342.
  5. Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551-558.
  6. Van Rijsbergen, C. J. (1979). Information Retrieval (2nd edition). Butterworth.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.