Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Specific agreement coefficient
Specific agreement is an index of the reliability of categorical measurements. Unlike other measures, it describes the amount of agreement observed with regard to specific categories. Thus, multiple specific agreement scores are typically used (i.e., one for each category). With two raters, the interpretation of specific agreement for any category is the probability of one rater assigning an item to that category given that the other rater has also assigned that item to that category. With more than two raters, the interpretation becomes the probability of a randomly chosen rater assigning an item to that category given that another randomly chosen rater has also assigned that item to that category. When applied to binary (i.e., dichotomous) tasks, specific agreement on the positive category is often referred to as positive agreement (PA) and specific agreement on the negative category is often referred to as negative agreement (NA).
Dice (1945) proposed specific agreement as a measure of ecological association between two species. Beck et al. (1962) recognized its applicability to inter-observer reliability and used it to assess agreement between two diagnosticians on specific psychiatric disorders. Uebersax (1982) provided extended formulas that could be applied to multiple raters and designs (including those with missing data). Finally, Cicchetti and Feinstein (1990) proposed specific agreement as a solution to the problems and paradoxes associated with chance-adjusted reliability indexes. It is worth noting that positive agreement (PA) for two raters is equivalent to the F1 score commonly used in computer science (Van Rijsbergen, 1979). As far as I know, I am the first person (here) to generalize the specific agreement coefficient to accept any weighting scheme.
- mSPECIFIC %Calculates SA using vectorized formulas
Use these formulas with two raters and two (dichotomous) categories:
is the number of items that both raters assigned to category 1
is the number of items that rater 1 assigned to category 1 and rater 2 assigned to category 2
is the number of items that rater 1 assigned to category 2 and rater 2 assigned to category 1
is the number of items that both raters assigned to category 2
Use this formula with multiple raters, multiple categories, and missing data:
is the number of items that were coded by two or more raters
is the number of raters that assigned item to category
is the number of raters that assigned item to any category
- Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.
- Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. E., & Erbaugh, J. K. (1962). Reliability of psychiatric diagnosis. 2. A study of consistency of clinical judgments and ratings. The American Journal of Psychiatry, 119, 351–357.
- Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the reliability of psychiatric diagnosis. British Journal of Psychiatry, 125, 341–7.
- Uebersax, J. S. (1982). A design-independent method for measuring the reliability of psychiatric diagnosis. Journal of Psychiatric Research, 17(4), 335–342.
- Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551-558.
- Van Rijsbergen, C. J. (1979). Information Retrieval (2nd edition). Butterworth.