# Krippendorff’s Alpha

## What is Krippendorff’s Alpha Coefficient?

Krippendorff’s alpha (α) is a statistical measure of inter-rater agreement. It assesses how consistently different raters evaluate the same items, accounting for the disagreement that could occur by chance. It is defined by the formula:

$$
\alpha = 1 - \frac{D_o}{D_e}
$$

Where:
- **Dₒ** is the *observed disagreement* (based on the actual data).
- **Dₑ** is the *expected disagreement* (the disagreement expected by chance).

## Interpretation

- **Perfect agreement**  
  If there is no observed disagreement ($D_o = 0$):
  
  $$
  \alpha = 1 - \frac{0}{D_e} = 1
  $$

- **Agreement no better than chance**  
  If the observed disagreement equals the expected disagreement by chance ($D_o = D_e$):
  
  $$
  \alpha = 1 - \frac{D_e}{D_e} = 1 - 1 = 0
  $$

- **Systematic disagreement (worse than chance)**  
  If the observed disagreement exceeds the expected disagreement by chance ($D_o > D_e$):
  
  $$
  \alpha = 1 - \frac{D_o}{D_e} < 0
  $$


### References
[Computing Krippendorff’s Alpha-Reliability](https://www.asc.upenn.edu/sites/default/files/2021-03/Computing%20Krippendorff%27s%20Alpha-Reliability.pdf)

[Wikipedia](https://en.wikipedia.org/wiki/Krippendorff%27s_alpha)

In [40]:
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install krippendorff
!{sys.executable} -m pip install tabulate


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/4.4.2_1/libexec/bin/python -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/4.4.2_1/libexec/bin/python -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/4.4.2_1/libexec/bin/python -m pip install --upgrade pip[0m
Collecting tabulate
  Downloading tabulate-0.9.0-py3-no

In [6]:
import numpy as np
import pandas as pd
import string
import krippendorff

In [9]:
# raw annotations

annA = [1,2,3,3,2,1,4,1,2,False,False,False]
annB = [1,2,3,3,2,2,4,1,2,5,False,3] 
annC = [False,3,3,3,2,3,4,2,2,5,1,False]
annD = [1,2,3,3,2,4,4,1,2,5,1,False]

In [10]:
# reliability data matrix

data = [annA, annB, annC, annD]
units = ["text1", "text2", "text3", "text4", "text5", "text6", "text7", "text8", "text9", "text10", "text11", "text12"]
annotators = ["annA", "annB", "annC", "annD"]
dfrel = pd.DataFrame(data, columns=units, index=annotators)
dfrel

Unnamed: 0,text1,text2,text3,text4,text5,text6,text7,text8,text9,text10,text11,text12
annA,1,2,3,3,2,1,4,1,2,False,False,False
annB,1,2,3,3,2,2,4,1,2,5,False,3
annC,False,3,3,3,2,3,4,2,2,5,1,False
annD,1,2,3,3,2,4,4,1,2,5,1,False


In [11]:
# values by units matrix
# how many times each value was assigned per unit

values = sorted(set([v for value in dfrel.values for v in value if v != False]))
print(f"values: {values}")

data = [[list(dfrel[col].values).count(value) for value in values] for col in dfrel.columns]
dfvu = pd.DataFrame(data)
dfvu = dfvu.T
dfvu.columns = units
dfvu.index = values
dfvu

values: [1, 2, 3, 4, 5]


Unnamed: 0,text1,text2,text3,text4,text5,text6,text7,text8,text9,text10,text11,text12
1,3,0,0,0,0,1,0,3,0,0,2,0
2,0,3,0,0,4,1,0,1,4,0,0,0
3,0,1,4,4,0,1,0,0,0,0,0,1
4,0,0,0,0,0,1,4,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,3,0,0


In [12]:
# dropping columns with not pairing units (e.g. text12)
not_paired_units = []
for col in dfvu.columns:
    if dfvu[col].sum() <= 1:
        not_paired_units.append(col)

dfvu = dfvu.drop(col, axis=1)
dfvu

Unnamed: 0,text1,text2,text3,text4,text5,text6,text7,text8,text9,text10,text11
1,3,0,0,0,0,1,0,3,0,0,2
2,0,3,0,0,4,1,0,1,4,0,0
3,0,1,4,4,0,1,0,0,0,0,0
4,0,0,0,0,0,1,4,0,0,0,0
5,0,0,0,0,0,0,0,0,0,3,0


In [13]:
dfvu["N"] = dfvu.sum(axis=1) # total annotations per value
dfvu.loc[len(dfvu)+1] = dfvu.sum(axis=0) # total annotations per unit

dfvu.rename({len(dfvu):"total_unit"}, inplace=True)
dfvu

Unnamed: 0,text1,text2,text3,text4,text5,text6,text7,text8,text9,text10,text11,N
1,3,0,0,0,0,1,0,3,0,0,2,9
2,0,3,0,0,4,1,0,1,4,0,0,13
3,0,1,4,4,0,1,0,0,0,0,0,10
4,0,0,0,0,0,1,4,0,0,0,0,5
5,0,0,0,0,0,0,0,0,0,3,0,3
total_unit,3,4,4,4,4,4,4,4,4,3,2,40


In [14]:
# nominal metric differences
data = []
for c in values:
    data.append([])
    for k in values:
        if c == k:
            data[-1].append(0)
        else:
            data[-1].append(1)
        
diffdf = pd.DataFrame(data, columns=values, index=values)
diffdf

Unnamed: 0,1,2,3,4,5
1,0,1,1,1,1
2,1,0,1,1,1
3,1,1,0,1,1
4,1,1,1,0,1
5,1,1,1,1,0


In [35]:
# ordinal metric differences
data = []
for c in values:
    data.append([])
    for k in values:
        if c == k:
            data[-1].append(0)
        else:
            n_g = sum([dfvu.loc[i, "N"] for i in range(c, k+1)])
            data[-1].append((n_g - ((dfvu.loc[c, "N"] + dfvu.loc[k, "N"])/2))**2)
        
diffdf = pd.DataFrame(data, columns=values, index=values)
diffdf

22
32
37
40
0
23
28
31
0
0
15
18
0
0
0
8
0
0
0
0


Unnamed: 0,1,2,3,4,5
1,0.0,121.0,506.25,900.0,1156.0
2,121.0,0.0,132.25,361.0,529.0
3,90.25,132.25,0.0,56.25,132.25
4,49.0,81.0,56.25,0.0,16.0
5,36.0,64.0,42.25,16.0,0.0


In [23]:
# interval metric differences (doesn't work with strings)
data = []
for c in values:
    data.append([])
    for k in values:
        if c == k:
            data[-1].append(0)
        else:
            data[-1].append((c-k)**2) # cannot substract strings!
        
diffdf = pd.DataFrame(data, columns=values, index=values)
diffdf

Unnamed: 0,1,2,3,4,5
1,0,1,4,9,16
2,1,0,1,4,9
3,4,1,0,1,4
4,9,4,1,0,1
5,16,9,4,1,0


In [24]:
def difference(c,k,measure, dfvu):
    if measure == 'nominal':
        return 0 if c == k else 1

    if measure == 'ordinal':
        n_g = sum([dfvu.loc[i, "N"] for i in range(c, k+1)])
        return (n_g - ((dfvu.loc[c, "N"] + dfvu.loc[k, "N"])/2))**2

    if measure == 'interval':
        return (c-k)**2

In [25]:
measure = 'nominal'

In [37]:
D_o = 0
for unit in dfvu.columns.values[:-1]:
    n_unit_dot = dfvu.loc["total_unit", unit]
    sum_ck = 0
    for c in range(1, dfvu.shape[0]):
        for k in range(c+1, dfvu.shape[0]):
            n_uc = dfvu.loc[c,unit]
            n_uk = dfvu.loc[k,unit]
            delta_ck = difference(c,k,measure,dfvu)
            sum_ck += n_uc * n_uk * delta_ck
    D_o += (1/(n_unit_dot - 1)) * sum_ck
D_o  # 1, 1; 1, 2; 1,3; 1,4; 1,5; 2,1

np.float64(4.0)

In [32]:
D_e = 0
for c in range(1, dfvu.shape[0]):
    for k in range(c+1, dfvu.shape[0]):
        n_c = dfvu.loc[c, "N"]
        n_k = dfvu.loc[k, "N"]
        D_e += n_c * n_k * difference(c,k,measure,dfvu)
D_e

np.int64(608)

In [33]:
n_dotdot = dfvu.loc["total_unit", "N"]
n_dotdot

np.int64(40)

In [34]:
alpha = 1 - ((n_dotdot-1)*(D_o/D_e))
alpha

np.float64(0.743421052631579)

## All in one go

In [38]:
def difference(c,k,measure, dfvu):
    if measure == 'nominal':
        return 0 if c == k else 1

    if measure == 'ordinal':
        c = dfvu.index.get_loc(c)
        k = dfvu.index.get_loc(k)
        n_g = sum([dfvu.loc[dfvu.index.values[i], "N"] for i in range(c, k+1)])
        return (n_g - ((dfvu.loc[dfvu.index.values[c], "N"] + dfvu.loc[dfvu.index.values[k], "N"])/2))**2

    if measure == 'interval': # strings only!
        return (c-k)**2
        
def compute_alpha(annotations, measure, value_domain=None):
    # reliability data matrix
    units = [f"text{i+1}" for i in range(len(annotations[0]))]
    annotators = [f"ann{string.ascii_uppercase[i]}" for i in range(len(annotations))]
    dfrel = pd.DataFrame(annotations, columns=units, index=annotators)

    # values by units matrix
    values = set([v for row in dfrel.values for v in row if pd.notna(v)])
    if measure == 'nominal':
        values = sorted(values)
    else:
        values = value_domain # ordinal and interval require a predefined order
        
    data = [[list(dfrel[col].values).count(value) for value in values] for col in dfrel.columns]
    dfvu = pd.DataFrame(data)
    dfvu = dfvu.T
    dfvu.columns = units
    dfvu.index = values
    
    # dropping columns with not pairing units (e.g. text12)
    not_paired_units = []
    for col in dfvu.columns:
        if dfvu[col].sum() <= 1:
            not_paired_units.append(col)
    dfvu = dfvu.drop(not_paired_units, axis=1)
    
    # total annotations
    dfvu["N"] = dfvu.sum(axis=1) # total annotations per value
    dfvu.loc[len(dfvu)+1] = dfvu.sum(axis=0) # total annotations per unit
    dfvu.rename({len(dfvu):"total_unit"}, inplace=True)
        
    D_o = 0
    n_dotdot = 0
    for unit in dfvu.columns.values[:-1]:
        n_unit_dot = dfvu.loc["total_unit", unit]

        n_dotdot += n_unit_dot
        
        sum_ck = 0
        for i in range(dfvu.shape[0]-1):
            c = dfvu.index[i]
            for j in range(i+1, dfvu.shape[0]-1):
                k = dfvu.index[j]
                n_uc = dfvu.loc[c,unit]
                n_uk = dfvu.loc[k,unit]
                delta_ck = difference(c,k,measure,dfvu)
                sum_ck += n_uc * n_uk * delta_ck
        D_o += (1/(n_unit_dot - 1)) * sum_ck
    
    D_e = 0
    for i in range(dfvu.shape[0]):
        c = dfvu.index[i]
        for j in range(i+1, dfvu.shape[0]-1):
            k = dfvu.index[j]
            
            n_c = dfvu.loc[c, "N"]
            n_k = dfvu.loc[k, "N"]
            D_e += n_c * n_k * difference(c,k,measure,dfvu)
    
    alpha = 1 - ((n_dotdot-1)*(D_o/D_e))
    return alpha, dfrel, dfvu, D_o, D_e

In [50]:
annA = [1,2,3,3,2,1,4,1,2,np.nan,np.nan,np.nan]
annB = [1,2,3,3,2,2,4,1,2,5,np.nan,3] 
annC = [np.nan,3,3,3,2,3,4,2,2,5,1,np.nan]
annD = [1,2,3,3,2,4,4,1,2,5,1,np.nan]
data = [annA, annB, annC, annD]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'nominal')
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement="nominal"))
print()
#print(dfrel.to_markdown())
print()
#print(dfvu.to_markdown())

alpha 0.743421052631579
D_o 4.0
D_e 608
krippendorff module 0.743421052631579




In [51]:
annA = [1,1,1,1,1,1,1,2,2,2,2]
annB = [1,1,1,1,1,1,2,2,2,2,2]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'nominal')
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement="nominal"))
print()
#print(dfrel.to_markdown())
print()
#print(dfvu.to_markdown())

alpha 0.8205128205128205
D_o 1.0
D_e 117
krippendorff module 0.8205128205128205




In [42]:
data = [
    [np.nan, np.nan, np.nan, np.nan, np.nan, 3, 4, 1, 2, 1, 1, 3, 3, np.nan, 3],
    [1, np.nan, 2, 1, 3, 3, 4, 3, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
    [np.nan, np.nan, 2, 1, 3, 4, 4, np.nan, 2, 1, 1, 3, 3, np.nan, 4]
]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'nominal')
print("alpha", alpha)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='nominal'))
print()

alpha 0.691358024691358
krippendorff module 0.691358024691358



In [43]:
data = [
    [np.nan, np.nan, np.nan, np.nan, np.nan, 3, 4, 1, 2, 1, 1, 3, 3, np.nan, 3],
    [1, np.nan, 2, 1, 3, 3, 4, 3, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
    [np.nan, np.nan, 2, 1, 3, 4, 4, np.nan, 2, 1, 1, 3, 3, np.nan, 4]
]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'ordinal', value_domain=[1,2,3,4])
print("alpha", alpha)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='ordinal', value_domain=[1,2,3,4]))
print()

alpha 0.8067214199413153
krippendorff module 0.8067214199413153



In [44]:
annA = ["low", "low", "high"]
annB = ["high", "low", "high"]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'nominal')
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='nominal'))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha 0.4444444444444444
D_o 1.0
D_e 9
krippendorff module 0.4444444444444444

|      | text1   | text2   | text3   |
|:-----|:--------|:--------|:--------|
| annA | low     | low     | high    |
| annB | high    | low     | high    |

|            |   text1 |   text2 |   text3 |   N |
|:-----------|--------:|--------:|--------:|----:|
| high       |       1 |       0 |       2 |   3 |
| low        |       1 |       2 |       0 |   3 |
| total_unit |       2 |       2 |       2 |   6 |


In [45]:
annA = ["low", "low", "high"]
annB = ["high", "low", "high"]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'ordinal', value_domain=['low', 'high'])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='ordinal', value_domain=['low', 'high']))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha 0.4444444444444444
D_o 9.0
D_e 81.0
krippendorff module 0.4444444444444444

|      | text1   | text2   | text3   |
|:-----|:--------|:--------|:--------|
| annA | low     | low     | high    |
| annB | high    | low     | high    |

|            |   text1 |   text2 |   text3 |   N |
|:-----------|--------:|--------:|--------:|----:|
| low        |       1 |       2 |       0 |   3 |
| high       |       1 |       0 |       2 |   3 |
| total_unit |       2 |       2 |       2 |   6 |


In [52]:
annA = ["low", "low", "low", "low", "high"]
annB = ["high", "low", "high", "low", "high"]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'ordinal', value_domain=['low', 'high'])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='ordinal', value_domain=['low', 'high']))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha 0.25
D_o 50.0
D_e 600.0
krippendorff module 0.2499999999999999

|      | text1   | text2   | text3   | text4   | text5   |
|:-----|:--------|:--------|:--------|:--------|:--------|
| annA | low     | low     | low     | low     | high    |
| annB | high    | low     | high    | low     | high    |

|            |   text1 |   text2 |   text3 |   text4 |   text5 |   N |
|:-----------|--------:|--------:|--------:|--------:|--------:|----:|
| low        |       1 |       2 |       1 |       2 |       0 |   6 |
| high       |       1 |       0 |       1 |       0 |       2 |   4 |
| total_unit |       2 |       2 |       2 |       2 |       2 |  10 |


In [47]:
annA = [1,1,1,2,2,4,4]
annB = [1,1,3,3,4,4,4]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'interval', value_domain=[1,2,3,4])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='interval', value_domain=[1,2,3,4]))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha 0.6443768996960486
D_o 9.0
D_e 329
krippendorff module 0.6443768996960486

|      |   text1 |   text2 |   text3 |   text4 |   text5 |   text6 |   text7 |
|:-----|--------:|--------:|--------:|--------:|--------:|--------:|--------:|
| annA |       1 |       1 |       1 |       2 |       2 |       4 |       4 |
| annB |       1 |       1 |       3 |       3 |       4 |       4 |       4 |

|            |   text1 |   text2 |   text3 |   text4 |   text5 |   text6 |   text7 |   N |
|:-----------|--------:|--------:|--------:|--------:|--------:|--------:|--------:|----:|
| 1          |       2 |       2 |       1 |       0 |       0 |       0 |       0 |   5 |
| 2          |       0 |       0 |       0 |       1 |       1 |       0 |       0 |   2 |
| 3          |       0 |       0 |       1 |       1 |       0 |       0 |       0 |   2 |
| 4          |       0 |       0 |       0 |       0 |       1 |       2 |       2 |   5 |
| total_unit |       2 |       2 |       2 |       2 |  

In [48]:
annA = [1,1,1,1,1]
annB = [1,1,2,2,2]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'interval', value_domain=[1,2,3,4])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='interval', value_domain=[1,2,3,4]))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha -0.2857142857142856
D_o 3.0
D_e 21
krippendorff module -0.2857142857142856

|      |   text1 |   text2 |   text3 |   text4 |   text5 |
|:-----|--------:|--------:|--------:|--------:|--------:|
| annA |       1 |       1 |       1 |       1 |       1 |
| annB |       1 |       1 |       2 |       2 |       2 |

|            |   text1 |   text2 |   text3 |   text4 |   text5 |   N |
|:-----------|--------:|--------:|--------:|--------:|--------:|----:|
| 1          |       2 |       2 |       1 |       1 |       1 |   7 |
| 2          |       0 |       0 |       1 |       1 |       1 |   3 |
| 3          |       0 |       0 |       0 |       0 |       0 |   0 |
| 4          |       0 |       0 |       0 |       0 |       0 |   0 |
| total_unit |       2 |       2 |       2 |       2 |       2 |  10 |


In [49]:
annA = [1,1,1,1,1]
annB = [1,1,1,2,2]
data = [annA, annB]

alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'interval', value_domain=[1,2])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='interval', value_domain=[1,2]))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha -0.125
D_o 2.0
D_e 16
krippendorff module -0.125

|      |   text1 |   text2 |   text3 |   text4 |   text5 |
|:-----|--------:|--------:|--------:|--------:|--------:|
| annA |       1 |       1 |       1 |       1 |       1 |
| annB |       1 |       1 |       1 |       2 |       2 |

|            |   text1 |   text2 |   text3 |   text4 |   text5 |   N |
|:-----------|--------:|--------:|--------:|--------:|--------:|----:|
| 1          |       2 |       2 |       2 |       1 |       1 |   8 |
| 2          |       0 |       0 |       0 |       1 |       1 |   2 |
| total_unit |       2 |       2 |       2 |       2 |       2 |  10 |


In [168]:
# Alpha is high when Do (observed disagreement) is low and De (expected disagreement) is high.
# For De, the value pairs with a span of 2 or 3 (such as value pairs 1/3 or 1/4) occur together 
# are weighted highest (or penalized most). In our data, however, 3 is missing entirely, 
# and 4 only occurs twice, so that the large, highly weighted spans do not occur often, 
# which leads to a rather low number in the denominator (Pe). 
# On the other hand, observed disagreement is large, because the only values occuring in 
# the matrix are disagreements.

# Alpha increases if there are more agreements in the annotation data (case 1) or more values that lead to large spans in possible value pairs are used (case 2).
annA = [1,1,2,2]
annB = [2,2,4,4]
data = [annA, annB]
alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'interval', value_domain=[1,2,3,4])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='interval', value_domain=[1,2,3,4]))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha 0.07894736842105265
D_o 10.0
D_e 76
krippendorff module 0.07894736842105254

|      |   text1 |   text2 |   text3 |   text4 |
|:-----|--------:|--------:|--------:|--------:|
| annA |       1 |       1 |       2 |       2 |
| annB |       2 |       2 |       4 |       4 |

|            |   text1 |   text2 |   text3 |   text4 |   N |
|:-----------|--------:|--------:|--------:|--------:|----:|
| 1          |       1 |       1 |       0 |       0 |   2 |
| 2          |       1 |       1 |       1 |       1 |   4 |
| 3          |       0 |       0 |       0 |       0 |   0 |
| 4          |       0 |       0 |       1 |       1 |   2 |
| total_unit |       2 |       2 |       2 |       2 |   8 |


In [140]:
# case 1 (more agreements): 
annA = [1,1,2,2]
annB = [1,2,4,4]
data = [annA, annB]
alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'interval', value_domain=[1,2,3,4])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='interval', value_domain=[1, 2, 3, 4]))
#print()
#print(dfrel.to_markdown())
#print()
#print(dfvu.to_markdown())


alpha 0.27586206896551724
D_o 9.0
D_e 87
krippendorff module 0.27586206896551724


In [143]:
# case 2 (large-span value pairs for Pe):
annA = [1,2,3,4]
annB = [2,3,4,5]
data = [annA, annB]
alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'ordinal', value_domain=[1,2,3,4, 5])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='ordinal', value_domain=[1, 2, 3, 4, 5]))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha 0.7299382716049383
D_o 12.5
D_e 324.0
krippendorff module 0.7299382716049383

|      |   text1 |   text2 |   text3 |   text4 |
|:-----|--------:|--------:|--------:|--------:|
| annA |       1 |       2 |       3 |       4 |
| annB |       2 |       3 |       4 |       5 |

|            |   text1 |   text2 |   text3 |   text4 |   N |
|:-----------|--------:|--------:|--------:|--------:|----:|
| 1          |       1 |       0 |       0 |       0 |   1 |
| 2          |       1 |       1 |       0 |       0 |   2 |
| 3          |       0 |       1 |       1 |       0 |   2 |
| 4          |       0 |       0 |       1 |       1 |   2 |
| 5          |       0 |       0 |       0 |       1 |   1 |
| total_unit |       2 |       2 |       2 |       2 |   8 |


In [167]:
# Related edge case problem: ties (annotators assign similar values to many items)
# Related to above, because there is no agreement, so the numerator Po will always be large.
# At the same time, De will be comparatively small, as not many diverse value types are present.

annA = [4,4,4,4]
annB = [1,1,1,1]
data = [annA, annB]
alpha, dfrel, dfvu, D_o, D_e = compute_alpha(data, 'ordinal', value_domain=[1,2, 3, 4])
print("alpha", alpha)
print("D_o", D_o)
print("D_e", D_e)
print('krippendorff module', krippendorff.alpha(reliability_data=data, level_of_measurement='ordinal', value_domain=[1, 2, 3, 4]))
print()
print(dfrel.to_markdown())
print()
print(dfvu.to_markdown())

alpha -0.75
D_o 64.0
D_e 256.0
krippendorff module -0.75

|      |   text1 |   text2 |   text3 |   text4 |
|:-----|--------:|--------:|--------:|--------:|
| annA |       4 |       4 |       4 |       4 |
| annB |       1 |       1 |       1 |       1 |

|            |   text1 |   text2 |   text3 |   text4 |   N |
|:-----------|--------:|--------:|--------:|--------:|----:|
| 1          |       1 |       1 |       1 |       1 |   4 |
| 2          |       0 |       0 |       0 |       0 |   0 |
| 3          |       0 |       0 |       0 |       0 |   0 |
| 4          |       1 |       1 |       1 |       1 |   4 |
| total_unit |       2 |       2 |       2 |       2 |   8 |


In [None]:
# To conclude: Krippendorffs alpha (ordinal) does not generally cope 
# poorly with shifted annotations, but it does cope poorly if there 
# are (shifted) ties in the annotations.