# Calculate Krippendorff's alpha for Inter-Rater Reliability

#### All imports go here

In [1]:
import krippendorff
import pandas as pd
import numpy as np
from functools import reduce
from krippendorff import alpha
from agreement.metrics import cohens_kappa, krippendorffs_alpha

### Data collection 
- We designed an experiment to understand how human perceive non-sensationalised headlines compared to sensationalised headlines. The purpose of this survey study was to find different perceptions on sensationalised and non-sensationalised headlines. 
- The total no. of participants was 3, and the survey was fully anonymous. 
- In this survey participants had 20 headlines and categorize the headlines by labeling as "sensationalised" or "Non- sensationalised'. The information collected in use of spreedsheet.
- The [[dataset collection spreedsheet in excel]](https://docs.google.com/spreadsheets/d/1-Bch4UgbQ5lSDosEB_4kJHDn4oTSAheKhmqTmE28F3c/edit?usp=sharing) and the [[dataset1]](https://github.com/sifat-e-noor/Experiement_in_Cognitive_Science/blob/main/P1_%20Inter-Rater%20Reliability%20-%20Participant%201.csv), [[dataset2]](https://github.com/sifat-e-noor/Experiement_in_Cognitive_Science/blob/main/P2_%20Inter-Rater%20Reliability%20-%20Participant%202.csv), [[dataset3]](https://github.com/sifat-e-noor/Experiement_in_Cognitive_Science/blob/main/P2_%20Inter-Rater%20Reliability%20-%20Participant%202.csv) can be found on the hyperlink. 

In this notebook, our goal is to determine the inter-rater reliability of their categorizations using Krippendorff's alpha.

### Data preperation

In [2]:
# Fetch data from the given source
df1 = pd.read_csv('https://raw.githubusercontent.com/sifat-e-noor/Experiement_in_Cognitive_Science/main/P1_%20Inter-Rater%20Reliability%20-%20Participant%201.csv')
df2 = pd.read_csv('https://raw.githubusercontent.com/sifat-e-noor/Experiement_in_Cognitive_Science/main/P2_%20Inter-Rater%20Reliability%20-%20Participant%202.csv')
df3 = pd.read_csv('https://raw.githubusercontent.com/sifat-e-noor/Experiement_in_Cognitive_Science/main/P3_%20Inter-Rater%20Reliability%20-%20Participant%203.csv')

In [3]:
# Rename dataframe's columns
df1.rename(columns={'Categorize': 'Person1'}, inplace=True)
df2.rename(columns={'Categorize': 'Person2'}, inplace=True)
df3.rename(columns={'Categorize': 'Person3'}, inplace=True)

In [4]:
# Create an array of dataframes
dfs = [df1, df2, df3]

In [5]:
# Merge dataframes
final_df = reduce(lambda  left,right: pd.merge(left,right,on=['Headline'],
                                            how='outer'), dfs)
final_df

Unnamed: 0,Headline,Person1,Person2,Person3
0,Psychopathic Tendencies Help Some People Succe...,Non-sensationalised,Sensationalised,Non-sensationalised
1,Do We Actually ‘Hear’ Silence?,Sensationalised,Non-sensationalised,Sensationalised
2,Forgotten Memories May Remain Intact in the Brain,Sensationalised,Sensationalised,Non-sensationalised
3,AI Anxiety’ Is on the Rise - Here’s How to Man...,Non-sensationalised,Sensationalised,Sensationalised
4,Why Do We Forget So Many of Our Dreams?,Non-sensationalised,Sensationalised,Non-sensationalised
5,Brain Waves Synchronize when People Interact,Non-sensationalised,Sensationalised,Non-sensationalised
6,The Heart Can Sway Our Perception of Time,Sensationalised,Non-sensationalised,Sensationalised
7,Why Kids Are Afraid to Ask for Help,Sensationalised,Sensationalised,Non-sensationalised
8,"If AI Becomes Conscious, Here’s How We Can Tell",Non-sensationalised,Sensationalised,Sensationalised
9,Brain-Reading Devices Allow Paralyzed People t...,Non-sensationalised,Non-sensationalised,Non-sensationalised


In [6]:
def replace_with_headlines(column):
    # Initialize a counter to keep track of the incremental number
    counter = 0

    # Define a function to replace the values
    def replace_value(value):
        nonlocal counter
        counter += 1
        return f'Headline{counter}'

    # Apply the replace_value function to each element in the column
    return column.apply(replace_value)

# Apply the function to the 'Headline' column
final_df['Headline'] = replace_with_headlines(final_df['Headline'])

# Display the DataFrame with updated 'Headline' column
final_df

Unnamed: 0,Headline,Person1,Person2,Person3
0,Headline1,Non-sensationalised,Sensationalised,Non-sensationalised
1,Headline2,Sensationalised,Non-sensationalised,Sensationalised
2,Headline3,Sensationalised,Sensationalised,Non-sensationalised
3,Headline4,Non-sensationalised,Sensationalised,Sensationalised
4,Headline5,Non-sensationalised,Sensationalised,Non-sensationalised
5,Headline6,Non-sensationalised,Sensationalised,Non-sensationalised
6,Headline7,Sensationalised,Non-sensationalised,Sensationalised
7,Headline8,Sensationalised,Sensationalised,Non-sensationalised
8,Headline9,Non-sensationalised,Sensationalised,Sensationalised
9,Headline10,Non-sensationalised,Non-sensationalised,Non-sensationalised


### Agreement Matrix
- Create an agreement matrix to represent the level of agreement between participants for each headline. This matrix shows which participants agreed on the category for each headline.

In [7]:
# Remove column name 'Headline'
temp_df = final_df.drop(['Headline'], axis=1)

In [8]:
# Create agreement matrix
agreement_matrix = pd.DataFrame(index=final_df.index, columns=["Headline", "Agreement", "Disagreement"])

agreement_matrix['Headline'] = replace_with_headlines(agreement_matrix['Headline'])

# Loop through each headline
for idx, row in temp_df.iterrows():
    # List to store the raters who agreed
    agreement_raters = []  
    # List to store the raters who disagreed
    disagreement_raters = []  
    for rater in temp_df.columns:
        if row[rater] == row[0]:  # Check if the rating matches the first rater's rating
            agreement_raters.append(rater)
        else:
            disagreement_raters.append(rater)
    agreement_matrix.loc[idx, "Agreement"] = ', '.join(agreement_raters)
    agreement_matrix.loc[idx, "Disagreement"] = ', '.join(disagreement_raters)

# Display the coding agreement matrix
print("Coding Agreement Matrix:")
agreement_matrix

Coding Agreement Matrix:


Unnamed: 0,Headline,Agreement,Disagreement
0,Headline1,"Person1, Person3",Person2
1,Headline2,"Person1, Person3",Person2
2,Headline3,"Person1, Person2",Person3
3,Headline4,Person1,"Person2, Person3"
4,Headline5,"Person1, Person3",Person2
5,Headline6,"Person1, Person3",Person2
6,Headline7,"Person1, Person3",Person2
7,Headline8,"Person1, Person2",Person3
8,Headline9,Person1,"Person2, Person3"
9,Headline10,"Person1, Person2, Person3",


### Calculate Krippendorff's Alpha
- To calculate Krippendorff's alpha, we need to compute both the observed agreement and the expected agreement for our data. - -- Once we have these values, we will use the genera lformula for Krippendorff's alpha for nominal data:

$$α = 1 - \frac{\text{Expected Agreement}}{\text{Observed Agreement}}$$

- And the formula for expected agreement:

   $$ E = \frac{K}{N(N-1)} \sum_{i=1}^{K} n_i(n_i - 1) $$


where,
- $E$ is the expected agreement.
- $N$ is the total number of items (rows).
- $K$ is the total number of categories.
- $n_i$ is the number of items coded as category $i$




Steps to calculate Krippendorff's Alpha 
- We first calculated the observed agreement, which is the proportion of items where all particiants agree.
- We calculated the expected agreement using the formula appropriate for nominal data, as shown per our dataset's value.
- Finally, we used the formula for Krippendorff's alpha to compute the alpha value.

In [9]:
# Calculate the observed agreement
observed_agreement = (final_df['Person1'] == final_df['Person2']) & (final_df['Person2'] == final_df['Person3'])

# Calculate the proportion of agreement
proportion_agreement = observed_agreement.sum() / len(final_df)

print(f'Observed Agreement: {proportion_agreement:.2f}')

Observed Agreement: 0.10


In [10]:
# Get the total number of items
N = len(final_df)

# Count the number of items in each category
categories = final_df['Person1'].unique()
category_counts = final_df['Person1'].value_counts()

# Calculate the expected agreement for nominal data
expected_agreement = sum((category_counts * (category_counts - 1)) / (N * (N - 1)))

print(f'Expected Agreement: {expected_agreement:.2f}')

Expected Agreement: 0.49


In [11]:
# Calculate Krippendorff's alpha
alpha = 1 - (proportion_agreement / expected_agreement)

print(f"Krippendorff's Alpha: {alpha:.2f}")

Krippendorff's Alpha: 0.80


### Interpreatation
Interpretation: The resulting Krippendorff's alpha value will fall between -1 and 1:

   - A value close to 1 indicates high agreement among raters.
   - A value close to -1 suggests high disagreement.
   - A value around 0 indicates no agreement beyond what would be expected by chance.
   

After performing the calculations, we obtain a Krippendorff's alpha value of approximately 0.80.
A Krippendorff's alpha of 0.80 indicates acceptable inter-rater reliability among the three participants in categorizing headlines. This suggests that there is a satisfactory level of inter-rater reliability, with some level of disagreement among the particpiants.