The Rand Index (RI) is a measure that quantifies the similarity between two clusterings by counting the number of pairs of data points that are either in the same cluster or in different clusters in both the true and predicted clusterings. The adjusted version, ARI, corrects for the expected similarity under random chance.


$$ ARI = \frac{{\text{RI} - \text{Expected_RI}}}{{\text{max(RI_max - Expected_RI, 0)}}} $$

The ARI value ranges from -1 to 1:
- ARI = 1 indicates perfect similarity between the two clusterings.
- ARI = 0 indicates that the clustering results are no better than random.
- ARI < 0 suggests that the clusterings are less similar than expected by chance.

1. Contingency Matrix:
    - counts the number of samples that are assigned to the same or different clusters in both the true and predicted clusterings.
2. Rand Index Calculation:
    - The Rand Index measures the similarity between the true and predicted clusterings.
3. Expected Rand Index
4. Adjusted Rand Index

In [20]:
def adjusted_rand_index(true_labels, predicted_labels):
    n = len(true_labels)

    contingency_table = [[0 for _ in range(n)] for _ in range(n)]
    
    for i in range(n):
        for j in range(n):
            contingency_table[i][j] = (true_labels[i] == true_labels[j]) and (predicted_labels[i] == predicted_labels[j])

    a = sum(sum(row) * (sum(row) - 1) / 2 for row in contingency_table)
    b = sum(sum(col) * (sum(col) - 1) / 2 for col in zip(*contingency_table))
    c = n * (n - 1) / 2 - a
    d = n * (n - 1) / 2 - b

    numerator = a + b
    denominator = a + b + c + d

    if denominator == 0:
        return 0.0  # Avoid division by zero

    return (numerator - (a * c + b * d) / denominator) / (0.5 * (a + b + c + d))

Adjusted Rand Index: 0.5471938775510204
