Compute Pointwise Mutual Information
Medium
NLP

Implement a function to compute the Pointwise Mutual Information (PMI) given the joint occurrence count of two events, their individual counts, and the total number of samples. PMI measures how much the actual joint occurrence of events differs from what we would expect by chance.

Example:
Input:
compute_pmi(50, 200, 300, 1000)
Output:
-0.263
Reasoning:
The PMI calculation compares the actual joint probability (50/1000 = 0.05) to the product of the individual probabilities (200/1000 * 300/1000 = 0.06). Thus, PMI = log₂(0.05 / (0.2 * 0.3)) ≈ -0.263, indicating the events co-occur slightly less than expected by chance.

In [None]:
import numpy as np

def compute_pmi(joint_counts, total_counts_x, total_counts_y, total_samples, round_to=3):
    """
    Compute Pointwise Mutual Information (PMI) between two events.

    Parameters
    ----------
    joint_counts : int
        Number of times x and y occurred together.
    total_counts_x : int
        Number of times x occurred.
    total_counts_y : int
        Number of times y occurred.
    total_samples : int
        Total number of observations.
    round_to : int, optional
        Decimal places to round the result (default=3).

    Returns
    -------
    float
        The PMI score.
    """
    # Probabilities
    p_xy = joint_counts / total_samples
    p_x = total_counts_x / total_samples
    p_y = total_counts_y / total_samples

    if p_xy == 0 or p_x == 0 or p_y == 0:
        return float("-inf")  # PMI undefined if probability zero

    pmi = np.log2(p_xy / (p_x * p_y))
    return round(pmi, round_to)

# Example
print(compute_pmi(50, 200, 300, 1000))  # -0.263

In [1]:
a = [1,2,3,4,5,1]
a.count(1)

2