# Mutual Information

YT Video - https://www.youtube.com/watch?v=eJIp_mgVLwE&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=8

### Measuring Relationships in Mixed Data

Before diving into Mutual Information, let's understand the problem it solves. We often want to know which variables in our data are most related to the thing we want to predict.

* Feature selection is the process of identifying the most useful variables in a dataset to predict an outcome.
* R-squared measures the relationship between two continuous variables but doesn't apply to categorical data. Mutual Information, however, can quantify relationships between any variable types, categorical or continuous.

### Mutual Information
A numeric value that measures the dependence or relationship between two variables. It tells you how much information you gain about one variable by observing another.

* A mutual information score of 0 means the variables are completely independent. Knowing one tells you nothing about the other.
* A higher mutual information score means the variables are more dependent. Knowing one tells you a lot about the other.
* Crucially, it works with both discrete and continuous variables.

**Joint Probability** - the probability of two or more things happening at the same time. In the video's example, its the probability that someone both **Likes Popcorn** AND **Loves Troll 2**
* You canculate it directly from you data by counting how many times both conditions are met and dividing by the total number of data points.

 `p(A and B) = (Number of times A and B both happen) / (Total Observations)`


In [1]:
# Raw data as lists
likes_popcorn = ['Yes', 'Yes', 'Yes', 'No', 'No']
loves_troll_2 = ['Yes', 'Yes', 'Yes', 'No', 'Yes']

# p(Likes_Popcorn=Yes AND Loves_Troll_2=Yes)
# len counts the number of elements in the list
total_observations = len(likes_popcorn)

# sum counts the number of True values in the list
both_happen = sum(1 for i in range(total_observations) if likes_popcorn[i] == 'Yes' and loves_troll_2[i] == 'Yes')

# divide the number of times both conditions are true by the total number of observations
joint_prob = both_happen / total_observations
print(f"Joint Probability p(Likes Popcorn=Yes, Loves Troll 2=Yes): {joint_prob:.2f}")

Joint Probability p(Likes Popcorn=Yes, Loves Troll 2=Yes): 0.60


**Marginal Probability** - the probability of a single event occurring, regardless of the outcomes of other variables. For example, what is the overall probability that someone **Likes Popcorn**?
* We calculate this by ignoring the other columns. We just count the total number of times our event of interest occurs and divide by the total number of observations. It's called "marginal" because, in a probability table, these totals appear in the margins.


In [2]:
likes_popcorn = ['Yes', 'Yes', 'Yes', 'No', 'No']
total_observations = len(likes_popcorn)


# p(Likes_Popcorn=Yes)
likes_popcorn_count = likes_popcorn.count('Yes')
marginal_prob = likes_popcorn_count / total_observations

print(f"Marginal Probability p(Likes_Popcorn=Yes): {marginal_prob:.2f}")

Marginal Probability p(Likes_Popcorn=Yes): 0.60


### **The Formula:**
The equation for Mutual Information looks complex, but it's just a sum of the joint and marginal probabilities we just learned about.

Mutual Information = Σ Σ *p(x, y)* log( *p(x, y)* / (*p(x)* *p(y)*) )

In [3]:
import math

# Raw data from the video
likes_popcorn = ['Yes', 'Yes', 'Yes', 'No', 'No']
loves_troll_2 = ['Yes', 'Yes', 'Yes', 'No', 'Yes']
total = len(likes_popcorn)

mi = 0.0
# Iterate over all possible combinations of values ('Yes'/'No' for both)
for x_val in set(likes_popcorn): # ('Yes', 'No')
    for y_val in set(loves_troll_2): # ('Yes', 'No')

        # Calculate marginal probabilities
        p_x = likes_popcorn.count(x_val) / total
        p_y = loves_troll_2.count(y_val) / total

        # Calculate joint probability
        p_xy = sum(1 for i in range(total) if likes_popcorn[i] == x_val and loves_troll_2[i] == y_val) / total

        # Add term to MI if joint probability is not zero
        if p_xy > 0:
            mi += p_xy * math.log(p_xy / (p_x * p_y))

print(f"Calculated Mutual Information: {mi:.2f}")

Calculated Mutual Information: 0.22


The formula works by going through every single combination of outcomes. For each combination, it compares the actual joint probability (p(x,y)) with the expected probability if the variables were independent (which is p(x) * p(y)).

* If p(x,y) is much larger than p(x) * p(y), it means the events are happening together more often than by chance, suggesting a strong relationship.
* If they are very close, it suggests the variables are independent.

In [None]:
import math


def print_table(headers, rows):
    # Determine column widths based on headers and data
    widths = [len(h) for h in headers]
    for row in rows:
        for i, cell in enumerate(row):
            widths[i] = max(widths[i], len(cell))

    # Print header      
    header_line = " | ".join(f"{h:<{w}}" for h, w in zip(headers, widths))
    separator = "-+-".join("-" * w for w in widths)
    print(header_line)
    print(separator)
    # Print rows
    for row in rows:
        row_line = " | ".join(f"{c:<{w}}" for c, w in zip(row, widths))
        print(row_line)

likes_popcorn = ['Yes', 'Yes', 'Yes', 'No', 'No']
loves_troll_2 = ['Yes', 'Yes', 'Yes', 'No', 'Yes']
total = len(likes_popcorn)

# Calculate all probabilities
p_yes_yes = sum(1 for i in range(total) if likes_popcorn[i] == 'Yes' and loves_troll_2[i] == 'Yes') / total
p_yes_no = sum(1 for i in range(total) if likes_popcorn[i] == 'Yes' and loves_troll_2[i] == 'No') / total
p_no_yes = sum(1 for i in range(total) if likes_popcorn[i] == 'No' and loves_troll_2[i] == 'Yes') / total
p_no_no = sum(1 for i in range(total) if likes_popcorn[i] == 'No' and loves_troll_2[i] == 'No') / total
p_pop_yes, p_pop_no = likes_popcorn.count('Yes') / total, likes_popcorn.count('No') / total
p_troll_yes, p_troll_no = loves_troll_2.count('Yes') / total, loves_troll_2.count('No') / total

# Prepare data and print table
print("--- Joint and Marginal Probability Table ---")
headers = ["", "Likes Popcorn: Yes", "Likes Popcorn: No", "p(Loves Troll 2)"]
rows = [
    ["Loves Troll 2: Yes", f"{p_yes_yes:.2f}", f"{p_no_yes:.2f}", f"{p_troll_yes:.2f}"],
    ["Loves Troll 2: No", f"{p_yes_no:.2f}", f"{p_no_no:.2f}", f"{p_troll_no:.2f}"],
    ["p(Likes Popcorn)", f"{p_pop_yes:.2f}", f"{p_pop_no:.2f}", ""]
]
print_table(headers, rows)

# Calculate MI
mi = 0.0
probs = [[p_yes_yes, p_pop_yes, p_troll_yes], [p_yes_no, p_pop_yes, p_troll_no],
         [p_no_yes, p_pop_no, p_troll_yes], [p_no_no, p_pop_no, p_troll_no]]
for p_xy, p_x, p_y in probs:
    if p_xy > 0:
        mi += p_xy * math.log(p_xy / (p_x * p_y))

print(f"\nCalculated Mutual Information: {mi:.2f}")

--- Joint and Marginal Probability Table ---
                   | Likes Popcorn: Yes | Likes Popcorn: No | p(Loves Troll 2)
-------------------+--------------------+-------------------+-----------------
Loves Troll 2: Yes | 0.60               | 0.20              | 0.80            
Loves Troll 2: No  | 0.00               | 0.20              | 0.20            
p(Likes Popcorn)   | 0.60               | 0.40              |                 

Calculated Mutual Information: 0.22
