# Alternative Dependence Measures

Having seen the limitations of Pearson correlation, we now explore more sophisticated measures of dependence that can capture non-linear relationships. In this notebook, we will cover:

1. **Mutual Information (MI)** - Information-theoretic measure of dependence
2. **Distance Correlation (dCor)** - A metric that detects any type of dependence
3. **Maximal Information Coefficient (MIC)** - Designed to find the "most interesting" relationships
4. **Randomized Dependence Coefficient (RDC)** - Computationally efficient approximation

We will compare these measures on various functional relationships and test their robustness to noise.

---

## Setup and Imports

In [None]:
# Environment configuration for Jupyter
%load_ext autoreload
%autoreload 2
%matplotlib inline

# Visualization libraries
import matplotlib.pyplot as plt
from IPython import display
import seaborn as sns
sns.set_theme(style="whitegrid")

# Numerical computing
import time
import numpy as np
from scipy.stats import multivariate_normal as mvn
import scipy.spatial.distance
import pandas as pd

# Machine learning utilities
from sklearn import linear_model
import sklearn.datasets as toy_datasets

# Dependence measure libraries
# Note: minepy doesn't work with Python 3.12, using fallback
# from minepy import MINE  # For MIC - commented out
import dcor  # For distance correlation

# Fallback MIC implementation using mutual information approximation
def _fallback_mic(x, y, bins=10):
    """Simple MIC approximation using binned mutual information."""
    import numpy as np
    from scipy.stats import entropy
    
    # Bin the data
    x_binned = np.digitize(x, np.linspace(x.min(), x.max(), bins))
    y_binned = np.digitize(y, np.linspace(y.min(), y.max(), bins))
    
    # Joint histogram
    joint_hist, _, _ = np.histogram2d(x_binned, y_binned, bins=bins)
    joint_hist = joint_hist / joint_hist.sum()
    
    # Marginals
    px = joint_hist.sum(axis=1)
    py = joint_hist.sum(axis=0)
    
    # Mutual information
    mi = 0
    for i in range(bins):
        for j in range(bins):
            if joint_hist[i,j] > 0 and px[i] > 0 and py[j] > 0:
                mi += joint_hist[i,j] * np.log(joint_hist[i,j] / (px[i] * py[j]))
    
    # Normalize to [0, 1] approximating MIC
    return min(1.0, mi / np.log(bins))

---

## Mutual Information

**Mutual Information** is an information-theoretic measure that quantifies the amount of information obtained about one random variable by observing another.

$$I(X;Y) = \int_y \int_x p_{X,Y}(x,y) \log \left( \frac{p_{X,Y}(x,y)}{p_X(x) \cdot p_Y(y)} \right) dx \, dy$$

Equivalently, it can be written as the Kullback-Leibler divergence between the joint distribution and the product of marginals:

$$I(X;Y) = D_{KL}(p_{X,Y} \| p_X \otimes p_Y)$$

**Properties:**
- $I(X;Y) \geq 0$, with equality if and only if $X$ and $Y$ are independent
- $I(X;Y) = I(Y;X)$ (symmetric)
- Not bounded above (unlike correlation)
- Captures any type of dependence, not just linear

---

## Implementation of Dependence Measures

### Randomized Dependence Coefficient (RDC)

In [None]:
def rdc(x, y, f=np.sin, k=20, s=1/6., n=1):
    """
    Computes the Randomized Dependence Coefficient.
    
    Reference:
    David Lopez-Paz, Philipp Hennig, Bernhard Schoelkopf
    "The Randomized Dependence Coefficient"
    http://papers.nips.cc/paper/5138-the-randomized-dependence-coefficient.pdf
    
    Parameters
    ----------
    x, y : numpy arrays
        1-D arrays of size (samples,) or 2-D arrays of size (samples, variables)
    f : function
        Non-linear function for random projection (default: np.sin)
    k : int
        Number of random projections to use
    s : float
        Scale parameter for random projections
    n : int
        Number of times to compute RDC and return median (for stability)
    
    Returns
    -------
    float
        The RDC value in [0, 1]
    """
    if n > 1:
        values = []
        for i in range(n):
            try:
                values.append(rdc(x, y, f, k, s, 1))
            except np.linalg.linalg.LinAlgError:
                pass
        return np.median(values)

    if len(x.shape) == 1:
        x = x.reshape((-1, 1))
    if len(y.shape) == 1:
        y = y.reshape((-1, 1))

    # Copula Transformation (convert to ranks)
    cx = np.column_stack([rankdata(xc, method='ordinal') for xc in x.T]) / float(x.size)
    cy = np.column_stack([rankdata(yc, method='ordinal') for yc in y.T]) / float(y.size)

    # Add bias term for affine projection
    O = np.ones(cx.shape[0])
    X = np.column_stack([cx, O])
    Y = np.column_stack([cy, O])

    # Random linear projections
    Rx = (s / X.shape[1]) * np.random.randn(X.shape[1], k)
    Ry = (s / Y.shape[1]) * np.random.randn(Y.shape[1], k)
    X = np.dot(X, Rx)
    Y = np.dot(Y, Ry)

    # Apply non-linear function
    fX = f(X)
    fY = f(Y)

    # Compute full covariance matrix
    C = np.cov(np.hstack([fX, fY]).T)

    # Find largest k with real-valued eigenvalues via binary search
    k0 = k
    lb = 1
    ub = k
    while True:
        Cxx = C[:k, :k]
        Cyy = C[k0:k0+k, k0:k0+k]
        Cxy = C[:k, k0:k0+k]
        Cyx = C[k0:k0+k, :k]

        eigs = np.linalg.eigvals(
            np.dot(np.dot(np.linalg.pinv(Cxx), Cxy),
                   np.dot(np.linalg.pinv(Cyy), Cyx))
        )

        if not (np.all(np.isreal(eigs)) and 0 <= np.min(eigs) and np.max(eigs) <= 1):
            ub -= 1
            k = (ub + lb) // 2
            continue
        if lb == ub:
            break
        lb = k
        if ub == lb + 1:
            k = ub
        else:
            k = (ub + lb) // 2

    return np.sqrt(np.max(eigs))


# Import rankdata for RDC
from scipy.stats import rankdata

### Mutual Information (Histogram-based Estimator)

In [None]:
from sklearn.metrics import mutual_info_score

def calc_MI(x, y, bins=50):
    """
    Estimate mutual information using histogram-based discretization.
    
    Parameters
    ----------
    x, y : array-like
        Input samples
    bins : int
        Number of bins for histogram discretization
    
    Returns
    -------
    float
        Estimated mutual information in nats
    """
    # Create 2D histogram (contingency table)
    c_xy = np.histogram2d(x, y, bins)[0]
    # Compute MI from contingency table
    mi = mutual_info_score(None, None, contingency=c_xy)
    return mi

### Distance Correlation

Distance correlation is a measure of dependence between random vectors that is zero if and only if the vectors are independent.

Reference: Szekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). "Measuring and Testing Dependence by Correlation of Distances".

In [None]:
def dCor(x, y):
    """
    Compute Distance Correlation between two samples.
    
    Distance correlation is zero if and only if the variables are independent.
    
    Reference:
    Szekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007)
    "Measuring and Testing Dependence by Correlation of Distances"
    https://arxiv.org/pdf/0803.4101.pdf
    
    Parameters
    ----------
    x, y : array-like
        Input samples
    
    Returns
    -------
    float
        Distance correlation in [0, 1]
    """
    return dcor.distance_correlation(x, y)

### Maximal Information Coefficient (MIC)

MIC is designed to capture a wide range of functional and non-functional relationships. It is based on mutual information but normalized to be in [0, 1].

Reference: Reshef et al. (2011). "Detecting Novel Associations in Large Data Sets".

In [None]:
def mic(x, y):
    """
    Compute the Maximal Information Coefficient (MIC) approximation.
    
    Note: Using mutual information-based approximation since minepy 
    doesn't compile with Python 3.12.
    
    Parameters
    ----------
    x, y : array-like
        Input samples
    
    Returns
    -------
    float
        MIC approximation value in [0, 1]
    """
    # Simple approximation using binned mutual information
    bins = 10
    x_binned = np.digitize(x, np.linspace(np.min(x), np.max(x), bins))
    y_binned = np.digitize(y, np.linspace(np.min(y), np.max(y), bins))
    
    # Joint histogram
    joint_hist, _, _ = np.histogram2d(x_binned, y_binned, bins=bins)
    joint_hist = joint_hist / joint_hist.sum()
    
    # Marginals
    px = joint_hist.sum(axis=1)
    py = joint_hist.sum(axis=0)
    
    # Mutual information
    mi = 0
    for ii in range(bins):
        for jj in range(bins):
            if joint_hist[ii,jj] > 0 and px[ii] > 0 and py[jj] > 0:
                mi += joint_hist[ii,jj] * np.log(joint_hist[ii,jj] / (px[ii] * py[jj]))
    
    # Normalize to [0, 1]
    return min(1.0, mi / np.log(bins))

---

## Comparing Dependence Measures

We will compare all four measures on different types of functional relationships:
- **Linear:** $y = x$
- **Quadratic:** $y = x^2$
- **Sinusoidal:** $y = \sin(x)$
- **Logarithmic:** $y = \log(x)$

### Test Data Visualization

In [None]:
# Define test relationships
# Linear: y = x
x = np.linspace(0, 4, 50)
y = x
t1 = (x, y)

# Quadratic: y = x^2
x = np.linspace(-2, 2, 50)
y = np.power(x, 2)
t2 = (x, y)

# Sinusoidal: y = sin(x)
x = np.linspace(0, 2 * np.pi, 50)
y = np.sin(x)
t3 = (x, y)

# Logarithmic: y = log(x)
x = np.linspace(0.01, 10, 50)
y = np.log(x)
t4 = (x, y)

# Visualize the test relationships
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, 1)
plt.plot(t1[0], t1[1], 'b-', linewidth=2)
plt.title('Linear: $y = x$')
plt.xlabel('x')
plt.ylabel('y')

plt.subplot(2, 2, 2)
plt.plot(t2[0], t2[1], 'b-', linewidth=2)
plt.title('Quadratic: $y = x^2$')
plt.xlabel('x')
plt.ylabel('y')

plt.subplot(2, 2, 3)
plt.plot(t3[0], t3[1], 'b-', linewidth=2)
plt.title('Sinusoidal: $y = \\sin(x)$')
plt.xlabel('x')
plt.ylabel('y')

plt.subplot(2, 2, 4)
plt.plot(t4[0], t4[1], 'b-', linewidth=2)
plt.title('Logarithmic: $y = \\log(x)$')
plt.xlabel('x')
plt.ylabel('y')

plt.tight_layout()

### Comparison Results

Now let's compute each dependence measure for all four relationships and compare.

In [None]:
# Define test vectors and measure functions
test_vectors = [t1, t2, t3, t4]
test_vectors_str = ['linear', 'quadratic', 'sinusoidal', 'logarithmic']
dependence_measures = [rdc, calc_MI, dCor, mic]
dependence_measures_str = ['RDC', 'MI', 'dCor', 'MIC']

# Compute all measures for all relationships
results = []
for ii, d in enumerate(dependence_measures):
    dep_str = dependence_measures_str[ii]
    for jj, t in enumerate(test_vectors):
        test_str = test_vectors_str[jj]
        x = t[0]
        y = t[1]
        r = d(x, y)
        result_dict = {'Measure': dep_str, 'Relationship': test_str, 'Value': r}
        results.append(result_dict)

# Create DataFrame and plot
df = pd.DataFrame(results)
g = sns.catplot(data=df, kind='bar', x='Relationship', y='Value', hue='Measure',
                height=5, aspect=1.5)
g.despine(left=True)
g.set_axis_labels("Dependence Relationship", "Measure Value")
g.legend.set_title("Measure")
plt.title('Comparison of Dependence Measures')

---

## Robustness to Noise

An important property of dependence measures is how they degrade as noise increases. We test this by adding Gaussian noise to a linear relationship and observing how each measure responds.

### Effect of Increasing Noise on Dependence Measures

In [None]:
# Test robustness to noise on linear relationship
test = 'linear'
kk = test_vectors_str.index(test)
test_fn = test_vectors[kk]
xx = test_fn[0]
yy = test_fn[1]

# Range of noise standard deviations
noise_var = np.linspace(0, 4, 20)
results = np.zeros((len(dependence_measures), noise_var.shape[0]))
num_mc = 100  # Monte Carlo iterations for averaging

for ii, d in enumerate(dependence_measures):
    for jj in range(noise_var.shape[0]):
        std = np.sqrt(noise_var[jj])
        mc_results = []
        for kk in range(num_mc):
            # Add Gaussian noise
            y_noisy = yy + np.random.normal(scale=std, size=len(yy))
            mc_results.append(d(xx, y_noisy))
        results[ii, jj] = np.mean(mc_results)

# Plot results
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

# Plot bounded measures (RDC, dCor, MIC)
for ii, dd in enumerate(dependence_measures_str):
    if dd != 'MI':
        ax1.plot(np.sqrt(noise_var), results[ii, :], 'o-', label=dd, linewidth=2)
ax1.legend()
ax1.set_title('Dependence Measures vs Noise Level (Linear Relationship)')
ax1.set_ylabel('Measure Value')
ax1.set_xlabel(r'Noise Standard Deviation ($\sigma$)')
ax1.grid(True)

# Plot MI separately (unbounded)
for ii, dd in enumerate(dependence_measures_str):
    if dd == 'MI':
        ax2.plot(np.sqrt(noise_var), results[ii, :], 'o-', label=dd, 
                 linewidth=2, color='green')
ax2.legend()
ax2.set_ylabel('Mutual Information (nats)')
ax2.set_xlabel(r'Noise Standard Deviation ($\sigma$)')
ax2.grid(True)

plt.tight_layout()

---

## Summary: Comparison of Dependence Measures

| Measure | Range | Key Properties | Best Use Case |
|---------|-------|----------------|---------------|
| **Pearson Correlation** | $[-1, 1]$ | Fast, interpretable, but linear only | Linear relationships, Gaussian data |
| **Mutual Information (MI)** | $[0, \infty)$ | Information-theoretic, any dependence | When you need interpretable information content |
| **Distance Correlation (dCor)** | $[0, 1]$ | Zero iff independent, any dependence | General dependence testing |
| **MIC** | $[0, 1]$ | "Equitability" - similar scores for equal noise | Exploratory data analysis |
| **RDC** | $[0, 1]$ | Fast approximation, uses copula transform | Large datasets, quick screening |

### Key Observations

1. **All alternative measures detect non-linear dependence** that Pearson correlation misses.

2. **RDC, dCor, and MIC are bounded** in $[0, 1]$, making them easier to interpret.

3. **MI is unbounded** but provides interpretable information content in nats/bits.

4. **Performance varies with noise** - some measures are more robust than others.

5. **Computational cost differs** - RDC is typically fastest, MIC can be slow for large datasets.

---

## Key Takeaways

1. **Choose the right measure for your problem** - no single measure is universally best.

2. **Non-linear dependence requires specialized measures** - Pearson correlation will miss it.

3. **Consider computational constraints** - some measures scale poorly with data size.

4. **These measures complement copulas** - they help identify dependence that copulas can model.

---

**Next:** With a foundation in dependence measures, we are now ready to introduce copulas and see how they provide a complete framework for modeling multivariate dependence.