# **Independent Distribution**

## 📑 Table of Contents

- **1. [Foundation - What Is the Independent Distribution?](#foundation---what-is-the-independent-distribution)**
    - 1.1 [The Big Picture](#the-big-picture)

- **2. [Setup & The Fundamental Concept](#setup--the-fundamental-concept)**
    - 2.1 [Essential Imports & Setup](#essential-imports--setup)  
    - 2.2 [The Core Distinction: True Multivariate vs Batched Univariate](#the-core-distinction-true-multivariate-vs-batched-univariate)

- **3. [The Independent Transformation](#the-independent-transformation)**
    - 3.1 [Converting Batched to Multivariate with Independent](#converting-batched-to-multivariate-with-independent)  
    - 3.2 [Log Probability: The Key Difference](#log-probability-the-key-difference)

- **4. [Complex Batching Scenarios](#complex-batching-scenarios)**
    - 4.1 [Higher-Dimensional Batching](#higher-dimensional-batching)  
    - 4.2 [Different reinterpreted_batch_ndims Values](#different-reinterpreted_batch_ndims-values)

- **5. [Advanced Applications](#advanced-applications)**
    - 5.1 [Multi-Dimensional Independent Distributions](#multi-dimensional-independent-distributions)

- **6. [Professional Patterns & Best Practices](#professional-patterns--best-practices)**
    - 6.1 [When to Use Independent](#when-to-use-independent)  
    - 6.2 [Loss Function Considerations](#loss-function-considerations)  
    - 6.3 [Shape Debugging Utilities](#shape-debugging-utilities)

- **7. [Expert Applications](#expert-applications)**
    - 7.1 [Building Complex Probabilistic Architectures](#building-complex-probabilistic-architectures)  
    - 7.2 [Custom Independent Implementations](#custom-independent-implementations)

- **8. [Complete Reference Guide](#complete-reference-guide)**
    - 8.1 [Independent Constructor Patterns](#independent-constructor-patterns)  
    - 8.2 [Essential Independent Operations Reference](#essential-independent-operations-reference)

- **[Final Notes](#-final-notes)**


## 1. **Foundation - What Is the Independent Distribution?**

### 1.1 **The Big Picture**
The Independent distribution is TensorFlow Probability's **meta-distribution** that transforms batch dimensions into event dimensions. It's the bridge between **batched univariate distributions** and **true multivariate distributions**.[1][2]

**Key Insight**: Independent doesn't create a new probability model—it **reinterprets existing dimensions** to change how probabilities are computed. It converts independent random variables into a joint distribution where `log_prob` returns a single value instead of a vector.[1]

**The Core Transformation**:
- **Input**: `batch_shape=[k], event_shape=[]` (k independent univariate)
- **Output**: `batch_shape=[], event_shape=[k]` (1 multivariate with k dimensions)

## 2. **Setup & The Fundamental Concept**

### 2.1 **Essential Imports & Setup**

In [40]:
# Importing TensorFlow Probability library 
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np

# Standard alias for distributions
tfd = tfp.distributions

# Set random seed for reproducibility
tf.random.set_seed(42)

### 2.2 **The Core Distinction: True Multivariate vs Batched Univariate**

In [41]:
# TRUE MULTIVARIATE DISTRIBUTION
mv_normal = tfd.MultivariateNormalDiag(loc=[-1., 0.5], scale_diag=[1., 1.5])
print(mv_normal)
"""
tfp.distributions.MultivariateNormalDiag("MultivariateNormalDiag",
                                        batch_shape=[],
                                        event_shape=[2],
                                        dtype=float32
                                        )
"""

# BATCHED UNIVARIATE DISTRIBUTION  
batched_normal = tfd.Normal(loc=[-1., 0.5], scale=[1., 1.5])
print(batched_normal)
"""
tfp.distributions.Normal("Normal",
                        batch_shape=[2],
                        event_shape=[],
                        dtype=float32
                        )
"""

tfp.distributions.MultivariateNormalDiag("MultivariateNormalDiag", batch_shape=[], event_shape=[2], dtype=float32)
tfp.distributions.Normal("Normal", batch_shape=[2], event_shape=[], dtype=float32)


'\ntfp.distributions.Normal("Normal",\n                        batch_shape=[2],\n                        event_shape=[],\n                        dtype=float32\n                        )\n'

**Critical Difference**:
- **Multivariate**: `batch_shape=[], event_shape=` → Joint distribution over 2D vectors[2]
- **Batched**: `batch_shape=, event_shape=[]` → Two independent 1D distributions[2]

## 3. **The Independent Transformation**

### 3.1 **Converting Batched to Multivariate with Independent**

In [42]:
# TRANSFORMING A BATCHED UNIVARIATE DISTRIBUTION TO A MULTIVARIATE DISTRIBUTION
# Start with batched univariate distribution
batched_normal = tfd.Normal(loc=[-1., 0.5], scale=[1., 1.5])

# Transform using Independent
independent_normal = tfd.Independent(batched_normal, reinterpreted_batch_ndims=1)
print(independent_normal)

tfp.distributions.Independent("IndependentNormal", batch_shape=[], event_shape=[2], dtype=float32)


**The Magic Parameter**: `reinterpreted_batch_ndims=1`
- **Meaning**: Take 1 dimension from the right side of `batch_shape` and move it to `event_shape`
- **Before**: `batch_shape=, event_shape=[]`[2]
- **After**: `batch_shape=[], event_shape=`[2]

### 3.2 **Log Probability: The Key Difference**

In [43]:
# Log probability computation comparison
batched_normal = tfd.Normal(loc=[-1., 0.5], scale=[1., 1.5])
independent_normal = tfd.Independent(batched_normal, reinterpreted_batch_ndims=1)

# Same input vector for both
test_input = [-0.2, 1.8]

# Batched: Returns array of independent log probabilities
batch_log_prob = batched_normal.log_prob(test_input)
print("Batched log prob:", batch_log_prob)
print("Shape:", batch_log_prob.shape)  # (2,)
# Output: [log P₁(-0.2), log P₂(1.8)] - separate probabilities

# Independent: Returns single joint log probability  
independent_log_prob = independent_normal.log_prob(test_input)
print("Independent log prob:", independent_log_prob)
print("Shape:", independent_log_prob.shape)  # ()
# Output: log P₁(-0.2) + log P₂(1.8) - joint probability

# Verification: Independent sums the batched log probabilities
print("Sum of batched:", tf.reduce_sum(batch_log_prob))
print("Independent value:", independent_log_prob)
# These should be equal (within numerical precision)

Batched log prob: tf.Tensor([-1.2389386 -1.699959 ], shape=(2,), dtype=float32)
Shape: (2,)
Independent log prob: tf.Tensor(-2.9388976, shape=(), dtype=float32)
Shape: ()
Sum of batched: tf.Tensor(-2.9388976, shape=(), dtype=float32)
Independent value: tf.Tensor(-2.9388976, shape=(), dtype=float32)


**Mathematical Interpretation**:
- **Batched**: `[log p₁(x₁), log p₂(x₂)]` → Vector of independent log probabilities
- **Independent**: `log p₁(x₁) + log p₂(x₂)` → Single joint log probability

## 4. **Complex Batching Scenarios**

### 4.1 **Higher-Dimensional Batching**

In [44]:
# Create 3 batches of 2D distributions
batched_normal = tfd.Normal(
    loc=[[-1., 0.5], [0., 1.], [0.3, -0.1]],      # Shape: (3, 2)
    scale=[[1., 1.5], [0.2, 0.8], [2., 1.]]       # Shape: (3, 2)
)
print(batched_normal)

# Apply Independent with reinterpreted_batch_ndims=1
independent_normal = tfd.Independent(batched_normal, reinterpreted_batch_ndims=1)
print(independent_normal)


tfp.distributions.Normal("Normal", batch_shape=[3, 2], event_shape=[], dtype=float32)
tfp.distributions.Independent("IndependentNormal", batch_shape=[3], event_shape=[2], dtype=float32)


**Shape Transformation**:
- **Original**: `batch_shape=[3, 2], event_shape=[]` → 6 independent scalars
- **After Independent**: `batch_shape=, event_shape=` → 3 independent 2D vectors[3][2]

### 4.2 **Different reinterpreted_batch_ndims Values**

In [45]:
# Same starting distribution
batched_normal = tfd.Normal(
    loc=[[-1., 0.5], [0., 1.], [0.3, -0.1]],
    scale=[[1., 1.5], [0.2, 0.8], [2., 1.]]
)
# Original: batch_shape=[3, 2], event_shape=[]

# reinterpreted_batch_ndims=2 (reinterpret ALL batch dimensions)
independent_normal_2 = tfd.Independent(batched_normal, reinterpreted_batch_ndims=2)
print(independent_normal_2)
# Output: batch_shape=[], event_shape=[3, 2]

tfp.distributions.Independent("IndependentNormal", batch_shape=[], event_shape=[3, 2], dtype=float32)


**Complete Transformation**:
- **Original**: `batch_shape=[3, 2], event_shape=[]` → 6 independent scalars
- **After Independent**: `batch_shape=[], event_shape=[3, 2]` → 1 joint distribution over 3×2 matrix


## 5. **Advanced Applications**

### 5.1 **Multi-Dimensional Independent Distributions**

In [46]:
import numpy as np
import tensorflow as tf  # Missing import
import tensorflow_probability as tfp
tfd = tfp.distributions

# Create a complex batch structure: (2, 4, 5) Bernoulli distributions
probs = 0.5 * tf.ones((2, 4, 5))  # Shape: (2, 4, 5)
bernoulli_batch = tfd.Bernoulli(probs=probs)
print("Original Bernoulli batch:")
print(f"  batch_shape: {bernoulli_batch.batch_shape}")  # [2, 4, 5]
print(f"  event_shape: {bernoulli_batch.event_shape}")  # []

# Apply Independent with explicit reinterpreted_batch_ndims to fix deprecation warning
dist = tfd.Independent(bernoulli_batch, reinterpreted_batch_ndims=2)
print("\nAfter Independent:")
print(f"  batch_shape: {dist.batch_shape}")  # [2]
print(f"  event_shape: {dist.event_shape}")  # [4, 5]

# Test log probability computation
# Shape must match batch_shape + event_shape = (2,) + (4, 5) = (2, 4, 5)
test_input = np.zeros((2, 4, 5), dtype=np.int32)  # Fixed shape
result = dist.log_prob(test_input)
print(f"\nLog prob result shape: {result.shape}")  # (2,) - one value per remaining batch


Original Bernoulli batch:
  batch_shape: (2, 4, 5)
  event_shape: ()

After Independent:
  batch_shape: (2,)
  event_shape: (4, 5)

Log prob result shape: (2,)


**Shape Analysis**:
- **Input**: `probs.shape = (2, 4, 5)` → `batch_shape=[2, 4, 5], event_shape=[]`
- **After Independent**: `batch_shape=[2, 4], event_shape=`[4]
- **Interpretation**: 2×4 = 8 independent distributions, each over 5-dimensional binary vectors

## 6. **Professional Patterns & Best Practices**

### 6.1 **When to Use Independent**

In [47]:
def create_probabilistic_layer_output(features, output_dim, use_independent=True):
    """
    Create probabilistic neural network output
    """
    # Neural network outputs raw parameters
    loc = tf.keras.layers.Dense(output_dim)(features)
    raw_scale = tf.keras.layers.Dense(output_dim)(features)
    scale = tf.nn.softplus(raw_scale) + 1e-6
    
    # Create batched normal distribution
    batched_dist = tfd.Normal(loc=loc, scale=scale)
    
    if use_independent:
        # Convert to multivariate for joint log probability
        return tfd.Independent(batched_dist, reinterpreted_batch_ndims=1)
    else:
        # Keep as independent scalars
        return batched_dist

# Example usage in model
class ProbabilisticRegressor(tf.keras.Model):
    def __init__(self, output_dim):
        super().__init__()
        self.dense1 = tf.keras.layers.Dense(64, activation='relu')
        self.dense2 = tf.keras.layers.Dense(32, activation='relu')
        self.output_dim = output_dim
        
    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.dense2(x)
        return create_probabilistic_layer_output(x, self.output_dim, use_independent=True)

# Model outputs multivariate distribution for joint uncertainty
model = ProbabilisticRegressor(output_dim=3)

### 6.2 **Loss Function Considerations**

In [48]:
def independent_loss_comparison(y_true, batched_dist, independent_dist):
    """
    Compare loss computation with and without Independent
    """
    # Batched approach: Sum of individual losses
    batched_losses = -batched_dist.log_prob(y_true)  # Shape: (batch_size, output_dim)
    batched_total_loss = tf.reduce_sum(batched_losses, axis=-1)  # Shape: (batch_size,)
    
    # Independent approach: Joint loss directly
    independent_loss = -independent_dist.log_prob(y_true)  # Shape: (batch_size,)
    
    return {
        'batched_individual': batched_losses,
        'batched_total': batched_total_loss,
        'independent': independent_loss,
        'are_equal': tf.reduce_all(tf.abs(batched_total_loss - independent_loss) < 1e-6)
    }

# Example with dummy data
batch_size, output_dim = 32, 5
y_true = tf.random.normal((batch_size, output_dim))
loc = tf.random.normal((batch_size, output_dim))
scale = tf.nn.softplus(tf.random.normal((batch_size, output_dim))) + 1e-6

batched_dist = tfd.Normal(loc=loc, scale=scale)
independent_dist = tfd.Independent(batched_dist, reinterpreted_batch_ndims=1)

loss_comparison = independent_loss_comparison(y_true, batched_dist, independent_dist)
print("Losses are mathematically equivalent:", loss_comparison['are_equal'])

Losses are mathematically equivalent: tf.Tensor(True, shape=(), dtype=bool)


### 6.3 **Shape Debugging Utilities**

In [49]:
def analyze_independent_transformation(original_dist, reinterpreted_batch_ndims):
    """
    Analyze the effect of Independent transformation on shapes
    """
    independent_dist = tfd.Independent(original_dist, reinterpreted_batch_ndims)
    
    print("=== INDEPENDENT TRANSFORMATION ANALYSIS ===")
    print(f"Original distribution: {original_dist}")
    print(f"  batch_shape: {original_dist.batch_shape}")
    print(f"  event_shape: {original_dist.event_shape}")
    print()
    print(f"Independent distribution: {independent_dist}")
    print(f"  batch_shape: {independent_dist.batch_shape}")
    print(f"  event_shape: {independent_dist.event_shape}")
    print(f"  reinterpreted_batch_ndims: {reinterpreted_batch_ndims}")
    print()
    
    # Sample shape analysis
    sample = independent_dist.sample(3)
    print(f"Sample shape (3 samples): {sample.shape}")
    print(f"Expected: (3,) + {independent_dist.batch_shape} + {independent_dist.event_shape}")
    
    return independent_dist

# Example usage
original = tfd.Normal(loc=tf.zeros((2, 3, 4)), scale=tf.ones((2, 3, 4)))
transformed = analyze_independent_transformation(original, reinterpreted_batch_ndims=2)

=== INDEPENDENT TRANSFORMATION ANALYSIS ===
Original distribution: tfp.distributions.Normal("Normal", batch_shape=[2, 3, 4], event_shape=[], dtype=float32)
  batch_shape: (2, 3, 4)
  event_shape: ()

Independent distribution: tfp.distributions.Independent("IndependentNormal", batch_shape=[2], event_shape=[3, 4], dtype=float32)
  batch_shape: (2,)
  event_shape: (3, 4)
  reinterpreted_batch_ndims: 2

Sample shape (3 samples): (3, 2, 3, 4)
Expected: (3,) + (2,) + (3, 4)


## 7. **Expert Applications**

### 7.1 **Building Complex Probabilistic Architectures**

In [50]:
class HierarchicalProbabilisticModel(tf.keras.Model):
    """
    Example: Hierarchical model using Independent at multiple levels
    """
    def __init__(self, hierarchy_dims=[10, 5, 3]):
        super().__init__()
        self.hierarchy_dims = hierarchy_dims
        self.model_layers = []  # Changed from self.layers to self.model_layers
        
        for i, dim in enumerate(hierarchy_dims):
            self.model_layers.append(tf.keras.layers.Dense(64, activation='relu'))
            self.model_layers.append(tf.keras.layers.Dense(dim * 2))  # loc + scale parameters
    
    def call(self, inputs):
        x = inputs
        distributions = []
        
        for i in range(0, len(self.model_layers), 2):  # Updated reference
            # Feature transformation
            x = self.model_layers[i](x)  # Updated reference
            
            # Distribution parameters
            params = self.model_layers[i + 1](x)  # Updated reference
            dim = self.hierarchy_dims[i // 2]
            
            # Split parameters
            loc = params[..., :dim]
            raw_scale = params[..., dim:]
            scale = tf.nn.softplus(raw_scale) + 1e-6
            
            # Create batched then independent distribution
            batched_dist = tfd.Normal(loc=loc, scale=scale)
            independent_dist = tfd.Independent(batched_dist, reinterpreted_batch_ndims=1)
            distributions.append(independent_dist)
            
            # Use samples as features for next level
            x = independent_dist.sample()
        
        return distributions

# Usage
model = HierarchicalProbabilisticModel()
input_data = tf.random.normal((32, 20))  # Batch of 32, 20 features
hierarchy_distributions = model(input_data)

for i, dist in enumerate(hierarchy_distributions):
    print(f"Level {i}: batch_shape={dist.batch_shape}, event_shape={dist.event_shape}")

Level 0: batch_shape=(32,), event_shape=(10,)
Level 1: batch_shape=(32,), event_shape=(5,)
Level 2: batch_shape=(32,), event_shape=(3,)


### 7.2 **Custom Independent Implementations**

In [51]:
class ConditionalIndependent:
    """
    Custom Independent that can dynamically choose reinterpreted_batch_ndims
    """
    def __init__(self, base_distribution_fn, max_reinterpreted_dims):
        self.base_distribution_fn = base_distribution_fn
        self.max_reinterpreted_dims = max_reinterpreted_dims
    
    def __call__(self, inputs, reinterpreted_dims=None):
        base_dist = self.base_distribution_fn(inputs)
        
        if reinterpreted_dims is None:
            # Auto-determine based on batch shape
            reinterpreted_dims = min(len(base_dist.batch_shape), self.max_reinterpreted_dims)
        
        if reinterpreted_dims == 0:
            return base_dist
        else:
            return tfd.Independent(base_dist, reinterpreted_batch_ndims=reinterpreted_dims)

# Example usage
def create_base_distribution(inputs):
    # Creates varying batch shapes based on input
    output_size = tf.shape(inputs)[-1]
    loc = tf.keras.layers.Dense(output_size)(inputs)
    scale = tf.nn.softplus(tf.keras.layers.Dense(output_size)(inputs)) + 1e-6
    return tfd.Normal(loc=loc, scale=scale)

conditional_independent = ConditionalIndependent(create_base_distribution, max_reinterpreted_dims=2)

## 8. **Complete Reference Guide**

### 8.1 **Independent Constructor Patterns**

In [52]:
# === BASIC INDEPENDENT USAGE ===
# Convert 1D batch to multivariate
tfd.Independent(tfd.Normal(loc=[0., 1.], scale=[1., 2.]), 
            reinterpreted_batch_ndims=1)

# Convert 2D batch partially  
tfd.Independent(tfd.Normal(loc=tf.zeros((3, 4)), scale=tf.ones((3, 4))), 
            reinterpreted_batch_ndims=1)  # → batch_shape=[3], event_shape=[4]

# Convert all batch dimensions
tfd.Independent(tfd.Bernoulli(probs=tf.ones((2, 3, 4)) * 0.5), 
            reinterpreted_batch_ndims=3)  # → batch_shape=[], event_shape=[2, 3, 4]

# === ADVANCED PATTERNS ===
# With multivariate base distributions
mv_base = tfd.MultivariateNormalDiag(loc=[[0., 1.], [2., 3.]], scale_diag=[[1., 1.], [2., 2.]])
tfd.Independent(mv_base, reinterpreted_batch_ndims=1)  # batch_shape=[], event_shape=[2, 2]

# With transformed distributions
transformed_base = tfd.TransformedDistribution(
    tfd.Normal(loc=tf.zeros((5,)), scale=tf.ones((5,))),
    bijector=tfp.bijectors.Exp()
)
tfd.Independent(transformed_base, reinterpreted_batch_ndims=1)  # Log-normal multivariate


<tfp.distributions.Independent 'IndependentexpNormal' batch_shape=[] event_shape=[5] dtype=float32>

### 8.2 **Essential Independent Operations Reference**

Since Independent creates distributions with diagonal covariance matrices by design (the components are independent), manually constructing the covariance matrix:

In [53]:
import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions

# Your independent distribution
base_dist = tfd.Normal([0.0, 1.0, 2.0], [1.0, 0.5, 1.5])
independent_dist = tfd.Independent(base_dist, reinterpreted_batch_ndims=1)

# Manual covariance matrix construction
variances = independent_dist.variance()  # This works - gives diagonal elements
cov_matrix = tf.linalg.diag(variances)   # Create diagonal covariance matrix

print("Covariance matrix:", cov_matrix)

Covariance matrix: tf.Tensor(
[[1.   0.   0.  ]
 [0.   0.25 0.  ]
 [0.   0.   2.25]], shape=(3, 3), dtype=float32)


For a more direct approach with full covariance support, use `MultivariateNormalDiag`

In [54]:
# Instead of Independent, use MultivariateNormalDiag
mvn_diag = tfd.MultivariateNormalDiag(
    loc=[0.0, 1.0, 2.0], 
    scale_diag=[1.0, 0.5, 1.5]
)

# This supports covariance() method
cov_matrix = mvn_diag.covariance()
mean_vector = mvn_diag.mean()
var_vector = mvn_diag.variance()

print("Covariance matrix:", cov_matrix)

Covariance matrix: tf.Tensor(
[[1.   0.   0.  ]
 [0.   0.25 0.  ]
 [0.   0.   2.25]], shape=(3, 3), dtype=float32)


If you need to stick with Independent, work around the limitation

In [55]:
# Access variance from Independent (this works)
variances = independent_dist.variance()

# Create diagonal covariance manually
cov_matrix = tf.eye(len(variances)) * variances

# Alternative: reshape variance vector into diagonal matrix
cov_matrix_alt = tf.linalg.diag(variances)

print("Are equal:", tf.reduce_all(tf.abs(cov_matrix - cov_matrix_alt) < 1e-6))

Are equal: tf.Tensor(True, shape=(), dtype=bool)


The Independent distribution wrapper focuses on reinterpreting dimensions rather than implementing all possible statistical operations. Many operations like `mean()`, `variance()`, and `log_prob()` are implemented, but `covariance()` is not.

### **🧮 Shape Transformation Rules**

| **reinterpreted_batch_ndims** | **Original batch_shape** | **Original event_shape** | **New batch_shape** | **New event_shape** |
|------------------------------|--------------------------|---------------------------|---------------------|---------------------|
| **0** | `[a, b, c]` | `[d]` | `[a, b, c]` | `[d]` |
| **1** | `[a, b, c]` | `[d]` | `[a, b]` | `[c, d]` |
| **2** | `[a, b, c]` | `[d]` | `[a]` | `[b, c, d]` |
| **3** | `[a, b, c]` | `[d]` | `[]` | `[a, b, c, d]`|

## **Final Notes**

- **The Independent Mindset**: Independent is not about creating new probability models—it's about **reinterpreting how you compute probabilities**. It transforms "multiple independent calculations" into "single joint calculations."

- **When to Use Independent**:
  - ✅ **Neural network outputs**: When you want joint loss over multiple predictions
  - ✅ **Multivariate modeling**: When treating related variables as a single entity
  - ✅ **Probabilistic layers**: When building end-to-end probabilistic models
  - ❌ **Already multivariate**: Don't wrap MultivariateNormal with Independent
  - ❌ **Need correlations**: Independent assumes independence—use true multivariate for correlations

- **Shape is Everything**: The relationship `reinterpreted_batch_ndims` ≤ `len(batch_shape)` must hold. Independent moves dimensions from right side of batch_shape to left side of event_shape.

- **Performance Insight**: Independent and manual reduction (`tf.reduce_sum(dist.log_prob(...), axis=...)`) are mathematically equivalent but Independent is cleaner and integrates better with TFP's ecosystem.

- **Debug Strategy**: When Independent doesn't work as expected, always print and compare shapes before and after transformation. The math should be clear from the dimensions.

> Independent distribution is the **Swiss Army knife** of TensorFlow Probability—it bridges the gap between batch processing and multivariate modeling, enabling clean and efficient probabilistic architectures! 