# **Sampling and Log Probabilities**

## 📑 Table of Contents

- **1. [Foundation - What Are Sampling and Log Probabilities?](#1-foundation---what-are-sampling-and-log-probabilities)**
    - 1.1 [The Big Picture](#11-the-big-picture)

- **2. [Setup & Basic Concepts](#2-setup--basic-concepts)**
    - 2.1 [Essential Imports & Setup](#21-essential-imports--setup)  
    - 2.2 [Your First Sampling & Log Prob Operations](#22-your-first-sampling--log-prob-operations)

- **3. [Basic Sampling Operations](#3-basic-sampling-operations)**
    - 3.1 [Sampling from Batched Distributions](#31-sampling-from-batched-distributions)  
    - 3.2 [Independent Wrapper for Joint Sampling](#32-independent-wrapper-for-joint-sampling)

- **4. [Complex Multi-Dimensional Sampling](#4-complex-multi-dimensional-sampling)**
    - 4.1 [Advanced Batch Structures](#41-advanced-batch-structures)  
    - 4.2 [Multi-Dimensional Sampling](#42-multi-dimensional-sampling)

- **5. [Log Probability Computations](#5-log-probability-computations)**
    - 5.1 [Basic Log Probability Evaluation](#51-basic-log-probability-evaluation)  
    - 5.2 [Vector Input Log Probability](#52-vector-input-log-probability)

- **6. [Advanced Broadcasting & Shape Mechanics](#6-advanced-broadcasting--shape-mechanics)**
    - 6.1 [Complex Broadcasting Patterns](#61-complex-broadcasting-patterns)  
    - 6.2 [Multivariate Distribution with Independent](#62-multivariate-distribution-with-independent)

- **7. [Professional Patterns & Best Practices](#7-professional-patterns--best-practices)**
    - 7.1 [Efficient Sampling Strategies](#71-efficient-sampling-strategies)  
    - 7.2 [Broadcasting Validation Utilities](#72-broadcasting-validation-utilities)  
    - 7.3 [Probabilistic Training Loss Patterns](#73-probabilistic-training-loss-patterns)

- **8. [Expert Applications](#8-expert-applications)**
    - 8.1 [Monte Carlo Variational Inference](#81-monte-carlo-variational-inference)  
    - 8.2 [Advanced Sampling Techniques](#82-advanced-sampling-techniques)

- **9. [Complete Reference Guide](#9-complete-reference-guide)**
    - 9.1 [Sampling Operations Reference](#91-sampling-operations-reference)  
    - 9.2 [Log Probability Operations Reference](#92-log-probability-operations-reference)  
    - 9.3 [Shape Rules for Sampling and Log Prob](#93-shape-rules-for-sampling-and-log-prob)

- **[Learning Path](#learning-path)**

- **[Final Notes](#final-notes)**

## 1. **Foundation - What Are Sampling and Log Probabilities?**

### 1.1 **The Big Picture**
Sampling and log probability computations are the **core computational operations** in TensorFlow Probability. Every probabilistic machine learning algorithm relies on these two fundamental operations: **drawing samples** from distributions and **evaluating probabilities** of observed data.

**Key Insight**: While `sample()` generates random data for Monte Carlo estimation and training, `log_prob()` evaluates how likely observed data is under your probabilistic model—the foundation of maximum likelihood estimation and Bayesian inference.[1]

**Mathematical Foundation**:
- **Sampling**: Generate `x ~ p(x)` for Monte Carlo approximation
- **Log Probability**: Compute `log p(x)` for stable numerical likelihood evaluation


## 2. **Setup & Basic Concepts**

### 2.1 **Essential Imports & Setup**

In [66]:
# Importing TensorFlow Probability library
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np

# Standard alias for distributions
tfd = tfp.distributions

# Set random seed for reproducibility
tf.random.set_seed(42)

### 2.2 **Your First Sampling & Log Prob Operations**

In [None]:
# Create a batched exponential distribution
exp = tfd.Exponential(rate=[[1., 1.5, 0.8], [0.3, 0.4, 1.8]])
print(exp)
# Output: <tfp.distributions.Exponential 'Exponential' batch_shape=[2, 3] event_shape=[] dtype=float32>

print("Batch shape:", exp.batch_shape)  
print("Event shape:", exp.event_shape)  

tfp.distributions.Exponential("Exponential", batch_shape=[2, 3], event_shape=[], dtype=float32)
Batch shape: (2, 3)
Event shape: ()


**Key Properties**:
- **`batch_shape=[2, 3]`**: 2×3 = 6 independent exponential distributions
- **`event_shape=[]`**: Each distribution produces scalar values
- **`rate` parameter**: Different rates for each distribution in the batch

## 3. **Basic Sampling Operations**

### 3.1 **Sampling from Batched Distributions**

In [68]:
# Create batched exponential distribution
exp = tfd.Exponential(rate=[[1., 1.5, 0.8], [0.3, 0.4, 1.8]])

# Single sample from each distribution
single_sample = exp.sample()
print("Single sample shape:", single_sample.shape)  # (2, 3)
print("Single sample:\n", single_sample)

# Multiple samples
multiple_samples = exp.sample(4)  
print("Multiple samples shape:", multiple_samples.shape)  # (4, 2, 3)
print("Multiple samples:\n", multiple_samples)

Single sample shape: (2, 3)
Single sample:
 tf.Tensor(
[[4.390422   2.7434108  0.16488166]
 [0.350755   2.5880651  0.47506964]], shape=(2, 3), dtype=float32)
Multiple samples shape: (4, 2, 3)
Multiple samples:
 tf.Tensor(
[[[2.1462562  0.86743593 0.0618375 ]
  [2.3312993  2.7058575  0.34583405]]

 [[0.7877366  0.6282713  0.04425894]
  [0.20140766 4.1841908  0.49354032]]

 [[0.07714589 0.8290665  0.32057405]
  [3.3196666  0.78857696 0.37590277]]

 [[0.05160386 0.35619614 0.31295168]
  [0.9867175  0.29580456 0.33078858]]], shape=(4, 2, 3), dtype=float32)


**Shape Analysis**:
- **Single sample**: `shape = batch_shape + event_shape = (2, 3) + () = (2, 3)`
- **Multiple samples**: `shape = sample_shape + batch_shape + event_shape = (4,) + (2, 3) + () = (4, 2, 3)`

### 3.1 **Independent Wrapper for Joint Sampling**

In [69]:
# Convert batched to multivariate using Independent
exp = tfd.Exponential(rate=[[1., 1.5, 0.8], [0.3, 0.4, 1.8]])
ind_exp = tfd.Independent(exp)  # Default reinterpreted_batch_ndims=1
print(ind_exp)
# Output: <tfp.distributions.Independent 'IndependentExponential' batch_shape=[2] event_shape=[2] dtype=float32>

# Sample from Independent distribution
joint_samples = ind_exp.sample(4)
print("Joint samples shape:", joint_samples.shape)  # (4, 2, 3)
print("Joint samples:\n", joint_samples)

tfp.distributions.Independent("IndependentExponential", batch_shape=[2], event_shape=[3], dtype=float32)
Joint samples shape: (4, 2, 3)
Joint samples:
 tf.Tensor(
[[[2.6272154  2.051555   3.347836  ]
  [1.6800407  0.51907015 0.9736745 ]]

 [[1.0240827  0.8342139  2.1261241 ]
  [0.98191    1.8131511  0.6493894 ]]

 [[0.26463446 0.08187366 5.945581  ]
  [5.35103    5.085251   0.5521324 ]]

 [[0.81465316 0.52829653 0.18442416]
  [2.2659454  1.7780305  0.37588385]]], shape=(4, 2, 3), dtype=float32)


**Shape Transformation**:
- **Original**: `batch_shape=[2, 3], event_shape=[]` → 6 independent scalars
- **After Independent**: `batch_shape=, event_shape=` → 2 joint 3D vectors[3]

## 4. **Complex Multi-Dimensional Sampling**

### 4.1 **Advanced Batch Structures**

In [70]:
# Create complex multi-dimensional batch structure
rates = [
    [[[1., 1.5, 0.8], [0.3, 0.4, 1.8]]],
    [[[0.2, 0.4, 1.4], [0.4, 1.1, 0.9]]]
]

exp = tfd.Exponential(rate=rates)
print(exp)
# Output: batch_shape=[2, 1, 2, 3], event_shape=[]

# Apply Independent with reinterpreted_batch_ndims=2
ind_exp = tfd.Independent(exp, reinterpreted_batch_ndims=2)
print(ind_exp)
# Output: batch_shape=[2, 1], event_shape=[2, 3]

tfp.distributions.Exponential("Exponential", batch_shape=[2, 1, 2, 3], event_shape=[], dtype=float32)
tfp.distributions.Independent("IndependentExponential", batch_shape=[2, 1], event_shape=[2, 3], dtype=float32)


**Complex Shape Analysis**:
- **Original**: `batch_shape=[2, 1, 2, 3], event_shape=[]` → 12 independent scalars
- **After Independent**: `batch_shape=[2, 1], event_shape=[2, 3]` → 2×1 = 2 joint distributions over 2×3 matrices

### 4.2 **Multi-Dimensional Sampling**

In [None]:
# Sample with specific sample shape
rates = [
    [[[1., 1.5, 0.8], [0.3, 0.4, 1.8]]],
    [[[0.2, 0.4, 1.4], [0.4, 1.1, 0.9]]]
]

exp = tfd.Exponential(rate=rates)
ind_exp = tfd.Independent(exp, reinterpreted_batch_ndims=2)

# Sample with shape [4, 2]
complex_samples = ind_exp.sample([4, 2])
print("Complex samples shape:", complex_samples.shape)  
print("Complex samples structure:")
print("  Sample shape: [4, 2]")
print("  Batch shape: [2, 1]") 
print("  Event shape: [2, 3]")
print("  Final shape: (4, 2, 2, 1, 2, 3)")

Complex samples shape: (4, 2, 2, 1, 2, 3)
Complex samples structure:
  Sample shape: [4, 2]
  Batch shape: [2, 1]
  Event shape: [2, 3]
  Final shape: (4, 2, 2, 1, 2, 3)


## 5. **Log Probability Computations**

### 5.1 **Basic Log Probability Evaluation**

In [None]:
# Create the same distribution structure
rates = [
    [[[1., 1.5, 0.8], [0.3, 0.4, 1.8]]],
    [[[0.2, 0.4, 1.4], [0.4, 1.1, 0.9]]]
]

exp = tfd.Exponential(rate=rates)
ind_exp = tfd.Independent(exp, reinterpreted_batch_ndims=2)

# Single scalar input (broadcasts across all distributions)
scalar_log_prob = ind_exp.log_prob(0.5)
print("Scalar log prob shape:", scalar_log_prob.shape)  
print("Scalar log prob values:", scalar_log_prob)


Scalar log prob shape: (2, 1)
Scalar log prob values: tf.Tensor(
[[-4.2501554]
 [-5.3155975]], shape=(2, 1), dtype=float32)


**Broadcasting Behavior**:
- **Input**: `0.5` (scalar)
- **Distribution event_shape**: `[2, 3]` 
- **Broadcasting**: Scalar expands to `[2, 3]` filled with `0.5`
- **Output**: `(2, 1)` - one log probability per batch

### 5.2 **Vector Input Log Probability**

In [None]:
# Vector input matching event shape
rates = [
    [[[1., 1.5, 0.8], [0.3, 0.4, 1.8]]],
    [[[0.2, 0.4, 1.4], [0.4, 1.1, 0.9]]]
]

exp = tfd.Exponential(rate=rates)
ind_exp = tfd.Independent(exp, reinterpreted_batch_ndims=2)

# Input shape (1, 3) - partially matches event_shape 
vector_input = [[0.3, 0.5, 0.8]]  # Shape: (1, 3)
vector_log_prob = ind_exp.log_prob(vector_input)
print("Vector log prob shape:", vector_log_prob.shape)  
print("Vector log prob values:", vector_log_prob)

Vector log prob shape: (2, 1)
Vector log prob values: tf.Tensor(
[[-4.7701554]
 [-5.8855977]], shape=(2, 1), dtype=float32)


## 6. **Advanced Broadcasting & Shape Mechanics**

### 6.1 **Complex Broadcasting Patterns**

In [74]:
tfd = tfp.distributions

# Create complex distribution structure
rates = [
    [[[1., 1.5, 0.8], [0.3, 0.4, 1.8]]],
    [[[0.2, 0.4, 1.4], [0.4, 1.1, 0.9]]]
]

exp = tfd.Exponential(rate=rates)
ind_exp = tfd.Independent(exp, reinterpreted_batch_ndims=2)

# Complex input with shape
complex_input = tf.random.uniform((5, 1, 1, 2, 1))
print("Complex input shape:", complex_input.shape)  

complex_log_prob = ind_exp.log_prob(complex_input)
print("Complex log prob shape:", complex_log_prob.shape)

Complex input shape: (5, 1, 1, 2, 1)
Complex log prob shape: (5, 2, 1)


**Advanced Broadcasting Rules**:
- **Distribution**: `batch_shape=[2, 1], event_shape=[2, 3]`
- **Input**: `(5, 1, 1, 2, 1)`
- **Broadcasting**: Input shape aligns with batch + event dimensions
- **Output**: Broadcasted result based on TensorFlow broadcasting rules

### 6.2 **Multivariate Distribution with Independent**

In [None]:
tfd = tfp.distributions

# Create multivariate distribution with Independent wrapper
loc = tf.zeros((2, 3, 1))          # Shape: (2, 3, 1)
scale_diag = tf.ones(4)            # Shape: (4,)

# Create base multivariate normal
base_mv_normal = tfd.MultivariateNormalDiag(loc=loc, scale_diag=scale_diag)
print("Base distribution:")
print(f"  batch_shape: {base_mv_normal.batch_shape}")  
print(f"  event_shape: {base_mv_normal.event_shape}") 

# Wrap with Independent
dist = tfd.Independent(base_mv_normal)  # Default reinterpreted_batch_ndims=1
print("After Independent:")
print(f"  batch_shape: {dist.batch_shape}")  
print(f"  event_shape: {dist.event_shape}")  

# Log probability evaluation
test_input = tf.random.uniform((2, 1, 1, 4))
result_log_prob = dist.log_prob(test_input)
print("Result shape:", result_log_prob.shape)


Base distribution:
  batch_shape: (2, 3)
  event_shape: (4,)
After Independent:
  batch_shape: (2,)
  event_shape: (3, 4)
Result shape: (2, 2)


**Multivariate + Independent Analysis**:
- **Base**: MultivariateNormal with `batch_shape=[2, 3, 1], event_shape=`[5]
- **Independent**: Moves last batch dimension to event: `batch_shape=[2, 3], event_shape=[1, 4]`
- **Result**: Joint log probability over `[1, 4]` dimensional events

## 7. **Professional Patterns & Best Practices**

### 7.1 **Efficient Sampling Strategies**

In [76]:
def efficient_monte_carlo_estimation(distribution, n_samples=1000, chunk_size=100):
    """
    Memory-efficient Monte Carlo sampling for large n_samples
    """
    samples = []
    log_probs = []
    
    for i in range(0, n_samples, chunk_size):
        current_chunk = min(chunk_size, n_samples - i)
        
        # Sample and evaluate in chunks to manage memory
        chunk_samples = distribution.sample(current_chunk)
        chunk_log_probs = distribution.log_prob(chunk_samples)
        
        samples.append(chunk_samples)
        log_probs.append(chunk_log_probs)
    
    return tf.concat(samples, axis=0), tf.concat(log_probs, axis=0)

# Example usage
dist = tfd.Independent(tfd.Normal(loc=tf.zeros(10), scale=tf.ones(10)))
samples, log_probs = efficient_monte_carlo_estimation(dist, n_samples=5000)
print(f"Samples shape: {samples.shape}")    
print(f"Log probs shape: {log_probs.shape}")  

Samples shape: (5000, 10)
Log probs shape: (5000, 10)


### 7.2 **Broadcasting Validation Utilities**

In [77]:
def validate_log_prob_input_shape(distribution, input_tensor):
    """
    Validate that input tensor can be evaluated by distribution.log_prob()
    """
    batch_shape = distribution.batch_shape
    event_shape = distribution.event_shape
    input_shape = input_tensor.shape
    
    print("=== LOG PROB INPUT VALIDATION ===")
    print(f"Distribution batch_shape: {batch_shape}")
    print(f"Distribution event_shape: {event_shape}")
    print(f"Input tensor shape: {input_shape}")
    
    # Check if input is compatible with event shape
    event_ndims = len(event_shape)
    if len(input_shape) < event_ndims:
        print("❌ ERROR: Input rank too small for event shape")
        return False
    
    input_event_shape = input_shape[-event_ndims:] if event_ndims > 0 else []
    
    # Check event shape compatibility
    try:
        tf.broadcast_static_shape(input_event_shape, event_shape)
        print("✅ Event shape compatible")
    except tf.errors.InvalidArgumentError:
        print("❌ ERROR: Input event shape incompatible")
        return False
    
    # Predict output shape
    try:
        sample_and_batch_shape = input_shape[:-event_ndims] if event_ndims > 0 else input_shape
        output_shape = tf.broadcast_static_shape(sample_and_batch_shape, batch_shape)
        print(f"✅ Expected output shape: {output_shape}")
        return True
    except tf.errors.InvalidArgumentError:
        print("❌ ERROR: Batch broadcasting will fail")
        return False

# Example validation
dist = tfd.Independent(tfd.Normal(loc=tf.zeros((3, 4)), scale=tf.ones((3, 4))))
input_tensor = tf.random.normal((2, 1, 4))
is_valid = validate_log_prob_input_shape(dist, input_tensor)

=== LOG PROB INPUT VALIDATION ===
Distribution batch_shape: (3,)
Distribution event_shape: (4,)
Input tensor shape: (2, 1, 4)
✅ Event shape compatible
✅ Expected output shape: (2, 3)


### 7.3 **Probabilistic Training Loss Patterns**

In [78]:
class ProbabilisticLoss(tf.keras.losses.Loss):
    """
    Custom loss for probabilistic outputs using log_prob
    """
    def __init__(self, reduction=tf.keras.losses.Reduction.AUTO, name='probabilistic_loss'):
        super().__init__(reduction=reduction, name=name)
    
    def call(self, y_true, y_pred_distribution):
        """
        y_true: Ground truth values
        y_pred_distribution: TFP Distribution object
        """
        # Negative log likelihood
        nll = -y_pred_distribution.log_prob(y_true)
        return nll

# Example usage in training
@tf.function
def train_step_with_sampling(features, targets, model, optimizer):
    """
    Training step that uses both sampling and log_prob
    """
    with tf.GradientTape() as tape:
        # Model outputs distribution
        pred_distribution = model(features, training=True)
        
        # Main loss: negative log likelihood
        nll_loss = -tf.reduce_mean(pred_distribution.log_prob(targets))
        
        # Regularization using samples
        samples = pred_distribution.sample(5)
        sample_std = tf.math.reduce_std(samples, axis=0)
        regularization = 0.01 * tf.reduce_mean(sample_std)  # Encourage diversity
        
        total_loss = nll_loss + regularization
    
    gradients = tape.gradient(total_loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    
    return {
        'total_loss': total_loss,
        'nll_loss': nll_loss,
        'regularization': regularization
    }

## 8. **Expert Applications**

### 8.1 **Monte Carlo Variational Inference**

In [79]:
def monte_carlo_elbo(encoder, decoder, data, n_samples=10):
    """
    Compute ELBO using Monte Carlo estimation with sampling and log_prob
    """
    # Encode to get posterior distribution
    posterior = encoder(data)
    
    # Sample from posterior
    z_samples = posterior.sample(n_samples)  # Shape: (n_samples, batch_size, latent_dim)
    
    # Decode samples
    reconstruction_logits = decoder(z_samples)
    
    # Prior distribution
    prior = tfd.Independent(tfd.Normal(loc=0., scale=1.), reinterpreted_batch_ndims=1)
    
    # Compute ELBO components
    # 1. Reconstruction term: E_q[log p(x|z)]
    reconstruction_dist = tfd.Independent(
        tfd.Bernoulli(logits=reconstruction_logits), 
        reinterpreted_batch_ndims=1
    )
    reconstruction_term = tf.reduce_mean(reconstruction_dist.log_prob(data))
    
    # 2. KL divergence: E_q[log q(z|x) - log p(z)]
    posterior_log_prob = tf.reduce_mean(posterior.log_prob(z_samples))
    prior_log_prob = tf.reduce_mean(prior.log_prob(z_samples))
    kl_divergence = posterior_log_prob - prior_log_prob
    
    # ELBO = Reconstruction - KL
    elbo = reconstruction_term - kl_divergence
    
    return {
        'elbo': elbo,
        'reconstruction': reconstruction_term,
        'kl_divergence': kl_divergence
    }

### 8.2 **Advanced Sampling Techniques**

In [None]:
class ImportanceSampler:
    """
    Importance sampling using proposal distribution
    """
    def __init__(self, target_distribution, proposal_distribution):
        self.target = target_distribution
        self.proposal = proposal_distribution
    
    def sample(self, n_samples):
        """
        Generate importance-weighted samples
        """
        # Sample from proposal
        samples = self.proposal.sample(n_samples)
        
        # Compute importance weights
        target_log_prob = self.target.log_prob(samples)
        proposal_log_prob = self.proposal.log_prob(samples)
        log_weights = target_log_prob - proposal_log_prob
        
        # Normalize weights
        weights = tf.nn.softmax(log_weights)
        
        return {
            'samples': samples,
            'weights': weights,
            'log_weights': log_weights,
            'effective_sample_size': 1.0 / tf.reduce_sum(weights**2)
        }
    
    def estimate_expectation(self, function, n_samples):
        """
        Estimate E_target[function(X)] using importance sampling
        """
        result = self.sample(n_samples)
        function_values = function(result['samples'])
        
        # Weighted average
        expectation = tf.reduce_sum(result['weights'] * function_values)
        
        return expectation, result['effective_sample_size']

# Example usage
target = tfd.Normal(loc=5., scale=1.)
proposal = tfd.Normal(loc=4., scale=2.)
sampler = ImportanceSampler(target, proposal)

# Estimate E[X^2] under target distribution
def square_function(x):
    return x**2

expectation, ess = sampler.estimate_expectation(square_function, n_samples=1000)
print(f"Estimated E[X^2]: {expectation}")
print(f"Effective sample size: {ess}")

Estimated E[X^2]: 26.391807556152344
Effective sample size: 568.8297119140625


## 9. **Complete Reference Guide**

### 9.1 **Sampling Operations Reference**

In [81]:
# === BASIC SAMPLING ===
dist = tfd.Normal(loc=0., scale=1.)
single = dist.sample()                          # Shape: ()
batch = dist.sample(100)                        # Shape: (100,)
seeded = dist.sample(100, seed=42)              # Reproducible

print("Single sample:", single.numpy())
print("Batch samples:", batch.numpy())
print("Seeded samples:", seeded.numpy())

Single sample: -0.5220753
Batch samples: [-0.1365969  -1.218801   -0.9215488   2.7218962   1.5125786   1.4108888
  1.6950028   1.4028981   0.7990797  -0.65465575  1.0263169   0.8830985
  0.04019995 -1.0114911   0.25672132 -0.0426529   0.17955922 -0.99527174
 -0.601428    1.4637355   1.6804155  -0.70276546 -0.9713734  -1.0440786
 -0.86533374  0.02161358  0.7340682   2.1229079   0.7858814  -0.5298925
 -1.0749215   1.1408054   0.3555776  -0.08745395  0.00865405 -1.32541
  0.39447325  0.56572866  0.25718474  0.77076024  0.8543685  -1.1066352
 -0.62045103  0.98014045 -0.30651346  1.415835   -1.933607    0.85357845
  1.3835907   0.25166965 -1.2337768   1.040264   -2.7738826  -0.6555695
  2.380446   -1.4398961  -0.21188603 -0.09578023  0.5760158   0.41028446
 -0.93355393  1.5430417  -2.0077636   0.45634276 -1.5455538  -0.21137801
  0.60432875  1.9447913  -1.3108555  -1.3553311  -1.6904684   0.6628786
  0.64199805  0.30658808  2.322921   -1.2522532   0.9507786   1.1831026
  1.0265836  -0.67858

In [82]:
# === BATCH SAMPLING ===
batch_dist = tfd.Normal(loc=[0., 1.], scale=[1., 2.])
batch_samples = batch_dist.sample(50)           # Shape: (50, 2)

print("Batch samples shape:", batch_samples.shape)

Batch samples shape: (50, 2)


In [83]:
# === MULTIVARIATE SAMPLING ===
mv_dist = tfd.MultivariateNormalDiag(loc=[0., 1.], scale_diag=[1., 2.])
mv_samples = mv_dist.sample(30)                 # Shape: (30, 2)
print("Multivariate samples shape:", mv_samples.shape)

Multivariate samples shape: (30, 2)


In [84]:
# === INDEPENDENT SAMPLING ===
indep_dist = tfd.Independent(tfd.Normal(loc=tf.zeros(5), scale=tf.ones(5)))
indep_samples = indep_dist.sample(20)           # Shape: (20, 5)
print("Multivariate samples shape:", mv_samples.shape)

Multivariate samples shape: (30, 2)


In [85]:
# === COMPLEX SAMPLING ===
complex_samples = dist.sample([10, 3, 2])       # Shape: (10, 3, 2) + batch_shape + event_shape
print("Complex samples shape:", complex_samples.shape)

Complex samples shape: (10, 3, 2)


### 9.2 **Log Probability Operations Reference**

In [86]:
# Creating distribution
dist = tfd.Normal(loc=0., scale=1.)
print(dist.log_prob(0.))                        # Scalar input
print(dist.log_prob([0., 1., 2.]))              # Vector input

tf.Tensor(-0.9189385, shape=(), dtype=float32)
tf.Tensor([-0.9189385 -1.4189385 -2.9189386], shape=(3,), dtype=float32)


In [87]:
# === BASIC LOG PROB ===
scalar_log_prob = dist.log_prob(0.5)            # Shape: ()
vector_log_prob = dist.log_prob([0., 1., 2.])   # Shape: (3,)

print("Scalar log prob:", scalar_log_prob.numpy())
print("Vector log prob:", vector_log_prob.numpy())

Scalar log prob: -1.0439385
Vector log prob: [-0.9189385 -1.4189385 -2.9189386]


In [88]:
# === BATCH LOG PROB ===
batch_dist = tfd.Normal(loc=[0., 1.], scale=[1., 2.])
batch_log_prob = batch_dist.log_prob([0.5, 1.5])  

print("Batch log prob:", batch_log_prob.numpy())

Batch log prob: [-1.0439385 -1.6433357]


In [89]:
# === MULTIVARIATE LOG PROB ===
mv_dist = tfd.MultivariateNormalDiag(loc=[0., 1.], scale_diag=[1., 2.])
mv_log_prob = mv_dist.log_prob([0.5, 1.5])      # Shape: () - joint probability

print("Multivariate log prob:", mv_log_prob.numpy())

Multivariate log prob: -2.6872742


In [90]:
# === INDEPENDENT LOG PROB ===
indep_dist = tfd.Independent(tfd.Normal(loc=tf.zeros(3), scale=tf.ones(3)))
indep_log_prob = indep_dist.log_prob([0., 1., 2.])  # Shape: () - sum of components
print("Independent log prob:", indep_log_prob.numpy())

Independent log prob: [-0.9189385 -1.4189385 -2.9189386]


In [91]:
# === BROADCASTING LOG PROB ===
broadcast_log_prob = dist.log_prob(tf.ones((5, 3, 2)))  # Broadcasting rules apply
print("Broadcast log prob shape:", broadcast_log_prob.shape)  

Broadcast log prob shape: (5, 3, 2)


### 9.3 **Shape Rules for Sampling and Log Prob**

| **Operation** | **Distribution Shape** | **Input/Sample Shape** | **Output Shape** |
|---------------|------------------------|------------------------|------------------|
| **`sample()`** | `batch=[B], event=[E]` | `sample=[S]` | `[S] + [B] + [E]` |
| **`sample(n)`** | `batch=[B], event=[E]` | `n` (scalar) | `[n] + [B] + [E]` |
| **`log_prob(x)`** | `batch=[B], event=[E]` | `x.shape` | `broadcast([x.shape[:-len(E)]], [B])` |

## 💡 **Final Notes**

- **The Sampling-Inference Cycle**: Modern probabilistic ML alternates between **sampling** (generating data, exploring parameter space) and **log_prob evaluation** (measuring likelihood, computing gradients). Master both to unlock the full power of probabilistic modeling.

- **Memory vs Accuracy Trade-off**: Large sample sizes give better Monte Carlo estimates but consume more memory. Use chunked sampling for large-scale problems.

- **Log-Space Numerics**: Always use `log_prob()` instead of `prob()` for numerical stability. Probabilities can underflow, but log probabilities remain stable.

- **Shape Debugging Strategy**: When shapes don't work, trace through: `sample_shape + batch_shape + event_shape` for sampling, and broadcasting rules for log_prob inputs.

- **Broadcasting is Your Friend**: TFP's broadcasting enables vectorized operations across complex hierarchies. Learn it well—it's essential for efficient probabilistic computation.

- **Gradient Flow**: `log_prob()` is differentiable through its inputs and distribution parameters. This enables gradient-based optimization of probabilistic models—the foundation of modern deep learning.

- Sampling and log probabilities are the **computational engines** of probabilistic machine learning. Every advanced technique—from VAEs to Bayesian neural networks to MCMC—builds on these fundamental operations. Master them, and you unlock the full potential of uncertainty quantification and probabilistic reasoning! 🚀


> Once comfortable with sampling and log probabilities, explore bijectors for distribution transformations, custom training loops with `tf.GradientTape`, and advanced inference techniques like Hamiltonian Monte Carlo to build cutting-edge probabilistic systems.
