# Probability & Statistics for Neural Networks

This notebook contains PyTorch examples demonstrating probability and statistics concepts essential for understanding neural networks.

## Table of Contents
1. [Expectation](#expectation)
2. [Variance](#variance)
3. [Softmax](#softmax)
4. [Cross-Entropy Loss](#cross-entropy-loss)

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

## Expectation

**Formula:** $\mathbb{E}[X] = \sum_x x \cdot P(X = x)$

Average value of a random variable.

In [None]:
# Monte Carlo estimation of expectation
def estimate_expectation(samples):
    return torch.mean(samples)

# Expectation in neural network training
data_loader = [(torch.randn(32, 10), torch.randint(0, 2, (32,))) for _ in range(100)]
model = torch.nn.Linear(10, 2)
criterion = torch.nn.CrossEntropyLoss()

# Compute expected loss over dataset
total_loss = 0
num_batches = 0
for x, y in data_loader:
    loss = criterion(model(x), y)
    total_loss += loss.item()
    num_batches += 1

expected_loss = total_loss / num_batches
print(f"Expected loss over dataset: {expected_loss:.4f}")

# Batch normalization uses expectation
x = torch.randn(100, 50)
batch_mean = torch.mean(x, dim=0)  # E[X] per feature
print(f"Feature means: {batch_mean[:5]}")

## Variance

**Formula:** $\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]$

Measures spread of a distribution.

In [None]:
# Variance in weight initialization
def xavier_init(input_dim, output_dim):
    """Xavier initialization maintains variance across layers"""
    variance = 2.0 / (input_dim + output_dim)
    return torch.randn(output_dim, input_dim) * torch.sqrt(torch.tensor(variance))

def he_init(input_dim, output_dim):
    """He initialization for ReLU networks"""
    variance = 2.0 / input_dim
    return torch.randn(output_dim, input_dim) * torch.sqrt(torch.tensor(variance))

# Compare initialization schemes
x = torch.randn(1000, 100)
for i in range(5):
    if i == 0:
        w_xavier = xavier_init(x.shape[1], 100)
        w_he = he_init(x.shape[1], 100)
        w_random = torch.randn(100, x.shape[1]) * 0.1
    else:
        w_xavier = xavier_init(100, 100)
        w_he = he_init(100, 100)
        w_random = torch.randn(100, 100) * 0.1
    
    x_xavier = torch.relu(x @ w_xavier.T)
    x_he = torch.relu(x @ w_he.T)
    x_random = torch.relu(x @ w_random.T)
    
    print(f"Layer {i+1} variance - Xavier: {torch.var(x_xavier):.4f}, He: {torch.var(x_he):.4f}, Random: {torch.var(x_random):.4f}")
    
    x = x_xavier  # Continue with Xavier for next layer

## Softmax

**Formula:** $\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$

Converts real values to probability distribution.

In [None]:
# Multi-class classification with softmax
logits = torch.tensor([2.0, 1.0, 0.1])
probabilities = torch.softmax(logits, dim=0)
print(f"Logits: {logits}")
print(f"Probabilities: {probabilities}")
print(f"Sum: {probabilities.sum()}")

# Temperature effects on softmax
def softmax_with_temperature(logits, temperature=1.0):
    return torch.softmax(logits / temperature, dim=-1)

temps = [0.1, 1.0, 10.0]
for temp in temps:
    probs = softmax_with_temperature(logits, temp)
    entropy = -torch.sum(probs * torch.log(probs + 1e-8))
    print(f"Temperature {temp}: {probs.numpy()} (entropy: {entropy:.3f})")

# Attention mechanism using softmax
query = torch.randn(1, 64)
keys = torch.randn(10, 64)
scores = query @ keys.T  # Attention scores
attention_weights = torch.softmax(scores, dim=-1)
print(f"Attention weights sum: {attention_weights.sum():.6f}")

## Cross-Entropy Loss

**Formula:** $\mathcal{L} = -\sum_i y_i \log(\hat{y}_i)$

Measures difference between predicted and true probability distributions.

In [None]:
# Cross-entropy for classification
predictions = torch.tensor([[0.1, 0.8, 0.1], [0.7, 0.2, 0.1]])  # Predicted probabilities
targets = torch.tensor([1, 0])  # True class indices

ce_loss = torch.nn.functional.cross_entropy(torch.log(predictions), targets)
manual_ce = -torch.mean(torch.log(predictions[range(len(targets)), targets]))

print(f"Cross-entropy loss: {ce_loss:.4f}")
print(f"Manual calculation: {manual_ce:.4f}")

# Confidence and loss relationship
confident_wrong = torch.tensor([[0.05, 0.05, 0.9]])  # Confident but wrong
uncertain = torch.tensor([[0.4, 0.3, 0.3]])         # Uncertain
target = torch.tensor([0])  # True class is 0

loss_confident = torch.nn.functional.cross_entropy(torch.log(confident_wrong), target)
loss_uncertain = torch.nn.functional.cross_entropy(torch.log(uncertain), target)

print(f"Confident wrong prediction loss: {loss_confident:.4f}")
print(f"Uncertain prediction loss: {loss_uncertain:.4f}")
print("Cross-entropy heavily penalizes confident wrong predictions")