<div style="text-align: center; font-size: 32px; font-weight: bold;">
    Deep Learning Matrix Computations in PyTorch
</div>

Deep Learning heavily relies on matrix operations for tasks like neural network training, backpropagation, and optimization. Let's explore the core matrix computations used in deep learning and how they are implemented in PyTorch.

| **Operation** | **PyTorch Function** | **Use Case** |
|--------------|----------------------|-------------|
| **Linear Transformation (Y = WX + b)** | `X @ W.T + b` | Neural Networks |
| **Activation Functions (ReLU, Sigmoid, Tanh)** | `torch.relu(x), torch.sigmoid(x)` | Deep Learning Models |
| **Softmax (Converts Scores to Probabilities)** | `torch.nn.functional.softmax(x)` | Classification |
| **Gradient Computation (Backpropagation)** | `tensor.backward()` | Training Models |
| **Loss Function (MSE, Cross Entropy)** | `torch.nn.functional.mse_loss(y_pred, y_true)` | Training Optimization |
| **CNN Convolution** | `torch.nn.functional.conv2d()` | Feature Extraction |
| **Singular Value Decomposition (SVD)** | `torch.svd(A)` | Dimensionality Reduction |


### Essential Matrix Operations in Deep Learning
Deep learning models work with tensors (multi-dimensional matrices) and use the following operations:

- Matrix Multiplication $\rightarrow$ Used in linear layers.
- Element-wise Operations $\rightarrow$ Used in activation functions.
- Gradient Computation $\rightarrow$ Used in backpropagation.
- Batch Operations $\rightarrow$ Used in mini-batch training.

### 1. Linear Transformations (Fully Connected Layers)
Neural networks consist of linear transformations where:
$$Y=WX+b$$
- X = Input Matrix (features)
- W = Weight Matrix (learnable parameters)
- b = Bias Vector (learnable parameters)

Used in: Fully Connected (Dense) Layers, Convolutional Layers.

In [None]:
import torch

# Define input tensor (features)
X = torch.tensor([[1.0, 2.0], [3.0, 4.0]])

# Define weight matrix
W = torch.tensor([[0.5, -1.0], [1.5, 2.0]])

# Define bias vector
b = torch.tensor([0.1, -0.2])

# Compute the linear transformation Y = WX + b
Y = X @ W.T + b  # @ is matrix multiplication
print("Linear Transformation Output:\n", Y)


### Activation Functions (Element-wise Computation)
Non-linear activations are applied element-wise to introduce non-linearity.

#### Common Activation Functions

1. ReLU (Rectified Linear Unit)
$$𝑓(x)=max(0,x)$$

2. Sigmoid
$$f(x) = \frac{1}{1+e^x}$$


4. Tanh
$$f(x) = \frac{e^{x}-e^{-x}}{e^{x} + e^{-x}}$$

- ReLU removes negative values.
- Sigmoid squashes inputs to (0,1).
- Tanh squashes to (-1,1).

Used in: Activation layers in deep neural networks.

In [None]:
# Define a tensor
Z = torch.tensor([[1.0, -2.0, 3.0], [-1.0, 0.5, -0.5]])

# Apply activation functions
relu_output = torch.relu(Z)
sigmoid_output = torch.sigmoid(Z)
tanh_output = torch.tanh(Z)

print("ReLU Output:\n", relu_output)
print("Sigmoid Output:\n", sigmoid_output)
print("Tanh Output:\n", tanh_output)


In [None]:
# Visualizing Activation Functions
import torch
import matplotlib.pyplot as plt
import numpy as np

# Generate values from -5 to 5
x = torch.linspace(-5, 5, 100)

# Compute activations
relu_y = torch.relu(x)
sigmoid_y = torch.sigmoid(x)
tanh_y = torch.tanh(x)

# Plot activation functions
plt.figure(figsize=(10, 6))

plt.plot(x, relu_y, label="ReLU", linewidth=2)
plt.plot(x, sigmoid_y, label="Sigmoid", linewidth=2)
plt.plot(x, tanh_y, label="Tanh", linewidth=2)

plt.axhline(0, color='black', linewidth=0.5, linestyle="dashed")
plt.axvline(0, color='black', linewidth=0.5, linestyle="dashed")

plt.xlabel("Input")
plt.ylabel("Output")
plt.title("Activation Functions")
plt.legend()
plt.grid()
plt.show()


### Softmax Function (Probability Distribution)
Softmax converts raw scores into probabilities: 
$$ Softmax(z_i) = \frac{e^{z_i}}{\sum e^{z_i}}$$

- Turns raw scores into probabilities.
- Helps in classification (e.g., ImageNet).
  
Used in: Classification Problems (e.g., ImageNet, NLP Models).

In [None]:
# Define logits (raw scores)
logits = torch.tensor([2.0, 1.0, 0.1])

# Apply softmax
softmax_output = torch.nn.functional.softmax(logits, dim=0)
print("Softmax Output:\n", softmax_output)


In [None]:
# Define logits (raw scores)
logits = torch.tensor([2.0, 1.0, 0.1])
softmax_output = torch.nn.functional.softmax(logits, dim=0)

# Convert to numpy for plotting
probabilities = softmax_output.numpy()

# Plot softmax output
plt.figure(figsize=(6, 4))
plt.bar(["Class 1", "Class 2", "Class 3"], probabilities, color=["blue", "green", "red"])
plt.xlabel("Classes")
plt.ylabel("Probability")
plt.title("Softmax Function Output")
plt.ylim(0, 1)
plt.show()


In [None]:
### Backpropagation (Computing Gradients with Autograd)
In deep learning, we compute gradients using automatic differentiation. \
Used in: Training Neural Networks (Gradient Descent, Backpropagation).

- Shows how gradient descent moves towards the minimum loss.
- Key for training neural networks.

In [None]:
# Define a tensor with requires_grad=True (track gradients)
X = torch.tensor([2.0, 3.0], requires_grad=True)

# Define a simple function: Y = X^2 + 3X
Y = X**2 + 3*X

# Compute gradients (backpropagation)
Y.sum().backward()

# Print gradients (dY/dX)
print("Gradients:\n", X.grad)


In [None]:
# Let's visualize how gradient descent updates a loss function.
# Define a simple quadratic loss function: L(x) = (x-3)^2
def loss_function(x):
    return (x - 3) ** 2

# Gradient of loss function
def gradient(x):
    return 2 * (x - 3)

# Gradient Descent Parameters
x = torch.tensor(10.0, requires_grad=True)  # Initial value
learning_rate = 0.1
history = []

# Perform Gradient Descent
for i in range(20):
    loss = loss_function(x)
    history.append(x.item())  # Store x values
    loss.backward()  # Compute gradients
    with torch.no_grad():  # Update x
        x -= learning_rate * x.grad
        x.grad.zero_()  # Reset gradients

# Plot loss function
x_vals = np.linspace(-1, 10, 100)
y_vals = loss_function(torch.tensor(x_vals))

plt.figure(figsize=(8, 5))
plt.plot(x_vals, y_vals, label="Loss Function", linewidth=2)
plt.scatter(history, loss_function(torch.tensor(history)), color="red", label="Gradient Descent Steps")
plt.xlabel("x")
plt.ylabel("Loss")
plt.title("Gradient Descent Optimization")
plt.legend()
plt.grid()
plt.show()


### Loss Function Computation
Deep learning uses loss functions to measure the difference between predictions and actual labels.
- Example: Mean Squared Error (MSE)
$$ MSE =\frac{1}{N} \sum (y_{true} - y_{pred})^2 $$
Used in: Regression Tasks, Neural Network Training.

In [None]:
# Define predicted and actual labels
y_pred = torch.tensor([3.0, 5.0, 7.0])
y_true = torch.tensor([2.5, 5.0, 8.0])

# Compute Mean Squared Error (MSE)
mse_loss = torch.nn.functional.mse_loss(y_pred, y_true)
print("MSE Loss:", mse_loss.item())


### Matrix Computation for Convolutional Neural Networks (CNNs)
Convolution is implemented using matrix operations.
- Convolution as Matrix Multiplication \
CNNs use the convolution operation, where a kernel (filter) slides over an input to extract features. \
Used in: Image Processing, CNNs, Feature Extraction.

In [None]:
import torch.nn.functional as F

# Define input image (3x3 matrix)
image = torch.tensor([[1.0, 2.0, 3.0], 
                      [4.0, 5.0, 6.0], 
                      [7.0, 8.0, 9.0]])

# Define kernel (filter)
kernel = torch.tensor([[0.0, 1.0, 0.0], 
                       [1.0, -4.0, 1.0], 
                       [0.0, 1.0, 0.0]])

# Perform 2D convolution (cross-correlation)
output = F.conv2d(image.unsqueeze(0).unsqueeze(0), kernel.unsqueeze(0).unsqueeze(0), padding=1)
print("Convolution Output:\n", output)


#### Visualizing Convolution in a CNN
We’ll apply a convolution filter (edge detection) to an image.

- Shows how CNN filters extract features from images.
- Used in object detection and image processing.

In [None]:
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Load image and convert to grayscale
image = Image.open("https://upload.wikimedia.org/wikipedia/commons/thumb/2/2d/Square_200x200.svg/200px-Square_200x200.svg.png").convert("L")
image = np.array(image, dtype=np.float32) / 255.0  # Normalize

# Convert to PyTorch tensor and add batch & channel dimensions
image_tensor = torch.tensor(image).unsqueeze(0).unsqueeze(0)

# Define an edge detection filter (Sobel operator)
sobel_filter = torch.tensor([[-1, -1, -1], 
                             [-1,  8, -1], 
                             [-1, -1, -1]], dtype=torch.float32).unsqueeze(0).unsqueeze(0)

# Apply convolution
edge_detected = F.conv2d(image_tensor, sobel_filter, padding=1)

# Convert output tensor to numpy
edge_image = edge_detected.squeeze().detach().numpy()

# Plot original and edge-detected images
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.imshow(image, cmap="gray")
plt.title("Original Image")
plt.axis("off")

plt.subplot(1, 2, 2)
plt.imshow(edge_image, cmap="gray")
plt.title("Edge Detection (Sobel Filter)")
plt.axis("off")

plt.show()


### Eigenvalues and Singular Value Decomposition (SVD) in Deep Learning
Eigenvalues and SVD are used in dimensionality reduction and neural network compression. \
SVD helps with dimensionality reduction and feature extraction.

Used in: PCA, Model Compression, Feature Reduction.

- Used in PCA (Principal Component Analysis).
- Helps compress neural networks.

In [None]:
# Create a random matrix
A = torch.randn(5, 3)

# Compute SVD
U, S, V = torch.svd(A)
print("Singular Values:\n", S)


In [None]:
# Visualizing Singular Value Decomposition (SVD)
# Create a random matrix
A = torch.randn(10, 10)

# Compute SVD
U, S, V = torch.svd(A)

# Plot singular values
plt.figure(figsize=(8, 5))
plt.plot(S.numpy(), marker="o", linestyle="dashed", color="red", label="Singular Values")
plt.xlabel("Index")
plt.ylabel("Value")
plt.title("Singular Value Decomposition (SVD)")
plt.legend()
plt.grid()
plt.show()
