# Linear Algebra Properties

Definition:

1. Vector Space Properties:
- Closure under addition: u + v ∈ V
- Closure under scalar multiplication: cv ∈ V
- Associativity: (u + v) + w = u + (v + w)
- Commutativity: u + v = v + u
- Zero vector existence: 0 + v = v
- Inverse existence: v + (-v) = 0

2. Matrix Properties:
- Associativity: (AB)C = A(BC)
- Distributivity: A(B + C) = AB + AC
- Non-commutativity: AB ≠ BA generally
- Transpose: (AB)^T = B^TA^T
- Inverse: (AB)⁻¹ = B⁻¹A⁻¹

3. Determinant Properties:
- det(AB) = det(A)det(B)
- det(A^T) = det(A)
- det(A⁻¹) = 1/det(A)
- det(cA) = cⁿdet(A)

4. Trace Properties:
- tr(A + B) = tr(A) + tr(B)
- tr(AB) = tr(BA)
- tr(cA) = c tr(A)
- tr(A^T) = tr(A)

5. Rank Properties:
- rank(AB) ≤ min(rank(A), rank(B))
- rank(A) = rank(A^T)
- rank(A) ≤ min(m,n) for A ∈ ℝᵐˣⁿ
- rank(A) = number of non-zero singular values

Applications in ML & Data Science:
- Algorithm design
- Optimization problems
- Dimensionality analysis
- Matrix factorizations
- Feature engineering
- Model validation
- System analysis
- Numerical stability

# Linear Algebra Operations

**Basic Matrix Operations:**

- Matrix multiplication (mm, matmul, @)
- Vector operations (mv, dot, cross)
- Element-wise operations


**Matrix Properties:**

- Determinant and rank
- Trace and diagonal
- Various matrix norms


**Matrix Decompositions:**

- SVD
- Eigendecomposition
- LU, QR, and Cholesky decompositions


**System Solving:**

- Linear system solutions
- Matrix inverse and pseudoinverse


**Advanced Operations:**

- Matrix functions (exp, power)
- Special matrix operations
- Batch operations


**Special Matrices:**

- Creation of identity, zero, random matrices
- Specialized matrix types

In [22]:
import torch
import torch.linalg as LA

$$ \text{A} = 
\begin{pmatrix}
1 & 2 & 3\\
a & b & c
\end{pmatrix}
$$ 

$$ \text{B} = 
\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
$$ 

In [24]:
A = torch.tensor([[1,2,3,4], [10,20,30, 40]])
A

tensor([[ 1,  2,  3,  4],
        [10, 20, 30, 40]])

In [37]:
B = torch.randint(low=1,high=100 ,size=(4,4))
B

tensor([[21, 36, 90,  4],
        [96, 15, 14, 59],
        [95, 62, 55, 94],
        [90, 39, 49, 10]])

In [51]:
V = torch.randn(4)
V

tensor([-0.6367, -0.1909, -0.4705,  0.7097])

In [52]:
W = torch.randn(4)
W

tensor([ 0.7690, -1.8104,  0.1360,  0.7058])

## Matrix Multiplication

Definition:

User-Friendly:
Matrix multiplication combines two grids of numbers using systematic rules to create a new grid. Like a recipe that combines ingredients in specific proportions, each element in the result comes from multiplying and adding corresponding elements from both matrices. For example, in image processing, one matrix might represent an image while another represents a transformation to blur or sharpen it.

Analytical:
Matrix multiplication is a bilinear operation that combines two matrices A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ to produce C ∈ ℝᵐˣᵖ. The operation preserves both row and column relationships while combining elements through a series of dot products. The compatibility requires the number of columns in the first matrix to equal the number of rows in the second.

Mathematical:
For matrices A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ, their product C = AB has elements:

$$C_{ij} = \sum_{k=1}^n A_{ik}B_{kj}$$

where i = 1,...,m and j = 1,...,p. Each element $C_{ij}$ is the dot product of row i from A with column j from B.

Mathematical Notation:
A = $[A_{ij}]$ ∈ ℝᵐˣⁿ 

B = $[B_{ij}]$ ∈ ℝⁿˣᵖ

C = AB = $[C_{ij}]$ = $[\sum_{k=1}^n A_{ik}B_{kj}]$ ∈ ℝᵐˣᵖ

Numerical Example:

A = $$\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$$
B = $$\begin{bmatrix} 1 & 5 \\ 2 & 6 \end{bmatrix}$$

Computing C = AB:
$C_{11} = (2 × 1) + (3 × 2) = 8$
$C_{12} = (2 × 5) + (3 × 6) = 28$
$C_{21} = (1 × 1) + (4 × 2) = 9$
$C_{22} = (1 × 5) + (4 × 6) = 29$

C = $\begin{bmatrix} 8 & 28 \\ 9 & 29 \end{bmatrix}$

Applications:
- Linear transformations composition
- Computer graphics (3D transformations)
- Neural network layers
- Markov chains
- Data compression
- Signal processing

In [None]:
# Matrix multiplication
C = torch.mm(A, B)
C

tensor([[ 858,  408,  479,  444],
        [8580, 4080, 4790, 4440]])

In [43]:

C = A @ B                      # Using @ operator
C

tensor([[ 858,  408,  479,  444],
        [8580, 4080, 4790, 4440]])

In [44]:
C =  torch.matmul(A, B)         # General matrix multiplication
C

tensor([[ 858,  408,  479,  444],
        [8580, 4080, 4790, 4440]])

# Matrix Vector Multiplication

User-Friendly Definition:
Matrix-vector multiplication is a powerful data transformation technique that works like a customized data processing system. Just as Instagram filters transform photos using preset rules, a matrix acts as a set of mathematical rules that systematically transforms input numbers (vector) into output numbers. Each element in the output is created by combining input numbers in specific ways defined by the matrix. For example, if you have data points representing temperature and humidity, the matrix multiplication could combine them in different proportions to give you a "comfort index." This operation is fundamental in many applications, from image processing to data analysis, where we need to transform data in consistent and meaningful ways.

Analytical Definition:
Matrix-vector multiplication represents a systematic linear transformation that converts vectors from one space to another while maintaining crucial mathematical properties. It decomposes complex transformations into elementary operations of scaling and combining, encoded within the matrix elements. The transformation preserves the fundamental properties of linearity: additivity and homogeneity. This means that transforming the sum of two vectors equals the sum of their individual transformations, and scaling a vector before transformation is equivalent to scaling after transformation. These properties make matrix-vector multiplication an essential tool in linear algebra, enabling applications from coordinate transformations to solving systems of equations. The operation's power lies in its ability to represent complex linear operations in a compact, computationally efficient form.

Mathematical Definition:
Matrix-vector multiplication is a precisely defined operation between a matrix A ∈ ℝᵐˣⁿ and a vector x ∈ ℝⁿ that produces a vector b ∈ ℝᵐ. Each component of the resulting vector is computed as the inner product of a row of the matrix with the input vector, following the formula $b_i = \sum_{j=1}^n A_{ij}x_j$ for i = 1,...,m. This operation can be viewed as a linear transformation from n-dimensional space to m-dimensional space, where each matrix row represents the coefficients of a linear combination. The operation satisfies the distributive property over vector addition and associative property with scalar multiplication, making it a fundamental building block of linear algebra. The dimensionality of the output vector is determined by the number of rows in the matrix, while the input vector's dimension must match the number of matrix columns.

Mathematical Notation:
Given:
A = $[A_{ij}]$ ∈ ℝᵐˣⁿ

x = $\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ ∈ ℝⁿ

b = Ax = $\begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix}$ = $\begin{bmatrix} \sum_{j=1}^n A_{1j}x_j \\ \sum_{j=1}^n A_{2j}x_j \\ \vdots \\ \sum_{j=1}^n A_{mj}x_j \end
{bmatrix}$ ∈ ℝᵐ

Numerical Example:

A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$

x = $\begin{bmatrix} 1 \\ 2 \end{bmatrix}$

Computing b = Ax:
$b_1 = (2 \times 1) + (3 \times 2) = 8$

$b_2 = (1 \times 1) + (4 \times 2) = 9$

b = $\begin{bmatrix} 8 \\ 9 \end{bmatrix}$

Applications:
- Linear transformations (rotation, scaling, shearing)
- Feature extraction in ML
- Solving linear systems
- Neural network layers




Let me explain matrix-vector multiplication with LaTeX notation and an example.

For a matrix A (m×n) and vector x (n×1), their product b = Ax (m×1) is:

$b_i = \sum_{j=1}^n A_{ij}x_j$

Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$ (2×2)

x = $\begin{bmatrix} 1 \\ 2 \end{bmatrix}$ (2×1)

Computing b = Ax:

$b_1 = (2 \times 1) + (3 \times 2) = 2 + 6 = 8$
$b_2 = (1 \times 1) + (4 \times 2) = 1 + 8 = 9$

Result:
b = $\begin{bmatrix} 8 \\ 9 \end{bmatrix}$

In PyTorch, this can be done using `torch.mv(A, x)` or `torch.matmul(A, x)`.

In [57]:

# Vector operations
w = torch.mv(A.to(torch.float), W)             # Matrix-vector multiplication
w

tensor([0.3793, 3.7929])

# Vector Dot Product

Definition:

User-Friendly:
Vector dot product is like calculating a weighted sum, where we multiply corresponding elements from two vectors and add them together. For example, if you have monthly expenses and quantities, dot product gives total cost by multiplying each item's price with its quantity and summing them up.

Analytical:
The dot product is a fundamental operation that maps two vectors to a scalar, measuring their alignment and magnitude interaction. It's a bilinear form that gives geometric information about vectors' relative orientations and lengths while preserving key algebraic properties.

Mathematical:
For vectors x, y ∈ ℝⁿ, their dot product is:
$x · y = \sum_{i=1}^n x_iy_i = x_1y_1 + x_2y_2 + ... + x_ny_n$
Geometrically: $x · y = ||x|| ||y|| \cos θ$, where θ is the angle between vectors.

Mathematical Notation:
x = $\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$
y = $\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}$
x · y = $\sum_{i=1}^n x_iy_i$ = $x^Ty$ = scalar

Numerical Example:
x = $\begin{bmatrix} 2 \\ 3 \end{bmatrix}$
y = $\begin{bmatrix} 1 \\ 4 \end{bmatrix}$

Computing x · y:
= (2 × 1) + (3 × 4)
= 2 + 12
= 14

Applications:
- Calculating work in physics (force · displacement)
- Finding vector projections
- Computing angles between vectors
- Neural network layer computations
- Similarity measures in ML

In [54]:
dot = torch.dot(V, W)          # Vector dot product
dot


tensor(0.2928)

Title: Vector Cross Product

Definition:

User-Friendly:
Cross product creates a new vector perpendicular to two input vectors, with magnitude related to both vectors' lengths and the sine of angle between them. Like finding a line perpendicular to a plane defined by two vectors.

Analytical:
A binary operation on three-dimensional vectors that produces a vector orthogonal to both inputs. The result's magnitude represents the area of the parallelogram formed by the vectors, with direction determined by the right-hand rule.

Mathematical:
For vectors a, b ∈ ℝ³, their cross product a × b is:
$(a × b) = ||a|| ||b|| \sin(θ)\hat{n}$
where θ is angle between vectors and $\hat{n}$ is unit vector perpendicular to both.

Mathematical Notation:
a = $\begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix}$
b = $\begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}$
a × b = $\begin{bmatrix} a_2b_3 - a_3b_2 \\ a_3b_1 - a_1b_3 \\ a_1b_2 - a_2b_1 \end{bmatrix}$

Numerical Example:
a = $\begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}$
b = $\begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}$

Computing a × b:
= $\begin{bmatrix} (3×1) - (4×2) \\ (4×1) - (2×1) \\ (2×2) - (3×1) \end{bmatrix}$
= $\begin{bmatrix} -5 \\ 2 \\ 1 \end{bmatrix}$


Applications in Data Science & ML:

- Feature extraction & dimensionality reduction
- Principal Component Analysis (PCA)
- Anomaly detection
- Pattern recognition in high-dimensional data
- Neural network optimizations
- Geometric deep learning
- Manifold learning
- Hyperplane calculations in SVM
- Data augmentation
- Feature space transformations

In [59]:
cross = torch.cross(V, V)      # Vector cross product
cross

RuntimeError: no dimension of size 3 in input

## Outer Product

Definition:

User-Friendly:
Outer product multiplies each element of one vector with every element of another vector, creating a matrix. Like creating a multiplication table between two lists of numbers.

Analytical:
A tensor product of two vectors that produces a matrix, representing all possible pairwise multiplications between elements of the input vectors. Results in a rank-1 matrix that captures directional information.

Mathematical:
For vectors x ∈ ℝᵐ and y ∈ ℝⁿ, their outer product is:
$(x ⊗ y)_{ij} = x_iy_j$ resulting in matrix A ∈ ℝᵐˣⁿ

Mathematical Notation:
x = $\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}$
y = $\begin{bmatrix} y_1 & y_2 & ... & y_n \end{bmatrix}$
x ⊗ y = $\begin{bmatrix} x_1y_1 & x_1y_2 & ... & x_1y_n \\ x_2y_1 & x_2y_2 & ... & x_2y_n \\ \vdots & \vdots & \ddots & \vdots \\ x_my_1 & x_my_2 & ... & x_my_n \end{bmatrix}$

Numerical Example:
x = $\begin{bmatrix} 2 \\ 3 \end{bmatrix}$
y = $\begin{bmatrix} 1 & 4 \end{bmatrix}$

Result = $\begin{bmatrix} 2×1 & 2×4 \\ 3×1 & 3×4 \end{bmatrix}$ = $\begin{bmatrix} 2 & 8 \\ 3 & 12 \end{bmatrix}$

Applications in ML & Data Science:
- Attention mechanisms in transformers
- Feature interaction modeling
- Kernel methods
- Embedding space calculations
- Covariance matrix estimation
- Distance matrix computation
- Word embeddings
- Recommendation systems

In [60]:
outer = torch.outer(V, W)  
outer# Outer product

tensor([[-0.4896,  1.1527, -0.0866, -0.4494],
        [-0.1468,  0.3455, -0.0260, -0.1347],
        [-0.3618,  0.8517, -0.0640, -0.3320],
        [ 0.5458, -1.2848,  0.0965,  0.5009]])

Title: Hadamard (Element-wise) Product

Definition:

User-Friendly:
Element-wise product multiplies corresponding elements of two matrices of same size, like comparing item-by-item price changes between two time periods.

Analytical:
A binary operation that produces a matrix of same dimensions as inputs, where each element is product of corresponding elements from input matrices, preserving element-wise relationships.

Mathematical:
For matrices A, B ∈ ℝᵐˣⁿ, their Hadamard product is:
(A ⊙ B)ᵢⱼ = Aᵢⱼ × Bᵢⱼ

Mathematical Notation:
A = [aᵢⱼ], B = [bᵢⱼ] ∈ ℝᵐˣⁿ
A ⊙ B = [aᵢⱼ × bᵢⱼ] ∈ ℝᵐˣⁿ

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$
B = $\begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}$

A ⊙ B = $\begin{bmatrix} 2×1 & 3×2 \\ 1×3 & 4×1 \end{bmatrix}$ = $\begin{bmatrix} 2 & 6 \\ 3 & 4 \end{bmatrix}$

Applications in ML & Data Science:
- Neural network gates (LSTM, GRU)
- Attention mechanisms
- Feature interactions
- Gradient computations
- Mask operations
- Element-wise weighting
- Signal processing
- Data normalization

Title: Matrix Addition and Subtraction

Definition:

User-Friendly:
Matrix addition/subtraction combines/reduces corresponding elements of same-sized matrices, like combining/comparing expenses across different departments in monthly budgets.

Analytical:
Binary operations on matrices of identical dimensions, preserving element-wise operations while maintaining distributive and associative properties.

Mathematical:
For matrices A, B ∈ ℝᵐˣⁿ, their sum/difference is:
(A ± B)ᵢⱼ = Aᵢⱼ ± Bᵢⱼ

Mathematical Notation:
A = [aᵢⱼ], B = [bᵢⱼ] ∈ ℝᵐˣⁿ
A ± B = [aᵢⱼ ± bᵢⱼ] ∈ ℝᵐˣⁿ

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$
B = $\begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}$

Addition:
A + B = $\begin{bmatrix} 3 & 5 \\ 4 & 5 \end{bmatrix}$

Subtraction:
A - B = $\begin{bmatrix} 1 & 1 \\ -2 & 3 \end{bmatrix}$

Applications in ML & Data Science:
- Residual networks
- Gradient operations
- Feature engineering
- State updates in RNNs
- Error computation
- Layer normalization
- Feature differencing
- Anomaly detection

In [5]:

# Element-wise operations
hadamard = A * B               # Hadamard (element-wise) product
sum_matrices = A + B           # Matrix addition
diff_matrices = A - B          # Matrix subtraction

Title: Matrix Determinant

Definition:

User-Friendly:
The determinant is a scalar value calculated from a square matrix that indicates important matrix properties like invertibility and scaling factor for area/volume transformation.

Analytical:
A scalar function that maps square matrices to real numbers, providing crucial information about linear transformations including orientation, scaling, and singularity.

Mathematical:
For a 2×2 matrix A = $\begin{bmatrix} a & b \\ c & d \end{bmatrix}$
det(A) = ad - bc

For n×n matrices, recursively calculated using cofactor expansion:
det(A) = $\sum_{j=1}^n a_{1j}(-1)^{1+j}M_{1j}$

Mathematical Notation:
For matrix A:
det(A) or |A|

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$

det(A) = (2 × 4) - (3 × 1)
= 8 - 3
= 5

Applications in ML & Data Science:
- Feature importance assessment
- Principal Component Analysis
- Matrix invertibility checking
- Volume calculations
- Multivariate distributions
- Change of variables
- Eigenvalue analysis
- Feature selection

Title: Matrix Rank

Definition:

User-Friendly:
Matrix rank represents the number of linearly independent rows or columns, indicating how much unique information the matrix contains.

Analytical:
A measure of the dimensional span of a matrix's row/column space, equal to the number of non-zero singular values or dimension of matrix's range.

Mathematical:
For matrix A ∈ ℝᵐˣⁿ, rank(A) is:
- Number of linearly independent rows/columns
- Dimension of column/row space
- Number of non-zero singular values

Mathematical Notation:
rank(A) = r ≤ min(m,n)
where r is number of linearly independent rows/columns

Numerical Example:
A = $\begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 3 & 5 & 7 \end{bmatrix}$

rank(A) = 2 because:
- Row 2 = 2 × Row 1 
- All rows are linear combinations of Rows 1 and 3

Applications in ML & Data Science:
- Dimensionality reduction
- Feature selection
- Matrix factorization
- Compression techniques
- Low-rank approximation
- Principal Component Analysis
- Linear dependency analysis
- Subspace learning

In [6]:

# ===== Matrix Properties =====
# Determinant and rank
det = torch.det(A)             # Matrix determinant
det = LA.det(A)               # Alternative using torch.linalg
rank = LA.matrix_rank(A)      # Matrix rank

Title: Matrix Trace

Definition:

User-Friendly:
The trace is the sum of elements on main diagonal (top-left to bottom-right), representing a quick way to measure total self-interaction in a square matrix.

Analytical:
A linear operator mapping square matrices to scalars, equal to sum of eigenvalues and invariant under similarity transformations.

Mathematical:
For matrix A ∈ ℝⁿˣⁿ:
tr(A) = $\sum_{i=1}^n a_{ii}$ = $\sum_{i=1}^n \lambda_i$
where λᵢ are eigenvalues

Mathematical Notation:
tr(A) or $\sum_{i=1}^n a_{ii}$

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$

tr(A) = 2 + 4 = 6

Applications in ML & Data Science:
- Loss function computation
- Network architecture optimization
- Kernel methods
- Covariance analysis
- Model regularization
- Dimensionality reduction
- Feature selection
- Gradient calculations

Title: Eigenvalues

Definition:

User-Friendly:
Eigenvalues are special scaling factors that tell us how a matrix stretches or compresses vectors in particular directions.

Analytical:
Scalars that characterize linear transformation's action on special vectors (eigenvectors), representing invariant scaling factors under transformation.

Mathematical:
For matrix A ∈ ℝⁿˣⁿ, λ is an eigenvalue if:
Av = λv
for some non-zero vector v (eigenvector)
det(A - λI) = 0

Mathematical Notation:
Characteristic equation:
det(A - λI) = 0
where I is identity matrix

Numerical Example:
A = $\begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$

det(A - λI) = $\begin{vmatrix} 2-λ & 1 \\ 1 & 2-λ \end{vmatrix}$ = 0
(2-λ)(2-λ) - 1 = 0
λ = 3 or 1

Applications in ML & Data Science:
- Principal Component Analysis
- Dimensionality reduction
- Spectral clustering
- Covariance analysis
- Feature importance
- Graph analysis
- Stability analysis
- Network embedding

Title: Similarity Transformations

Definition:

User-Friendly:
A transformation that preserves matrix's essential properties (eigenvalues) while changing its representation, like viewing same object from different angles.

Analytical:
A transformation of form B = P⁻¹AP where P is invertible, preserving eigenvalues and other spectral properties while potentially simplifying matrix structure.

Mathematical:
For matrix A, similarity transformation produces B:
B = P⁻¹AP
where P is invertible

Mathematical Notation:
A ~ B ⟺ ∃P: B = P⁻¹AP
where ~ denotes "similar to"

Numerical Example:
A = $\begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$
P = $\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}$

B = P⁻¹AP = $\begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}$

Applications in ML & Data Science:
- Diagonalization
- Feature transformation
- Basis change
- Dimensionality reduction
- Spectral methods
- Matrix decomposition
- Kernel methods
- Pattern recognition

In [7]:

# Trace and diagonal
tr = torch.trace(A)            # Matrix trace
diag = torch.diag(A)           # Extract diagonal
diag_mat = torch.diag(v)       # Create diagonal matrix from vector

Title: Frobenius Norm

Definition:

User-Friendly:
A measure of matrix size that considers all elements, calculated by taking square root of sum of squared elements, like measuring total energy in a system.

Analytical:
Matrix norm that treats matrix as vector of its elements, providing Euclidean-like measure of matrix magnitude while being computationally tractable.

Mathematical:
For matrix A ∈ ℝᵐˣⁿ:
||A||ᶠ = $\sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2}$ = $\sqrt{tr(A^TA)}$

Mathematical Notation:
||A||ᶠ or ||A||₂
= $\sqrt{\sum_{i,j} |a_{ij}|^2}$
= $\sqrt{\sum_{i=1}^r σᵢ²}$ (using singular values)

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$

||A||ᶠ = $\sqrt{2² + 3² + 1² + 4²}$
= $\sqrt{4 + 9 + 1 + 16}$
= $\sqrt{30}$ ≈ 5.48

Applications in ML & Data Science:
- Loss function design
- Model regularization
- Matrix approximation
- Error measurement
- Weight initialization
- Gradient clipping
- Model comparison
- Convergence analysis

Title: Nuclear Norm

Definition:

User-Friendly:
Nuclear norm is sum of matrix's singular values, providing measure of matrix's "complexity" or "effective rank", useful in finding simple yet effective solutions.

Analytical:
A matrix norm equal to sum of singular values, serving as convex approximation of matrix rank and promoting low-rank solutions in optimization problems.

Mathematical:
For matrix A ∈ ℝᵐˣⁿ:
||A||* = $\sum_{i=1}^r σᵢ$
where σᵢ are singular values and r = rank(A)
= tr($\sqrt{A^TA}$)

Mathematical Notation:
||A||* = $\sum_{i=1}^r σᵢ$
where σᵢ are singular values

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$

Singular values: σ₁ ≈ 5.47, σ₂ ≈ 0.37
||A||* = 5.47 + 0.37 ≈ 5.84

Applications in ML & Data Science:
- Matrix completion
- Recommendation systems
- Dimensionality reduction
- Low-rank approximation
- Feature selection
- Collaborative filtering
- Robust PCA
- Signal recovery

Title: Spectral Norm

Definition:

User-Friendly:
Spectral norm represents maximum "stretching" factor of matrix transformation, measuring largest possible amplification when matrix acts on a unit vector.

Analytical:
Maximum singular value of matrix, equivalent to square root of largest eigenvalue of A^TA, representing operator's maximum gain.

Mathematical:
For matrix A ∈ ℝᵐˣⁿ:
||A||₂ = $\sqrt{λ_{max}(A^TA)}$ = σ₁
= max{||Ax||₂ : ||x||₂ = 1}
where σ₁ is largest singular value

Mathematical Notation:
||A||₂ or σ₁(A)
= max{||Ax||₂/||x||₂ : x ≠ 0}

Numerical Example:
A = $\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}$

Singular values: σ₁ ≈ 5.47, σ₂ ≈ 0.37
||A||₂ = σ₁ ≈ 5.47

Applications in ML & Data Science:
- Neural network stability
- Gradient clipping
- Model regularization
- Lipschitz continuity
- Robustness analysis
- Weight normalization
- Convergence analysis
- Network architecture design

Title: Vector Norm

Definition:

User-Friendly:
A measure of vector's "size" or "length", like measuring distance or magnitude. Different norms measure size in different ways, similar to measuring distance using different paths.

Analytical:
A function mapping vectors to non-negative real numbers, satisfying properties of non-negativity, homogeneity, and triangle inequality.

Mathematical:
For vector x ∈ ℝⁿ:
L₁ norm: ||x||₁ = $\sum_{i=1}^n |x_i|$
L₂ norm: ||x||₂ = $\sqrt{\sum_{i=1}^n x_i^2}$
L∞ norm: ||x||∞ = max|xᵢ|

Mathematical Notation:
General p-norm:
||x||ₚ = $(\sum_{i=1}^n |x_i|^p)^{1/p}$

Numerical Example:
x = $\begin{bmatrix} 3 \\ 4 \end{bmatrix}$

L₁ norm: |3| + |4| = 7
L₂ norm: $\sqrt{3² + 4²}$ = 5
L∞ norm: max(|3|, |4|) = 4

Applications in ML & Data Science:
- Feature scaling
- Distance metrics
- Loss functions
- Regularization
- Outlier detection
- Gradient clipping
- Robustness analysis
- Model comparison

In [8]:

# Norms
norm_frobenius = LA.norm(A, 'fro')    # Frobenius norm
norm_nuclear = LA.norm(A, 'nuc')      # Nuclear norm
norm_spectral = LA.norm(A, 2)         # Spectral norm (largest singular value)
norm_vector = LA.vector_norm(v)       # Vector norm

Title: Matrix Decomposition

Definition:

User-Friendly:
Breaking down a complex matrix into simpler component matrices, like factoring a number into primes. Each decomposition reveals different aspects of original matrix.

Analytical:
Factorization of a matrix into a product of matrices with special properties, enabling efficient computation and revealing underlying structure of linear transformations.

Mathematical:
Common decompositions:
- SVD: A = UΣV^T
- LU: A = LU
- QR: A = QR
- Eigendecomposition: A = PDP⁻¹ (if A diagonalizable)

Mathematical Notation:
SVD: A = UΣV^T where
U: orthogonal left singular vectors
Σ: diagonal singular values
V^T: orthogonal right singular vectors

Numerical Example:
A = $\begin{bmatrix} 4 & 0 \\ 3 & -5 \end{bmatrix}$

LU Decomposition:
L = $\begin{bmatrix} 1 & 0 \\ 3/4 & 1 \end{bmatrix}$
U = $\begin{bmatrix} 4 & 0 \\ 0 & -5 \end{bmatrix}$

Applications in ML & Data Science:
- Dimensionality reduction (PCA)
- Feature extraction
- Data compression
- Recommendation systems
- Image processing
- Noise reduction
- Matrix completion
- Spectral clustering

Title: Singular Value Decomposition (SVD)

Definition:

User-Friendly:
SVD breaks down any matrix into three components: directions of maximum variation (U), strength of these variations (Σ), and transformed directions (V^T), like decomposing complex motion into principal movements.

Analytical:
A fundamental matrix factorization that decomposes matrix into product of orthogonal and diagonal matrices, revealing geometric structure of linear transformation.

Mathematical:
For matrix A ∈ ℝᵐˣⁿ:
A = UΣV^T where
- U ∈ ℝᵐˣᵐ (orthogonal)
- Σ ∈ ℝᵐˣⁿ (diagonal)
- V^T ∈ ℝⁿˣⁿ (orthogonal)

Mathematical Notation:
A = UΣV^T
where σᵢ (diagonal elements of Σ) are singular values
U: left singular vectors
V: right singular vectors

Numerical Example:
A = $\begin{bmatrix} 4 & 0 \\ 3 & -5 \end{bmatrix}$

SVD:
U ≈ $\begin{bmatrix} 0.78 & -0.62 \\ 0.62 & 0.78 \end{bmatrix}$
Σ ≈ $\begin{bmatrix} 6.71 & 0 \\ 0 & 3.46 \end{bmatrix}$
V^T ≈ $\begin{bmatrix} 0.60 & -0.80 \\ -0.80 & -0.60 \end{bmatrix}$

Applications in ML & Data Science:
- Principal Component Analysis
- Data compression
- Dimensionality reduction
- Matrix approximation
- Recommendation systems
- Image processing
- Feature extraction
- Noise reduction

In [9]:

# ===== Matrix Decompositions =====
# Singular Value Decomposition (SVD)
U, S, Vh = LA.svd(A)           # Full SVD
U, S, Vh = LA.svd(A, full_matrices=False)  # Economy SVD

Title: Eigendecomposition

Definition:

User-Friendly:
Factorizes a square matrix into eigenvectors (directions of pure scaling/rotation) and eigenvalues (amount of scaling), revealing fundamental behavior of linear transformations.

Analytical:
A decomposition of square matrix expressing it as product of eigenvectors and diagonal matrix of eigenvalues, applicable when matrix has complete set of eigenvectors.

Mathematical:
For diagonalizable matrix A ∈ ℝⁿˣⁿ:
A = PDP⁻¹ where
- P: eigenvector matrix
- D: diagonal matrix of eigenvalues
- P⁻¹: inverse of eigenvector matrix

Mathematical Notation:
A = PDP⁻¹
where D = diag(λ₁,...,λₙ)
and P = [v₁|v₂|...|vₙ]
λᵢ: eigenvalues
vᵢ: eigenvectors

Numerical Example:
A = $\begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$

Eigendecomposition:
P = $\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}$
D = $\begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}$
P⁻¹ = $\begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix}$

Applications in ML & Data Science:
- Principal Component Analysis
- Spectral clustering
- Covariance analysis
- Dimensionality reduction
- Feature transformation
- Signal processing
- Graph analysis
- Dynamic systems

Title: Hermitian/Symmetric Eigendecomposition

Definition:

User-Friendly:
Special decomposition for symmetric matrices where eigenvalues are real and eigenvectors are orthogonal, simplifying many computations like finding principal directions in data.

Analytical:
Eigendecomposition of Hermitian/symmetric matrices featuring real eigenvalues and orthogonal eigenvectors, guaranteeing diagonalization with orthogonal matrix.

Mathematical:
For symmetric matrix A ∈ ℝⁿˣⁿ:
A = QΛQ^T where
- Q: orthogonal matrix of eigenvectors
- Λ: diagonal matrix of real eigenvalues
- Q^T = Q⁻¹ (orthogonality)

Mathematical Notation:
A = QΛQ^T
where Λ = diag(λ₁,...,λₙ)
Q = [q₁|q₂|...|qₙ]
⟨qᵢ,qⱼ⟩ = δᵢⱼ (orthonormality)

Numerical Example:
A = $\begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$

Q = $\begin{bmatrix} 1/\sqrt{2} & 1/\sqrt{2} \\ 1/\sqrt{2} & -1/\sqrt{2} \end{bmatrix}$
Λ = $\begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}$

Applications in ML & Data Science:
- Covariance matrix analysis
- Principal Component Analysis
- Kernel methods
- Spectral clustering
- Dimensionality reduction
- Signal processing
- Feature extraction
- Correlation analysis

In [10]:

# Eigendecomposition
eigenvals, eigenvecs = LA.eig(A)      # Eigendecomposition
eigenvals, eigenvecs = LA.eigh(A)     # Hermitian/symmetric eigendecomposition

Title: LU Decomposition

Definition:

User-Friendly:
Breaks matrix into product of lower triangular (L) and upper triangular (U) matrices, like solving equations step by step, making complex calculations simpler.

Analytical:
Factorization of matrix into lower and upper triangular matrices, useful for solving linear systems efficiently and avoiding repeated eliminations.

Mathematical:
For matrix A ∈ ℝⁿˣⁿ:
A = LU where
- L: lower triangular (lᵢⱼ = 0 for i < j)
- U: upper triangular (uᵢⱼ = 0 for i > j)
- L typically has diagonal of 1s

Mathematical Notation:
A = LU
L = [lᵢⱼ], lᵢⱼ = 0 for i < j
U = [uᵢⱼ], uᵢⱼ = 0 for i > j

Numerical Example:
A = $\begin{bmatrix} 4 & 3 \\ 6 & 3 \end{bmatrix}$

L = $\begin{bmatrix} 1 & 0 \\ 3/2 & 1 \end{bmatrix}$
U = $\begin{bmatrix} 4 & 3 \\ 0 & -1.5 \end{bmatrix}$

Applications in ML & Data Science:
- Linear system solving
- Matrix inversion
- Determinant calculation
- Numerical stability analysis
- Network analysis
- Time series modeling
- System identification
- Pattern recognition

In [11]:

# LU Decomposition
LU, pivots = LA.lu_factor(A)          # LU factorization
P, L, U = LA.lu(A)                    # Complete LU decomposition

Title: QR Decomposition

Definition:

User-Friendly:
Decomposes matrix into orthogonal matrix (Q) and upper triangular matrix (R), like breaking movement into perpendicular directions and their magnitudes.

Analytical:
Factorization expressing matrix as product of orthogonal matrix and upper triangular matrix, useful for solving least squares and finding orthonormal bases.

Mathematical:
For matrix A ∈ ℝᵐˣⁿ:
A = QR where
- Q: orthogonal matrix (Q^TQ = I)
- R: upper triangular matrix
- Q^T = Q⁻¹ (orthogonality)

Mathematical Notation:
A = QR
Q ∈ ℝᵐˣᵐ (orthogonal)
R ∈ ℝᵐˣⁿ (upper triangular)
Q^TQ = I

Numerical Example:
A = $\begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}$

Q ≈ $\begin{bmatrix} 0.707 & 0.707 \\ 0.707 & -0.707 \end{bmatrix}$
R ≈ $\begin{bmatrix} 1.414 & 0.707 \\ 0 & 0.707 \end{bmatrix}$

Applications in ML & Data Science:
- Least squares problems
- Linear regression
- Feature orthogonalization
- Eigenvalue algorithms
- Signal processing
- Matrix factorization
- Subspace tracking
- Optimization problems

In [12]:

# QR Decomposition
Q, R = LA.qr(A)                       # QR decomposition

Title: Cholesky Decomposition

Definition:

User-Friendly:
Special decomposition for symmetric positive-definite matrices into product of lower triangular matrix and its transpose, like finding "square root" of matrix.

Analytical:
Unique factorization of symmetric positive-definite matrix into product of lower triangular matrix and its transpose, computationally efficient and numerically stable.

Mathematical:
For symmetric positive-definite A ∈ ℝⁿˣⁿ:
A = LL^T where
- L: lower triangular matrix
- L^T: transpose of L (upper triangular)
- A must be symmetric and positive definite

Mathematical Notation:
A = LL^T
L ∈ ℝⁿˣⁿ (lower triangular)
A = A^T and x^TAx > 0 for x ≠ 0

Numerical Example:
A = $\begin{bmatrix} 4 & 2 \\ 2 & 3 \end{bmatrix}$

L = $\begin{bmatrix} 2 & 0 \\ 1 & \sqrt{2} \end{bmatrix}$
L^T = $\begin{bmatrix} 2 & 1 \\ 0 & \sqrt{2} \end{bmatrix}$

Applications in ML & Data Science:
- Covariance matrix analysis
- Monte Carlo simulation
- Linear regression
- Optimization algorithms
- Gaussian processes
- Mahalanobis distance
- Kalman filtering
- Portfolio optimization

In [13]:

# Cholesky Decomposition
L = LA.cholesky(A @ A.T)              # Cholesky decomposition

In [14]:

# ===== System Solving =====
# Solve linear system Ax = b
b = torch.randn(3)
x = LA.solve(A, b)                    # Solve Ax = b
x = LA.lstsq(A, b)                    # Least squares solution

In [15]:

# Matrix inverse
inv_A = LA.inv(A)                     # Matrix inverse
pinv_A = LA.pinv(A)                   # Pseudoinverse

In [16]:

# ===== Advanced Operations =====
# Matrix functions
exp_A = LA.matrix_exp(A)              # Matrix exponential
power_A = LA.matrix_power(A, 3)       # Matrix power

In [17]:
# Matrix products
multi_mm = torch.chain_matmul(A, B, A)  # Chain matrix multiplication

  return _VF.chain_matmul(matrices)  # type: ignore[attr-defined]


In [18]:
# Triangular operations
upper_tri = torch.triu(A)             # Extract upper triangular
lower_tri = torch.tril(A)             # Extract lower triangular

In [19]:

# ===== Additional Properties =====
def matrix_properties(A):
    """Compute various matrix properties"""
    props = {
        'is_symmetric': torch.allclose(A, A.T),
        'is_positive_definite': is_positive_definite(A),
        'condition_number': LA.cond(A),
        'rank': LA.matrix_rank(A),
        'determinant': LA.det(A),
        'trace': torch.trace(A)
    }
    return props

In [20]:

def is_positive_definite(A):
    """Check if matrix is positive definite"""
    try:
        LA.cholesky(A)
        return True
    except RuntimeError:
        return False

In [21]:
# ===== Special Matrices =====
def create_special_matrices(n):
    """Create various special matrices"""
    matrices = {
        'identity': torch.eye(n),
        'zeros': torch.zeros(n, n),
        'ones': torch.ones(n, n),
        'random': torch.randn(n, n),
        'orthogonal': LA.qr(torch.randn(n, n))[0],
        'symmetric': lambda x: (x + x.T)/2,
        'diagonal': torch.diag(torch.randn(n))
    }
    return matrices

# ===== Performance Optimized Operations =====
# Batch operations
batch_A = torch.randn(10, 3, 3)  # Batch of matrices
batch_b = torch.randn(10, 3)     # Batch of vectors

# Batch matrix multiplication
batch_C = torch.bmm(batch_A, batch_A.transpose(1, 2))  # Batch matrix-matrix product
batch_x = LA.solve(batch_A, batch_b.unsqueeze(-1))     # Batch solve