## Cholesky Decomposition
Cholesky Decomposition is a method used to decompose a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. It's essentially a square root of a matrix, specifically for positive-definite matrices.

### Resources
- [Explanation](https://www.geeksforgeeks.org/cholesky-decomposition-matrix-decomposition/)
- [Example](https://www.youtube.com/watch?v=NppyUqgQqd0)

### Intuition

Imagine you have a positive-definite matrix `A`. This matrix could represent something like the correlations between different variables in your dataset. Now, you want to "simplify" this matrix while keeping its essential features. This is where Cholesky Decomposition comes in.

Cholesky Decomposition breaks down a positive-definite matrix into a product of a lower triangular matrix and its transpose. The lower triangular matrix is like a simpler "building block" that, when multiplied by its transpose, gives back the original matrix.

Mathematically, if `A` is your positive-definite matrix, Cholesky Decomposition gives you a lower triangular matrix `L` such that `A = LL*`, where `L*` is the conjugate transpose of `L`.

### Use Cases in Machine Learning

1. **Covariance Matrix Decomposition**: Covariance matrices are positive-definite. Cholesky Decomposition allows us to break them down and work with simpler, lower-dimensional matrices, which can make computations more efficient and stable.

2. **Multivariate Normal Distribution**: When generating samples from a multivariate normal distribution, we need to ensure they have the correct covariance structure. Cholesky Decomposition is used to achieve this.

3. **Linear Least Squares for Multiple Outputs**: In multiple-output linear least squares problems, Cholesky Decomposition of the Gram matrix (a matrix representing inner products of vectors) can make the problem easier to solve.

4. **Preconditioning**: In optimization, Cholesky Decomposition can be used to transform a problem into a form that's easier for algorithms like the conjugate gradient method to solve.

5. **Kalman Filters**: Kalman filters are used in systems with linear dynamics to estimate the state of the system from noisy measurements. Cholesky Decomposition is used to update the covariance matrix, which is a key part of the Kalman filter algorithm.

### Intuition

Cholesky Decomposition is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. It is named after André-Louis Cholesky, who found that real, symmetric, positive-definite matrices could be written as the product of a lower triangular matrix and its transpose.

If `A` is a Hermitian, positive-definite matrix, then `A` can be written as `LL*` where `L` is a lower triangular matrix with real and positive diagonal entries, and `L*` is the conjugate transpose of `L`.

### Use Cases in Machine Learning

1. **Covariance Matrix Decomposition**: In statistics and machine learning, the covariance matrix of a dataset is often required to be decomposed for further analysis. Cholesky Decomposition is used here because the covariance matrix is a positive-definite matrix.

2. **Multivariate Normal Distribution**: When generating samples from a multivariate normal distribution, the covariance matrix is decomposed using Cholesky Decomposition to transform independent standard normally distributed random variables to the desired covariance.

3. **Linear Least Squares for Multiple Outputs**: When solving a linear least squares problem for multiple outputs, a Cholesky Decomposition of the Gram matrix is often used.

4. **Preconditioning**: In optimization, Cholesky Decomposition is used as a preconditioner for the conjugate gradient method.

5. **Kalman Filters**: In Kalman Filters, which are used for linear dynamical systems, Cholesky Decomposition is used to update the covariance matrix, which is a key part of the Kalman filter algorithm.


For example, if you're working with a Gaussian (or normal) distribution, generating samples would mean creating data points that are spread out in a way that matches the properties of a Gaussian distribution - a certain mean (the peak of the distribution) and standard deviation (how wide the distribution is).

This is a common task in machine learning and statistics, as it allows us to create synthetic data that behaves like the real-world data we're interested in. This can be useful for testing algorithms, making predictions, and understanding the behavior of different statistical distributions.

In the context of the Cholesky decomposition example, generating samples from a multivariate Gaussian distribution means creating vectors of numbers where the distribution of vectors follows a multivariate Gaussian distribution. This distribution is specified by a mean vector (the average vector) and a covariance matrix (which indicates how each pair of elements in the vectors varies together).

In [2]:
import numpy as np

# Desired mean and covariance
mu = np.array([0, 0])
Sigma = np.array([[1, 0.6], [0.6, 1]])

# Cholesky decomposition of the covariance matrix
L = np.linalg.cholesky(Sigma)

# Generate samples
n_samples = 1000
Z = np.random.normal(size=(n_samples, 2))
X = mu + np.dot(Z, L.T)

# X is now a sample from a Gaussian distribution with mean mu and covariance Sigma

# Eigendecomposition and Diagonolization

### Resourcs
- [Explain](https://builtin.com/data-science/eigendecomposition)
- [Relation to PCA](https://pages.mtu.edu/~shanem/psy5220/daily/Day04/PCA.html#:~:text=Similarly%20V3%20is%20%27average%20of,the%20most%20to%20least%20important.)
- [Great Code Demo](https://youtu.be/oshZQtYAh84?si=0O0nt2K-tqj2RW98)

### Intuition

Eigendecomposition is the process of breaking down a matrix into its constituent parts. It's a kind of 'factorization' where we represent the matrix in terms of its eigenvalues and eigenvectors. 

If `A` is a square matrix, we can write it as `A = PDP^-1`, where `D` is a diagonal matrix containing the eigenvalues of `A`, and `P` is a matrix whose columns are the eigenvectors of `A`. This is the eigendecomposition of `A`.

Diagonalization is a special case of eigendecomposition. A matrix is diagonalizable if it is similar to a diagonal matrix, i.e., if we can find a basis of eigenvectors for the matrix. In other words, a matrix is diagonalizable if we can perform the eigendecomposition `A = PDP^-1` where `P` is invertible.

### Explanation

The power of eigendecomposition and diagonalization lies in simplifying the matrix to make it easier to work with. Diagonal matrices have the property that they are easy to raise to a power or invert, which can be useful in many mathematical contexts.

### Use Cases in Machine Learning

1. **Principal Component Analysis (PCA)**: PCA is a technique used to reduce the dimensionality of data. It works by computing the eigendecomposition of the data's covariance matrix, and using the eigenvectors (principal components) to project the data into a lower-dimensional space.

2. **Spectral Clustering**: Spectral clustering techniques make use of the eigendecomposition of the data's similarity matrix to perform dimensionality reduction before clustering.

3. **Linear Discriminant Analysis (LDA)**: LDA is a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. It involves computing the eigendecomposition of the within-class scatter matrix and the between-class scatter matrix.

4. **Deep Learning**: In deep learning, eigendecomposition is used in the initialization of weights, optimization algorithms (like Adam and RMSProp), and understanding the model's capacity (like the spectral norm of a layer).

## Relationship between Eigendecomposition and the Determinant of a Matrix

The determinant of a square matrix is a special number that can be calculated from its elements. It has two important properties related to eigendecomposition:

1. **Determinant and Eigenvalues**: The determinant of a matrix is equal to the product of its eigenvalues (including their multiplicities). This is true regardless of the basis in which the matrix is expressed. If `A = PDP^-1` is the eigendecomposition of `A`, then `det(A) = det(D)`, where `det` denotes the determinant.

2. **Determinant and Invariance**: The determinant of a matrix is invariant under change of basis. This means that if we perform an eigendecomposition or any other similarity transformation on the matrix, its determinant remains the same.

These properties make the determinant a useful tool in linear algebra and matrix analysis, as it allows us to infer global properties of the matrix from its eigenvalues.
## Relationship between Eigendecomposition and the Trace of a Matrix

The trace of a square matrix is the sum of its diagonal elements. It has two important properties related to eigendecomposition:

1. **Trace and Eigenvalues**: The trace of a matrix is equal to the sum of its eigenvalues (including their multiplicities). This is true regardless of the basis in which the matrix is expressed. If `A = PDP^-1` is the eigendecomposition of `A`, then `Tr(A) = Tr(D)`, where `Tr` denotes the trace.

2. **Trace and Invariance**: The trace of a matrix is invariant under change of basis. This means that if we perform an eigendecomposition or any other similarity transformation on the matrix, its trace remains the same.

These properties make the trace a useful tool in linear algebra and matrix analysis, as it allows us to infer global properties of the matrix from its eigenvalues.
## Relationship between Eigendecomposition and Singular Value Decomposition (SVD)

Eigendecomposition and SVD are both methods of factorizing a matrix, but they are used for different types of matrices and provide different insights.

### Eigendecomposition

Eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way.

If `A` is a square `n x n` matrix, we can write it as `A = PDP^-1`, where `D` is a diagonal matrix containing the eigenvalues of `A`, and `P` is a matrix whose columns are the eigenvectors of `A`.

### Singular Value Decomposition (SVD)

SVD is a factorization of a real or complex matrix. It has many useful applications in signal processing and statistics.

For a given `m x n` matrix `M`, the SVD is written as `M = UΣV*`, where `U` and `V` are unitary matrices and `Σ` is a diagonal matrix containing the singular values of `M`.

### Relationship

The main difference between the two is that SVD can be applied to any `m x n` matrix, whereas eigendecomposition can only be applied to diagonalizable matrices, which are often square.

The columns of `U` in the SVD are actually the eigenvectors of `MM*`, and the columns of `V` are the eigenvectors of `M*M`. The singular values in `Σ` are the square roots of the eigenvalues from `M*M` or `MM*`.

In other words, SVD is a generalization of the eigendecomposition to non-square matrices.

# Singular Value Decomposition

### Resources
- [MUST WATCH](https://www.youtube.com/watch?v=_So8j8T-E1o&list=PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a&index=42)
- [Data Compression with SVD](https://www.youtube.com/watch?v=RscKCtF--NI&list=PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a&index=43)
### Intuition

Singular Value Decomposition (SVD) is a method of decomposing a matrix into three other matrices. If `M` is an `m x n` matrix, the SVD is written as `M = UΣV*`, where:

- `U` is an `m x m` unitary matrix
- `Σ` is an `m x n` diagonal matrix
- `V*` (the conjugate transpose of `V`) is an `n x n` unitary matrix

The diagonal entries of `Σ` are the singular values of `M`. The columns of `U` are the left singular vectors (eigenvectors of `MM*`), and the columns of `V` are the right singular vectors (eigenvectors of `M*M`).

### Explanation

SVD provides a way to identify the orthogonal directions in the input space and the output space. The singular values (diagonal elements of `Σ`) tell us the amount of stretching or scaling that happens in these directions.

### Use Cases in Machine Learning

1. **Principal Component Analysis (PCA)**: PCA can be performed using SVD. The right singular vectors `V` correspond to the principal components, and the singular values in `Σ` correspond to the square root of the eigenvalues of the covariance matrix.

2. **Latent Semantic Analysis (LSA)**: In natural language processing, LSA uses SVD to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.

3. **Image Compression**: SVD can be used for image compression by approximating an image matrix with a low-rank matrix.

4. **Data Science**: SVD is used in data science to make predictions, fill in missing data, and find patterns in data.

## Geometric Intuition of Singular Value Decomposition (SVD)

The SVD of a matrix `M` is a factorization of the form `M = UΣV*`. Geometrically, this factorization represents a series of transformations to a given vector `x` in the input space:

1. **Rotation or Reflection (`V*`)**: The first transformation is a rotation or reflection that aligns the axes of the input space with the principal axes of `M`. This is represented by the matrix `V*`, whose columns are the right singular vectors of `M`.

2. **Scaling (`Σ`)**: The next transformation is a scaling that stretches or shrinks the vector along each of the principal axes. This is represented by the matrix `Σ`, which is a diagonal matrix containing the singular values of `M`. The singular values are the lengths of the semi-axes of the hyperellipse defined by `M`.

3. **Rotation or Reflection (`U`)**: The final transformation is another rotation or reflection that maps the principal axes to the axes of the output space. This is represented by the matrix `U`, whose columns are the left singular vectors of `M`.

In summary, any linear transformation represented by a matrix `M` can be decomposed into a rotation or reflection, a scaling, and another rotation or reflection. This is the geometric interpretation of the SVD.

## Notes for whoever reads this:
I had an easier time watching youtube videos explaining SVD which helped push my understanding. I reccomend going on a youtube haul and reading some papers to advance decomposition understanding.