# Resources
- 3blue1brown
    - [3-d linear Transformations](https://youtu.be/rHLEWRxRGiM)
    - [Inverse Matrices, column space, and null space](https://youtu.be/uQhTuRlWMxw)
    - [Nonsquare Matrices as transformations between dimensions](https://youtu.be/v8VSDg_WQlA)
    - [Cross products in the light of linear Transformations](https://youtu.be/BaM7OCEm3G0)
    - [Eigenvectors and Eigenvalues](https://youtu.be/PFDu9oVAE-g)
    - [Quick trick for computing eigenvalues](https://youtu.be/e50Bj7jn9IQ)
    - [Abstract vector spaces](https://youtu.be/TgKwz5Ikpc8)

# Linear Mappings

### Definition
A linear mapping (also known as a linear transformation or linear function) is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication.

### Intuition
The intuition behind a linear mapping is that it's a way to "transform" vectors from one space to another while preserving the "structure" of the space in terms of vector addition and scalar multiplication. This means that straight lines remain straight, and the origin remains fixed.

### Explanation
Formally, a function `f: V -> W` between two vector spaces `V` and `W` is a linear mapping if for every two vectors `v1, v2` in `V` and every scalar `c`, the following two conditions hold:
1. `f(v1 + v2) = f(v1) + f(v2)` (preserves vector addition)
2. `f(c * v1) = c * f(v1)` (preserves scalar multiplication)

### Example
An example of a linear mapping is the function `f: R^2 -> R^2` defined by `f([x, y]) = [2x, 3y]`. This function doubles the first component of any vector and triples the second component, which is a linear transformation.

### Properties
1. **Matrix Representation:** Every linear mapping can be represented by a matrix. The action of the linear mapping on a vector is given by matrix multiplication.
2. **Preservation of Linear Combinations:** Linear mappings preserve linear combinations. This means that if `v = c1*v1 + c2*v2`, then `f(v) = c1*f(v1) + c2*f(v2)`.

### Linear Mapping ML Applications
1. **Data Transformation:** Linear mappings are often used to transform data in machine learning. For example, Principal Component Analysis (PCA) uses a linear mapping to transform the original data into a new coordinate system in which the basis vectors are the principal components of the data.

2. **Feature Scaling:** Feature scaling, such as standardization and normalization, can be viewed as applying a linear mapping to each feature vector to scale its values to a certain range or to have a certain mean and standard deviation.

3. **Linear Regression:** In linear regression, the relationship between the predictors and the response is modeled as a linear mapping. The coefficients in the linear regression model define this mapping.

4. **Neural Networks:** In a neural network without activation functions, each layer applies a linear mapping to its inputs to produce its outputs. The weights of the network define these linear mappings.

5. **Support Vector Machines (SVMs):** In SVMs, the data is often linearly mapped to a higher-dimensional space where it is linearly separable.

6. **Eigen-decomposition and Singular Value Decomposition (SVD):** These matrix decompositions, which are used in many machine learning algorithms, involve finding a basis in which a certain linear mapping (represented by a matrix) is particularly simple (diagonal).

Note to self!! The power of linear mappings in machine learning comes from their simplicity and the fact that they preserve the structure of the data. However, real-world data is often not linearly structured, which is why many machine learning algorithms, like neural networks, combine linear mappings with non-linear activation functions.

## Injective, Surjective, and Bijective Linear Mappings

### Definitions

- **Injective (One-to-One) Linear Mapping:** A linear mapping `f: V -> W` is injective if different vectors in `V` always map to different vectors in `W`. In other words, if `f(v1) = f(v2)`, then `v1` must equal `v2`.

- **Surjective (Onto) Linear Mapping:** A linear mapping `f: V -> W` is surjective if every vector in `W` is the image of at least one vector in `V`. In other words, for every `w` in `W`, there exists a `v` in `V` such that `f(v) = w`.

- **Bijective (One-to-One and Onto) Linear Mapping:** A linear mapping `f: V -> W` is bijective if it is both injective and surjective. This means that `f` is a one-to-one correspondence between `V` and `W`.

### Intuition

These concepts relate to how vectors in one space map to vectors in another space. An injective mapping ensures that distinct vectors remain distinct when transformed, a surjective mapping ensures that the entire target space is covered, and a bijective mapping does both.

### Example

Consider the linear mapping `f: R -> R` defined by `f(x) = 2x`. This mapping is both injective and surjective, and therefore bijective. Different real numbers map to different real numbers, and every real number is the image of some real number, so the mapping is one-to-one and onto.

### Importance in Machine Learning, Neural Networks, and AI

Understanding these concepts is important in machine learning and AI because they relate to the properties of the functions that models learn:

- **Injective Mappings:** These are important in tasks where we need to preserve distinctness, such as in embeddings or encodings.
- **Surjective Mappings:** These are important in tasks where we need to cover the entire output space, such as in generative models.
- **Bijective Mappings:** These are important in tasks where we need both properties, such as in autoencoders, where the encoder is typically an injective mapping and the decoder is a surjective mapping.

## Homomorphism, Isomorphism, Endomorphism, and Automorphism

### Definitions

- **Homomorphism:** A homomorphism is a map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces), that preserves the operations of the structures. In the context of linear algebra, a linear map is a homomorphism of vector spaces.

- **Isomorphism:** An isomorphism is a homomorphism that has an inverse, i.e., there is a map in the opposite direction that undoes the operation of the original map. In terms of linear maps, an isomorphism is a bijective linear map.

- **Endomorphism:** An endomorphism is a homomorphism from a structure to itself. In the context of linear algebra, an endomorphism is a linear map from a vector space to itself.

- **Automorphism:** An automorphism is an isomorphism from a structure to itself. In terms of linear maps, an automorphism is a bijective linear map from a vector space to itself.

### Intuition

These concepts relate to the types of structure-preserving transformations we can have between (or within) algebraic structures. Homomorphisms and isomorphisms concern the preservation of structure between different structures, while endomorphisms and automorphisms concern transformations within a single structure.

### Example

Consider the vector space R^2 and the linear map `f: R^2 -> R^2` defined by `f([x, y]) = [2x, 3y]`. This map is an endomorphism of R^2 because it maps R^2 to itself. If we also had a map `g: R^2 -> R^2` such that `g([x, y]) = [x/2, y/3]`, then `f` and `g` would be inverses of each other, and `f` would be an automorphism of R^2.

### Importance in Machine Learning, Neural Networks, and AI

Understanding these concepts is important in machine learning and AI because they relate to the properties of the functions that models learn:

- **Homomorphisms and Isomorphisms:** These concepts are important in understanding the structure-preserving transformations that machine learning models can learn, and in understanding the relationships between different models or different model parameters.

- **Endomorphisms and Automorphisms:** These concepts are important in understanding transformations that a model applies to its input space (for example, in the hidden layers of a neural network), and in understanding the invariances of a model.

## Matrix Representation of Linear Mappings

### Definition
Every linear mapping between finite-dimensional vector spaces can be represented by a matrix. The matrix representation of a linear mapping is a convenient way to express and manipulate the mapping.

### Intuition
The intuition behind the matrix representation of a linear mapping is that each column of the matrix tells us where the corresponding basis vector of the domain ends up in the codomain.

### Explanation
Given a linear mapping `f: V -> W` and bases for `V` and `W`, the matrix representation of `f` is the matrix `A` where the `i`-th column of `A` is the coordinates of `f(v_i)` in the basis of `W`, where `v_i` is the `i`-th basis vector of `V`. The action of the linear mapping on a vector `v` in `V` is then given by the matrix-vector multiplication `Av`.

### Example
Consider the linear mapping `f: R^2 -> R^2` defined by `f([x, y]) = [2x, 3y]`. The matrix representation of `f` in the standard basis is the matrix `[[2, 0], [0, 3]]`. This matrix doubles the first component of any vector and triples the second component, which is exactly what `f` does.

### Properties
1. **Change of Basis:** The matrix representation of a linear mapping changes if we change the bases of the vector spaces. However, the matrices representing the same linear mapping in different bases are related by a similarity transformation.
2. **Composition and Inverse:** The matrix representation of the composition of two linear mappings is the product of the matrices representing the mappings, and the matrix representing the inverse of a linear mapping is the inverse of the matrix representing the mapping (if it exists).


## Transformation Matrix

### Definition
A transformation matrix is a specific matrix that is used to perform a linear transformation on a vector in a vector space. 

### Intuition
The intuition behind a transformation matrix is that it encodes how every point in the space is transformed. Each column of the matrix represents the new coordinates of the original basis vectors after the transformation.

### Explanation
Given a linear transformation `T: V -> W` and bases for `V` and `W`, the transformation matrix `A` of `T` is the matrix where the `i`-th column of `A` is the coordinates of `T(v_i)` in the basis of `W`, where `v_i` is the `i`-th basis vector of `V`. The action of the linear transformation on a vector `v` in `V` is then given by the matrix-vector multiplication `Av`.

### Example
Consider the linear transformation `T: R^2 -> R^2` defined by `T([x, y]) = [2x, 3y]`. The transformation matrix of `T` in the standard basis is the matrix `[[2, 0], [0, 3]]`. This matrix doubles the first component of any vector and triples the second component, which is exactly what `T` does.

### Properties
1. **Change of Basis:** The transformation matrix of a linear transformation changes if we change the bases of the vector spaces. However, the matrices representing the same linear transformation in different bases are related by a similarity transformation.
2. **Composition and Inverse:** The transformation matrix of the composition of two linear transformations is the product of the matrices representing the transformations, and the transformation matrix of the inverse of a linear transformation is the inverse of the matrix representing the transformation (if it exists).

## Linear Transformations of Sets of Vectors

### Definition
A linear transformation is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication. 

### Types of Linear Transformations
There are several types of linear transformations that have special properties:

1. **Identity Transformation:** The identity transformation leaves every vector unchanged. It's represented by the identity matrix.

2. **Scaling Transformation:** A scaling transformation multiplies all vectors by a scalar. If the scalar is different for each dimension, we get a transformation that can stretch or shrink vectors along each axis.

3. **Rotation Transformation:** A rotation transformation rotates all vectors around the origin by a certain angle.

4. **Shear Transformation:** A shear transformation displaces each point in a fixed direction, by an amount proportional to its signed distance from a line parallel to that direction.

5. **Reflection Transformation:** A reflection transformation reflects all vectors across a subspace.

### Importance
Linear transformations are fundamental in linear algebra and have many applications:

1. **Changing Representations:** Linear transformations allow us to change the representation of a vector, which can simplify problem solving. For example, we can use a rotation transformation to rotate the coordinate system so that a linear equation becomes simpler.

2. **Data Transformation:** In machine learning, we often use linear transformations to transform data, for example to normalize it, reduce its dimensionality (PCA), or project it onto a different subspace (LDA).

3. **Modeling Transformations:** In computer graphics and robotics, linear transformations are used to model the transformations of objects in the world.

4. **Linear Systems:** The solutions to linear systems of equations can be interpreted as the result of a linear transformation, which gives us a geometric way to understand these systems.

5. **Neural Networks:** Each layer in a neural network (ignoring the activation function) is a linear transformation of the data.


## Basis Change

### Definition
A basis change, also known as a change of basis, is a transformation that changes the basis of a vector space to a different set of basis vectors.

### Intuition
The intuition behind a basis change is that it allows us to express the same vectors in a different "language" or coordinate system. This can simplify calculations or provide a different perspective on the data.

### Explanation
Given a vector space `V` and two bases `B` and `C` for `V`, a basis change from `B` to `C` is a linear transformation that takes a vector expressed in the `B` coordinates and expresses it in the `C` coordinates. This transformation can be represented by a matrix that is constructed from the coordinates of the `C` basis vectors in the `B` coordinates.

### Example
Consider the vector space `R^2` with the standard basis `B = {[1, 0], [0, 1]}` and a new basis `C = {[2, 1], [1, 1]}`. The basis change from `B` to `C` is represented by the matrix `[[2, 1], [1, 1]]`. This matrix transforms a vector `v` expressed in the `B` coordinates to the `C` coordinates.

### Properties
1. **Invertibility:** The matrix representing a basis change is always invertible, and its inverse represents the basis change in the opposite direction.
2. **Composition:** The matrix representing the composition of two basis changes is the product of the matrices representing the basis changes.

# Image and Kernel

## Image and Kernel (Null Space) of a Linear Transformation

### Definition
The **image** (also known as the range) of a linear transformation is the set of all vectors that can be reached by applying the transformation to some vector in the domain. 

The **kernel** (also known as the null space) of a linear transformation is the set of all vectors in the domain that are mapped to the zero vector in the codomain.

### Intuition
The image of a transformation can be thought of as the "output" of the transformation, while the kernel can be thought of as the "blind spots" of the transformation.

### Explanation
Given a linear transformation `T: V -> W`, the image of `T` is the set `{T(v) | v in V}`, and the kernel of `T` is the set `{v in V | T(v) = 0}`.

### Example
Consider the linear transformation `T: R^2 -> R^2` defined by `T([x, y]) = [2x, 0]`. The image of `T` is the x-axis, and the kernel of `T` is the y-axis.

### Properties
1. **Dimension Theorem:** The dimension of the domain of a linear transformation is equal to the dimension of the image plus the dimension of the kernel. This is known as the Rank-Nullity Theorem.
2. **Injectivity and Surjectivity:** A linear transformation is injective (one-to-one) if and only if its kernel is {0}, and it is surjective (onto) if and only if its image is the entire codomain.

### Relationship between Image and Dimension of the Kernel

The relationship between the dimension of the image and the dimension of the kernel of a linear transformation is given by the Rank-Nullity Theorem, which states that the dimension of the domain of the transformation (also known as the rank) is equal to the dimension of the image (also known as the range) plus the dimension of the kernel (also known as the nullity). 

In mathematical terms, if `T: V -> W` is a linear transformation, then:

`dim(V) = dim(Im(T)) + dim(Ker(T))`

where `dim(V)` is the dimension of the domain, `dim(Im(T))` is the dimension of the image, and `dim(Ker(T))` is the dimension of the kernel.

This theorem is fundamental in linear algebra and has many applications, for example in solving systems of linear equations, analyzing the structure of linear transformations, and studying the properties of matrices.

## Importance of Null Space and Column Space

### Null Space (Kernel)
The null space of a matrix (or a linear transformation) is the set of all vectors that are mapped to the zero vector. It has several important applications:

1. **Solving Linear Systems:** The null space gives the solutions to the homogeneous system `Ax = 0`. This is useful for understanding the solutions to a system of linear equations.

2. **Determining Injectivity:** A linear transformation is injective (one-to-one) if and only if its null space contains only the zero vector. This is useful for determining whether a transformation has an inverse.

3. **Machine Learning and Data Analysis:** In machine learning, understanding the null space can help identify features that do not contribute to a prediction, allowing for dimensionality reduction.

### Column Space (Image)
The column space of a matrix (or the image of a linear transformation) is the set of all possible output vectors. It also has several important applications:

1. **Solving Linear Systems:** The column space gives the possible output values for `Ax = b`. If `b` is in the column space, then the system has a solution.

2. **Determining Surjectivity:** A linear transformation is surjective (onto) if and only if its column space is the entire codomain. This is useful for determining whether every output can be reached from some input.

3. **Machine Learning and Data Analysis:** In machine learning, the column space can represent the feature space that the model can capture. Understanding this space can help in feature engineering and model selection.

### Relationship Between Dimension of Null Space and Dimension of Column Space
The relationship between the dimension of the null space (kernel) and the dimension of the column space (image) of a matrix is given by the Rank-Nullity Theorem. This theorem states that for any linear transformation `T` from a finite-dimensional vector space `V` to another vector space `W`, the dimension of `V` (the domain of `T`) is equal to the dimension of the image of `T` (the rank) plus the dimension of the kernel of `T` (the nullity).

In terms of a matrix `A`, this can be written as:

`dim(V) = rank(A) + nullity(A)`

where `dim(V)` is the dimension of the vector space `V`, `rank(A)` is the dimension of the column space of `A`, and `nullity(A)` is the dimension of the null space of `A`.

This theorem is fundamental in linear algebra and has many applications, such as determining the number of solutions to a system of linear equations, analyzing the structure of linear transformations, and studying the properties of matrices.

## Rank-Nullity Theorem

### Definition
The Rank-Nullity Theorem states that for any linear transformation `T` from a finite-dimensional vector space `V` to another vector space `W`, the dimension of `V` (the domain of `T`) is equal to the dimension of the image of `T` (the rank) plus the dimension of the kernel of `T` (the nullity).

### Intuition
The intuition behind the Rank-Nullity Theorem is that it provides a balance between the "output" of a linear transformation (the image) and its "blind spots" (the kernel). The total "size" of the domain is split between these two components.

### Explanation
Given a linear transformation `T: V -> W`, the Rank-Nullity Theorem can be written as:

`dim(V) = dim(Im(T)) + dim(Ker(T))`

where `dim(V)` is the dimension of the domain, `dim(Im(T))` is the dimension of the image (rank), and `dim(Ker(T))` is the dimension of the kernel (nullity).

### Example
Consider the linear transformation `T: R^3 -> R^2` defined by `T([x, y, z]) = [x, y]`. The image of `T` is `R^2` (rank = 2), and the kernel of `T` is the z-axis (nullity = 1). According to the Rank-Nullity Theorem, the dimension of the domain is 2 + 1 = 3, which is indeed the case.

# Afine Spaces
### Definition
An affine space is a geometric structure that generalizes some of the properties of Euclidean spaces in such a way that these are independent of the concepts of distance and measure of angles, keeping only the properties related to parallelism and ratio of lengths for parallel line segments.

In simpler terms, an affine space is a set of points that is closed under the operations of vector addition and scalar multiplication. However, unlike a vector space, it does not have a natural origin (a special "zero" point).

### Intuition
The intuition behind affine spaces is that they allow us to study geometric properties that are preserved under affine transformations (which include translations, scaling, and rotations). For example, in an affine space, we can talk about lines, planes, and volumes, and we can say that two lines are parallel or that two planes are the same, without reference to any origin or coordinate system.

### Example
A common example of an affine space is the Euclidean plane. Even though we often choose an origin and axes in order to do calculations, the geometric properties of the plane (such as the parallelism of lines) do not depend on this choice.

## Affine Subspaces

### Definition
An affine subspace is a subset of an affine space that is itself an affine space. It can be thought of as the result of applying an affine transformation (such as translation) to a linear subspace.

### Intuition
The intuition behind affine subspaces is that they preserve the structure of a linear subspace while allowing for translations. This means that, unlike linear subspaces, affine subspaces do not have to pass through the origin.

### Example
In a 3-dimensional affine space (like the space we live in), a plane that does not pass through the origin is an example of an affine subspace. It preserves the structure of a 2-dimensional linear subspace (a plane through the origin), but it has been translated away from the origin.

### Properties
1. **Closed under affine combinations:** If you take any points in an affine subspace, their affine combination (a weighted sum where the weights add up to 1) is also in the affine subspace.
2. **Closed under intersection:** The intersection of affine subspaces is also an affine subspace.

### Affine Subspaces Parameters
The parameters of an affine subspace are typically the coefficients of the linear equations that define the subspace. For example, in a 3-dimensional space, a plane (which is an affine subspace) can be defined by a linear equation of the form `ax + by + cz = d`. The parameters `a`, `b`, `c`, and `d` define the position and orientation of the plane.

## Affine Mappings

### Definition
An affine mapping (or affine transformation) is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. In terms of matrices, an affine map is the transformation of the form `x -> Ax + b`, where `A` is a linear transformation and `b` is a vector.

### Intuition
The intuition behind affine mappings is that they are combinations of linear transformations and translations. They can scale, rotate, reflect, and shear an object, and then translate (move) it.

### Example
A simple example of an affine transformation is a translation. If we have a 2D space with a point at `(x, y)`, an affine transformation could move this point to `(x+2, y+3)`. This transformation can be represented by the matrix equation `x' = Ax + b`, where `A` is the identity matrix (since we're not scaling or rotating the point), and `b` is the vector `(2, 3)`.

### Properties
1. **Composability:** The composition of two affine transformations is also an affine transformation.
2. **Invertibility:** If an affine transformation is invertible, its inverse is also an affine transformation.

## Translation Vectors in Affine Mappings

### Definition
In the context of affine mappings, a translation vector is the vector `b` in the affine transformation `x -> Ax + b`, where `A` is a linear transformation and `x` is a point in the affine space. The vector `b` represents a shift that is applied after the linear transformation `A`.

### Intuition
The intuition behind translation vectors in affine mappings is that they allow us to move objects around in space, in addition to scaling, rotating, reflecting, and shearing them. This makes affine transformations more flexible than linear transformations.

### Example
For example, consider a 2D space and an object at position `(x, y)`. We could first apply a linear transformation (represented by a matrix `A`) to scale the object, and then use a translation vector `(2, 3)` to move the object to a new position `(Ax + 2, Ay + 3)`.

### Properties
1. **Additivity:** The sum of two translation vectors is another translation vector, representing the combined translation.
2. **Scalar Multiplication:** Multiplying a translation vector by a scalar changes the magnitude of the translation, but not the direction.