# Linear Operators
An operator $A: X \rightarrow Y$ (read as operator A transforming a vector from vector space X to vector space Y) is said to be linear if, for every $\mathbf{x}_1, \mathbf{x}_2 \in X$ and scalars $\alpha_1, \alpha_2 \in R$ (R here denotes a ring and not real numbers),

$$
A(\alpha_1\mathbf{x}_1 + \alpha_2\mathbf{x}_2) = \alpha_1 A(\mathbf{x}_1) + \alpha_2 A(\mathbf{x}_2)
$$

Often the parantheses are dropped the operation in denoted as $A\mathbf{x}$ - This shouldn't be confused with multiplying A with $\mathbf{x}$. If $A$ is a matrix, then it is indeed a multiplication. But a matrix isn't the only linear operator. For example, a differential equation, $ \dfrac{dx(t)}{dt} - ax(t) = b(t) $ can be thought of as a linear operator on $x(t)$ transforming it to $b(t)$, and can be written as:

$$
Ax(t) = b(t)
$$

where, $A$ is the "differential operator": $ A = \dfrac{d}{dt} - a $. Linear operator theorey applies to all linear operators, and not just matrices.

## Range and Nullity
Range of an operator $A: X \rightarrow Y$, $\mathcal{R}(A)$ is the set of all vectors in $Y$ that are reachable by $A$, i.e., that can be results of the operation $A\mathbf{x}$

Nullity of an operator $A: X \rightarrow Y$, $\mathcal{N}(A)$ is the set of all vectors in $X$ that are transformed to $\mathbf{0}$ in $Y$

The nullity of any linear operator is a vector subspace of $X$. i.e., $\mathcal{N}(A) \subseteq X$. It is easy to see why: Suppose there exists a vector $\mathbf{q}_1 \in X$ that transforms to $\mathbf{0}$ by $A$, then by definition of the linear operator, every vector $\mathbf{x} = \alpha \mathbf{q}_1$ will also be transformed by $A$ to $\mathbf{0}$. Similarly *if* there is another vector $\mathbf{q}_2 \in X$ that is linearly independent of $\mathbf{q}_1$, that is in the nullity of $A$, then all linear combinations of $\mathbf{q}_1$ and $\mathbf{q}_2$ will also be transformed by $A$ to $\mathbf{0}$. We can generalize this argument and say that, suppose we find $k$ linearly independent vectors that are in the nullity of $A$, then all linear combinations of those $k$ vectors would span a subspace. Hence the nullity of $A$ is also called, more popularly, as the **nullspace** of A. We can say that the nullspace has a Hamel basis: $\{\mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_k\}$

The dimension of the nullspace, $k$ could be $0$ - i.e., there may exists no vector in $X$, other than $\mathbf{0}$, such that $A$ transforms it to $\mathbf{0}$. In this case, $A$ is said to have a **trivial nullspace**, i.e., $\mathcal{N}(A) = \{\mathbf{0}\}$

Suppose the nullspace is non-trivial, then if, $\mathbf{x}_r \in X$ is a solution to $A\mathbf{x} = \mathbf{b}$, then, $\mathbf{x}_r + \mathbf{x}_0$ is also a solution, for any $\mathbf{x}_0 \in \mathcal{N}(A)$. In other words, if the nullspace is non-trivial, then $A\mathbf{x} = \mathbf{b}$ does not have a unique solution

Since $X$ is a vector space, we know that it must have a Hamel basis. We also know that, since $\mathcal{N}(A)$ is a subspace of $X$, we can find a Hamel basis of $X$ that contains, among other vectors, $\mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_k $. Let us denote this Hamel basis as $ \{\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_r, \mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_k\} $, where $ \mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_r $ are the other linearly indepedendent vectors in that Hamel basis. Given this, any vector in $X$ can be represented as:

$$
\mathbf{x} = c_1\mathbf{p}_1 + c_2\mathbf{p}_2 + \ldots + c_r\mathbf{p}_r + c_{r+1}\mathbf{q}_1 + c_{r+2}\mathbf{q}_2 + \ldots c_n\mathbf{q}_k 
$$

where $n$ is the dimension of $X$. Consequently, $A\mathbf{x}$ can be represented as:

$$
\begin{align}
A\mathbf{x} &= c_1 A\mathbf{p}_1 + c_2 A\mathbf{p}_2 + \ldots + c_r A\mathbf{p}_r \\
            &= c_1 \mathbf{u}_1 + c_2 \mathbf{u}_2 + \ldots + c_1 \mathbf{u}_r 
\end{align}
$$

The $\mathbf{q}_i$ terms go to $\mathbf{0}$. $\mathcal{R}(A)$, by definition, will have all possible linear combinations of $\mathbf{u}_i = A\mathbf{p}_i$. We can prove that the $\mathbf{u}_i$ are linearly independent of one other. Suppose, let us assume that they are *not* linearly independent of one other. Then, we can find a set of scalars $\{d_1, d_2, \ldots, d_r\}$, not all zeroes, such that,

$$
d_1 \mathbf{u}_1 + d_2 \mathbf{u}_2 + \ldots + d_r \mathbf{u}_r =0
$$

Substituting $\mathbf{u}_i = A\mathbf{p}_i$, we have,

$$
d_1 A\mathbf{p}_1 + d_2 A\mathbf{p}_2 + \ldots + d_r A\mathbf{p}_r = \mathbf{0} \\
A(d_1 \mathbf{p}_1 + d_2 \mathbf{p}_2 + \ldots + d_r \mathbf{p}_r) = \mathbf{0}
$$

Since the LHS of the last equation contains an operand entirely composed of $\mathbf{p}_i$, none of which go to $\mathbf{0}$ when operated upon by $A$, the only possible solution to that equation is if

$$
d_1 \mathbf{p}_1 + d_2 \mathbf{p}_2 + \ldots + d_r \mathbf{p}_r = \mathbf{0}
$$

Since the $\mathbf{p}_i$ are linearly independent of one other, this is not possible, which mean that our assumption that $\mathbf{u}_i$ are *not* linearly independent of one another has to be wrong! A beautiful consequence of this finding is that, $\mathcal{R}(A)$ is not just a set of vectors, but a subspace of vectors. i.e., $\mathcal{R}(A) \subseteq Y$. And that its dimension is $r$, which is the number of vectors in the Hamel basis of $X$ after discounting the $\mathbf{q}_is$. Since $\mathcal{R}(A)$ is a subspace, it is often referred to as the **range space** of $A$.

Since $Y$ is a vector space, we know that it must have a Hamel basis. We also know that, since $\mathcal{R}(A)$ is a subspace of $Y$, we can find a Hamel basis of $Y$ that contains, among other vectors, $\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_r $. Let us denote this Hamel basis as $ \{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_r, \mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_l\} $, where $ \mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_l $ are the other linearly indepedendent vectors in that Hamel basis. Given this, any vector in $Y$ can be represented as:

$$
\mathbf{y} = c_1\mathbf{u}_1 + c_2\mathbf{u}_2 + \ldots + c_r\mathbf{u}_r + c_{r+1}\mathbf{v}_1 + c_{r+2}\mathbf{v}_2 + \ldots c_m\mathbf{v}_l 
$$

where $m$ is the dimension of $Y$.

We see that the nullspace is a subspace of $X$ and that it is spanned by $\mathbf{q}_is$. But what about the $\mathbf{p}_is$? They must span a complementary subspace of $X$. Is there any point in analysing that subspace? Similarly, in the case of $Y$, we have a subspace complementary to the range space, that is spanned by the $\mathbf{v}_is$. Is that subspace of any interest to us? It happens that they are interesting subspaces in their own right, but understanding their importance requires the introduction of the adjoint operator $A^*$.

## Adjoint operator
For a linear operator $A: X \rightarrow Y$, there is an adjoint operator $A^*: Y \rightarrow X$, such that, 

$$
\langle A\mathbf{x}, \mathbf{y} \rangle = \langle \mathbf{x}, A^*\mathbf{y} \rangle
$$

We have seen the least squares solution in the case of underdetermined set of equations, $A\mathbf{x} = \mathbf{b}$ which is reproduced below for convenience:

$$
\mathbf{x} = (A^H A)^{-1}A^H \mathbf{b}
$$

But this is the case when,
1. $A$ is a matrix
2. The inner product is defined as: $\langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{y}^H \mathbf{x} $

Can the method of least squares be used when $A$ isn't a matrix? And when the inner product is defined differently? Yes. This is where the use of the adjoint operator becomes apparent. At the heart of the least squares is the fact that the error vector $\mathbf{e} = \mathbf{b} - A\mathbf{x}$ is minimized when it is orthogonal to $A \mathbf{x}$. Orthogonality itself is defined based on the inner product: Two vectors are said to be orthogonal if their inner product results in $0$. So, in the case of minimum error, we have:

$$
\langle A\mathbf{x}, \mathbf{e} \rangle = 0 = \langle \mathbf{x}, A^*\mathbf{e} \rangle
$$

Substituting, $\mathbf{e} = \mathbf{b} - A\mathbf{x}$ in the RHS of the above equation, we have,

$$
\langle \mathbf{x}, A^*(\mathbf{b} - A\mathbf{x}) \rangle = 0 \\
\langle \mathbf{x}, A^*\mathbf{b} \rangle - \langle \mathbf{x}, A^*A\mathbf{x} \rangle = 0 \\
\langle \mathbf{x}, A^*\mathbf{b} \rangle = \langle \mathbf{x}, A^*A\mathbf{x} \rangle
$$

The last equation is true only when,

$$
A^*A\mathbf{x} = A^*\mathbf{b} \\
\mathbf{x} = (A^*A)^{-1} A^*\mathbf{b}
$$

The last equation is the least squares solution for any linear operator and any inner product. In the case of matrices, when the inner product is defined as $\langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{y}^H \mathbf{x} $, we have,

$$
\langle A\mathbf{x}, \mathbf{y} \rangle = \mathbf{y}^H A \mathbf{x} \\
\text{And, } \langle \mathbf{x}, A^*\mathbf{y} \rangle = (A^*\mathbf{y})^H \mathbf{x}
$$

Using the definition of adjoint, we can equate,

$$
\begin{align}
\mathbf{y}^H A \mathbf{x} &= (A^*\mathbf{y})^H \mathbf{x} \\
                          & = \mathbf{y}^H (A^*)^H \mathbf{x}  
\end{align}
$$

Comparing both sides of the above equation, we get,

$$
\begin{align}
A &= (A^*)^H \\
A^H &= ((A^*)^H)^H \\
A^H &= A^*
\end{align}
$$

So for matrices, for the given inner product, $A^* = A^H$. Plugging this into the generalised least squares solution, we get our familiar least squares solution for matrices for the given inner product:

$$
\mathbf{x} = (A^H A)^{-1}A^H \mathbf{b}
$$

Even in the case of matrices, if we were to change the definition of inner product, we can get a different least squares solution! For instance, if we define the inner product as:

$$
\langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{y}^H W \mathbf{x}
$$

where $W$ is an invertible hermitian weighting matrix, we will find that, $A^* = W^{-1} A^H W$. So our least squares solution would change to:

$$
W^{-1} A^H W A \mathbf{x} = W^{-1} A^H W b \\
\text{Since } W \text{ is invertible, we can multiply either side of the equation, on the left with } W, \text{ and get} \\
A^H W A \mathbf{x} = A^H W b \\
\mathbf{x} = (A^H W A)^{-1}A^H W\mathbf{b}
$$

We will show later, how, in some cases (ex. overdetermined set of equations), although, $A^{-1}$ and $(A^*)^{-1}$ may not exist, $(A^*A)^{-1}$ always exists. In fact, one can view the whole idea of least squares simply as that, because we cannot solve $A\mathbf{x} = \mathbf{b}$, simply as $\mathbf{x} = A^{-1}\mathbf{b}$, because $A$ isn't invertible, we can simply multiply both sides of $A\mathbf{x} = \mathbf{b}$ by $A^*$ and then we get our generalised least squares solution, because now $(A^*A)^{-1}$ would exist.

The adjoint operator $A^*$ has the following properties:
1. $ (A_1 + A_2)^* = A^*_1 + A^*_2 $
2. $ (\alpha A)^* = \bar{\alpha} A^* $, where $\bar{\alpha}$ is the complex conjugate of $\alpha$ (check)
3. $ (A_2 A_1)^* = A^*_1 A^*_2 $
4. If $A$ has an inverse, then, $ (A^{-1})^* = (A^*)^{-1} $

In a Hilbert space, we also have the property that $ A^{**} = A $

## Range and nullspace of $A^*$
**Theorem**
The range of $A^*$ is perpendicular to the nullspace of A. ie., $ \mathcal{R}(A^*) = [\mathcal{N}(A)]^{\perp} $

*Proof*: 
Say, vector $\mathbf{x}_n \in \mathcal{N}(A)$, then for *any* $\mathbf{y} \in Y$, 

$$ 
\begin{align}
\langle \mathbf{x}_n, A^* \mathbf{y} \rangle &= \langle A\mathbf{x}_n, \mathbf{y} \rangle \\
                                             &= \langle \mathbf{0}, \mathbf{y} \rangle \\
                                             &= 0
\end{align}
$$

This proves that $\mathcal{N}(A) \subset [\mathcal{R}(A^*)]^\perp$. i.e., the nullspace of $A$ lies within a space perpendicular to $\mathcal{R}(A^*)$ - but it does not say if there is some vector $\mathbf{x} \in X$, where $\mathbf{x} \notin \mathcal{N}(A)$ that is also in $[\mathcal{R}(A^*)]^\perp$. To prove that this is indeed the case, let us assume the contrary: That there is such an $\mathbf{x}$. Then we have,

$$
\langle \mathbf{x}, A^* \mathbf{y} \rangle = 0 = \langle A \mathbf{x}, y \rangle
$$

But since $\mathbf{y}$ is *any* vector in $Y$, $\langle A \mathbf{x}, y \rangle =0$ must be true for *all* the vectors in $Y$. That is only possible if $A\mathbf{x} = 0$, which, by definition, means $\mathbf{x}$ lies in $\mathcal{N}(A)$. This goes against our assumption, and hence our assumption must be false. So $\mathcal{N}(A)$ isn't just a subset of $[\mathcal{R}(A^*)]^\perp$, but, $\mathcal{N}(A) = [\mathcal{R}(A^*)]^\perp$

Using $A^{**} = A$, and using the above result, it can be easily be proven that we also have:  $\mathcal{N}(A^*) = [\mathcal{R}(A)]^\perp$. With these two results, we have the following facts:

$$
\mathcal{R}(A) \subset Y \\
\mathcal{N}(A) \subset X \\
\mathcal{R}(A^*) \subset X \\
\mathcal{N}(A^*) \subset Y \\
\\
X = \mathcal{R}(A^*) \oplus \mathcal{N}(A) \\
Y = \mathcal{R}(A) \oplus \mathcal{N}(A^*)
$$

The facts devolve from realising that, since we have already proven $\mathcal{R}(A)$ and $\mathcal{N}(A)$ to be subspaces, and that $\mathcal{N}(A^*)$ $\mathcal{R}(A^*)$ contain *all* the vectors orthogonal to them respectively, $\mathcal{N}(A^*)$ $\mathcal{R}(A^*)$ are themselves subspaces. Together, $\mathcal{R}(A)$, $\mathcal{N}(A)$, $\mathcal{R}(A^*)$ and $\mathcal{N}(A^*)$ are called the "four fundamental subspaces of a linear operator" 

A few topics back, we had seen how $X$ has a Hamel basis, $ \{\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_r, \mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_k\} $, where $ \mathcal{N}(A) = span\{\mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_k\} $. Now with the new information we have obtained from this chapter, we have $ \mathcal{R}(A^*) = span\{\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_r\} $. Similarly, we have $ \mathcal{N}(A^*) = span\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_l\} $

### Dimensionality of the four subspaces
Continuing with our notations, we see that, both $\mathcal{R}(A^*)$ and $\mathcal{R}(A)$ are spanned by the same number of linearly independent vectors, $dim(\mathcal{R}(A^*)) = dim(\mathcal{R}(A) = r$. This hints at the possibility that, for a every $\mathbf{y}_r \in \mathcal{R}(A)$ there is a unique $\mathbf{x}_r \in \mathcal{R}(A^*)$, that satisfies, $ A\mathbf{x}_r = \mathbf{y}_r $. Let us see if this is indeed the case. Since  $ \{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_r\} $ are all linearly independent, $\mathbf{y}_r$ has a unique representation,

$$
\mathbf{y}_r = c_1\mathbf{u}_1 + c_2\mathbf{u}_2 + \ldots + c_r\mathbf{u}_r 
$$

Since $\mathbf{u}_i = A\mathbf{p}_i$, this means, $ \mathbf{y}_r = A (c_1\mathbf{p}_1 + c_2\mathbf{p}_2 + \ldots + c_r\mathbf{p}_r )$. Now, since $ \{\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_r\} $ are linearly independent, their linear combination must produce a unique vector in $\mathcal{R}(A^*)$. 

Suppose the dimensionality of $X$ is $n$ and $Y$ is $m$, then, we have, $dim(\mathcal{N}(A)) = n-r$ and $dim(\mathcal{N}(A^*)) = m-r$. The letters we have chosen to denote the dimensionalities are not accidental: They were purposefully selected to match some matrix conventions - When $A$ is an $m \times n$ matrix, when the definition of inner product is, $\langle \mathbf{x}_1, \mathbf{x}_2 \rangle = \mathbf{x}^H_2 \mathbf{x}_1$, we have seen that $A^* = A^H$. If $A$ has $r$ linearly independent column vectors, then it is said to have a **rank** equat to $r$. And its nullspace will have a dimension of $n-r$ - i.e., the number of linearly dependent columns of $A$. Since $\mathcal{R}(A)$ consists of linear combinations of columns of $A$, it is often referred to as the **column space** of $A$. Since $A^* = A^H$, columns of $A^*$ are the complex conjugates of the rows of $A$. Hence $\mathcal{R}(A^*)$ is often referred to as the **row space** of $A$. We can think of $A$ as mapping its column space to its row space. And what we proved in the previous paragraph can be rephrased as, "The mapping from row space to column space is invertible". We have seen that the row space has the same dimensionality as the column space. And hence a matrix always has the same number of linearly independent columns as it has linearly independent rows. A **full rank** matrix is one where $m = n = r$, and it will have a trivial $\mathcal{N}(A)$ and a trivial $\mathcal{N}(A^*)$. This will essentially be a square matrix. 

## Least squares and minimum norm solution
When we have an equation $A\mathbf{x} = \mathbf{y}$, suppose $\mathbf{y} \notin \mathcal{R}(A)$, then the equation has no solution (In the case of simultaneous equations, this situation is called an overdetermined system of equations). This can only happen if $\mathcal{N}(A^*)$ is non-trivial. In that case, we are tasked to finding an $\mathbf{x}$ that gives us a $\mathbf{y}_r$ that is the best approximation of $\mathbf{y}$. Using the principle that the best approximation is when the approximation error is orthogonal to $\mathbf{y}_r$, we see that the error must lie in $\mathcal{N}(A^*)$, because $\mathcal{N}(A^*) = [\mathcal{R}(A)]^\perp$. In other words, when we are given a $\mathbf{y}$ that doesn't reside in $\mathcal{R}(A)$, the best solution is obtained by orthogonally projecting $\mathbf{y}$ onto $\mathcal{R}(A)$ (which is equivalent to throwing away all the terms involving $\mathbf{v}_is$ in the representation of $\mathbf{y}$ in terms of its Hamel basis). We saw earlier, how when we tried to find the least squares solution, we used the definition of an adjoint operator and came across the equation:

$$
A^*A\mathbf{x} = A^*\mathbf{y}
$$

Here, unlike in the case of $A\mathbf{x} = \mathbf{y}$, where the equality is a hypothesis, which is sometime true and sometimes not true, the equality is a mathematical certainty. This implies that $(A^*A)^{-1}$ must exist and we can safely say that the least squares solution, i.e., $\mathbf{x} = (A^*A)^{-1} A^*\mathbf{y}$ exists. What we are doing in this solution is that, given a $\mathbf{y} \notin \mathcal{R}(A)$, we first find a vector $\mathbf{x}^\prime_r$ in $\mathcal{R}(A^*)$ by doing $A^*\mathbf{y}$. Then we use $(A^*A)^{-1} \mathbf{x}^\prime_r$ to get our least squares solution. 

Sometimes the equation $A\mathbf{x} = \mathbf{y}$ indeed has a solution, but more than one solution (In the case of simulateneous equations, this situation is called an undeterdetermined system of equations). This happens when $\mathcal{N}(A)$ is non-trivial. This is because, for every $\mathbf{x} \in X$ such that $A\mathbf{x} = \mathbf{y}$, we will have $A(\mathbf{x} + \mathbf{x}_0 = \mathbf{y}$, where $\mathbf{x}_0$ is any vector that lies in $\mathcal{N}(A)$. In this case, we are tasked with finding $\mathbf{x}$, that, among all the solutions, has the minimum norm (Because smaller vectors are easy to store in computer memory and do not cause register overflows when used in computer operations along with other numbers etc.). Using our knowledge that, every $\mathbf{y} \in Y$ has a corresponding unique $\mathbf{x}_r \in \mathcal{R}(A^*)$ that satisfies, $A\mathbf{x} = \mathbf{y}$, we realise that all the solutions, must simply be a **linear variety** of this $\mathbf{x}_r$. i.e., one can represent every solution $\mathbf{x}$ as, $\mathbf{x} = \mathbf{x}_r + c \mathbf{x}_0$, where $\mathbf{x}_0$ is a vector in $\mathcal{N}(A)$. And we immediately also realise that, $\mathbf{x}_r$ must have the minimum norm among all the solutions! But how do we find this $\mathbf{x}_r$ given a $\mathbf{y}$?

Since $\mathbf{x}_r$ lies in $\mathcal{R}(A^*)$, there must be some vector $\mathbf{y^\prime} \in Y$, that is transformed by $A^*$ to $\mathbf{x}_r$. Since $A\mathbf{x}_r = \mathbf{y}$, we get, $ A A^* \mathbf{y^\prime} = \mathbf{y} $. This equality is a mathematical certainty, which implies that $(A A^*)^{-1}$ must exist. This, in turn, means, we can get, $\mathbf{y^\prime} = (A A^*)^{-1} \mathbf{y}$. And since, $\mathbf{x}_r = A^*\mathbf{y}^*$, we finally get the **minimum norm solution** as:

$$
\mathbf{x}_r = A^*(A A^*)^{-1}\mathbf{y}
$$

In literature, $(A^*A)$ appearing in the least squares solution is often referred to as the **inner product** and the $(AA^*)$ appearing in the minimum norm solution as the **outer product**.

## Left and right inverse
If, for an operator $A$, there exists an operator $B$ such that, $BA = I$, where $I$ is the identity matrix, then $A$ is said to have a left inverse $B$. If $A$ has a left inverse, then the solution to $A\mathbf{x} = \mathbf{y}$, is $\mathbf{x} = B\mathbf{y}$: This is the result of simply multiplying both the sides of $A\mathbf{x} = \mathbf{y}$ with $B$

If, for an operator $A$, there exists an operator $C$ such that, $AC = I$, then $A$ is said to have a right inverse $C$. If $A$ has a right inverse, then the solution to $A\mathbf{x} = \mathbf{y}$, is $\mathbf{x} = C\mathbf{y}$: This is because, if we substitute $\mathbf{x} = C\mathbf{y}$ in $A\mathbf{x}$, we get, $AC\mathbf{y} = \mathbf{y}$

The term $(A^*A)^{-1}A^*$ appearing in the least squares solution is a left inverse of $A$, since $(A^*A)^{-1}A^*A = I$. And the $A^*(A A^*)^{-1}$ appearing in the minimum norm solutions is a right inverse of $A$, since $AA^*(A A^*)^{-1} = I$.

## Projections
A linear operator $P$ that transforms a linear space on to itself is called a Projection, if $P^2 = P$. So suppose, we have a vector $\mathbf{x} \in S$, such that $\mathbf{x} = \mathbf{v} + \mathbf{w}$  where $V$ and $W$ are two disjoint subspaces of $S$, then a linear transform that operates on $\mathbf{x}$ and gives $\mathbf{v}$ is a projection if $P(P(\mathbf{x})) = P(\mathbf{v}) = \mathbf{v}$. One can similarly think of a projection that results in $\mathbf{w}$.

A projection $P$ is called an **orthogonal projection** when its range and nullspace are othogonal. i.e., $\mathcal{R}(P) \perp \mathcal{N}(P)$. 

In the least squares method, the best approximation of $\mathbf{y}$ is given as:

$$
\mathbf{y}_r = A(A^*A)^{-1}A^*\mathbf{y}
$$

This can be thought of as the orthogonal projection of $\mathbf{y}$ on to $\mathcal{R}(A)$ by the projection operator, $P = A(A^*A)^{-1}A^*$.

Similarly, in the case of the minimum norm solution, given some solution, $\mathbf{x}$, we can find the minimum norm solution as:

$$
\mathbf{x}_r = A^*(A A^*)^{-1}A\mathbf{x}
$$

Again, this can be thought of as the orthogonal projection of $\mathbf{x}$ on to $\mathcal{R}(A^*)$ by the projection operator, $P = A^*(A A^*)^{-1}A$.