<h1 align="center">Linear Algebra (Part II)</h1>

<h3 align="center">Linear transformations</h3>


$\textbf{Definition}$. A $linear$ $transformation$ (or $operator$) $A$ on a vector space $\mathcal{V}$ over a field $\mathcal{F}$, is a correspondence that assigns to every vector $x \in \mathcal{V}$ a vector $Ax \in \mathcal{V}$ in such a way that:

$$A(\alpha x  + \beta y) = \alpha Ax + \beta A y$$

for any $x, y \in \mathcal{V}$ and $\alpha, \beta \in \mathcal{F}$.


<h3 align="center">Examples</h3>

$\bullet$ Two special types of transformations:
<br>
&emsp; $\bullet$ **Null** transformation: $\Theta x = 0$ for any $x \in \mathcal{V}$;
<br>
&emsp; $\bullet$ **Identity** transformation: $Ix = x$ for any $x \in \mathcal{V}$.

$\bullet$ Let $\mathcal{X} = \{ x_1, ..., x_n \}$ be any basis in $n$-dimensional vectors space $\mathcal{V}$ and let $\mathcal{X}' = \{ y_1, ..., y_n\}$ be the dual basis of linear functionals in $\mathcal{V}'$. Write:
$$A(x) = y_1(x)x_1 + ... y_n(x)x_n.$$
&ensp; It is easy to prove that every linear transformation $A$ has the form just described.

$\bullet$ For any $x \in \mathcal{P}$, such that $x(t) = \sum_{i = 0}^{n-1} \xi_i t^i$, write:
$$(Dx)(t) = \sum_{i = 0}^{n-1} i \xi_i t^{i-1}; \text{     }
(Sx)(t) = \sum_{i = 0}^{n-1} \frac{\xi_i}{i+1}  i^{i+1}.$$
&ensp; Observe that for polynomials the definition of differentiation and integration can be given purely algebraically, and does not need the usual theory of limiting processes.

<h3 align="center">Transformations as vectors</h3>

$\textbf{Theorem} \space \textbf{10}$. The set of all linear transformations $A$ on a vector space $\mathcal{V}$ is itself a vector space.


$\textbf{Proof}$.
Let $A$ and $B$ be linear transformations on a vector space $\mathcal{V}$, we define their $sum$, by the equation:
$$S(x) = Ax + Bx \text{ (for every } x \in \mathcal{V} \text{)}.$$
We observe that the commutativity and associativity of addition imply immediately that the addition of linear transformations is commutative and associative.
<br>
If we consider the sum of any linear transformations $A$ and the linear transformation $\Theta$, we see that:
$$A + \Theta = A.$$
If, for each $A$, we denote by $-A$ the transformation defined by $(-A)x = -(Ax)$, we see that:
$$A + (-A) = \Theta.$$
For any $A$, and any $\alpha \in \mathcal{F}$ we define the product $\alpha A$ by the equation: 
$$(\alpha A)x = \alpha(Ax).$$
To sum up: the properties of a vector space, described in the axioms **(A)**, **(B)** and **(C)**, appear again in the set of all linear transformations on the space. 

<h3 align="center">Exercises 6.1</h3>

Prove that if $\mathcal{V}$ is a finite-dimensional vector space, then the space of all linear transformations $A$ on $\mathcal{V}$ is finite-dimensional, and find its dimension.

<h3 align="center">Exercises 6.2</h3>

&emsp; (a) Suppose that $\mathcal{U}$ and $\mathcal{V}$ are vector spaces over the same field $\mathcal{F}$.
<br>
&emsp; &emsp; If $A$ and $B$ are linear transformations from $\mathcal{U}$ to $\mathcal{V}$, if $\alpha, \beta \in \mathcal{F}$, and if:

$$Cx = \alpha Ax + \beta Bx,$$

&emsp; &emsp; for each $x \in \mathcal{U}$, then $C$ is a linear transformation from $\mathcal{U}$ to $\mathcal{V}$.

&emsp; (b) If we write, by definition, $C = \alpha A + \beta B$, then the set of all linear transformations from $\mathcal{U}$ to $\mathcal{V}$
<br>
&emsp; &emsp; becomes a vector space with respect to this definition of the linear operations.

&emsp; (c) Prove that if $\mathcal{U}$ and $\mathcal{V}$ are finite-dimensional, 
<br>
&emsp; &emsp; then so is the space of all linear transformations from $\mathcal{U}$ to $\mathcal{V}$, and find its dimension.


<h3 align="center">Products</h3>


$\textbf{Definition}$. The $product$ $P$ of two linear transformations $A$ and $B$, $P = AB$, is defined by the equation:

$$Px = A(Bx).$$

Transformation multiplication is, in general, **not commutative**, and the order in which we transform makes a lot of difference.
The order to transform by $AB$ means to transform first by $B$ and then by $A$. 

Most of the formal algebraic properties of numerical multiplication are still valid for products:
$$A\Theta = \Theta A =\Theta$$
$$AI = IA =I$$
$$A(B+C) = AB + AC$$
$$(A+B)C =AC + BC$$
$$A(BC)= (AB)C$$



<h3 align="center">Examples</h3>

Let's consider the differentiation and multiplication transformations $D$ and $T$, defined by:
$$(Dx)(t) = \frac{dx}{dt} \text{ and } (Tx)(t) = tx(t).$$
We have that:
$$(DTx)(t)= \frac{d}{dt}(tx(t))= x(t) + t \frac{dx}{dt},$$
and 
$$(TDx)(t) = t \frac{dx}{dt}.$$

In other words, not only is it false that $DT \neq TD$, but, in fact, $(DT - TD)x = x$ for every $x \in \mathcal{V}$, 
<br> i.e. we have that $DT - TD = I$.

<h3 align="center">Polinomials</h3>


$\bullet$ Although in general transformation multiplication is not commutative, for the powers of one transformation $A$ we do have the usual laws of exponents:

$$A^nA^m = A^{n+m} \text{ and } (A^n)^m = A^{nm}.$$

$\bullet$ We observe that $A^1 = A$. It is customary also to write, by definition, $A^0 = I$.

$\bullet$ The calculus of powers of a single transformation is almost exactly the same as in ordinary arithmetic.
<br> 
&ensp; We may, in particular, define polynomials in a linear transformation. Thus, if:

$$p(t) = \alpha_0 + \alpha_1 t + ... + \alpha_n t^n,$$

we may form the linear transformation:

$$p(A) = \alpha_0 I + \alpha_1 A + ... + \alpha_n A^n,$$

$\bullet$ The rules for the algebraic manipulation of such polynomials are easy:
$$p(t)q(t) = r(t) \text{ implies } p(A)q(A) = r(A).$$

<h3 align="center">Exercises 7.1</h3>

Suppose that $x$ is any polynomial in $\mathcal{P}$ and $D$ is a differentiation transformation on $\mathcal{P}$.
<br>
It is easy to see that for every positive integer k:
$$(D^kx)(t) = \frac{d^kx}{dt^k}.$$
If $x$ is a polynomial of degree $n - 1$, what is $D^nx$?

$\textbf{Definition}$.  A non-zero transformation whose product with some non-zero transformation is zero is called a $divisor$ $of$ $zero$.

<h3 align="center">Exercises 7.2</h3>

Calculate the linear transformations $D^n S^n$ and $S^n D^n$, $n = 1, 2, 3, ...$; 
<br>In other words, compute the effect of each such transformation on an arbitrary element of $\mathcal{P}$. 

(Here $D$ and $S$ denote the differentiation and integration transformations)

<h3 align="center">Exercises 7.3</h3>

Suppose that $Ax(t) = x(t + 1)$ for every $x \in \mathcal{P_n}$. Prove that if $D$ is the differentiation operator, then:
$$1 + \frac{D}{1!} + \frac{D^2}{2!} + ... + \frac{D^{n-1}}{(n-1)!} = A.$$

<h3 align="center">Inverses</h3>


$\textbf{Definition}$. If linear transformations $A$ has properties:
<br>
&emsp; (1) If $x_1 \neq x_2$, then $Ax_1 \neq Ax_2$;
<br>
&emsp; (2) To every vector $y$ there corresponds (at least) one vector $x$ such that $Ax = y$;
<br>
&emsp; then we say that $A$ is $invertable$.

For any invertibale linear transformation $A$ we can define a linear transformation, called the $inverse$ of
$A$ and denoted by $A^{-1}$, as follows:
<br>
&emsp; $\bullet$ If $y$ is any vector, we may (by (2)) find $x$ for with $Ax = y$;
<br>
&emsp; $\bullet$ This $x$ is uniquely determined, since $x_1 \neq x_2$ implies (by (1)) that $y = Ax_0 \neq Ax_1 = y$.
<br>
&emsp; $\bullet$ We define $A^{-1}y$ to be $x$.

It is immediate from the definition that for any invertible $A$ we have:
$$A A^{-1} = A^{-1} A = I.$$

$\textbf{Theorem} \space \textbf{11}$. 
If $A$, $B$, and $C$ are linear transformations such that $AB = CA = I$, then $A$ is invertible and $A^{-1}=B=C$.

$\textbf{Proof}$.
<br>
If $Ax_1 = Ax_2$, then $CAx_1 = CAx_2$, so that $x_1 = x_2$, i.e. the first condition of invertibility is satisfied.
<br>
If $y$ is any vector and $x = By$, then $y = ABy = Ax$, i.e. the second condition is also satisfied.
<br>
Multiplying $AB = I$ on the left, and $CA = I$ on the right, by $A^{-1}$, we see that $A^{-1} = B = C$.

$\textbf{Theorem} \space \textbf{11}$. 
A linear transformation $A$ on a finite-dimensional vector space $\mathcal{V}$ is invertible if and only if $Ax = 0$ implies that $x = 0$.

$\textbf{Proof}$. 
Let $A$ be the invertible transformation, i.e. both conditions are satisfied.
<br> Suppose now that $Ax = 0$ implies that $x = 0$. 
If $u \neq v$, that is, $u-v \neq 0$, implies that $A(u - v) \neq 0 $, that is, that $Au \neq Av$. This proves (1).
<br> To prove (2), let $\{ x_1, ... , x_n\}$ be a basis in $\mathcal{V}$. We assert that $\{Ax_1, ..., Ax_n\}$ is also a basis. 
<br>
According to $\textbf{Theorem} \space \textbf{4}$, we need only prove linear independence. But:
$$\sum_{i=1}^{n} \alpha_i A x_i = A \left ( \sum_{i=1}^{n} \alpha_i x_i \right ) = 0.$$
By hypothesis, this implies that $\sum_{i} \alpha_i x_i = 0$. 
The linear independence of the $x_i$ implies that $\alpha_i = 0$ for all $i$.

Now let's assume that every $y$ is an $Ax$, and let $\{y_1, ..., y_n\}$ be any basis in $\mathcal{V}$.
<br> Corresponding to each $y_i$ we may find a $x_i$ for which $y_i = Ax_i$. We assert that $\{x_1, ..., x_n\}$ is also a basis. 
For $\sum_{i} \alpha_i x_i = 0 $ implies $\sum_{i} \alpha_i A x_i = \sum_{i} \alpha_i y_i = 0$, so that $\alpha_i = 0$ for all $i$.
Consequently every $x$ may be written in the form $\sum_{i} \alpha_i x_i$, and $Ax = 0$ implies, as in the argument just given, that $x = 0$.

$\textbf{Theorem} \space \textbf{12}$.

1. If $A$ and $B$ are invertible, then $AB$ is invertible and $(AB)^{-1} = B^{-1}A^{-1}$.
2. If $A$ is invertible and $\alpha \neq 0$, then $\alpha A$ is invertible and $(\alpha A)^{-1}$= $\alpha^{-1}A^{-1}$.
3. If $A$ is invertible, then $A^{-1}$ is invertible and $(A^{-1})^{-1} = A$.

$\textbf{Proof}$. 
The proofs of these statements we leave to the students!

<h3 align="center">Exercises 8.1</h3>

A linear transformation $A$ is defined on $\mathbb{C}^2$ by:

$$A(\xi_1, \xi_2) = (\alpha \xi_1 + \beta \xi_2, \gamma \xi_1 + \delta \xi_2),$$

where $\alpha, \beta, \gamma$ and $\delta$ are fixed scalars. 

Prove that $A$ is invertible if and only if $\alpha \delta - \beta \gamma \neq 0$.

<h3 align="center">Exercises 8.2</h3>

Show that if $A$ is a linear transformation such that $A^2 - A + I = \Theta$ then $A$ is invertible.

<h3 align="center">Exercises 8.3</h3>

If $A$ and $B$ are linear transformations (on the same vector space) and if $AB = I$, 
<br> then $A$ is called a $left$ $inverse$ of $B$ and $B$ is called a $right$ $inverse$ of $A$. 

Prove that if $A$ has exactly one right inverse, say $B$, then $A$ is invertible. 

($\textbf{Hint:}$ consider $BA + B - I$)

<h3 align="center">Exercises 8.4</h3>

If $A$ is an invertible linear transformation on a finite-dimensional vector space $\mathcal{V}$,
then there exists a polynomial $p$ such that $A^{-1} = p(A)$. 

($\textbf{Hint:}$ find a non-zero polynomial $q$ of least degree such that $q(A) = 0$ and prove that its constant term cannot be $0$.)

<h3 align="center">Matrices</h3>


$\textbf{Definition}$. Let $\mathcal{V}$ be an $n$-dimensional vector space over a field $\mathcal{F}$, 
let $\mathcal{X} = \{x_1, ..., x_n \}$ be any basis of $\mathcal{V}$, and let $A$ be a linear transformation on $\mathcal{V}$. 
Since every vector is a linear combination of the $x_i$, we have:
$$A x_j = \sum_{i=1}^{n} \alpha_{ij} x_i$$
for $j = \overline{1,n}$. The set $\alpha_{ij}$ of $n^2$ scalars, indexed with the double subscript $i$, $j$, 
is the $matrix$ of $A$ in the coordinate system $\mathcal{X}$. 
We denote it by $[A]$.
<br>
A matrix $[A]$ is usually written in the form of a square array:

$$[A] = \begin{bmatrix}
\alpha_{11} & \alpha_{12} & \cdots  & \alpha_{1n} \\ 
\alpha_{21} & \alpha_{11} & \cdots  & \alpha_{2n} \\ 
\vdots & \vdots & \ddots  & \vdots \\ 
\alpha_{n1} & \alpha_{11} & \cdots  & \alpha_{nn} 
\end{bmatrix};$$

the scalars $(\alpha_{i1}, ... , \alpha_{in})$ form a $row$, and scalars $(\alpha_{1j}, ... , \alpha_{nj})$ a $column$, of $[A]$.

$\textbf{Note:}$ We comment on notation. It is customary to use the same symbol, say, $A$, for the matrix as for the transformation. The justification for this is to be found in the discussion below (of properties of matrices).
We do not follow this custom here, because one of our principal aims, in connection with matrices, is to emphasize that they depend on a coordinate system (whereas the notion of linear transformation does not), and to study how the relation between matrices and linear transformations changes as we pass from one coordinate system to another.

<h3 align="center">Examples</h3>

Let's consider the differentiation transformations $D$ on the space $\mathcal{P}_n$, and the basis $\mathcal{X} =\{x_1, ..., x_n\}$ defined by $x_i(t) = t^{i-1}$ for $i = \overline{1, n}$. What is the matrix of $D$ in the basis $\mathcal{X}$?

We have:
<font size="5">
$$\begin{matrix}
Dx_1 = 0x_1 + 0x_2 + & \cdots  & + 0x_{n-1} + 0x_n \\
Dx_2 = 1x_1 + 0x_2 + & \cdots  & + 0x_{n-1} + 0x_n \\ 
Dx_3 = 0x_1 + 2x_2 + & \cdots  & + 0x_{n-1} + 0x_n \\ 
\vdots & \cdots  & \vdots\\ 
Dx_n = 0x_1 + 0x_2 + & \cdots  & + (n-1)x_{n-1} + 0 x_n
\end{matrix};$$
</font>
so that
<font size="5">
$$[D] = \begin{bmatrix}
0 & 1 & 0 & \cdots  & 0 & 0 \\ 
0 & 0 & 2 & \cdots  & 0 & 0 \\ 
\vdots & \vdots & \vdots & \ddots  & \vdots & \vdots \\ 
0 & 0 & 0 & \cdots  & 0 & n-1 \\ 
0 & 0 & 0 & \cdots  & 0 & 0
\end{bmatrix}.$$
</font>

<h3 align="center">Exercises 9.1</h3>

Let's consider the multiplication transformations $T$ on the space $\mathcal{P}_n$, defined by:
$(Tx)(t) = tx(t)$.
<br>
Let $\mathcal{X} =\{x_1, ..., x_n\}$ be a basis defined by $x_i(t) = t^{i-1}$ for $i = \overline{1, n}$. 

What is the matrix of $T$ in the basis $\mathcal{X}$?


<h3 align="center">Matrices of transformations</h3>


In a fixed coordinate $\mathcal{X} = \{x_1, ..., x_n \}$, the matrices of $A$ and $B$, how can we find the matrices of $\Theta$ and $I$, of $C = \alpha A + \beta B$, of $P = AB$, etc.?

Write $[A] = (\alpha_{ij})$, $[B] = (\beta_{ij})$, $[C] = (\gamma_{ij})$, $[P] = (\rho_{ij})$, and $[\Theta] = (o_{ij})$, $[I] = (e_{ij})$ then:


$$\begin{matrix}
\gamma_{ij}  & = & \alpha \alpha_{ij} + \beta \beta_{ij}; \\
\rho_{ij}  & = &\sum_{k=1}^{n} \alpha_{ik} \beta_{kj}; \\
o_{ij}  & = & 0; \\
e_{ij}  & = & \delta_{ij}.\\
\end{matrix}$$

To prove the second rule we use the definition of the matrix associated with a transformation,
and juggle, thus:
$$Px_j = A(Bx_j) = A \left ( \sum_{k=1}^{n} \beta_{kj} x_k \right ) = \sum_{k=1}^{n} \beta_{kj} A x_k = 
\sum_{k=1}^{n} \beta_{kj} \left ( \sum_{i=1}^{n} \alpha_{ik} x_i \right ) = \sum_{i=1}^{n} \left ( \sum_{k=1}^{n} \alpha_{ik} \beta_{kj} \right ) x_i $$





$\textbf{Theorem} \space \textbf{13}$. Among the set of all matrices $\alpha_{ij}$, $\beta_{ij}$, etc., $i, j = \overline{1,n}$ (not considered in relation to linear transormations), we define sum, scalar multiplication, product, $(o_{ij})$, and $e_{ij}$, by:
$$(\alpha_{ij}) + (\beta_{ij}) = (\alpha_{ij} + \beta_{ij});$$
$$\alpha (\alpha_{ij}) = (\alpha  \alpha_{ij});$$
$$(\alpha_{ij})(\beta_{ij}) = (\sum_{k=1}^{n} \alpha_{ik}\beta_{kj} );$$
$$o_{ij} = 0;$$
$$e_ij = \delta_{ij}.$$

Then the correspondence (established by means of an arbitrary coordinate system $\mathcal{X} =\{x_1, ..., x_n\}$ of the $n$-dimensional vector $\mathcal{V}$), between all linear transformations $A$ on $\mathcal{V}$ and all matrices $(\alpha_{ij})$, described by $Ax_i = \sum_{i} \alpha_{ij} x_i$, is an isomorphism. In other words, it is a one-to-one correspondence
that preserves sum, scalar multiplication, product, 0, and 1.

<h3 align="center">Matrix multiplication</h3>

<img src="images/RM_Matrix_Multiplication.png" width="1500" height="1000" alt="Example"  align="center"/>

<h3 align="center">Exercises 10.1</h3>

Let $A$ be the linear transformation on $\mathcal{P}_n$ defined by $(Ax)(t) = x(t+1)$,
<br> and let $\mathcal{X} = \{x_0, ...., x_{n-1}\}$ be the basis of $\mathcal{P}_n$ defined by $x_i(t) = t^i$ for $i = \overline{0, n-1}$. 

What is the matrix of $A$ in the basis $\mathcal{X}$?


<h3 align="center">Change of basis</h3>


Let $\mathcal{V}$ be an $n$-dimensional vector space and $\mathcal{X} =\{x_1, ..., x_n\}$ and $\mathcal{Y} =\{y_1, ..., y_n\}$ be two bases in $\mathcal{V}$. 
<br>
We may ask the following two questions.

$\textbf{Question I}$. If $x \in \mathcal{V}$ and $x = \sum_{i} \xi x_i = \sum_{i} \eta_i y_i$, what is the relation between its coordinates $(\xi_1, ..., \xi_n)$ with respect to $\mathcal{X}$ and its coordinates $(\eta_1, ..., \eta_n)$ with respect to $\mathcal{Y}$?

$\textbf{Question II}$. If $(\xi_1, ..., \xi_n)$ is an ordered set of $n$ scalars, what is the relation between the vectors $x = \sum_{i} \xi_i x_i$ and $y = \sum_{i} \xi_i y_i$?

Both these questions are easily answered in the language of linear transformations.
<br>
We consider, namely, the linear transformation $A$ defined by $Ax_i = y_i$ for $i = \overline{1,n}$.

$\textbf{Answer to Question I}$. Since:
$$\sum_{j=1}^{n} \eta_j y_j = \sum_{j=1}^{n} \eta_j A x_j = \sum_{j=1}^{n} \eta_j \sum_{i=1}^{n} \alpha_{ij} x_i = \sum_{i=1}^{n} \left ( \sum_{j=1}^{n} \alpha_{ij} \eta_j \right ) x_i,$$
i.e. we have:
$$\xi_i = \sum_{j=1}^{n} \alpha_{ij} \eta_j.$$

$\textbf{Answer to Question II}$. $$y = Ax$$

Roughly speaking, the invertible linear transformation $A$ (or, more properly, the matrix $[A]$) may be considered as a **transformation of coordinates**, or it may be considered as a **transformation of vectors**.

<h3 align="center">Similarity</h3>

The following two questions are closely related to those of the preceding
section:

$\textbf{Question III}$. If $B$ is a linear transformation on $\mathcal{V}$, what is the relation between its matrix $(\beta_{ij})$ with respect to $\mathcal{X}$ and its matrix $(\gamma_{ij})$ with respect $\mathcal{Y}$?

$\textbf{Question IV}$. If $(\beta_{ij})$ is a matrix, what is the relation between the linear transformations $B$ and $C$ defined, respectively, by $B x_i = \sum_{i=1}^{n} \beta_{ij} x_i$ and $C y_i = \sum_{i=1}^{n} \beta_{ij} y_i$?

Questions III and IV are explicit formulations of a problem we raised before: 
<br>
$\bullet$ to **one transformation** there correspond **many matrices** (in different coordinate systems);
<br>
$\bullet$ to **one matrix** there correspond **many transformations**.

$\textbf{Answer to Question III}$. We have:
$$B x_i = \sum_{i=1}^{n} \beta_{ij} x_i$$
$$B y_i = \sum_{i=1}^{n} \gamma_{ij} y_i$$
Using the linear transformation $A$ defined in the preceding section, we may write:
$$B y_j = B A x_j = B \left ( \sum_{k=1}^{n} \alpha_{kj} x_k\right ) = \sum_{k=1}^{n} \alpha_{kj} B x_k = 
\sum_{k=1}^{n} \alpha_{kj} \sum_{i=1}^{n} \beta_{ik} x_i = \sum_{i=1}^{n} \left ( \sum_{k=1}^{n} \beta_{ik} \alpha_{kj} x_i\right ),$$
$$\sum_{k=1}^{n} \gamma_{kj} y_k = \sum_{k=1}^{n} \gamma_{kj} A x_k = 
\sum_{k=1}^{n} \gamma_{kj} \sum_{i=1}^{n} \alpha_{ik} x_i = \sum_{i=1}^{n} \left ( \sum_{k=1}^{n} \alpha_{ik} \gamma_{kj} x_i\right ).$$
comparing with each other, we see that:
$$\sum_{k=1}^{n} \alpha_{ik} \gamma_{kj} = \sum_{k=1}^{n} \beta_{ik} \alpha_{kj}$$

Using matrix multiplication, we can write this in the "dangerously" simple form:
$[A][C] = [B][A]$

$\textbf{Answer to Question IV}$. We observe that:
$$C y_j = C A x_j$$,
$$\sum_{i=1}^{n} \beta_{ij} y_i = \sum_{i=1}^{n} \beta_{ij} A x_i = 
A \left ( \sum_{i=1}^{n} \beta_{ij} x_i \right ) = AB x_j.$$
Hence $C$ is such that $CA x_j = AB x_j$, or, finally:
$$C = A B A^{-1}.$$

The situation is conveniently summed up in the following mnemonic diagram:
<center><img src="images/RM_Diagram.svg" width="300" height="300" alt="Example" /></center>

<h3 align="center">Range and null-space</h3>

$\textbf{Definition}$. If $A$ is a linear transformation on a vector space $\mathcal{V}$ and if $\mathfrak{M}$ is a subspace of $\mathcal{V}$, the $image$ of $\mathfrak{M}$ under $A$, in symbols $A\mathfrak{M}$, is the set of all vectors of the form $Ax$ with $x \in \mathfrak{M}$. 

$\textbf{Definition}$. The $range$ of $A$ is the set $\mathfrak{R}(A)= A\mathcal{V}$. 

$\textbf{Definition}$. The $null-space$ of $A$ is the set $\mathfrak{N}$ of all vectors $x\in \mathcal{V}$ for which $Ax = 0$.

$\textbf{Statements:}$
<br>
$\bullet$ The transformation $A$ is invertible if and only if $\mathfrak{R}(A) = \mathcal{V}$ and $\mathfrak{N} = 0$.
<br>
$\bullet$ In case $\mathcal{V}$ is finite-dimensional, $A$ is invertible if and only if $\mathfrak{R}(A) = \mathcal{V}$ or $\mathfrak{N} = 0$.

<h3 align="center">Rank and nullity</h3>

$\textbf{Definition}$. The $rank$, $\rho(A)$, of a linear transformation $A$ on a finite-dimensional vector space is the dimension of $\mathfrak{R}(A)$.


$\textbf{Definition}$. The $nullity$, $\nu(A)$, of a linear transformation $A$ on a finite-dimensional vector space is the dimension of $\mathfrak{N}(A)$.

$\textbf{Theorem} \space \textbf{14}$. If $A$ is a linear transformation on an $n$-dimensional vector space $\mathcal{V}$, then $\nu(A) = n - \rho(A)$.

$\textbf{Proof}$. 
The proofs of this theorem we leave to the students!

$\textbf{Theorem} \space \textbf{15}$. If $A$ and $B$ are linear transformations on a finite-dimensional vector space $\mathcal{V}$, then:
$$\rho(A + B) \leq \rho(A) + \rho(B);$$

$$\left.\begin{matrix}
\rho(AB) \leq min \{ \rho(A), \rho(B)\}\\ 
\nu(AB) \leq \nu(A) + \nu(B)
\end{matrix}\right\}\text{(known as Sylvester's law of nullity).}
$$

If $B$ is invertible, then:
$$\rho(AB)  = \rho(BA)=\rho(A).$$

$\textbf{Proof}$. Since $(AB)x = A(Bx)$, it follows that  $\mathfrak{R}(AB)$ is contained in  $\mathfrak{R}(A)$, so that $\rho(AB) \leq \rho(A)$. 
<br> In other words, the rank of a product is not greater than the rank of the first factor. 
<br> If $B$ is invertible, then:
$$\rho(A) = \rho(ABB^{-1}) \leq \rho (AB) \text{ and } \rho(A) = \rho(B^{-1}BA) \leq \rho (BA).$$
The proof of other statements we leave as an exercises for the students.

<h3 align="center">Determinant</h3>


$\textbf{Definition}$. The $determinant$ of the linear transformation $A$ over the vector space $\mathcal{V}$, denoted as $det(A)$, or $|A|$, is a scalar value that describes the $n$-dimensional volume **scaling factor** of the $A$.

<br>
<img src="images/RM_Determinant.gif" width="1000" height="300" alt="Example" />

<h3 align="center">How to define the determinant</h3>

$\bullet$ The $Leibniz$ formula for the determinant of a $2 \times 2$ matrix is:
$$|A| = \begin{vmatrix}
a & b\\ 
c & d
\end{vmatrix}
= ad - bc.
$$

$\bullet$ The $Laplace$ formula for the determinant of a $3 \times 3$ matrix is:

$$|A| =
\begin{vmatrix}
a & b & c\\
d & e & f\\
g & h & i
\end{vmatrix}
=
a \cdot
\begin{vmatrix}
e & f \\
h & i
\end{vmatrix}
- b \cdot
\begin{vmatrix}
d & f \\
g & i
\end{vmatrix}
+ c \cdot
\begin{vmatrix}
d & e \\
g & h
\end{vmatrix}
= aei + bfg + cdh - ceg - bdi - afh.
$$

$\bullet$ The $Leibniz$ formula for the determinant of an $n \times n$ matrix is:
$$|A| = \sum_{\sigma \in S_n} \left ( sign(\sigma)\prod_{i=1}^{n} \alpha_{i\sigma_i} \right ),$$
where the sum is computed over all permutations $\sigma$ of the set $\{1, 2, ..., n\}$.

<h3 align="center">Properties of the determinant</h3>

The determinant has many properties. Some basic properties of determinants are:
1. $|I_n| = 1$, where $I_n$ is the $n \times n$ identity matrix;

2. $|A^T| = |A|$, where $A^T$ denotes the transpose of $A$;

3. $|A^{-1}| = |A|^{-1}$;

4. $|AB| = |A|\cdot |B|$ for square matrices $A$ and $B$ of equal size;

5. $|\alpha A| = \alpha^n |A|$, for an $n \times n$ matrix $A$;

6. $|A + B| \geq |A| + |B|$ positive semidefinite matrices $A$ and $B$;

7. $|A| = \prod_{i=1}^{n} \alpha_{ii} $ for triangular matrix $A$, i.e. $\alpha_{ij} =0$ whenever $i >j$;

8. $|A| = \prod_{i=1}^{n} \lambda_{i} $ for an $n \times n$ matrix $A$ with eigenvalues $\lambda_1, ..., \lambda_n$.

<h3 align="center">Eigenvectors and eigenvalues</h3>

$\textbf{Definition}$. If $A$ is a linear transformation on $n$-dimensional vector space $\mathcal{V}$ over a field $\mathcal{F}$, and there exists **non-zero** vector $x\in \mathcal{V}$ and scalar $\lambda \in \mathcal{F}$, such that $Ax = \lambda x$, then $x$ is called $eigenvector$ and $\lambda$ is called $eigenvalue$ of the linear transformation $A$.

A more usual form is:
$$Ax = \lambda x \rightarrow Ax - \lambda x = 0 \rightarrow (A - \lambda I) x = 0$$

If this equation has a solution, we can calculate eigenvector and eigenvalue for the linear transformation $A$.

The eigenvalues can be obtained from the equation $det(A - \lambda I ) = 0$.

<h3 align="center">Eigendecomposition of a matrix</h3>


Let $A$ be a linear transformation on $n$-dimensional vector space $\mathcal{V}$.
Suppose that $\mathcal{Y} = \{q_1, ..., q_n\}$ is the set (basis) of linear independent eigenvectors of $A$ with corresponding eigenvalues $\lambda_1 \geq ...\geq \lambda_n$ (sorted in descending order).
Easy to see that matrix of the linear transformation $A$ in basis $\mathcal{Y}$ has the form:

$$
\Lambda = 
\begin{bmatrix}
 \lambda_1 & 0 & \cdots & 0 \\ 
 0 & \lambda_2 & \cdots & 0 \\ 
 \vdots & \vdots & \ddots & \vdots \\ 
 0 & 0 & \cdots & \lambda_n \\ 
\end{bmatrix}
$$

Lets suppose now that $\mathcal{X} = \{ x_1, ..., x_n\}$ is any basis of vector space $\mathcal{V}$, 
and $[A]$ is the matrix of $A$ in the coordinate system $\mathcal{X}$.
If $[Q]$ is the matrix of the transformation of coordinates: $Q: \mathcal{X} \to \mathcal{Y}$, i.e. $Q x_i = y_i$, 
then the matrix $[A]$ of the linear transformation $A$ can be fictorized to the form:

$$[A] = [Q][\Lambda][Q^{-1}]$$


<h3 align="center">Visualisation of eigenvalues and eigenvectors</h3>

<br>
<img src="images/RM_Eigenvalues_and_Eigenvectors.gif" width="1000" height="300" alt="Example" />

<h3 align="center">Application of matrices for solving a system of linear equations</h3>

$\bullet$ A general system of $m$ linear equations with $n$ unknowns can be written as:
$$\begin{matrix}
\alpha_{11} x_1 + \alpha_{12} x_2 + \cdots + \alpha_{1n} x_n = b_1\\ 
\alpha_{21} x_1 + \alpha_{22} x_2 + \cdots + \alpha_{2n} x_n = b_2\\  
\vdots \\
\alpha_{m1} x_1 + \alpha_{12} x_2 + \cdots + \alpha_{mn} x_n = b_m
\end{matrix}$$

where $x_1, ..., x_n$ are the unknowns, $\alpha_{11}, ..., \alpha_{mn}$ are the coefficients of the system, and $b_1, ..., b_m$ are the constant terms.
<br>
$\bullet$ We can write this system of linear equations in the equivalent matrix form:
$$Ax = b,$$
$$\text{where } A = 
\begin{bmatrix}
\alpha_{11} & \alpha_{12} & \cdots  & \alpha_{1n} \\ 
\alpha_{21} & \alpha_{11} & \cdots  & \alpha_{2n} \\ 
\vdots & \vdots & \ddots  & \vdots \\ 
\alpha_{n1} & \alpha_{11} & \cdots  & \alpha_{nn} 
\end{bmatrix}
\text{, }
x = 
\begin{bmatrix}
x_1 \\ x_2 \\ \vdots \\ x_n \\
\end{bmatrix}
\text{ and }
b = 
\begin{bmatrix}
b_1 \\ b_2 \\ \vdots \\ b_m \\
\end{bmatrix}
$$

<h3 align="center">Solving a linear system</h3>

There are **several algorithms** for solving a system of linear equations:
- **Elimination of variables**. The simplest method for solving a system of linear equations by repeatedly eliminating the variables.
- **Gaussian elimination**. The matrix is modified using elementary row operations until it reaches reduced **Row Echelon Form**:
 - **Type 1:** Swap the positions of two rows;
 - **Type 2:** Multiply a row by a nonzero scalar;
 - **Type 3:** Add to one row a scalar multiple of another.
- **Cramer's rule**. An explicit formula for the solution of a system of linear equations, with each variable given by a fraction of two determinants;
- **Matrix solution**: If the matrix $A$ is square and has full rank then the system has a unique solution given by $x = A^{-1}b$.

<h3 align="center">Singular Value Decomposition (SVD)</h3>

The matrix $AA^T$ and $A^TA$ are very special in linear algebra, since they has next useful properties:
<br> $\bullet$ they are symmetrical;
<br> $\bullet$ they are square;
<br> $\bullet$ they are positive semidefinite, i.e. eigenvalues are zero or positive: $\sigma_i \geq 0$;
<br> $\bullet$ both matrices have the same positive eigenvalues;
<br> $\bullet$ both have the same rank: $\rho(AA^T)=\rho(A^TA)=\rho(A)$;

Let $u_i$ and $v_i$ be the eigenvectors of $AA^T$ and $A^TA$ respectively: $(AA^T)u_i = \sigma_i u_i$ and $(A^TA)v_i = \sigma_i v_i$;

$\textbf{Definition}$. The eigenvectors $u_i$ and $v_i$ are called the $singular$ $vectors$ and the square roots of $\sigma_i$ eigenvalues are called $singular$ $values$ of the matrix $A$.

$\textbf{Theorem} \space \textbf{16}$. Let $A$ be the matrix of the linear transformation $A$ over the $n$-dimensional vector space $\mathcal{V}$. Then $A$ can be factorized as:
$$A = USV^T,$$
where $U$ and $V$ are $m \times r$ and $r \times n$ orthogonal matrices, i.e. $UU^T = U^TU = I$ and $VV^T = V^TV = I$, 
<br>with eigenvectors chosen from $AA^T$ and $A^TA$ respectively.
<br>$S$ is an $r \times r$ diagonal matrix with elements equal to the **root of the positive eigenvalues** of $AA^T$ or $A^TA$.

$\textbf{Proof}$. Lets consider the eigenvectors and eigenvalues of the matrix $AA^T$ and $A^TA$ :
$$AA^T u_i = \sigma_i u_i \text{, for } i =\overline{1,m};$$
$$A^TA v_i = \sigma_i v_i \text{, for } i =\overline{1,n}.$$
We can write these equations in matrix form:
$$AA^T U = U S^2 \text{ and } A^TA V = V S^2,$$
where $U = \{u_1, ..., u_m\}$, $V = \{v_1, ..., v_n\}$ and $S = diag(\sigma_1, ... , \sigma_r, 0, ..., 0).$<br>
Remembering, that $UU^T = I$ and $V^TV = I$, we have:
$$AA^T = U S^2 U^T = U S V^T V S U^T = (U S V^T)(U S V^T)^T.$$
Therefore, $A$ can be expressed in the form:
$$A = U S V^T.$$

<h3 align="center">Visualisation of SVD</h3>

<br>
<img src="images/RM_SVD.jpg" width="1500" height="300" alt="Example" />

<h3 align="center">Principal Component Analysis (PCA)</h3>

$\textbf{Definition}$. $Principal \text{ } Component \text{ } Analysis$ ($PCA$) is a dimensionality reduction technique that enables you to identify correlations and patterns in a data set so that it can be transformed into a data set of significantly lower dimension without loss of any important information.


The standard context for PCA as an exploratory data analysis tool involves a dataset with observations on $m$ numerical $vectors$, for each of $n$ $features$. 
These $m$ numerical values define $n$-dimensional vectors $\{x_1, ..., x_m\}$, where $x_i = \{x_{1i}, ..., x_{ni}\}$ for $i = \overline{1, m}$. ($x_{ij} \in \mathbb{R}$ for $i=\overline{1,m}$ and $j=\overline{1,n}$).
<br> Or, equivalently, data is defined as $m \times n$ data matrix $X =(x_{ij})$.

<h3 align="center">Step by step computation of PCA</h3>

The below steps need to be followed to perform dimensionality reduction using PCA:

1. Standardization of the data set;

2. Calculation of the covariance matrix;

3. Calculation of the singular values and singular vectors and factorization of the covariance matrix;

4. Calculation of the Prinipal omponents and reduction of the data set size;

5. Data reconstruction from a reduced data set;

6. Validation of the reconstricted data.

<h3 align="center">Step 1: Standardization of the data</h3>

$\bullet$ The properties of PCA have some **undesirable features** when variables are measures in **different units**;

$\bullet$ To **overcome** this undesirable feature, it is common practice to begin by **standardizing** the variables;

$\bullet$ Standardization is carried out by replacing initial data matrix $X$ with the standardized data matrix $Y$;

$\bullet$ Each data value $x_{ij}$ is both centered and divided by the standard deviation $s_j$ of the $n$ observations of the variable $j$:
$$y_{ij} = \frac{x_{ij} - \bar{x}_j}{s_j},$$
$$\text{where } \bar{x}_j = \frac{1}{m}\sum_{i=1}^{m} x_{ij} \text{ and } s_j = \sqrt{\frac{1}{m-1} \sum_{i=1}^{m}(x_{ij} - \bar{x}_j)^2}.$$

<h3 align="center">Step 2: Computing the covariance matrix $\Sigma$</h3>

$\bullet$ A covariance matrix $\Sigma$ expresses the **correlation between each two different features** in the data set:

$$\Sigma =  
\begin{bmatrix}
cov[y_1, y_1] & cov[y_1, y_2] & \cdots & cov[y_1, y_n]\\
cov[y_2, y_1] & cov[y_2, y_2] & \cdots & cov[y_2, y_n]\\
\vdots & \vdots & \ddots & \vdots\\ 
cov[y_n, y_1] & cov[y_m, y_2] & \cdots & cov[y_n, y_n]\\ 
\end{bmatrix}
,$$
where where each element represents the **covariance between two features** (remember that $y_i$ are centered variables):
$$cov[y_i, y_j] = \frac{1}{m-1} \sum_{k=1}^{m} (y_i)(y_j)^T.$$


$\bullet$ If the covariance value is **negative**, then the respective features are **indirectly proportional** to each other;

$\bullet$ A **positive** covariance denotes that the respective features are **directly proportional** to each other.

<h3 align="center">Step 3: Calculating the singular values and singular vectors</h3>

$\bullet$ The next step is to factorize the matrix $\Sigma$ using the SVD:

$$\Sigma = U S V^T,$$

$\bullet$ $U$ and $V$ are $n \times n$ orthogonal matrices, i.e. $UU^T = U^TU = I$ and $VV^T = V^TV = I$,
with singular vectors chosen from $\Sigma\Sigma^T$ and $\Sigma^T\Sigma$ respectively. 

$\bullet$  The $S$ is an $n \times n$ diagonal matrix with elements equal to the $\sqrt{\sigma_i}$ of the singular values of $\Sigma\Sigma^T$ or $\Sigma^T\Sigma$.

$\bullet$  Singular values are sorted in descending order: $\sigma_1 \geq \sigma_2, .... \geq \sigma_n$.

<h3 align="center">Step 4: Computing the Principal Components (PC)</h3>

$\bullet$ $Principal$ $Components$ (PC) are the new set of variables that are obtained from the initial set of variables;

$\bullet$ Once we have computed the **singular values** and **singular vectors**, we order them in the **descending order**, <br>where the singular vector with the **highest singular value** is the most significant and forms the first principal components.

$\bullet$ The principal components of **lesser significances** can thus be **removed** in order to **reduce the dimensions** of the data.

$\bullet$ Thus we take first $k \leq n$ columns of $U$ and consider new matrix $U_k$:

$$U_k = \{u_1, ..., u_k\}.$$

<h3 align="center">Step 5: Reducing the dimensions of the data set</h3>

$\bullet$ The last step in performing PCA is to re-arrange the original data with the final principal components which represent the maximum and the most significant information of the data set. 

$\bullet$ In order to replace the original data axis with the newly formed Principal Components, you simply multiply the transpose of the original data set by the transpose of the obtained feature vector.

$\bullet$ Thus newly formed Principal Components are:

$$z_i = U_k^T y_i.$$

Since $U_k^T \in \mathbb{R}^{k\times n}$ and $y_i \in \mathbb{R}^{n}$, thus $z_i \in \mathbb{R}^{n}$.

<h3 align="center">Step 6: Validation of the reconstricted data</h3>

$\bullet$ We can approximate the reconstruction of the original value by ${y_{i}}' = U_{k}z_{i}$ and compare it with the original value $y_i$:
$$\frac{
\frac{1}{m}\sum_{i=1}^{m} \left \| y_{i} - {y_{i}}' \right \|^{2}
}{
\frac{1}{m}\sum_{i=1}^{m} \left \| y_{i} \right \|^{2}
} \leq \epsilon$$
where $\epsilon$ is the $proportion \text{ } of \text{ } total \text{ } variance$.
<br>Using the next inequality:
$$\frac{
\frac{1}{m}\sum_{i=1}^{m}\left \| y_{i} - {y_{i}}' \right \|^{2}
}{
\frac{1}{m}\sum_{i=1}^{m}\left \| y_{i} \right \|^{2}
} \leq 1 -
\frac{
\sum_{i=1}^{k}S_{ii}
}{
\sum_{j=1}^{n}S_{jj}
},$$
we can write:
$$\frac{
\sum_{i=1}^{k}S_{ii}
}{
\sum_{j=1}^{n}S_{jj}
} \geq \epsilon$$
$\bullet$ It is common practice $\epsilon = 70\%$ is used to decide how many PCs should be retained.