$$
\newcommand{\F}{\mathbb{F}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\v}{\mathbf{v}}
\newcommand{\w}{\mathbf{w}}
\newcommand{\a}{\mathbf{a}}
\newcommand{\b}{\mathbf{b}}
\newcommand{\c}{\mathbf{c}}
\newcommand{\w}{\mathbf{w}}
\newcommand{\u}{\mathbf{u}}
\newcommand{\0}{\mathbf{0}}
\newcommand{\1}{\mathbf{1}}
\newcommand{\A}{\mathbf{A}}
\newcommand{\B}{\mathbf{B}}
\newcommand{\Q}{\mathbf{Q}}
\newcommand{\b}{\mathbf{b}}
\newcommand{\q}{\mathbf{q}}
\newcommand{\e}{\mathbf{e}}
\newcommand{\I}{\mathbf{I}}
$$

## Matrix Operations

### Matrix Addition and Subraction

The sum of two matrices **of the same size** $m \times n$, $\A$ and $\B$ are calculated elementwise:

$$
(\A \pm \B)_{i, j} = \A_{i, j} \pm \B_{i, j}
$$

where $1 \leq i \leq m , \quad 1 \leq j \leq n$.

### Scalar-Matrix Multiplication

For any scalar $\lambda \in \F$, the **Matrix-Scalar Multiplication** $\lambda \A$ is given by:

$$
(\lambda \A)_{i, j} = \lambda \cdot \A_{i, j}
$$

#### Commutative of Scalar-Matrix Multiplication

Not surprisingly, the operation is commutative such that, given any scalar $\lambda$, and any sequence of matrices $\A, \B$, we have:

$$
\lambda \A\B = \A\lambda\B = \A\B\lambda
$$

### Matrix Operations Fulfill Field Properties

In fact, matrix operations fulfill the properties of field properties. That is, for any matrix $\A$ and $\B$ of the same shape and size, we have[^courtesy_macro_analyst]:

1. $\A+ \B= \B+ \A$
2. $(\A+\B)+ C=\A+(\B+C)$
3. $c(\A+\B)=c\A+c\B$
4. $(c+d)\A=c\A+c{D}$
5. $c(d\A)=(cd)\A$
6. $\A+=\A$, where ${0}$ is the zero matrix
7. For any $\A$, there exists a $-\A$, such that $ \A+(- \A)=0$.


Although we have not learn **matrix multiplication**, their properties are:

1. $ \A({\B\mathbf{C}})=({\A\B}) \mathbf{C}$
2. $\mathbf{C}({\A\B})=(\mathbf{C}\A)\B=\A(\mathbf{C}\B)$
3. $\A(\B+ \mathbf{C})={\A\B}+{\A\mathbf{C}}$
4. $(\B+\mathbf{C})\A={\B\A}+{\mathbf{C}\A}$

[^courtesy_macro_analyst]: https://github.com/MacroAnalyst/Linear_Algebra_With_Python

### Matrix Tranpose

Given a matrix $\A \in \R^{m \times n}$, the **transpose of $\A$** is denoted $\A^\top$ and formed by mapping the rows of $\A$ to columns and columns of $\A$ to rows, as illustrated:

$$
(\A^\top)_{i, j} = \A_{j, i}
$$

#### Theorem (A Matrixs transpose is itself)

Prove that the transpose of a matrix $\mathbf{A}$'s transpose is $\mathbf{A}$: $(\mathbf{A}^\top)^\top = \mathbf{A}$.

##### Proof

Consider a matrix $A_{n \times k}$ as follows, 
$$\mathbf{A}=\begin{bmatrix}
 a_{11} & a_{12} & \cdots & a_{1k} \\
 a_{21} & a_{22} & \cdots & a_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nk} \\
\end{bmatrix}
$$

By definition of **Transpose**, all $(i,j)$ entries of $A$ is mapped to $(j,i)$, for example, $a_{1,2}$ becomes $a_{2,1}$ when transposed. Performing a transpose once more will then map all $(j,i)$ entries back to $(i,j)$. $A$ is unchanged and thus $(\mathbf{A}^\top)^\top = \mathbf{A}$. 
**Q.E.D**

#### Theorem (Sum of Transpose is Transpose of Sum)

Given two matrices $\mathbf{A}$ and $\mathbf{B}$, show that the sum of transposes is equal to the transpose of a sum: $\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top$.

##### Proof

Say that we have two matrices $\mathbf{A} \in \mathbb{R}^{n \times k}$ and $\mathbf{B} \in \mathbb{R}^{k \times m}$:

$$\mathbf{A}=\begin{bmatrix}
 a_{11} & a_{12} & \cdots & a_{1k} \\
 a_{21} & a_{22} & \cdots & a_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nk} \\
\end{bmatrix},\quad
\mathbf{B}=\begin{bmatrix}
 b_{11} & b_{12} & \cdots & b_{1m} \\
 b_{21} & b_{22} & \cdots & b_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 b_{k1} & b_{k2} & \cdots & b_{km} \\
\end{bmatrix},\quad
\mathbf{A+B}=\begin{bmatrix}
 a_{11}+b_{11} & a_{12}+b_{12} & \cdots & a_{1m}+b_{1m} \\
 a_{21}+b_{21} & a_{22}+b_{22} & \cdots & a_{2m}+b_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{k1}+b_{k1} & a_{k2}+b_{k2} & \cdots & a_{km}+b_{km} \\
\end{bmatrix}.$$

Then we can prove it by simply computing the LHS and RHS respectively. Without loss of generality, we pick any pair of point $a_{i,j} \in \mathbf{A}, b_{i,j} \in \mathbf{B}$ and this pair of point corresponds to $a_{i,j}+b_{i,j} \in \mathbf{A}+\mathbf{B}$. Note in particular that $a_{i,j}+b_{i,j} = (a+b)_{i,j}$.

Then the transpose of the point $a_{i,j}$ and $b_{i,j}$ is $a_{j,i}$ and $b_{j,i}$, which sums to $a_{j,i}+b_{j,i} = (a+b)_{j,i}$, which is the transpose of the point $a_{i,j}+b_{i,j} = (a+b)_{i,j}$. **Q.E.D**

### Shifting a Matrix

When we say we shift a matrix, we really mean the following:

Given a **square matrix** $\A \in \R^{n \times n}$, then shifting a matrix by a constant $\lambda$ is the following operation:

$$
\widetilde{\A} = \A + \lambda \I_n \quad \A \in \R^{n \times n}, \lambda \in \R
$$

#### Example and Motivation

The author Mike gave us an example with some motivation behind, with reference to **Linear Algebra: Theory, Intuition, Code, 2021. (pp. 127)**, we consider the matrix:

$$
\widetilde{A} = \A + 0.1 \cdot \I_3 = \begin{bmatrix}1 & 3 & 0  \\ 1 & 3 & 0 \\ 2 & 2 & 7 \end{bmatrix} + 0.1 \cdot \I_3 = \begin{bmatrix}1.1 & 3 & 0  \\ 1 & 3.1 & 0 \\ 2 & 2 & 7.1 \end{bmatrix}
$$

Then we observed:

- Diagonal Elements will be affected by shifting, but nothing else. This is obvious as off-diagonal elements of the scaled identity matrix are all zero entries.
- Note that row 1 and 2 of $\A$ are identical, and thus linearly dependent, but just by shifting a little, we will have distinct rows in $\widetilde{A}$.
- In practice, we choose $\lambda$ to be small so that the shifted matrix is similar to the original matrix $\A$, while still satisfying some constraints.

#### Applications in Machine Learning

The well known regularization technique is shifting a matrix in disguise.

One can read it more here[^Tikhonov_regularization].

[^Tikhonov_regularization]: https://en.wikipedia.org/wiki/Tikhonov_regularization

### Diagonal

We can extract the diagonal of a matrix into a vector:

$$
\v = \text{diag}(\A) \quad \A \in \R^{m, n}, v_i = \A_{i, i}, i = \{1, 2, ..., \min{(m, n)}\}
$$

#### Applications in Machine Learning

The diagonal elements of a matrix can be extracted and placed into a vector. This is used, for example, in statistics: the diagonal elements of a covariance matrix contain the variance of each variable. **- Mike X Cohen: Linear Algebra: Theory, Intuition, Code, 2021. (pp. 129)**

### Trace

The trace of a matrix $\A \in \R^{m \times n}$ is:

$$
\text{tr}(\A) = \sum_{i=1}^{m}a_{i, i}
$$

#### Applications in Machine Learning

The trace operation has two applications in machine learning: It is used to compute the Frobenius norm of a matrix (a measure of the magnitude of a matrix) and it is used to measure the "distance" between two matrices. **- Mike X Cohen: Linear Algebra: Theory, Intuition, Code, 2021. (pp. 129)**