## Engineering Mathematics I

### I. Introduction to Vectors

#### I.1 Scalars and Fields

Numbers: Real Numbers ($\mathbb{R}$) <br>
    Natural Numbers ($\mathbb{N}$) <br>
    Integers ($\mathbb{Z}$) <br>
    Complex Numbers ($\mathbb{C}$)

##### Fields: 
a field consists of a set of objects ($\mathbb{F}$), addition ($+$), and multiplication ($\bullet$). <br>
It has five properties: <br>
1. It is closed to addition and closed to multiplication
2. Addition and multiplication is commutative and associative
3. It has a inverse and identity to addition
4. It has a inverse and identity to multiplication
5. Multiplication can be distributed over addition

$\rightarrow$ an element of such a field $\mathbb{F}$ is called a scalar

#### I.2 Vectors

An n-vector: a list of n scalars over a field (mostly $\mathbb{R}$ or $\mathbb{C}$) <br>
It is written as this: $x = \begin{bmatrix} x_1\\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ &emsp;
(n is the <em> dimension </em> or <em> size </em> of x) <br>
It is different from the <em> transpose </em> of x, which is $x^T = [x_1, x_2, ... x_n]$


$\mathbb{R}^n$: set of all n-vectors over $\mathbb{R}$ <br>
##### Some interesting vectors: <br>
* $0_n$: the zero-vector. Also written as 0. An n-vector whose every entry is 0.
* $1_n$: the sum-vector. Also written as 1. An n-vector whose every entry is 1.
* $e_i$: the unit vector. An n vector whose every entry is 0, except the $i^{th}$, which is 1.


##### Applications of Vectors: <br>
- Location or a displacement.
- Bag-of-words representation of a document, or word-2-vec representation
- Time-series: the evolution in discrete time of some quantity
- Image: greyscale or color image
- Features: attributes of an entity

Gradient ($\nabla$) of a function $f : \mathbb{R}^n \rightarrow \mathbb{R}$: <br>
Diffrentiate the function with respect to each variable to get the <strong> gradient vector </strong>.

#### I.3 Linear Combinations

##### Addition and Subtraction: <br>
For each element add/subtract the corresponding element. <br>
$x + y = \begin{bmatrix} x_1 + y_1\\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}$ <br> <br>
Geometric Interpretation: adding displacements.

##### Scalar Multiplication: <br>
For each element multiply the scalar. <br>
$\alpha x = \begin{bmatrix} \alpha x_1\\ \alpha x_2 \\ \vdots \\ \alpha x_n \end{bmatrix}$ <br> <br>
Geometric Interpretation: scaling the vector.

##### Linear Combination: <br>
for m different vectors $x^i \in \mathbb{R}^n, i = 1, 2, ... m, x$ is a <strong> linear combination </strong> <br> of those vectors with the coefficients $\alpha_i \in \mathbb{R}$ when x is formed thus: <br>
$ x = \alpha_1 x^1 + \alpha_2 x^2 + ... + \alpha_m x^m$ <br>

Any vector $ b = (b_1, b_2, ... b_n)$ can be represented as a linear combination of the unit vectors: <br> $b = b_1 e_1 + b_2 e_2 + ... b_n e_n$

##### Span: <br>
with respect to m vectors, span$({x^1, x^2, ... x^n})$ <br> is the set of all linear combinations of those vectors. <br> <br>

##### Conic Combination: <br>
If all the coefficients of the vectors are non-negative, the span is called the <strong> conic combination. </strong> The conic combination encompasses the region that is between the vectors.<br> <br>

##### Affine Combination: <br>
If the sum of all the coefficients is 1, the span is called the <strong> affine combination. </strong> The affine combination encompasses the hyperplane that is formed by the vectors. <br> <br>

##### Convex Combination: <br>
The <strong> convex combination </strong> is both conic and affine. The convex combination encompasses the part of the hyperplane that is between the vectors.

##### Linear Dependence: <br>
A set of vectors $V = (x^i \in \mathbb{R}^n)$ is <strong> linearly dependent </strong> if <br>
the linear combination of V is 0 while some $\alpha$ is not 0. <br> 
If V is linearly dependent, at least one of them can be represented by a linear combination of the others. <br>
That is, at least one vector is redundant.
<br><br>
##### Linear Independence: <br>
V is <strong> linearly independent </strong> if it is not linearly dependent.

#### I.4 Lengths and Dot Products

##### Inner Product: <br>
$x \bullet y = x^T y = y^T x = x_1 y_1 + x_2 y_2 ... + x_n y_n$ (results in a scalar) <br>

Attributes:
- $x^T x \geq 0$ &emsp; $(x^T x = 0 \iff x = 0)$ <br>
- $e_i^T x = x_i$ (ith element)
- $1^T x = x_1 + x_2 + ... x_n$ (sum of elements)
- $x^T x= x_1^2 + x_2^2 + ... x_n^2$ (sum of squares) <br>
- if $w$ is a weight vector and $f$ is a feature vector, then $w^T f$ is total weighted score
- $x - (\frac{1^T x} {n}) 1$: de-meaned vector (transposed that the mean is 0)

##### Linear Functions and Affine Functions: <br>
Linear Function: $f: \mathbb{R}^n \rightarrow \mathbb{R}$
is linear if for any scalars $\alpha, \beta$ and n-vectors $x, y$: <br>
$f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)$ <br>
$\rightarrow$ every linear function is an inner product of x with an n-vector a. <br>
$f(x) = a^T x = a_1 x + a_2 x + ... a_n x$ <br>
So, $f(0) = 0$ for every linear function. <br>
<br>
Affine Function: $f(x) = a^T x + b$ (b is constant) <br>
Gradients of Linear/Affine Functions is the vector $a$.

##### Norm and Distance: <br>
The Euclidean Norm: <br>
$|x| = \sqrt{x^T x}$ $\rightarrow$ induced by the inner product. <br>
The Euclidean ($l_2$) Norm is used to measure the magnitude of a vector. <br>
$|x - y|$ measures the distance between the two. 

##### Properties of Norms: <br>
- $|\alpha|\Vert x\rVert = \Vert\alpha x\rVert$
- Triangle Inequality: $\Vert x + y \rVert \leq \Vert x \rVert + \Vert y\rVert$
- Nonnegativity: $\Vert x\rVert \geq 0$

##### Other Kinds of Norms: <br>
- $l_n$ norm: $(\sum_{i=1}^n{x_i}^p)^{1/p}$
- $l_0$ norm: the number of elements that is not 0. (technically not a norm)
- $l_1$ norm: $\sum_{i=1}^n{|x_i|}$
-$l_\infty$ norm: $\max(|x_1|, |x_2|, ... |x_n|)$

##### Application of Norms: <br>
- $rms(x) = \frac {\Vert x \rVert}{\sqrt n}$
- $std(x) = rms(\hat x)$ &emsp; where, $\hat x$ is the de-meaned vector.
- $rms(x)^2 = avg(x)^2 + std(x)^2$
- Chebyshev Inequality: the fraction of elements with <br> 
$|x_i| \geq a$ is less than $(\frac {rms(x)}  {a})^2$
- Chebyshev Inequality for De-meaned Vectors: <br>
the fraction of elements of x with 
$|x_i - avg(x)| \geq \alpha std(x)$ is less than $\frac {1}{\alpha^2}$

##### Angles: <br>
- Cauchy-Schwarz Inequality: $|x^T y| \leq \Vert x\rVert \Vert y \rVert$
- Angle: the <strong> angle </strong> between two vectors is defined thus: <br>
    $\theta = \arccos (\frac {x^T y}{\Vert x \rVert \Vert y \rVert}) $
- Cosine Similarity: the similarity between two vectors <br>
can be measured using the cosine of the angle between them.
- Correlation Coefficient: the Cosine Similarity of De-meaned vectors.

#### I.5 Matrices

Matrix: an m x n rectangular array of numbers <br>
(m is the row count, n is the column count) <br>

$A = \begin{bmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} 
\\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & ... & a_{mn} \end{bmatrix}$

##### Applications of Matrices: <br>
* images ($a_{ij}$ is the pixel value of a greyscale image)
* contingency table
* feature data: ($a_{ij}$ is the value of feature $i$ for data $j$)
* node-arc matrix: matrix for directed graphs ($a_{ij}$ is the value for node $i$ and arc $j$)

##### Transpose of a Matrix: <br>
The transpose of a matrix $A \in \mathbb{R}^{m \times n}$ is denoted $A^T$ <br>
$A^T$ is an $n \times m$ matrix. $(A^T)_{ij} = a_{ji}$ <br>
The <strong> row vectors </strong> and the <strong> column vectors </strong> is interchanged. <br>
${AB}^T = B^T A^T$

##### Special Matrices: <br>
* Zero Matrix: all entries are zero. 
* Identity Matrix: all entries are zero, except the entries on the diagonal, which is 1. <br>
The identity matrix is a square matrix. It is also a diagonal matrix.
* Triangular Matrix: the lower triangular matrix: zeros above the diagonal. <br>
the upper triangular matrix: zeros below the diagonal.
* Symmetric Matrix: a square matrix that $A = A^T$, that is, $a_{ij} = a_{ji}$
* Diagonal Matrix: a square matrix such that all entries are zero except on the diagonal. <br>
It is Symmetric, Upper Triangular, and also Lower Triangular.

##### Block Representations:
Matrices whose entries are matrices. Of course, the dimensions must be compatible.
$\rightarrow$ Row and Column Representation of Matrices: <br>
$ A = \begin{bmatrix} a_{1} & a_{2} & ... & a_{n} \end{bmatrix}$
&emsp; where $a_{i}$ is the column vectors of the matrix. <br>
$ A = \begin{bmatrix} \hat a_{1}^T\\ \hat a_{2}^T \\ \vdots \\ \hat a_{m}^T \end{bmatrix}$
&emsp; where $\hat a_{i}^T$ is the row vectors of the matrix.

##### Matrix Norm: <br>
The Frobenius Norm is defined thus: <br>
$\Vert A \rVert _F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{ij}^2}$ <br>
The Distance between two matrices is defined as $\Vert A-B \rVert _F$

##### Matrix Operations: <br>
Addition, Subtraction, and Scalar multiplication: element-wise. <br>
Multiplication with Vectors:
$Ax = \begin{bmatrix}
a_{11} & a_{12} & ... & a_{1n} 
\\ a_{21} & a_{22} & ... & a_{2n} 
\\ \vdots & \vdots & \ddots & \vdots \\ 
a_{m1} & a_{m2} & ... & a_{mn} 
\end{bmatrix}$ 
$\begin{bmatrix} x_1\\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ <br>
= $\begin{bmatrix} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{bmatrix} x_1 +
\begin{bmatrix} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{bmatrix} x_2 +
...
\begin{bmatrix} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{bmatrix} x_n$ <br>
= $a_1 x_1 + a_2 x_2 + ... a_n x_n$ &emsp; 
($a_i$ are column vectors, $x_i$ are components of x.) 
<br><br>
Column Interpretation: <br>
A linear combination of the column vectors of A where the coefficients are $x_i$
<br> Row Interpretation: <br>
A batch inner product of the row-vectors with $x$. 

##### Special Products: <br>
* $0x$: the 0 vector.
* $Ix$: the original x vector.
* De-meaned vector: multiply by A when A is 
$\begin{bmatrix} 
1- \frac {1}{n} & -\frac {1}{n} & ... & -\frac {1}{n} 
\\ -\frac {1}{n} &  1- \frac {1}{n} & ... & -\frac {1}{n} 
\\ \vdots & \vdots & \ddots & \vdots \\ 
-\frac {1}{n} & -\frac {1}{n} & ... & 1- \frac {1}{n} 
\end{bmatrix}$ 

##### Matrix Products: <br>
To multiply matrices, the inner dimensions must agree. <br>
$\rightarrow A \in \mathbb{R}^{m \times p} \times B \in \mathbb{R}^{p \times n}
= C \in \mathbb{R}^{m \times n}$ <br>
$C_{ij} = \hat a_i^T b_j$ <br><br>

Array of Matrix-Vector Products:
$C = [A b_1 A b_2 ... A b_n]$ <br>
Sum of outer products: 
$C = \hat a_1^T b_1 + \hat a_2 ^T b_2 +... \hat a_p^T b_p$

##### Gram Matrix: <br>
The <strong> Gram Matrix </strong> of matrix A 
$\in \mathbb{R}^{m \times n}$ is defined thus: <br>
$G \in \mathbb{R}^{n \times n} = A^T A$ <br>
If the Gram Matrix is $I$, then $A$ is a orthogonal matrix. $A$ has to be a square. <br>
Orthogonal Matrix: all the row and column vectors are orthogonal to each other.

##### Powers of Matrices: <br>
$A^n = AA ... A$ <br>
Negative powers: defined by inverses. (might not even exists) <br>
Fractional powers: if A is positive definite, $\sqrt A$ is defined. <br>
In Undirected Edge-Adjacency Matrices: <br>
$(A^l){ij}$: the number of ways to get from node i to node j, where one moves l times.

##### Inverses: <br>
If A is a square matrix, and a square matrix B satisfies <br>
$AB = BA = I$, then B is the inverse of A (denoted $A^-1$), <br>
and A is called invertible or non-singular. <br>
Not all matrices have inverses (singular) <br><br>

In a 2 x 2 case:
$\begin{bmatrix} a & b \\ c & d \end{bmatrix} ^{-1} 
= \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$

If $A$ and $B$ are invertible, then $AB$ is invertible. <br>
If A is invertible, $A^{-1}$ is invertible. <br>
If $BA = I$ and $CA = I$, then $B = C$. <br>
In square matrices, inverses are unique.

### II. Systems of Linear Equations and LU Decomposition

#### II.1 Systems of Linear Equations

m linear equations of n unknowns: <br>
$a_{11}x_1 + a_{12}x_2 + ... a_{1n}x_n = b_1$ <br>
$a_{21}x_1 + a_{21}x_2 + ... a_{2n}x_n = b_2$ <br>
$\vdots$ <br>
$a_{m1}x_1 + a_{m2}x_2 + ... a_{mn}x_n = b_n$ <br>
represented as $Ax = b$, where $A \in \mathbb{R}^{m \times n},
x \in \mathbb{R}^n, b \in \mathbb{R}^m$ <br>
$a_{ij}$ is the gain factor from the $j^{th}$ input to the $i^{th}$ output.

##### Linear Functions <br>
$f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ is linear if for all 
$x, y, \alpha, \beta$: <br>
$f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)$ <br>
$\rightarrow$ All linear functions are represented by a matrix. <br>
All functions can be approximated as a Linear Function using the Jacobian. <br>
When $x \approx x_0$, $f(x) \approx f(x_0) + D_f(x_0)(x - x_0)$ &emsp; where $D_f$ is the Jacobian.

##### Applications: <br>
* Function fittiing $\rightarrow$ Vandermonde matrix
* Regression
* Forecasting
* Control 
* Markov Matrix: a matrix that each columns's sum is 1. <br> 
It presents states and transitions as probabilities. <br>
ex. $\begin{bmatrix} 0.9 & 0.2 \\ 0.1 & 0.8 \end{bmatrix}$ <br>
$\rightarrow$ 0.9 is the probability that state 1 will remain state 1, <br> 0.1 is the probability state 1 will change to state 2, <br> 0.8 is the probability state 2 will remain state 2, <br> and 0.2 is the probability state 2 will change to state 1. <br>
$\rightarrow M^N$ is the states after the $N^{th}$ iteration. <br>
Also the equilibrium can be found by solving $Mx = x$, or the eigenvalue.

#### II.2 Geometric Views on Linear Systems

##### Hyperplanes: <br>

$H = \{x \in \mathbb{R}^n : a^Tx = b\}$ <br>
If $a = 0$, the solution is non-existent or infinite. <br>
$H_0 = \{x \in \mathbb{R}^n : a^Tx = 0\}$ <br>
$\rightarrow H_0$ is the set of vectors that is orthogonal to $a$. <br> 
If $v \in H_0$, for a particular element $x_0$ of H, for any $x \in H, x = v + x_0$ <br>
$\rightarrow$ $H = \{x \in \mathbb{R}^n: x = x_0 + tv, t \in \mathbb{R}, v \in H_0 \}$ <br>
<br> Halfspaces: $H_1 = \{x \in \mathbb{R}^n : a^Tx \geq b\}$ <br>
$H_2 = \{x \in \mathbb{R}^n : a^Tx \leq b\}$ <br>

##### Row Space and Column Space:<br>
Row Picture: intersection of $m$ hyperplanes in $n$-dimensions. <br>
Column Picture: linear combination of $n$ $m$-vectors to make $b$. <br>
Column-Space: $C(A) = span(a_1, a_2, ... a_n)$ <br>
The Column Space is the Range of $f(x)$.

##### Onto (Surjective) and One-to-one (Injective) Functiions: <br>
Surjective Functions:<br>
every element in the codomain has at least one corresponding element in the domain. <br><br>
Injective Functions:<br>
every element in the range has only one corresponding element in the domain. <br><br>
Bijective Functinos: <br>
the function is both surjective and injective $\rightarrow$ the function is invertible. (Non-singualar)

##### Nullspace (Kernel): <br>
$ N(A) = \{x = \mathbb{R}^n : Ax = 0\}$ <br>
If there is an $x \in N(A)$ that $x \neq 0$, then $f(x) = Ax$ is not injective.

#### II.3 Solutions to Linear Systems

##### Equivalent Linear Systems: <br>
Elementary Row Operations:
* Multiplication of one row by a nonzero scalar $\alpha$.
* Replacement of (row $k$) by (row $k$) + $\alpha$(row $l$), 
where $k \neq l$ and $\alpha$ is a scalar
* Interchanging two rows.

##### Gaussian Elimination: <br>
Input: $Ax = b$ <br>
Goal: Use the elementary row operations to find an equivalent system $Ux = c$ that U is of echelon form. <br>
$\rightarrow$ make zeros below the diagonal to eliminate variables. <br>
The coefficients on the diagonal are called <strong> pivots </strong>. <br>
Sometimes a matrix - vector product does not have a unique solution.

$Ax = b \iff$ every $y \in \mathbb{R}^m$, if $y^TA = 0, y^Tb = 0$ <br>

Calculating Time: approx. $\frac{1}{3} n^3$ <br>

RREF: (Row-Reduced-Echelon Form) <br>
All rows have only pivots in them. Calculated from the Gauss-eliminated U, <br>
and back-substituted using the Jordan elimination. <br>
$\rightarrow$ $R = I \iff$ A is invertible. $\iff N(A) = {0}$ &emsp; (N(A) is the nullspace of A)
<br> $\iff$ the columns of $A$ are independent $\iff$ the rank of $A$ is $n$ 

##### LDU Decomposition: <br>

$L$: Inverse of the elimination matrices used for GE. <br>
$D$: Diagonal matrix consisting of pivots. <br>
$U$: the matrix that is the echelon form reduced by GE, its pivots being all 1. <br> <br>

If row exchange is needed, then include Permutation Matrix P. <br>
$P$: A matrix that is a row exchange of the Identity Matrix. <br> <br>

$\rightarrow$ $PA = LDU$ &emsp; $\rightarrow$ always possible! <br>
If A is symmetric, $\rightarrow$ $PA = LDL^T$

##### Least Squares: <br>
If $Ax = b$ does not have a solution, one can find x such that $Ax$ is close to b. <br>
$\rightarrow$ find x that minimizes $\Vert Ax - b \rVert$ <br>
$A^TAx = A^Tb$ always has a solution, and it is called <strong> normal equation. </strong><br>
Its solution solves least square problems. <br>
If the columns of A are independent, $(A^TA)^-1$ exists, making $x = (A^TA)^{-1}(A^T)b$. <br>
$\rightarrow$ used in linear regression.

##### Non-negative, Integer Solutions: <br>
Some equations $Ax = b$ require nonnegative variables or integer variables. <br>
Nonnegative systems of equations become systems of inequalities: <br>
$Ax = b, x \geq 0$ $\rightarrow$ Linear Programming <br>
Integer systems are called Diophantine Equations: <br>
$Ax = b, x \in \mathbb{Z}$ $\rightarrow$ Diophantine Equations <br>
When nonnegative and integer systems merge to one, it becomes Integer Programming. <br>
$Ax = b, x \geq 0, x \in \mathbb{Z}$ $\rightarrow$ Integer Programming <br>
Integer Programming is not yet solvable. (P-NP problem)

### III. Vector Spaces

#### III.1 Definition of Vector Spaces

##### Vector Space: <br>
* set V of elements called vectors.
* a sum operation which returns a vector in V.
* a scalar multiplication operation which returns a vector in V
* a element 0 called the zero vector.
<br><br>

##### Properties: <br>
* $x + y = y+x$
* $0 + x = x$, $1x = x$
* $a(x+y) = ax + ay, (a+b)x = ax + bx$

##### Subspaces: <br>
A subset of the vector space V which by itself is a subspace. <br>
$\rightarrow$ Closed under linear combination. <br>
$\rightarrow$ Must contain the 0 vector.

##### Four Fundamental Subspaces: <br>
1.Column Space of A $\in \mathbb{R}^{m \times n}: C(A)$ <br>
span($a_1, a_2,... a_n) \in \mathbb{R}^m$ <br>
2. Null Space (Kernel) of $A: N(A)$ <br>
$\{x \in \mathbb{R}^n | Ax = 0\}$ $\rightarrow$ always contains the $0$ vector. <br>
3. Row Space of $A: C(A^T)$ <br>
span ($\tilde a_1 ^T, \tilde a_2 ^T, ... \tilde a_m ^ T) \in \mathbb{R}^n$ <br>
4. Left Null Space of $A: N(A^T)$ <br>
$\{x \in \mathbb{R}^m | x^TA = 0\}$

##### Basis, Dimension, Rank: <br>
* Basis: if $V = span(v_1, v_2, ... v_k)$ and $\{v_1, v_2, ... v_k\}$ is independent, <br>
$\{v_1, v_2, ... v_k\}$ is the basis of $V$. $\rightarrow$ bases are not unique, but the number of them is the same.
* Dimension: the number of vectors in a basis. It is unique for any subspace $V$.
* Rank: the maximal number of independent vectors row or column vectors of a matrix $A$. <br>
The Pivots of $rref(A)$ is the only useful vectors in A. <br>
Rank($A$) = dim($C(A)$) = dim($C(A^T)$) $\rightarrow$ maximally max($m,n$)

If a solution to the equation $Ax = b$ exists $\iff b \in C(A) \iff b \in N(B)$ for some $B$. <br>
In elimination: the row space is preserved, but not the column space. <br>

$A = \begin{bmatrix} 1 & 1 & 2 \\ 2 & 1 & 3 \\ 3 & 1 & 4 \\ 4 & 1 & 5 \end{bmatrix}$ &emsp;
$U = \begin{bmatrix} 1 & 1 & 2 \\ 0 & -1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$ &emsp;
$R = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$ <br> <br>

1) $C(A) = span(\{a_1, a_2\})$ <br>
Since the pivots are in column 1 and column 2 in $R$, C(A) is made out of $a_1, a_2$ in $A$. <br>
$\rightarrow$ $dim(C(A)) =$ # of pivots ($rank(A)$) $= 2$ <br>

2)$N(A) = \alpha \begin{bmatrix} -1 & 1 & 1 \end{bmatrix}$. <br>
Since A is not full-rank, $N(A)$ is not trivial. <br>
The special solution consists of the third (non-basic) column of $R$ times -1, and then a 1. <br>
The reason why is because $a_3$ is non-basic, and can be formed out of the first two columns, <br>
the coefficients being $r_3$. <br>
Therefore, $dim(N(A)) = n - rank(A)$ 

3) $C(A^T) = C(U^T) = C(R^T)$ <br>
In elimination, the row-space is preserved. Therfore $C(A^T) = C(U^T) = C(R^T)$. <br>
The basis of $C(A^T)$ is the first two rows of $R$, that is, the rows that contain pivots. <br>
That is, $dim(C(A^T)) = dim(C(A)) = rank(A)$ $\rightarrow$ Fundamental Theorem of Linear Algebra. <br>

4) $N(A^T)$ <br>
The rows that do not contain the pivots are eliminated to 0. <br>
The elimination matrix's rows which eliminate to 0 form the left-nullspace of A. <br>
$dim(N(A^T)) = m - rank(A)$ <br>

$C(A^T)$ and $N(A)$ is orthogonal in $\mathbb{R}^n$. <br>
$C(A)$ and $N(A^T)$ is orthogonal in $\mathbb{R}^m$

#### III.2 Complete Solutions and the Null Space 

To get a complete solution to $Ax = b$: <br>
* Find a particular solution (Gaussian Elimination)
* Add to it linear combinations of the bases of $N(A)$.
* There is $n - rank(A)$ non-basic columns in rref(A)

One can order R such that <br>
$R' = \begin{bmatrix} I_{r \times r} & F_{r \times n-r} \\ 0_{m-r \times r} & 0_{m-r \times n-r} \end{bmatrix}$ <br>
The special solutions, or the bases of the NullSpace is $\begin{bmatrix} -F_{r \times n-r} \\ I_{n-r \times n-r} \end{bmatrix}$

#### III.3 Orthogonality of the Four Subspaces

Two subspaces are orthogonal if for every vector in the subspaces are orthogonal to each other. <br>
$\rightarrow$ the only intersection is the zero vector.

The ColumnSpace and the Left NullSpace are Orthogonal. <br>
pf) if $Ax \ \text{in the Column Space} = y \ \text{in the left Null Space}$ <br>
$A^TAx = A^Ty = 0$ <br>
$x^T A^T Ax = 0 \rightarrow {(Ax)}^T Ax = 0 \rightarrow \Vert Ax \rVert = 0 \rightarrow Ax = 0$ <br> <br>
Also, the RowSpace and the NullSpace are Orthogonal. <br>
pf) the same for $A^T$

#### III.4 More on Vector Spaces

* $f''(x) + f(x) = 0$ <br>
$f(x) = a\cos(x) + b\sin(x)$ <br>
basis: $\cos(x), \sin(x)$
* $f(x) = a + bx + cx^2$ <br>
basis: $\{1, x, x^2\}$ <br>
*$V + W$ when $V$, $W$ are vector spaces (Minkowski sum) <br>
$V + W = \{x + y | x \in V, y \in W\}$

##### More on Independence, Bases, and Dimensions <br>
$A_{n \times k} = [v_1, v_2, ... v_k]$ <br>
* if $v_i$ are independent: $k \leq n$
* if $v_i$ are independent: $rank(A) = k, N(A) = {0}$
* if $v_i$ are a basis of $V$: $dim(V) = k$ <br>
* two bases $v_i$ and $w_i$ are basis of $V$: $len(v_i) = len(w_i)$
pf) WLOG $len(w_i) > len(v_i)$. <br>
$\rightarrow$ one can construct a matrix C that <br>
$\begin{bmatrix}w_1 & w_2 & ... & w_p\end{bmatrix} = \begin{bmatrix}v_1 & v_2 & ... & v_k\end{bmatrix} 
\begin{bmatrix} c_1 & c_2 & ... & c_p \end{bmatrix}$ <br>
$N(C) \subset N(W)$, since if $Cx = 0 \rightarrow Wx = VCx = 0$ <br>
but since $k < p$, $N(C)$ contains a non-trivial solution. <br>
$\rightarrow$ $W$ is dependent.
* if $dim(V) = k$, any independent set $S$ with $len(S) = k$ form a basis of $V$.

* $rank(A) = dim(C(A)) = dim(C(A^T)) = $ # of pivots$ = r$
* $r \leq min(m, n)$
* $r = m < n\rightarrow$ full row rank. $\rightarrow$ has $\infty$ solutions ($N(A^T) = \{0\}$) <br>
Independent Rows. Right-Inverse exists. $R = \begin{bmatrix} I & F\end{bmatrix}$
* $r = n < m\rightarrow$ full column rank. $\rightarrow$ has $1$ or  $0$ solutions $N(A) = \{0\}$. <br>
Independent Columns. Left-Inverse exists. $R = \begin{bmatrix} I \\ 0 \end{bmatrix}$
* $r = m = n \rightarrow$ has exactly $1$ solution $\rightarrow$ invertible, non-singular. <br>
$R = \begin{bmatrix} I \end{bmatrix}$
* $r < m, r < n \rightarrow$ has $0$ or $\infty$ solutions <br>
$R = \begin{bmatrix} I & F \\ 0 & 0 \end{bmatrix}$ <br> <br>

pf) $N(A) = \{0\}$ <br>
$\rightarrow N(A^T A) = \{0\}$ <br>
$\rightarrow$ Inverse of $A^T A$ exists <br>
$\rightarrow {A^T A}^{-1} A^T$ is left inverse

### IV. Graphs and Networks

#### IV.1 Graphs and Networks

##### Graphs: <br>
Graph: a pair composed of the set of Vertices(Nodes) and the set of Edges(Arcs) <br>
$G = (V, E)$ <br>
Undirected Graph: all arcs are non-directional. <br>
Directed Graph: all arcs are directional. <br>
Simple Graph: a graph that does not have Self-loops or repeated edges. <br>
Degree of a Node: the number of arcs leading to it + the number of arcs leading out of it.

##### Networks: <br>
Networks: a graph that has numerical properties on edges and nodes. <br>
Generally, networks mean simple directed graphs.

##### Subgraphs: <br>
If $G = (V, E)$ and $G' = (V', E')$ and $V' \subset V, E' \subset E$,  <br>
and the nodes are linked in the same way, then $G'$ is a subgraph of $G$. <br>

##### Special Graphs: <br>
* Complete Graph: evey node pair has a unique edge between them.
* Bipartite Graph: every edge connects between two node sets.
* Tree: there is only one route between two nodes.
* Star: one inner node and $k$ leaf nodes. 
* Forest: a union of trees.

#### IV.2 Routes

* Walk: an ordered sequence of nodes and arcs.
* Trail: a walk with distinct edges.
* Path: a trail with distinct nodes. Closed path $\rightarrow$ cycle.

##### Matrix Form of Undirected Graphs: <br>
Node-node adjacency matrix. An $n \times n$ matrix where there are $n$ nodes. <br>
$a_{ij} = 1$ if there is a connection between node $i$ and node $j$. If not, it is 0. <br>
Symmetric matrix.

##### Matrix Form of Directed Graphs: <br>
Node-arc incidence matrix. An $n \times m$ matrix where there are $n$ nodes and $m$ arcs. <br>
$a_{ij} = 1$ if arc $j$ leads to node $i$. -1 if arc $j$ goes out from node $i$. If not, 0. <br>
Rank: $n-1$

### V. Orthogonality and QR Decomposition

#### V.1 Orthogonality

Orthogonality: two non-zero vectors are orthogonal, then $v^T w = 0$ <br>
Orthogonal Vectors: $\{v_1, v_2, ... v_n\}$ is orthogonal $\iff$ $v_i \perp v_j = 0 \ (i \neq j)$ <br>
Orthonormal Vectors: $\{v_1, v_2, ... v_n\}$ is orthonormal $\iff$ $\{v_1, v_2, ... v_n\}$ is orthogonal and $\Vert v_i \rVert = 1$ <br>
$\rightarrow$ Non-zero orthogonal vectors are linearly independent.

##### Orthogonal Matrix: <br>
$A = \begin{bmatrix} v_1 & v_2 & ... & v_n \end{bmatrix}$ when $\{v_1, v_2, ... v_n\}$ are orthonormal. <br> Then $A$ has an inverse. $A^{-1} = A^T$

##### Bases and Orthgonality: <br>
Any set of independent bases can form coordinates. <br>
If $x$ is the coordinate under the current system, <br>
$A^{-1}x$ is the new coordinate under the new system, <br>
since $A A^{-1}x = x.$ <br>
If $A$ is an orthogonal matrix, then it is efficient to find $A^{-1}.$

##### Orthogonality of Subspaces: <br>
Subspace $V$ and $W$ are orthogonal ($S \perp T$) 
$\iff$ $s \perp t$ for any $s \in S, t \in T$ <br>
If $S \perp T$, then $S \cap T = \{0\}$ <br>
$\rightarrow N(A) \perp C(A^T), C(A) \perp N(A^T)$ <br>
<br>
##### Orthogonal Complements: <br>
If every vector that is orthogonal to $S$ is in $T$, then $T$ is the orthogonal complement of $S. (T = S^{\perp})$ <br>
Dimensions must add up in order to be the orthogonal complement. <br>
$N(A) = C(A^T)^{\perp}, C(A) = N(A^T)^{\perp}$ <br>
$\rightarrow$ for any $x \in \mathbb{R}^n$, $x = x_r + x_n$ such that $x_r \in C(A^T), x_n \in N(A)$. <br> Any subspace can be a nullspace of a matrix $A$.

#### V.2 Projection

##### Projection onto a Subspace: <br>
Decomposing a given vector into two components, <br>
one in the subspace and the other orthogonal to the subspace.

##### Projection Matrix: <br>
The projection of $b$ onto $C(A)$ is thus: <br>
$p = Pb$, where $P = A(A^T A)^{-1} A^T$, where $A$ is full column rank <br>
$r = b - p = (I-P)b$ <br>
$p \in C(A), r \in N(A^T)$ <br><br>
In 1-d case: $p = \frac{aa^T}{a^Ta} b$

##### Properties of Projection Matrices: <br>
Projection Matrices are
* Symmetric $(P = P^T)$
* Idempotent $(P^2 = P)$ 
<br>

If a matrix $P$ is symmetric and idempotent, it is a projection matrix. <br>
pf) In order for $P$ to be a projection matrix, for any vector $b$, <br>
$Pb \perp b - Pb$.
$P^T (b - Pb) = P(b-Pb) = Pb - P^2 b = Pb - Pb = 0$ <br>
$\rightarrow Pb \in C(P), b-Pb \in N(P^T) \rightarrow Pb \perp b-Pb$

##### V.3 Least Squares

The projection onto $C(A)$ solves the least square problem: <br>
 minimize $\Vert Ax - b\rVert$, where $x \in \mathbb{R}^n$ <br>
That is, $\hat x = A(A^TA)^{-1}A^T x$ is a pseudo-answer to $Ax = b$

Least Norm problem: <br>
minimize $\Vert x \rVert$, where $Ax = b, x \in \mathbb{R}^n$ <br>
$x = A^T(AA^T)^{-1}b$, since the projection of $x$ onto $N(A) = 0$ <br>
The projection of $x$ onto $N(A)$ 
$\rightarrow (I - A^T(AA^T)^{-1}A)x$

#### V.4 QR Decomposition

Orthonormal Matrix: <br>
$A = \begin{bmatrix} q_1 & q_2 & ... & q_n\end{bmatrix}$ where $q_i$ is orthogonal to each other. <br>
Then, $A^TA = I$, and then $P = AA^T$ <br>
Projection onto a Orthogonal Matrix: <br>
$p = AA^Tb$

Gram-Schimdt Algorithm: <br>
Decompose $A$ into orthonormal vectors. <br>
(Assume $A$ is full-rank. If not, leave only linearly independent vectors)<br>
* $\hat q_1 = a_1$
* $q_1 = \frac {\hat q_1}{\Vert \hat q_1 \rVert}$
* $\hat q_2 = a_2 - q_1q_1^Ta_1$
* $q_2 = \frac{\hat q_2}{\Vert \hat q_2 \rVert}$
<br> $\vdots$
* $\hat q_n = a_n - (q_1q_1^Ta_1 + q_2q_2^Ta_2 + ... q_{n-1}q_{n-1}^Ta_{n-1})$
* $q_n = \frac{\hat q_n}{\Vert \hat q_n \rVert}$

QR Decomposition: given a matrix $A = \begin{bmatrix} a_1 & a_2 & ... a_n \end{bmatrix}$, factor it into form $A = QR$ where <br>
$Q = \begin{bmatrix} q_1 & q_2 & ... & q_n \end{bmatrix}, 
R = \begin{bmatrix} q_1^Ta_1 & q_1^Ta_2 & ... & q_1^Ta_n \\ 0 & q_2^Ta_2 & ... & q_2^Ta_n \\ \vdots & \vdots & \ddots & \vdots
\\ 0 & 0 & ... & q_n^T a_n \end{bmatrix}$

### VI. Determinants

#### VI.1 Definition of the Determinant

Defining Properties of the Determinant: <br>
* $\det(I) = 1$.
* sign is changed when rows are exchanged.
* row-linear: $\det(a_1^T, a_2^T, ... a_n^T + t b_n^T) = \det(a_1^T, a_2^T, ... a_n^T) + t\det(a_1^T, a_2^T, ... b_n^T)$

<br>
The function that satisfies all these preperties uniquely exist. <br>
(Only defined for square matrices)

Properties of the Determinant: <br>
* 2 equal rows $\rightarrow \det(A) = 0$
* elimination $\rightarrow \det(A) = \det(EA)$
* rows of zeros $\rightarrow \det(A) = 0$
* $A$ is triangular $\rightarrow \det(A) = d_1d_2...d_n$ where $d_i$ is the diagonal entries.
* $\det(A) = (-1)^kd_1d_2...d_n$, where $k$ is the number of row exchanges.
* $A$ is singular $\rightarrow \det(A) = 0$
* $\det(A)\det(B) = \det(AB)$
* $\det(A) = \det(A^T), \det(A^{-1}) = \frac{1}{\det(A)}$

$\det(A) = \sum_{\sigma\in\Omega}(a_{1\sigma_1}a_{2\sigma_2}...a_{n\sigma_n})\det(P_\sigma)$ <br>
But it is an inefficient way to calculate determinants. <br>
$\rightarrow$ if all the elements in $A$ are integers, then the determinant is an integer.

#### VI.2 Calculating Determinants