# Optional Reference Material: Linear Algebra 

A curious collection of various and sundry linear algebra things that have in some way at some point in time related to STA410...

*Totally optional and just here for your reference in the event that you're chasing some clarity on something referenced in our homework or pre-lecture reading material for the week (and beyond?).*

- Linear Independence and Orthogonality
    - Rank and Bases
- Eigenvalues and Eigenvectors
    - Eigendecomposition 
    - Eigen Analysis: Understanding $Ax$ by its Eigenvalues
- Stuff about $A^{-1}$ 
    - Inverse computation is unnecessarily wasteful
    - Inversion computation is actually very often prone to numerical inaccuracy 
        - Sherman-Morrison-Woodbury Formula
    - Backward Substitution
    - Gaussian Elimination
        - Elementary Operations
        - LU Decomposition
    - Generalized Inverses

- Vector and Matrix Norms
- Deriving the Matrix Condition Number
    - The Matrix Condition Number using the $L_2$ norm


## Linear Independence and Orthogonality

---

The columns of a matrix $A_{\cdot j}$ are **linearly indepdendent** if

  $$ \underbrace{\sum_{j = 1}^n c_j A_{\cdot j} = 0  \;\; \Longrightarrow  \;\; c_j = 0 \text{ for all } j}_{Ac \;=\; 0 \;\;\Longrightarrow \;\;c \;=\; 0}$$

A stronger condition than **linear independence** is **orthogonality** where 

$$ (A_{\cdot j})^T A_{\cdot k} =  \sum_i A_{ij} A_{ik} = 0 \text{ for all } j \neq k \quad \text{ and } \quad (A_{\cdot j
})^T A_{\cdot k} \neq 0 \text{ for all } j$$

since for nonzero columns $A_{\cdot j}$ 

$$  \underbrace{(A_{\cdot j})^T A_{\cdot k} = 0}_{\text{Orthogonality}} \quad \Longrightarrow \quad \underbrace{c_jA_{\cdot j} + c_kA_{\cdot k} = 0 \Longrightarrow c_j=c_k=0}_{\text{Linear Independence}}$$

but 

$$\require{\cancel} \underbrace{c_jA_{\cdot j} + c_kA_{\cdot k} = 0 \Longrightarrow c_j=c_k=0}_{\text{Linear Independence}} \quad \cancel{\Longrightarrow} \quad  \underbrace{(A_{\cdot j})^T A_{\cdot k} = 0}_{\text{Orthogonality}}$$

It is possible to transform two **linearly independent** columns $A_{\cdot j}$ and $A_{\cdot k}$ so that they are **orthogonal**, and the most common way to do this is known as the **(modified) Gram-Schmidt procedure**. 


### Rank, and Bases 

The **rank** of the matrix $A_{n\times m}$ is the number of **linearly independent** columns (and equivalently, rows) of $A$. The matrix $A$ is said to be **full rank** if $\text{rank}(A_{n \times m}) = \min(n,m)$.  When $A$ is **square** so $A_{n\times m} = A_{n\times n}$, if $A$ is **full rank** then $\text{rank}(A_{n \times n}) = n$ and the $n$ columns of a $A_{n \times n}$ are **linearly independent** and form a **basis** in $n$-dimensional space. 

- A **basis** formed by the $n$ **linearly independent** columns of a **square** matrix $A$ is a set of axes defining a coordinate system from which to index the $n$-dimensional space.  

  >Any vector of an $n$-dimensional space $x$ may be given in terms of the coordinates of any **basis** formed by a **full rank square matrix** as 
>
>$$b = \sum_j c_j A_{\cdot j}$$
>
>which illustrates that a ***basis*** does not define the space; rather, the ***basis*** just defines the way points $x$ in the space are referenced.
Changing the ***basis*** does not change the space itself. 

The **standard basis** is $A_{n \times n}=I$. The columns of $I$ are the **standard basis vectors** $e_j$.  The $e_j$ are **linearly independent** and **orthogonal**; and, because the length of these vectors in the n-dimensional space is 1, they are called **normal vectors**. 

>The (**Euclidean distance**) length of a vector is given by its the square root of its **inner (dot) product** with itself, so a column vector $A_{\cdot j}$ is a **normal vector** if 
>
> $$\sqrt{A_{\cdot j} \cdot A_{\cdot j}} = \sqrt{(A_{\cdot j})^T A_{\cdot j}} = \sqrt{\sum_{j = 1}^n A_{i j}^2} = 1 \quad \text{ e.g., } \quad e_j \cdot e_j = e_j^T e_j =1$$ 

Vectors which are both **normal** and **orthogonal** are called **orthonormal**. The **standard basis** is thus an **orthonormal basis**. Two standard convensions that are common in this context are

1. since two vectors are **linearly independent** regardless of their length, it is usual to specify the vectors of a **basis** in their **normal form**, and
2. an **orthonormal basis** is often just called an **orthogonal basis** as the **orthogonality** is a much more crucial property of such a basis.

## Eigenvalues and Eigenvectors

---

**Eigenvalue** and **eigenvector** analysis of the linear transformation $A_{n\times n}$ examines the rate of the expansion (and/or contraction) along the invariant directions of the transformation, respectively, as

$$A_{n\times n} V_{\cdot j} = \lambda_j V_{\cdot j} \quad \text{often usefully encountered as} \quad (A_{n\times n} - \lambda_j I) V_{\cdot j} = 0$$

Thus, for $x = \sum_{j} c_j V_{\cdot j}$ expressed in an **eigenvector basis** (regardless of the **rank** of $A_{n\times n}$) 

   $$Ax= A\left(\sum_{j} c_j V_{\cdot j}\right) = \sum_{j} c_j A  V_{\cdot j} = \sum_{j} c_j \lambda_j V_{\cdot j}$$

### Eigendecomposition 

---

Any matrix $\Sigma$ such that
\begin{align*}
\Sigma =  {}& \Sigma^T & \textbf{symmetric}\\
x^T\Sigma x {}& > 0 & \textbf{positive definite}\\ 
\end{align*}

is **full rank** (so $\text{rank}(\Sigma_{n\times n}) = n$) and may be a **covariance matrix**. For such matrices, there exists an **eigendecomposition** (or synonymously, **spectral decomposition** or **diagonal factorization**)

\begin{align*}
\Sigma_{n\times n} = {} & V_{n\times n} \Lambda_{n\times n} (V^T)_{n\times n}\\
\end{align*}

such that  
- **orthonormal eigenvectors** of $\Sigma$ form the columns of the **orthonormal matrix** $V_{n\times n}$  


  $$\begin{align*}
  V_{\cdot j}^TV_{\cdot j} & {} = \,\;1\;\,  = V_{j\cdot}^TV_{j\cdot} & {} \textbf{normal vectors}\\
  V_{\cdot j}^TV_{\cdot k} & {} = \,\;0\;\,  = V_{j\cdot}^TV_{k\cdot}, j\not=k & {} \textbf{orthogonality}\\
  V^TV & {} = I_{n\times n}  = VV^T & {} \textbf{orthonormality}
  \end{align*}$$

- and corresponding positive **eigenvalues** 

  $$\Lambda_{11}=\lambda_1 \geq \Lambda_{22}=\lambda_2 \geq \cdots \geq \Lambda_{nn}=\lambda_n > 0$$

  comprise the entries of the diagonal matrix $\Lambda_{n\times n}$ 

> The case of **symmetric positive definite** is quite distinct compared to more general **eigendecomposition**. 
>
> [**Eigendecomposition**](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix) exists more generally for (**square**) [**diagnalizable matrices**](https://math.stackexchange.com/questions/1811983/diagonalizable-vs-full-rank-vs-nonsingular-square-matrix) which might be neither **positive definite** nor **symmetric**. In this case, the  **eigendecomposition** is $V\Lambda V^{-1}$ where $V^{-1}\neq V^T$, and the  **eigenvalues** may not all be positive and the **eigenvectors** may not all be **orthogonal**:
- the **eigenvectors** are **orthogonal** when $\Sigma$ is **symmetric** since this means $V^{-1} = V^T$
- the **eigenvalues** $\lambda_i > 0$ of $\Sigma$ are positive when **symmetric** $\Sigma$ is **positive definite**
>
> **Eigendecomposition** also exists for **diagonalizable matrices** which are not **full rank**. In this case, $r>0$ **eigenvalues** will be nonzero $\Lambda_{ii} = \lambda_{i}$ for $i\leq n-r$ and $\Lambda_{ii} = 0$ for $r < i \leq n$ in the diagonal matrix $\Lambda$. The **eigendecomposition** then has the **compact** form
>
> $$A_{n\times n} = V_{n \times n} \Lambda_{n \times n} V^{-1}_{n \times n} = V_{n \times r} \Lambda_{r \times r} V^{-1}_{r \times n}$$
> 
> and the columns of $V_{n \times r}$ will be **linearly independent** [so long as](https://math.stackexchange.com/questions/157382/are-the-eigenvectors-of-a-real-symmetric-matrix-always-an-orthonormal-basis-with) all **non-zero eigenavalues are unique** (i.e., have multiplicity $1$). The remaining columns in $V^T_{n \times n}$ may be chosen arbitrarily, e.g., to also be **linearly independent** since they will not contribute to $V \Lambda V^T$ for any diagonal element $\Lambda_{ii} = 0$.  

### Eigen Analysis: Understanding $Ax$ by its Eigenvalues

---


**Eigenvalues** determine many properties of an $A_{n \times n}$ matrix.

1. The **determinant** of the matrix $A_{n \times n}$ is the product of the **eigenvalues** 

   $$\det(A_{n\times n}) = \prod_{i=1}^n \lambda_i$$ 

   and so characterizes the multiplicative change in the "geometric volume" of the space under the linear transformation $Ax$.

2. The **spectral radius** of $A_{n\times n}$ is the largest absolute **eigenvalue**

   $$\rho(A_{n\times n}) = \underset{i=1,...,n}{\max} |\lambda_i| \leq \begin{array}{c}\underset{i=1,...,n}{\max} \sum_{j=1}^n |A_{ij}| \\ \underset{j=1,...,n}{\max} \sum_{i=1}^n |A_{ij}|\end{array}$$
   
   which represents the maximul "radius" of the transformation of the space under $A_{n\times n}$ and influences many statistical and computational characteristics of $A_{n\times n}$.

3. The **trace** (sum of diagonal elements) of $A_{n\times n}$ is the sum of the **eigenvalues** 
  
   $$\text{tr}(A_{n\times n}) = \sum_{i=1}^n A_{ii} = \sum_{i=1}^n \lambda_i$$

   so, e.g., the "total variance" (sum of the diagonal elements) of a **covariance matrix** is the sum of the **eigenvalues** of the covariance matrix.

<!--
   > which can be shown using the [Jordan canonical form](https://math.stackexchange.com/questions/546155/proof-that-the-trace-of-a-matrix-is-the-sum-of-its-eigenvalues) $A=P J P^{-1}$ (whose diagonal elements $J_{ii}$ are the ***eigenvalues*** of $A$) and the cyclical $\text{trace}(AB)=\text{trace}(BA)$ property of the ***trace*** operator 
   >
   > $$\begin{align*}\text{tr}(A) = {} & \text{tr}(PJP^{-1}) = \text{tr}(JP^{-1}P) = \text{tr}(J) = \sum_{i=1}^n J_{ii} = \sum_{i=1}^n \lambda_i \end{align*}$$
   -->

## Stuff about $A^{-1}$ 

---

If we write $x=A^{-1}b$ as the solution to the system of linear equations $Ax = b$ we are implying that $A$ is an $n\times n$ square matrix which is **full rank** or (synonymously) **invertible or nonsingular**. $A^{-1}$ doesn't exist for **non full rank** or (synonymously) **non-invertible or singular** matrices.

> The use of the synonym **nonsingular** in place of the more straightforward term **invertible** is because if $A^{-1}$ does not exist, then the **inverse function**
>
> $$f(A) = A^{-1}$$
>
> is not defined at $A$ and so then $A$ is a point of [mathematical singularity](https://en.wikipedia.org/wiki/Singularity_(mathematics)) in the **domain** of $f$. 

$A_{n\times n}^{-1} = $ [$\det(A_{n\times n})^{-1}\operatorname {adj}(A_{n\times n})$](https://en.wikipedia.org/wiki/Adjugate_matrix#Definition) and the **determinant** is the product of the **eigenvalues** of $A$ a well as the product of the (absolute) **singular values** of $A$. Thus, for **singular values** (and **eigenvalues**) of $A_{n\times n}, \lambda_j, 1 \leq j \leq n$, if

| $\lambda_j = 0$ for some $j$| $\lambda_j \not = 0$ for all $j$ |
|-|-|
| $\det A = 0$ | $\det A \neq 0$ |
| division by $0$ | no division by $0$ | 
|$A$ is **singular** | $A$ is **nonsingular** |
| $A$ is **not invertible** | $A$ is **invertible** |
| $A$ is **not full rank** | $A$ is **full rank** |
| Some columns (rows) are | All columns (rows) are
| **linearly dependent** | **linearly independent** |
| $A^{-1}$ does not exist | $A^{-1}$ exists |

Even if $A^{-1}$ exists, there are three problems:

0. Inverse computation is **(usually)** not a simple algorithm, like **transpose** $A^T$ 
  - which just reverse the indexing scheme $\quad[A^T]_{ij} = A_{ji}$
  - and have simple higher order properties $\quad (AB)^T = B^TA^T$

    > However, notice that for **orthonormal** matrices (which are often simply just referred to as **orthogonal** matrices since the columns can be easily **standardized** into **normal vectors**)
    > $$W_{n \times n}^TW_{n \times n}=W_{n \times n}W_{n \times n}^T = I_{n \times n}.$$    
    >
    > and **inversion*** is **transposition**; and, this is partially true for 
    > **semi-orthogonal** (or **semi-orthonormal**) matrices where 
    >
    >   $$S_{n \times m}^TS_{n \times m}=I_{m \times m}$$


### Inverse computation is unnecessarily wasteful

  - since solving for $x$ in $Ax = b$ means solving $$A \left[\begin{array}{c}x_1\\\vdots\\x_n\end{array}\right] = \left[\begin{array}{c}b_1\\\vdots\\b_n\end{array}\right]$$
  - but computing $x=A^{-1}b$ means either knowing or solving for $A^{-1}$
  $$A \left[\!\!\!\!\!\!\begin{array}{c:c:c:c} & 
  \begin{array}{c}A_{11}^{-1}\\\vdots\\A_{n1}^{-1}\end{array}& 
  \begin{array}{c}A_{12}^{-1}\\\vdots\\A_{n2}^{-1}\end{array}&\cdots&
  \begin{array}{c}A_{1n}^{-1}\\\vdots\\A_{nn}^{-1}\end{array}& 
  \end{array}\!\!\!\!\!\!\right] = 
  \left[\!\!\!\!\!\!\begin{array}{c:c:c:c} & 
  \begin{array}{c}1\\0\\\vdots\\\vdots\\0\end{array}& 
  \begin{array}{c}0\\1\\0\\\vdots\\0\end{array}&\cdots&
  \begin{array}{c}0\\\vdots\\\vdots\\0\\1\end{array}& 
  \end{array}\!\!\!\!\!\!\right]$$

     which requires solving the $n$ equations $AA_{\cdot j}^{-1} = e_{j}$ for $j = 1, \cdots, n$ for $A_{\cdot j}^{-1}$ where $A_{\cdot j}^{-1}$ is the $j^{th}$ column of $A^{-1}$ and $e_{j}$ is the **standard basis vector** with all elements equal to $0$ except the $j^{th}$ element which is equal to $1$.



### Inversion computation is actually very often prone to numerical inaccuracy 

<!-- *as is seen in this example taken from Keith Knight's STA410 [notes7.pdf](https://q.utoronto.ca/courses/296804/files?preview=24300633) document*
--> 

   $$A = \left[\begin{array}{cc}1&1-\epsilon\\1+\epsilon&1\end{array}\right] \quad \text{with analytical inverse} \quad 
A^{-1} = \left[\begin{array}{cc}\epsilon^{-2}&\epsilon^{-1}-\epsilon^{-2}\\-\epsilon^{-1}-\epsilon^{-2}&\epsilon^{-2}\end{array}\right]$$
  - and $\det(A) = |A| = A_{11}A_{22} - A_{12}A_{21} = \epsilon^{2} \not = 0 $ so $1/\det(A) = \det(A^{-1}) \not = 0$ so $A$ is mathematically **invertible**
   

  - but if the magnitude of $\epsilon^{-2}$ outranges that of $\epsilon^{-1}$, the $\epsilon^{-1}$ terms are lost due to **roundoff error** and so $A^{-1}$ can never be accurately represented since in that case 

  $$A^{-1} \approx [A^{-1}]_c = \left[ \left[\begin{array}{cc}\epsilon^{-2}&\epsilon^{-1}-\epsilon^{-2}\\-\epsilon^{-1}-\epsilon^{-2}&\epsilon^{-2}\end{array}\right] \right]_c = \left[\begin{array}{cc}\epsilon^{-2}&-\epsilon^{-2}\\-\epsilon^{-2}&\epsilon^{-2}\end{array}\right]$$

  so $\det([A]_c)=0$ so $[A]_c$ is no longer ***invertible***.

In [1]:
import numpy as np

In [2]:
# https://stackoverflow.com/questions/2891790/how-to-pretty-print-a-numpy-array-without-scientific-notation-and-with-given-pre
np.set_printoptions(precision=16)

# For the matrix
epsilon =  .5/100 #  2**-30 # which is not that extreme, e.g., 2**1023 # 
A = np.array([[1, 1-epsilon], 
              [1+epsilon, 1]])
# (some other potentially helpful matrix functionality:
#  e.g, np.ones, np.diag_indices, np.fill_diagonal, etc.)
print("A")
print(A)

print("Condition(A)")
print(np.linalg.cond(A))

# The analytical inverse is
A_inv = np.array([[epsilon**-2, 1/epsilon-epsilon**-2],
                  [-1/epsilon-epsilon**-2, epsilon**-2]])
print("\n\nA**-1")
print(A_inv)

# Which can be confirmed
print("\n\nA @ A_inv")
print(A @ A_inv) # matrix multiplication
print("\n\nI")
print(np.eye(2)) # identity matrix
print("\n\n(A @ A_inv) == I")
print(A @ A_inv == np.eye(2)) # Confirmation

# However, this breaks because 
# (0) general roundoff error; but, even for numbers that exactly representable  
# (1) the magnitude of epsilon**-2 will outrange epsilon**-1 if epsilon is small...

A
[[1.    0.995]
 [1.005 1.   ]]
Condition(A)
160001.99999113867


A**-1
[[ 40000. -39800.]
 [-40200.  40000.]]


A @ A_inv
[[ 1.0000000000001785e+00 -1.7763568394002505e-13]
 [-7.2759576141834259e-12  1.0000000000072760e+00]]


I
[[1. 0.]
 [0. 1.]]


(A @ A_inv) == I
[[False False]
 [False False]]


### Sherman-Morrison-Woodbury Formula

---

<!--
- *The presentation in this section is taken from Keith Knight's STA410 [notes7.pdf](https://q.utoronto.ca/courses/296804/files?preview=24300633) document*. 
-->

Also known as the ***Woodbury Matrix Identity***,

$$(A + UCV)^{−1} = A^{−1} − A^{−1}U (C^{−1} + VA^{−1}U)^{−1}VA^{−1}$$

makes inversion simple if $A$ and $C$ are diagonal.

Thus, ***low rank*** $m<n$ matrix approximations, with $A=I$ and $C=1$ 

$$
\begin{align*}
\Sigma_{n \times n}^{-1} \approx {} & (I_{n \times n} + \mathbf{u}_{n\times m}(\mathbf{v}^T)_{n\times m})^{-1}\\
= {} & I - \mathbf{u}(1+\mathbf{v}^T\mathbf{u})^{-1}\mathbf{v}^T\\
= {} & I - \frac{\mathbf{u}\mathbf{v}^T}{1+\mathbf{v}^T\mathbf{u}} \quad \text{ if } m=1
\end{align*}$$

can be used to trivialize matrix inversion approximation calculations.

In fact, performing computations on the basis of this identity can even avoid numeric problems.  Returning to the example of the previous section 

$$A = \left[\begin{array}{cc}1 & 1 - \epsilon\\ 1
+ \epsilon & 1 \end{array}\right] = \left[\begin{array}{cc}0 & - \epsilon\\  \epsilon & 0 \end{array}\right] + \left[\begin{array}{c}1 \\1 \end{array}\right]
\left[\begin{array}{c}1 \\1 \end{array}\right]^T$$

and $x=A^{-1}b$ is not a computation that will work to solve $Ax = b$ if $A^{-1}$ cannot be accurately computed; however, by instead computing

$$
x = A^{-1}b = \left(\left[\begin{array}{cc}0 & - \frac{1}{\epsilon}\\  \frac{1}{\epsilon} & 0 \end{array}\right] - 
\frac{
\left[\begin{array}{cc}0 & - \frac{1}{\epsilon}\\  \frac{1}{\epsilon} & 0 \end{array}\right] \left[\begin{array}{c}1 \\1 \end{array}\right] \left[\begin{array}{c}1 \\1 \end{array}\right]^T \left[\begin{array}{cc}0 & - \frac{1}{\epsilon}\\  \frac{1}{\epsilon} & 0 \end{array}\right]  
}{1 +  \left[\begin{array}{c}1 \\1 \end{array}\right]^T   
\left[\begin{array}{cc}0 & - \frac{1}{\epsilon}\\  \frac{1}{\epsilon} & 0 \end{array}\right]
\left[\begin{array}{c}1 \\1 \end{array}\right]}\right) b$$

$Ax = b$ can be accurately solved.

In [3]:
ep = 1e-2 #5, 7, 8,9,12
A = np.array([[1,1-ep],[1+ep,1]])
A_inv = np.array([[ep**-2, 1/ep - ep**-2],[-1/ep + -ep**-2,  ep**-2]])
b = np.array([[1],[1]])
print("epsilon", ep)
print("Condition number", np.linalg.cond(A))
print("\nA^-1 @ b = ?")
print("@ means matrix multiply")

print("\nTrue Answer")
print(np.array([[1/ep],[-1/ep]])) 
print("\nAnalytical Inverse")
print(A_inv@b)
B = np.array([[0,-ep],[ep,0]])
u = np.ones((2,1))
v = u.T
B_inv = np.linalg.inv(B)
print("\nWoodbury's Identity")
print((B_inv - (B_inv @ u @ v @ B_inv) / (1 + v @ B_inv @ u) ) @ b)
print("\nCalcluated Inverse")
print(np.linalg.inv(A) @ b)
print("\nLinear Equation Solver")
print(np.linalg.solve(A, b))
print("\nCalcluated Genearlized Inverse")
print(np.linalg.pinv(A) @ b)

epsilon 0.01
Condition number 40001.99997498112

A^-1 @ b = ?
@ means matrix multiply

True Answer
[[ 100.]
 [-100.]]

Analytical Inverse
[[ 100.]
 [-100.]]

Woodbury's Identity
[[ 100.]
 [-100.]]

Calcluated Inverse
[[ 100.]
 [-100.]]

Linear Equation Solver
[[ 100.]
 [-100.]]

Calcluated Genearlized Inverse
[[ 99.99999999994907]
 [-99.99999999994725]]


### Backward Substitution

---

Consider the easier problem of solving for $x$ in $A_{n \times n}x = b$ when $A_{n \times n}$ is given in ***upper triangular form***, where everything below the diagonal is zero and everything on the diagonal is non-zero.

$$\left[\begin{array}{cccccc} 
a_{11}&a_{12}&a_{13}& \cdots & a_{1(n-1)} & a_{1n}\\
 &a_{22} &a_{23} & \cdots &a_{2(n-1)} & a_{2n} \\ 
 &&a_{33} & \cdots &a_{3(n-1)} & a_{3n} \\ 
 &&& \ddots & \vdots & \vdots \\
& &&& a_{(n-1)(n-1)}& a_{(n-1)n}\\
0 & &&& & a_{nn}\\
\end{array}\right] 
\left[\begin{array}{c} 
x_1\\x_2\\x_3\\\vdots\\x_{n-1}\\x_{n}\\
\end{array}\right] = 
\left[\begin{array}{c} 
b_1\\b_2\\b_3\\\vdots\\b_{n-1}\\b_{n}\\
\end{array}\right]$$

In this form $x$ can be solved for using ***backward substitution*** as

$$x_n = \frac{b_n}{a_{nn}} \quad x_{n-1} = \frac{b_{n-1} - a_{(n-1)n}x_n}{a_{(n-1)(n-1)}} \quad \cdots \quad x_{n-j} = \frac{b_{n-j} - \sum_{i=n}^{n-j+1}a_{(n-j)i}x_i}{a_{(n-j)(n-j)}}$$

so long as (the so-called ***pivot points***) $a_{jj} \neq 0$ so there is no division by zero. 

For $x_j$, the final formula shows that there is $1$ division and $n-j$ multiplications and $n-j$ subtractions, so the total number of arithmetic computations to solve for all $x_j$ is 

$$\sum_{j=n}^1 1 + 2(n-j) = \sum_{j=0}^{n-1} (1 + 2j) = n + 2 \sum_{j=0}^{n-1} j = n + 2\frac{n(n-1)}{2} = n^2$$

> The presentation above is given for ***square invertible*** A; but, the ***backward substitution*** algorithm can also find solutions to ***non-square*** systems of equations based on $A_{n\times m}$ were the ***upper triangular form*** is exchanged with [row echelon form](https://en.wikipedia.org/wiki/Row_echelon_form) (where every row must have more leading zeros than the row above it). A system $A_{n\times m}x = b$ itself may be 
> 
> - ***overdetermined*** ($n>m$) with more equations (rows) than the unknown variables (and the "triangle" completes before the final row of $A$)
> - ***underdetermined*** $(n<m)$ so there are more free unknown variables than the number of equations (and the triangle doesn't complete before before the final row of $A$)
> 
> and the system $A_{n\times m}x = b$ may be 
> 
> - ***consistent*** *with a single solution*, e.g.,    
>
>   $$\left[\begin{array}{cc}
1 & 1\\
0 & 1
\end{array}\right]
\left[\begin{array}{c}
x_1\\
x_2
\end{array}\right] = 
\left[\begin{array}{c}
1\\
1
\end{array}\right] 
$$
> 
>   in which case $\text{rank}(A) = \text{rank}(A|b)$, and the columns (and rows) of $A$ are ***linearly independent*** so no columns (and rows) will be linear combinations of each other
> 
> - ***consistent*** *with infinitely many solutions*, e.g., 
>
>   $$\left[\begin{array}{cc}
1 & 1\\
0 & 0
\end{array}\right]
\left[\begin{array}{c}
x_1\\
x_2
\end{array}\right] = 
\left[\begin{array}{c}
1\\
0
\end{array}\right] 
$$
> 
>   in which case $\text{rank}(A) = \text{rank}(A|b)$, but some columns (and rows) of $A$ are ***linearly dependent*** so some columns (and rows) will be linear cominations of each other
>   
>   > i.e., for some sets of indices $\mathcal{J} = \{j_k: k=1,...,K\}$ and $\mathcal{I} = \{i_k: k=1,...,K\}$
>   >
>   > $$\sum_{j \in \mathcal{J}} c_j A_{*j} = 0  \;\; \not \! \Longrightarrow  \;\; c_j = 0 \quad \text{ and } \quad \sum_{i \in \mathcal{I}} c_i A_{i*} = 0  \;\; \not \! \Longrightarrow  \;\; c_i = 0$$
> 
> - ***inconsistent*** *with no solutions at all*, e.g., 
> 
>   $$\left[\begin{array}{cc}
1 & 1\\
0 & 0
\end{array}\right]
\left[\begin{array}{c}
x_1\\
x_2
\end{array}\right] = 
\left[\begin{array}{c}
0\\
1
\end{array}\right] 
$$
> 
>   in which case $\text{rank}(A) < \text{rank}(A|b)$, and the column $b$ cannot be constructed as a linear combination of the columns of $A$.
> 
> where $A|b$ is the $n \times (m+1)$ [*augmented matrix*](https://en.wikipedia.org/wiki/Augmented_matrix)
> 
> $$\left[\begin{array}{ccc:c} 
a_{11} & \cdots & a_{1m} & b_1 \\
\vdots & \ddots & \vdots  & \vdots \\
a_{n1} & \cdots & a_{nm} & b_m \end{array}\right]$$





### Gaussian Elimination

---

Converting a system of linear equations into ***upper triangular form*** (or the more general ***row echelon form***) is itself a quite simple process known as ***Gaussian elimination***.

- Multiplying a row of the augmented matrix $A|b$ by a constant and adding to another row of the augmented matrix produces an equivalent system of linear equations to the one originally defined by the augmented matrix, i.e.,

  $$x \quad \text{ solving } \quad E^{ci+j}Ax = E^{ci+j}b \quad \text{ also solves } \quad Ax = b$$

- Multiplying and adding rows in this manner can produce leading zeros, e.g.,

  $$\left[\begin{array}{ccc:c} 
a_{11} & \cdots & a_{1m} & b_1 \\
\vdots & \ddots & \vdots  & \vdots \\
a_{n1} & \cdots & a_{nm} & b_n \end{array}\right]
\quad \overset{A|b \; \rightarrow \; E^{c1+m}[A|b]}{\longrightarrow} \quad
\left[\begin{array}{ccc:c} 
a_{11} & \cdots & a_{1m} & b_1 \\
\vdots & \ddots & \vdots  & \vdots \\
a_{n1} + ca_{11} & \cdots & a_{nm} +c a_{1m} & b_n + cb_1\end{array}\right]$$

  and if $c = -\frac{a_{n1}}{a_{11}}$ then $a_{n1} + ca_{11} = 0$ and the bottom left element of the resulting matrix vanishes (i.e., becomes $0$).

When the column elements below a ***pivot point*** have been turned into zeros, the column and row of the ***pivot point*** are completed, the the ***Gaussian elimination*** process recurrsively restarts on the next ***pivot points*** in the top left corner of the submatrix without the completed row and column. 

$$\left[\begin{array}{c|ccc:c} 
a_{11} & a_{12} &  \cdots & a_{1m} & b_1 \\\hline
0 & a_{22} + c_2 a_{2m} & \cdots & a_{2m} + c_2 a_{1m} & b_2 + c_n b_1 \\
\vdots & \vdots & \ddots & \vdots  & \vdots \\
0 & a_{n2} + c_n a_{n1} & \cdots & a_{n2} + c_n a_{nm} & b_n + c_n b_1\end{array}\right]$$

For a ***square*** matrix $A$ where $m=n$, the above formulation [shows](http://www.it.uom.gr/teaching/linearalgebra/chapt6.pdf) that the number of divisions and multiplication-additions that are required to create an ***upper triangular form*** matrix (augmented with the transformed $b$ column) are

$$\sum_{j=1}^n (j+1)(j-1) + \underset{\text{due to } b}{(j-1)} = \sum_{j=1}^n j^2 - 1 + (j-1) = \frac{n(n+1)(2n+1)}{6} - n + \frac{n(n+1)}{2} - n $$

> #### Pivoting
> An important computational caveat is that when the scalar multiplier $c$ is large, the numerical precision of the floating point-operation will be insufficient if 
> $$[b_{i'} + cb_i]_c = [cb_i]_c$$ 
> e.g., for three digits of precision, one step of ***Gaussian elimination*** on 
>
> \begin{align*}
0.0001 x_1 + x_2 & {} = 1\\
x_1 + x_2 & {} =  2\\
\quad \quad \quad \quad \quad \; \text{produces } \quad \quad \quad & {}    \\
 \quad 0.0001 x_1 + x_2 & {} = 1\\
 -10000x_2 & {} = -10000 \quad \text{ (the "$+x_2$" and the 2 are lost due to precision!)}\\
\end{align*}
>
> To fix this issue the rows may be reordered with a ***partial pivot*** so $c$ will be as small as possible, and ***Gaussian elimination*** step will then be 
>
> \begin{align*}
x_1 + x_2 & {} =  2\\
0.0001 x_1 + x_2 & {} = 1\\
\text{instead produces } \quad \quad \quad & {}    \\
x_1 + x_2 & {} =  2\\
x_2 & {} = 1 \quad \text{ (has roundoff error but solution's more accurate)}\\
\end{align*}
>
> which gives $x_1=x_2=1$ which is a more accurate solution than $x_1=0$ $x_2=1$.
> 
> If this was not sufficient, a ***full pivot*** which reorders both the rows and the columns as well could be used to produce an even smaller $c$.
>
> *This example is inspired by the **Pivoting** subsection of the 
Section 5.2 **Gaussian Elimination and Elementary Operator Matrices** in Chapter 5 **Numerical Linear Algebra** on page 212 of James E. Gentle's **Computational Statistics** textbook.*

#### Elementary Operations

---

Mathematically, multiplying row $i$ by scalar $c$ and adding it to row $j$, and ***partial pivoting*** two rows $i$ and $j$ are so-called ***elementary operations*** and are represented by simple matrix multiplications $E^{ci+j}[A|b]$ and $E^{i\leftrightarrow j}[A|b]$, respectively, where

- $E^{ci+j} = I + c e_je_i^T$, so $E^{ci+j}_{kk}=1$ and $E^{ci+j}_{ji}=c$ and all other entries of $E^{ci+j}$ are $0$.

  - $\left(E^{ci+j}\right)^{-1} = E^{-ci+j}$ since $E^{ci+j} E^{(-c)i+j} =  E^{(-c)i+j}E^{ci+j} = I$.

- $E^{i\leftrightarrow j} = I^{i\leftrightarrow j}$ where row $i$ and $j$ have in the identity matrix $I$ have been switched, so all $E^{i\leftrightarrow j}_{kk}=1$ except $E^{i\leftrightarrow j}_{ii} = E^{i\leftrightarrow j}_{jj} = 0$ and all other elements are $0$ except $E^{i\leftrightarrow j}_{ij}=E^{i\leftrightarrow j}_{ji}=1$.

  - $(E^{i\leftrightarrow j})^{-1} = E^{i\leftrightarrow j}$ since $E^{i\leftrightarrow j}E^{i\leftrightarrow j}=I$ so it's (of course) easy to "undo" row interchanges.


$$
E^{ci+j} = \left[\begin{array}{ccccccc}
1 & &&&&&0\\
&1&&&&&\\
 && \ddots&&\\
& && 1 &&\\
 &&c&& \ddots\\
&&\uparrow&&&1&\\
0&&E^{ci+j}_{ji}&&&&1\\
\end{array}\right] 
\quad\quad
E^{i\leftrightarrow j} = \left[\begin{array}{ccccccc}
1 &0&&&&0&0\\
0& \ddots &&&&&0\\
 &\cdots & 0 &\cdots& 1&\cdots \\
 &&& \ddots \\
 &\cdots & 1 &\cdots& 0 &\cdots \\
 0 &&&&& \ddots &0\\    
0  &0&&&&0& 1 \\  
\end{array}\right]
\begin{array}{c}\leftarrow \text{ row }i\\\\\leftarrow \text{ row }j\\\end{array}
$$

#### The LU Decomposition

---

Ignoring row interchanges $E^{i\leftrightarrow j}$ which are easily applied and undone, ***Gaussian elimination*** transformation sequence 

$$\prod E^{ci+j} \quad \text{and} \quad \left(\prod E^{ci+j}\right)^{-1} = \prod E^{-ci+j} = L$$ 

will be ***lower triangular matrices*** and 

  $$U = \left(\prod E^{ci+j}\right) A$$ 

[may](https://math.stackexchange.com/questions/218770/when-does-a-square-matrix-have-an-lu-decomposition/2274657) (if ***Gaussian elimination*** is working) be an ***upper triangular***, and when so
\begin{align*}
Ax = {} & b\\
\left(\prod E^{ic+j}\right) Ax = {} & \left(\prod E^{ic+j}\right) b\\
Ux = {} & L^{-1}b
\end{align*}

and $x$ may be solved for by simple ***backward substitution***.

***LU decomposition*** is thus seen to be a byproduct of solving for $x$ using ***Gausian elimination*** where operation is done using the extended augmated matrix [$A|I|b$](https://en.wikipedia.org/wiki/Gaussian_elimination#Finding_the_inverse_of_a_matrix) instead of only $A|b$  

$$\left[\begin{array}{ccc:ccc:c} 
a_{11} & \cdots & a_{1m} & 1 & \cdots & 0 & b_1 \\
\vdots & \ddots & \vdots  & \vdots & \ddots & \vdots & \vdots  \\
a_{n1} & \cdots & a_{nm} & 0 & \cdots & 1 & b_n \end{array}\right]
\quad \overset{[A|I|b] \;\rightarrow \;L^{-1}[A|I|b]}{\longrightarrow} \quad 
\left[ \!\begin{array}{c:c:c}  U & L^{-1} & b' \!\end{array}  \right]$$

Computationally all that needs to be kept track of is the sequence of the ***elementary operations*** $\left(\prod E^{ic+j}\right)$ actualized during the ***Gausian elimination*** process, since 

$$U = \left(\prod E^{ic+j}\right) A \quad \text{and} \quad
\left(\prod E^{ic+j}\right)I = L^{-1} \quad \text{and} \quad
\left(\prod E^{ic+j}\right)b = b'$$

> The ***LU decomposition*** will be [unique](https://math.stackexchange.com/questions/1799854/is-the-l-in-lu-factorization-unique) if
>
> $$x^TAx \geq 0 \quad\quad \text{ subject to the constraint } \quad \quad L_{kk} = 1 \; \text{ or } \; U_{kk} = 1 \; \text{ for all $k$}$$
>
> i.e., if $A$ is a ***square nonnegative definite matrix***, and the diagonals of either $L$ or $U$ all one.

#### Generalized Inverses

---

Returning to our system of linear equations $Ax = b$, it is very straightforward to analyticall calculate $x = A^{-1} b$ for ***full rank square*** matrices $A$ if we have the ***eigendecomposition*** or ***SVD*** of $A$ since 

- if $A$ is a ***symmetric*** and ***full rank*** then $A = V \Lambda V^T$ and $A^{-1} = V \Lambda^{-1} V^T$ with $\Lambda^{-1}_{ii} = \frac{1}{\Lambda_{ii}}$ since

\begin{align*}
A^{-1}A = {} & V \Lambda^{-1} V^T V \Lambda V^T\\
= {} & V \Lambda^{-1} \;\;\, I \;\;\, \Lambda V^T\\
= {} & V \quad \;\;\, I \;\;\, \quad V^T = I\\
\end{align*}

- if $A$ is ***square*** and ***full rank*** $A = U D V^T$ and $A^{-1} = V D^{-1} U^T$ with $D^{-1}_{ii} = \frac{1}{D_{ii}}$ since

\begin{align*}
A^{-1}A = {} & V^T D^{-1} U U^T D V\\
= {} & V^T D^{-1} \;\;\, I \;\;\, D V\\
= {} & V^T \quad \;\;\; I \;\;\;\quad V = I\\
\end{align*}

Further, $Ax = b$ can be ***consistent*** (i.e., have at least one solution) even if $A$ is not squre or full rank.

- $Ax = b$ is ***consistent*** if 

  $$\text{rank}(A|b) = \text{rank}(A)$$

  where $A|b$ is the matrix made by appending the column $b$ as the rightmost column of $A$.

If $Ax = b$ is ***consistent***, then a solution $x = A^{-}b$ based on $A^{-}$ only requires that 

$$A = A A^{-}A $$

since

\begin{align*}
 Ax = {} & A A^{-}Ax\\
  = {} & A A^{-}b\\
 \Longrightarrow x = {} & A^- b \quad \text{ is a possible solution}\\
\end{align*}

Matrices which can play the role of $A^{-}$ above, from strongest to weakest, are

0. $A^{-1}$:  ***inverses*** which satisfy $A^{-1}A = AA^{-1} = I$
1. $A^{+}$: (unique) ***Moore-Penrose inverses*** for which $A^{+}A$ and $AA^{+}$ are ***symmetric*** and which are also ***g1*** and ***g2 inverses***
  > also called ***p-inverses***, ***normalized generalized inverses***, or ***pseudoinverses***
2. $A^{*}$: ***g2 inverses*** for which $A^{*}AA^{*} = A^{*}$ and which are also ***g1 inverses***
  > also called ***reflexive generalized inverses*** and ***outer pseudoinverses***
3. $A^{-}$: ***g1 inverses*** for which $A A^{-}A = A$
  > also called ***conditional inverses*** or ***inner pseudoinverses***

Now, if $Ax = b$ is ***consistent***, then 

$$x = A^{+}b \quad \text{is a solution to} \quad Ax = b$$

where 
- $A^{+} = V D^{+} U^T$ and $D^{+}_{ii}=\frac{1}{D_{ii}}$ is taken from the SVD $A = U D V^T$ 
and can be seen to satisfy $AA^{+}A = A$ and be the (unique) ***Moore-Penrose inverse***.

## Vector and Matrix Norms
---

For $f(x) = Ax$ the statement $f(x) \approx f(x+\epsilon_x)$ means that $y = f(x) - f(x+\epsilon_x) \approx 0$ which is judged on the basis of the ***norm*** (or magnitude, or size) of the ***vector*** $y$, notated as $||y||$. The most common ***vector norms*** are the ubiquetous $L_p$ ***norms***

$$||x||_p = \left(\sum_i |x_i|^{p}\right)^{\frac{1}{p}}$$

$$||x||_2 = \underset{L_2 \text{: Euclidean}}{\sqrt{\sum_i x_i^2}} \quad   \quad ||x||_1 = \underset{L_1 \text{: Manhattan}}{\sum_i |x_i|} \quad  \quad ||x||_\infty = \underset{L_\infty \text{: Chebyshev}}{\max_i |x_i|}$$


We can also define the ***norm*** (or magnitude, or size) of a matrix $A$.  One common method to do so is to induce a ***matrix norm*** from the $L_p$ ***vector norms*** as 

$$||A||_p = \underset{x\not=0}{\max} \frac{||Ax||_p}{||x||_p}$$

Another common ***matrix norm*** is the ***Frobenius norm***
$$||A||_F = \left(\sum_i\sum_j A_{ij}^2\right)^{\frac{1}{2}} =  \underbrace{\sqrt{\text{tr}(AA^T)} = \sqrt{\text{tr}(A^TA)} }_{\text{trace doesn't care about transposes}} = \underset{\text{singular values of } A}{\overset{\text{The $L_2$ norm of the}}{\sqrt{ \sum_i \lambda_i^2}}}$$

which is just a direct extension of $L_2$ ***Euclidean distance***. 

<!-- ***Norms*** will be considered again in the context of function (vector) spaces. Some additional considerations regarding ***norms*** are available in Keith Knight's STA410 [notes8.pdf](https://q.utoronto.ca/courses/296804/files?preview=24301082) document. 

- Further analysis of ***condition*** in the context of $L_p$ ***induced matrix norms*** and of $L_p$ ***induced matrix norms*** themselves (e.g., using ***Manhattan*** and ***Chebyshev distance*** ***induced matrix norms*** to derive bounds on the ***spectral radius***) is available in Keith Knight's STA410 [notes8.pdf](https://q.utoronto.ca/courses/296804/files?preview=24301082) document. 

-->


## Deriving the Matrix Condition Number of $x=f_{A}(b)=A^{-1}b$ 


When computing a solultion $x=f_{A}(b)=A^{-1}b$ to $Ax = b$ for an invertible linear transformation (***square full rank***) $A$, either

$$\require{cancel} \underset{\Large \text{$f_A$ is well-conditioned}}{b+\epsilon_b \approx b \Longrightarrow f_A(b+\epsilon_b) \approx f_A(b)} \quad \text{ or } \quad \underset{\Large \text{$f_A$ is ill-conditioned}}{b+\epsilon_b \approx b \cancel{\Longrightarrow} f_A(b+\epsilon_b) \approx f_A(b)}$$

Since actual value $\tilde x = [A^{-1}b]_c$ obtainable in the ${\rm I\!F}$ ***floating-point*** representation of ${\rm I\!R}$ will actually be a solution to a different problem 

$$A \tilde x = \tilde b \quad \text{with} \quad \tilde x = x + \epsilon_x \; \text{and}\; \tilde b = b + \epsilon_b.$$

then, for any reasonable ***vector norm*** $||\cdot||$ measuring magnitude, $A$ is called ***well-conditioned*** if 

- whenever $\frac{||\epsilon_b||}{||b||}$ is small $\quad\quad\quad\quad\quad\quad\;\;$  whenever the "nearby problem" actually being solved is "close"
- $\Longrightarrow$ then $\frac{||\epsilon_x||}{||x||}$ is also small $\quad\quad\quad\quad\quad\;\Longrightarrow$ the solution is also "close"


So a matrix $A$ is ***well-conditioned*** with respect to $x = f_A(b) = A^{-1}b$ if small changes in the input $b$ to the function, do not produces large changes in the output $x$.  Contrarily, $A$ is ***ill-conditioned*** if $f_A(b) = A^{-1}b$ is highly volatile relative to small changes in the input $b$.

The ***condition*** of $A$ in $f_A(b) = A^{-1}b$ can actually be given a precise numeric quantification on the basis of the ***induced matrix norm*** $||A|| = \underset{x\not=0}{\max} \frac{||Ax||}{||x||}$ since

\begin{align*}
\text{it implies } && ||b|| = {} &  ||Ax|| \leq ||A|| \; ||x|| \\
\text{and thus } && \frac{1}{||x||} \leq {} & \frac{||A||}{||b||} \\ \\
\text{and since } && \epsilon_x = {} & A^{-1} \epsilon_b \quad \text{ (by the definition of $\epsilon_x$ and $\epsilon_b$)},\\
&&||\epsilon_x|| = {} & ||A^{-1} \epsilon_b|| \leq {} ||A^{-1}|| \; ||\epsilon_b||\\\\
\text{the product } &&\frac{||\epsilon_x||}{||x||} \leq {} & \underbrace{||A||\;||A^{-1}||}_{\Large \kappa(A)}\;\frac{||\epsilon_b||}{||b||} \quad \text{follows}\\ 
\end{align*}

Thus the [condition number](https://math.stackexchange.com/questions/4116544/any-example-of-condition-number-of-matrix-less-than-1) of $A$ for the problem of solving for $x$ in $Ax=b$ is defined as 

$$\Large \kappa(A) = ||A||\;||A^{-1}||\quad  \quad  1 \leq \kappa(A) < \infty$$

and explicitly bounds how small $\frac{||\epsilon_x||}{||x||}$ will be relative to $\frac{||\epsilon_b||}{||b||}$, characterizing how rapidly the output of the function $x = f(b) = A^{-1}b$ can change for small changes in the input $b$. Large $\kappa(A)$ means $\frac{||\epsilon_x||}{||x||}$ may not be small even if $\frac{||\epsilon_b||}{||b||}$ is small, so $A$ is ***well-conditioned*** if $\kappa(A)$ is small and ***ill-conditioned*** otherwise. 


### The Matrix Condition Number using the $L_2$ norm

For the $L_2$ ***induced matrix norm***, and a ***symmetric real valued*** matrix $A$ (which thus has ***real eigenvalues***)

\begin{align*}
||A||_2 = {} & \underset{x\not=0}{\max} \frac{||Ax||_2}{||x||_2}
= \underset{||x||_2=1}{\max} \frac{||A(cx)||_2}{||(cx)||_2} 
= \underset{||x||_2=1}{\max} \frac{|c|||Ax||_2}{|c|||x||_2} 
= \underset{||x||_2=1}{\max} ||Ax||_2 \\
= {} & \underset{i}{\max} |\lambda_i| \quad \text{ (the largest magnitude eigenvalue for square $A$)}\\
\color{gray}{=} {} & \color{gray}{\sqrt{\rho(A^TA)} \quad \text{ (the square root of the spectral radius of the gramian of non-square $A$)}} 
\end{align*}

and for ***symmetric*** and ***diagonizable*** ([***normal***](https://en.wikipedia.org/wiki/Normal_matrix)) $A$ with a real-valued ***orthogonal eigendecomposition*** 

$$\begin{align*}
||A^{-1}||_2  = {} & ||\overset{\text{eigendecom-}}{\overbrace{(V\Lambda V^T)}^{\text{position of } A}} {}^{-1}||_2  = ||V\Lambda^{-1}V^T||_2  \\
 = {} & \frac{1}{|\lambda_\min^A|} \quad \text{ (the reciprocal of the smallest magnitude eigenvalues of $A$)}\end{align*}$$

Thus, for the $L_2$ ***induced matrix norm***, the ***condition number*** $\kappa(A) = ||A||_2\;||A^{-1}||_2$ depends on the relative magnitudes of the smallest and largest ***eigenvalues*** with

$$\underset{\text{the ratio of the largest and smallest eigenvalues}}{\kappa(A) = ||A||_2 ||A^{-1}||_2 = \frac{|\lambda_\max^A|}{|\lambda_\min^A|}} \quad \text{ and } \quad \frac{||\epsilon_x||_2}{||x||_2} \leq \frac{|\lambda_\max^A|}{|\lambda_\min^A|}\frac{||\epsilon_b||_2}{||b||_2}$$


<!--
> Thus for ***square full rank*** $A$, the previously noted bound on the ***spectral radius***
>
> $$\rho(A_{n\times n}) = \max_i |\lambda_i| = ||A||_2 \leq \begin{array}{c} \max_i \sum_j |A_{ij}| \\ \max_j \sum_i |A_{ij}|\end{array}$$
> 
> follows since for ***eigenvalue*** and ***eigenvector*** pair $\lambda_i$ and $v_i$, $|\lambda_i| \leq ||A||_p$ as seen from 
$$|\lambda_i| = |\lambda_i| \,||v_i||_p = ||\lambda_i v_i||_p = \overbrace{||A v_i||_p \leq ||A||_p ||v_i||_p}^{\text{by definition of the induced norm}} = ||A||_p \cancel{||v_i||_p}^1$$
>
> and the bounds are the ***Manhattan*** and ***Chebyshev distance*** ***induced matrix norms*** (as shown in Keith Knight's STA410 [class notes](https://q.utoronto.ca/courses/244990/files?preview=18669503))
>
> $$||A||_1=\max_j \sum_i |A_{ij}| \quad \text{ and } \quad ||A||_\infty =\max_i \sum_j |A_{ij}|$$
-->