---
title: "Matrices I"
format: 
  html:
    toc: true
    code-fold: false
    page-layout: full
    fig-cap-location: bottom
    number-sections: true
    number-depth: 2
    html-to-math: katex
    html-math-method: katex
jupyter: python3
---

# Motivation
Linear algebra pops up almost everywhere in physics, so the matrix-related techniques developed below will be used repeatedly in later lectures. As a result, we will spend lots of time on matrices. We will take the time to introduce several numerical techniques in detail. 

## Examples from Physics
We discuss some elementary examples from undergraduate physics.

### Rotations in two dimensions
Consider a two-dimensional Cartesian coordinate system. A point $\boldsymbol{r} = (x,y)^T$ can be rotated counter-clockwise through an angle $\theta$ about the origin, producing a new point $\boldsymbol{r}' = (x',y')^T$. The two points' coordinates are related as follows:
$$
\begin{pmatrix}
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{pmatrix}
\begin{pmatrix}
x \\
y
\end{pmatrix}
=
\begin{pmatrix}
x' \\
y'
\end{pmatrix}
$$
The $2\times 2$ matrix appearing here is an example of a _rotation matrix_ in Euclidean space. If you know $\boldsymbol{r}'$ and wish to calculate $\boldsymbol{r}$, you need to solve this system of two linear equations. 

### Electrostatic potentials
Assume you have $n$ electric charges $q_j$ (which are unknown) held at the positions $\boldsymbol{R}_j$ (which are known).  Further assume that you have measured the electric potential $\phi(r_i)$ at the $n$ known positions $\boldsymbol{r}_i$. From the definition of the potential (as well as the fact that the potential obeys the principle of superposition), we see that:
$$
\phi(\boldsymbol{r}_i) = \sum_{j=0}^{n-1}\left(\frac{k q_j}{|\boldsymbol{r}_i - \boldsymbol{R}_j|}\right),
$$
where $i = 0,1,\dots,n-1$.
If you assume you have four charges, the above relation turns into the following $4\times 4$ linear systems of equations:
$$
\begin{pmatrix}
k/|\boldsymbol{r}_0 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_0 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_0 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_0 - \boldsymbol{R}_3| \\
k/|\boldsymbol{r}_1 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_1 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_1 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_1 - \boldsymbol{R}_3| \\
k/|\boldsymbol{r}_2 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_2 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_2 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_2 - \boldsymbol{R}_3| \\
k/|\boldsymbol{r}_3 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_3 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_3 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_3 - \boldsymbol{R}_3|
\end{pmatrix}
\begin{pmatrix}
q_0 \\ q_1 \\ q_2 \\ q_3
\end{pmatrix}
=
\begin{pmatrix}
\phi(\boldsymbol{r}_0) \\ \phi(\boldsymbol{r}_1) \\ \phi(\boldsymbol{r}_2) \\ \phi(\boldsymbol{r}_3)
\end{pmatrix}
$$
which needs to be solved for the 4 unknowns $q_0$, $q_1$, $q_2$ and $q_3$.

### Principle moments of inertia
In study of the rotation of a rigid body about an arbitrary axis in three dimensions, you may have encountered the moment of inertia tensor:
$$
I_{\alpha \beta} = \int \rho(\boldsymbol{r}) \left(\delta_{\alpha \beta}r^2 - \boldsymbol{r}_\alpha \boldsymbol{r}_\beta\right)d^3 r,
$$
where $\rho(r)$ is the mass density, $\alpha$ and $\beta$ denote Cartesian components, and $\delta_{\alpha \beta}$ is the Kronecker delta. 

The moment of inertia tensor is represented by a $3\times 3$ matrix: 
$$
\boldsymbol{I} = 
\begin{pmatrix}
I_{xx} & I_{xy} & I_{xz} \\
I_{yx} & I_{yy} & I_{yz} \\
I_{zx} & I_{zy} & I_{zz}.
\end{pmatrix}
$$
This is a symmetric matrix. It is possible to choose a coordinate system such that the off-diagonal elements vanish. 
This axes of this coordinate system are known as the _principal axes_ for the body at the origin. Then the moment of inertian tensor is represented by a diagonal matrix, with diagonal elements $I_0$, $I_1$, and $I_2$, known as the principal moments. This is an instance of the "eigenvalue problem".

## The problems to be solved
First, we look at the problem where we have $n$ unknowns $x_i$, along with $n\times n$ coefficients $A_{ij}$ and $n$ constants $b_i$:
$$
\begin{pmatrix}
A_{00} & A_{01} & \dots & A_{0,n-1} \\
A_{10} & A_{11} & \dots & A_{1,n-1} \\
\vdots & \vdots & \ddots & \vdots \\
A_{n-1,0} & A_{n-1,1} & \dots & A_{n-1,n-1} 
\end{pmatrix}
\begin{pmatrix}
x_0 \\ x_1 \\ \vdots \\ x_{n-1}
\end{pmatrix}
= 
\begin{pmatrix}
b_0 \\ b_1 \\ \vdots \\ b_{n-1}
\end{pmatrix}
$$
where we used a comma to separate two indices when this was necessary to avoid confusion. 
These are $n$ equations linear in $n$ unknowns. 

In compact matrix form, this problem is written as 
$$
\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b},
$$
where $\boldsymbol{A}$ is called the _coefficient matrix_. This is a problem that we will spend considerable time solving in this lecture. 
We will be doing this mainly by using the _augmented coefficient matrix_ which places together the elements of $\boldsymbol{A}$ and $\boldsymbol{b}$, i.e.:
$$
(\boldsymbol{A}|\boldsymbol{b})= \left(
\begin{matrix}
A_{00} & A_{01} & \dots & A_{0,n-1} \\
A_{10} & A_{11} & \dots & A_{1,n-1} \\
\vdots & \vdots & \ddots & \vdots \\
A_{n-1,0} & A_{n-1,1} & \dots & A_{n-1,n-1} 
\end{matrix}\right|
\left.
\begin{matrix}
b_0 \\ b_1 \\ \vdots \\b_{n-1}
\end{matrix}
\right).
$$
For now we assume the determinant of $\boldsymbol{A}$ satisfy $|\boldsymbol{A}| \neq 0$.

In a course on linear algebra you have seen examples of legitimate operations one can carry out while solving the system of linear equations. 
Such operations change the elements of $\boldsymbol{A}$ and $\boldsymbol{b}$, but leave the solution vector $\boldsymbol{x}$ unchanged. 
More generally, we are allowed to carry the following elementary row operations:

- _Scaling_: each row/equation may be multiplied by a constant (multiplies $|\boldsymbol{A}|$ by the same constant).
- _Pivoting_: two rows/equations may be interchanged (changes sign of $|\boldsymbol{A}|$).
- _Elimination_: a row/equation may be replaced by a linear combination of that row/equation with any other row/equation (doesn't change $|\boldsymbol{A}|$).

Keep in mind that these are operations that are carried out on the augmented coefficient matrix $(\boldsymbol{A}|\boldsymbol{b})$.

Second, we wish to tackle the standard form of the matrix eigenvalue problem:
$$
\boldsymbol{A}\boldsymbol{v} = \lambda \boldsymbol{v}.
$${#eq-eigenvalue}
Here, both $\lambda$ and the column vector $\boldsymbol{v}$ are unknown. This $\lambda$ is called an _eigenvalue_ and $\boldsymbol{v}$ is called an _eigenvector_.

Let's sketch one possible approach to solve this problem.  We can move everything to the left-hand side, we have
$$
(\boldsymbol{A} - \lambda \boldsymbol{I})\boldsymbol{v} = \boldsymbol{0},
$$
where $\boldsymbol{I}$ is the $n\times n$ identity matrix and $\boldsymbol{0}$ is an $n\times 1$ column vector made up of $0$s. 
It is easy to see that we are faced with a system of $n$ linear equations: the coefficient matrix here is $A - \lambda \boldsymbol{I}$. 

The trivial solution is $\boldsymbol{v} = 0$. In order for a non-trivial solution to exist, we must have vanishing determinant $|\boldsymbol{A} - \lambda \boldsymbol{I}| = 0$.
In other words, the matrix $\boldsymbol{A} - \lambda \boldsymbol{I}$ is singular. Expanding the determinant gives us a polynomial equation, known as the _characteristic equation_:
$$
(-1)^n\lambda^n + c_{n-1} \lambda^{n-1} + \cdots + c_1 \lambda + c_0 = 0.
$$

Thus, an $n \times n $ matrix has at most $n$ distinct eigenvalues, which are the roots of the characteristic polynomial. When a root occurs twice, we say that root has multiplicity $2$. If a root occurs only once, in other words if it has multiplicity 1, we are dealing with a _simple_ eigenvalue.

Having calculated the eigenvalues, one way to evaluate the eigenvectors is simply by using @eq-eigenvalue again. 

- Specifically, for a given/known eigenvalue, $\lambda_i$, one tries to solve the system of linear equations $(\boldsymbol{A}-\lambda_i\boldsymbol{I})\boldsymbol{v}_i = 0$ for $\boldsymbol{v}_i$. 
- For each value $\lambda_i$, we will not be able to determine unique values of $\boldsymbol{v}_i$, so we will limit ourselves to computing the relative values of the components of $\boldsymbol{v}_i$. 
- We will in the following use the notation $(v_j)_0$, $(v_j)_1$ etc. to denote the $n$ elements of the column vector $\boldsymbol{v}_j$.

# Error Analysis

We now turn to a discussion of practical error estimation in work with matrices. we will provide some general derivations and examples of when a problem is "well-conditioned", typically by using matrix perturbation theory (i.e., by checking what happens if there are uncertainties in the input data).

After some preliminary comments, examples, and definitions, we will investigate quantitatively how linear systems, eigenvalues, and eigenvectors depend on the input data. We will be examining in each case the simplest scenario but, hopefully, this will be enough to help you grasp the big picture. 

## From _a posteriori_ to _a priori_ Estimates

Let us look at a specific $2\times 2$ linear system, namely $\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}$ for the case where
$$
(\boldsymbol{A}|\boldsymbol{b}) = \left(
\begin{matrix}
0.2161 & 0.1441 \\
1.2969 & 0.8648 
\end{matrix}\ \right|
\left.
\begin{matrix}
0.1440 \\
0.8642
\end{matrix}
\right).
$${#eq-linear_eq}

Simply put, there are two options on how to analyze errors: 

a. an _a priori_ analysis, in which case we try to see how easy/hard the problem is to solve before we begin solving it.
b. an _a posteriori_ analysis, where we have produced a solution, and attempt to see how good it is.

Let us start with the latter option, an _a posteriori_ approach.  Say you are provided with the following approximate solution to the problem defined in @eq-linear_eq:
$$
\tilde{\boldsymbol{x}}^T = (0.9911 \quad -0.4870).
$${#eq-approximate_sol}

One way of testing how good a solution is, is to evaluate the residue vector:
$$
\boldsymbol{r} = \boldsymbol{b} - \boldsymbol{A} \tilde{\boldsymbol{x}}.
$$
Plugging in @eq-approximate_sol, we find the residue vector
$$
\boldsymbol{r}^T = (-10^{-8} \quad 10^{-8})
$$
which might naturally lead you to the conclusion that our approximate solution $\tilde{\boldsymbol{x}}$ is pretty good!

However, here’s the thing: the exact solution to our problem is actually:
$$
\boldsymbol{x}^T = (-2 \quad 2).
$$
The approximate solution $\tilde{\boldsymbol{x}}$ doesn't contain even a single correct significant figure!

With the disclaimer that there’s much more that could be said at the a posteriori level, we now drop this line of attack and turn to an _a priori_ analysis: could we have realized that solving the problem in @eq-linear_eq was difficult? How could we know that there's something pathological about it?

## Magnitude of Determinant?

### Example 1
In an attempt to see what is wrong with our previous example in @eq-linear_eq, we start to make small perturbation to the input data. 
Imagine we didn't know the values of the coefficients in $\boldsymbol{A}$ all that precisely. Would anything change? 

Let us take
$$
\Delta \boldsymbol{A} = 
\begin{pmatrix}
0.0001 & 0 \\
0 & 0
\end{pmatrix}
$$

We want to study the effect of this perturbation on the solution, namely
$$
(\boldsymbol{A} + \Delta \boldsymbol{A})(\boldsymbol{x} + \Delta \boldsymbol{x}) = \boldsymbol{b},
$$
where $\boldsymbol{b}$ is kept fixed/unperturbed. 
For the specific case studied here, we find (using the following program)
$$
(\boldsymbol{x} + \Delta \boldsymbol{x})^T  = (-2.31294091\times 10^{-4} \quad 0.99653059\times 10^{-1}).
$$

We see that this is not a "small" effect. Our perturbation amounted to changing only one element of $\boldsymbol{A}$ by less than $0.1\%$, and 
had a dramatic impact on the solution to our problem.

In [1]:
import numpy as np
A = np.array([[0.2161, 0.1441],[1.2969, 0.8648]])
deltaA = np.array([[0.0001, 0],[0, 0]])
b = np.array([0.1440, 0.8642])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)

print(x)
print(x_dx)

print("determinant of A is: ", np.linalg.det(A))

[ 2. -2.]
[-2.31294091e-04  9.99653059e-01]
determinant of A is:  -9.999999998544968e-09


### Example 2
Let us look at the following example
$$
(\boldsymbol{A} | \boldsymbol{b}) = 
\left(
    \begin{matrix}
    1 & 1 \\
    1 & 1.001
    \end{matrix}\ 
\right|
\left.
    \begin{matrix}
    2 \\
    2.001
    \end{matrix}
\right).
$$
The exact solution is (from the following program)
$$
\boldsymbol{x}^T = (1 \quad 1).
$$
We can then add a perturbation
$$
\Delta\boldsymbol{A} = 
\begin{pmatrix}
0 & 0 \\
0 & 0.001
\end{pmatrix}.
$$
Then, the perturbed solution is
$$
(\boldsymbol{x} + \Delta\boldsymbol{x})^T = (1.5 \quad 0.5).
$$

Instead of adding a perturbation to $\boldsymbol{A}$, one can also add a perturbation to $\boldsymbol{b}$, with 
$$
\Delta \boldsymbol{b}^T = (0 \quad 0.001).
$$
We find (in the following program)
$$
(\boldsymbol{x} + \Delta \boldsymbol{x})^T = (0 \quad 2).
$$

In [2]:
import numpy as np
A = np.array([[1, 1],[1, 1.001]])
deltaA = np.array([[0, 0],[0, 0.001]])
b = np.array([2, 2.001])
deltab = np.array([0, 0.001])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)
x_dx2 = np.linalg.solve(A,b+deltab)

print(x)
print(x_dx)
print(x_dx2)


print("determinant of A is: ", np.linalg.det(A))

[1. 1.]
[1.5 0.5]
[0. 2.]
determinant of A is:  0.00099999999999989


### Example 3
There are also cases where small perturbations won't lead to dramatic consequences in the solutions. See the following code. 

In [9]:
import numpy as np
# A|b: 2   1 | 2
#      1   2 | 7
A = np.array([[2, 1],[1, 2]])
b = np.array([2,7])
# Delta A: 0    0
#          0.01 0
deltaA = np.array([[0, 0],[0.01, 0]])
# Delta b: 0.01
#          0
deltab = np.array([0.01, 0])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)
x_dx2 = np.linalg.solve(A,b+deltab)

print(x)
print(x_dx)
print(x_dx2)


print("determinant of A is: ", np.linalg.det(A))

[-1.  4.]
[-1.00334448  4.00668896]
[-0.99333333  3.99666667]
determinant of A is:  2.9999999999999996


## Norms for Matrices and Vectors
### Example 4
Consider the following question: what does "small determinant" mean? If the definition is "much less than 1", then one might counter-argue: what about the following matrix:
$$
\boldsymbol{A} = 
\begin{pmatrix}
0.2 & 0.1 \\
0.1 & 0.2
\end{pmatrix},
$$
which is just the matrix $\boldsymbol{A}$ in Example 3 multiplied by 0.1. This matrix has a determinant $\det(\boldsymbol{A}) = 0.03$, which is certainly much less than $1$.
If you also multiply each element of $\boldsymbol{b}$ in the previsou example by $0.1$, you should get the same answer. What's more, this linear system of equations should be equally sensitive to perturbations. 

Thus, the value of the determinant should be compared with the magnitude of the relevant matrix elements. 


### Definitions and Properties for Matrices
Let us provide our intuitions with quantitative backing. 
We shall introduce the _matrix norm_, which measures the magnitude of $\boldsymbol{A}$.
There are several possible definitions of a norm, but we will employ two possibilities. 

1. **Euclidean norm**:
   $$
   \|\boldsymbol{A} \|_E = \sqrt{\sum_{i=0}^{n-1}\sum_{j=0}^{n-1} |A_{ij}|^2},
   $$
   which is sometimes also called the _Frobenius norm_. 
2. **Infinity norm**:
   $$
   \| \boldsymbol{A} \|_{\infty} = \max_{0\leq i \leq n-1} \sum_{j=0}^{n-1} |A_{ij}|,
   $$
   which is also known as the _maximum row-sum norm_.

Regardless of the specific norm definitions, all matrix norms for square matrices obey:

- $\|\boldsymbol{A} \| \geq 0$
- $\| \boldsymbol{A} \| = 0$ if and only if all $A_{ij} = 0$
- $\| k \boldsymbol{A} \| = |k| \|\boldsymbol{A}\|$
- $\| \boldsymbol{A} + \boldsymbol{B} \| \leq \| \boldsymbol{A} \| + \| \boldsymbol{B} \|$
- $\| \boldsymbol{A B} \| \leq \|\boldsymbol{A}\|\|\boldsymbol{B}\|$

Now, we return to the question when the determinant is "small". A reasonable definition would be $|\det(\boldsymbol{A})| \ll \|\boldsymbol{A}\|$.

In [15]:
A1 = np.array([[0.2161, 0.1441],[1.2969, 0.8648]])
A2 = np.array([[1, 1],[1, 1.001]])
A3 = np.array([[2, 1],[1, 2]])
A4= np.array([[0.2, 0.1],[0.1, 0.2]])

Alist = [A1, A2, A3, A4]

for ii,A in enumerate(Alist):
    print("Example", ii+1, ": det(A) =", np.linalg.det(A), ", Euclidean norm =", np.linalg.norm(A))


Example 1 : det(A) = -9.999999998544968e-09 , Euclidean norm = 1.5802824652573981
Example 2 : det(A) = 0.00099999999999989 , Euclidean norm = 2.000500187453128
Example 3 : det(A) = 2.9999999999999996 , Euclidean norm = 3.1622776601683795
Example 4 : det(A) = 0.03000000000000001 , Euclidean norm = 0.31622776601683794


These results seem to be consistent with what we had seen above: 

- Examples 1 and 2 are near-singular, while Example 3 is not singular. 
- For Example 4, this criterion claims that our matrix is not quite singular (though it's getting there). 

Our introduction of the concept of the matrix norm seems to have served its purpose: a small determinant needs to be compared to the matrix norm, so Example 4 (despite having a small determinant) is not singular, given that its matrix elements are small, too.

### Definitions for Vectors
Let us also introduce norms for vector norms:
$$
\|\boldsymbol{x}\|_E = \sqrt{\sum_{i = 0}^{n-1} |x_i|^2}, \quad \|\boldsymbol{x} \|_{\infty} = \max_{0 \leq i \leq n-1} |x_i|.
$$

## Condition Number for Linear Systems

Unfortunately, our criterion $|\det(\boldsymbol{A})|\ll \| \boldsymbol{A} \|$ is flawed, though this appears in many textbooks. 
We will look at two examples. 

### Example 5
We shall consider
$$
\boldsymbol{A} = 
\begin{pmatrix}
2 \times 10^{-10} & 1 \times 10^{-10} \\
1 \times 10^{-10} & 2 \times 10^{-10}
\end{pmatrix},
$$
which is the matrix in Example 3 multiplied by $10^{-10}$. 
Here, we have $|\det(\boldsymbol{A})| = 3 \times 10^{-20}$, and $\|\boldsymbol{A} \|_E \simeq 3.16 \times 10^{-10}$, so 
$|\det(\boldsymbol{A})| \ll \|\boldsymbol{A} \|_E$ holds.

But isn't this strange? Simply multiplying a set of equations with a small number cannot be enough to make the problem near-singular. 

### Example 6
Let us look at the following $8 \times 8$ problem: 
$$
\boldsymbol{A} = 
\begin{pmatrix*}[r]
2 & -2 & -2 & \cdots & -2 \\
0 & 2 & -2 & \cdots & -2 \\
0 & 0 & 2 & \cdots & -2 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & 2
\end{pmatrix*}
$$
The corresponding results are 
$|\det(\boldsymbol{A})| = 256$, and $\|\boldsymbol{A} \|_E = 12$, so $|\det(\boldsymbol{A})| \gg \|\boldsymbol{A} \|_E$ holds.

Now, take a look at the following code. 

In [4]:
import numpy as np
A = np.array([[2, -2, -2, -2, -2, -2, -2, -2],
              [0,  2, -2, -2, -2, -2, -2, -2],
              [0,  0,  2, -2, -2, -2, -2, -2],
              [0,  0,  0,  2, -2, -2, -2, -2],
              [0,  0,  0,  0,  2, -2, -2, -2],
              [0,  0,  0,  0,  0,  2, -2, -2],
              [0,  0,  0,  0,  0,  0,  2, -2],
              [0,  0,  0,  0,  0,  0,  0,  2]])
b = np.array([1, -1, 1, -1, 1, -1, 1, -1])


deltaA = np.zeros((8,8))
deltaA[-1,0] = -0.01  # bottom-left elememnt is -0.01

# Delta b: 0.01
#          0
deltab = np.array([0.01, 0])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)

print(x)
print(x_dx)


print("determinant of A is: ", np.linalg.det(A))
print("norm of A is: ", np.linalg.norm(A))

[-21.  -11.   -5.   -3.   -1.   -1.    0.   -0.5]
[-30.88235294 -15.94117647  -7.47058824  -4.23529412  -1.61764706
  -1.30882353  -0.15441176  -0.65441176]
determinant of A is:  255.99999999999994
norm of A is:  12.0


### Derivation

In the present subsection, we will carry out an informal derivation that will point us toward a quantitative measure of ill-conditioning. This measure of the sensitivity of our problem to small changes in its elements will be called the condition number.

Let us start with the unperturbed problem
$$
\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}
$$
and the perturbed one
$$
(\boldsymbol{A} + \Delta \boldsymbol{A})(\boldsymbol{x} + \Delta \boldsymbol{x}) = \boldsymbol{b}.
$$

Combining the above two equations, we have
$$
\boldsymbol{A}\Delta \boldsymbol{x} = -\Delta \boldsymbol{A}(\boldsymbol{x} + \Delta \boldsymbol{x}).
$$
Assuming $\boldsymbol{A}$ is nonsingular (so you can invert it), we get
$$
\Delta \boldsymbol{x} = -\boldsymbol{A}^{-1}\Delta \boldsymbol{A} (\boldsymbol{x} + \Delta \boldsymbol{x}).
$$
Now, we take the norm on both sides, we obtain
$$
\| \Delta \boldsymbol{x}\| = \| \boldsymbol{A}^{-1}\Delta \boldsymbol{A} (\boldsymbol{x} + \Delta \boldsymbol{x})\|
\leq \| \boldsymbol{A}^{-1} \| \| \Delta \boldsymbol{A} \| \| \boldsymbol{x}+ \Delta \boldsymbol{x} \|.
$$

This means
$$
\frac{\| \Delta \boldsymbol{x} \|}{\|\boldsymbol{x} + \Delta \boldsymbol{x} \|}
\leq \| \boldsymbol{A}^{-1} \| \|\Delta \boldsymbol{A} \| =
\| \boldsymbol{A}\|\| \boldsymbol{A}^{-1} \| \frac{\|\Delta \boldsymbol{A} \|}{\| \boldsymbol{A}\|}.
$$

In other words, if you know an error bound on $\|\Delta \boldsymbol{A} \|/ \|\boldsymbol{A} \|$
then translates to an error bound on $\|\Delta \boldsymbol{x}\|/\|\boldsymbol{x}\| \simeq \|\Delta \boldsymbol{x}\|/\|\boldsymbol{x} + \Delta \boldsymbol{x}\|$.

This leads to the introduction of the _condition number_:
$$
\kappa(\boldsymbol{A}) = \| \boldsymbol{A}\|\| \boldsymbol{A}^{-1} \|,
$$
which determines if a small perturbation gets amplified when solving for $\boldsymbol{x}$ or not. 

A large condition number leads to an amplification of a small perturbation: we say we are dealing with an _ill-conditioned_ problem. If the condition number is of order unity, then a small perturbation is not amplified, so we are dealing with a _well-conditioned_ problem.

### Examples

- Example 1: $\kappa(\boldsymbol{A}) = 249729267.388$, ill-conditioned.
- Example 2: $\kappa(\boldsymbol{A}) = 4002.001$, ill-conditioned.
- Example 3: $\kappa(\boldsymbol{A}) = 3.33$, well-conditioned
- Example 4: $\kappa(\boldsymbol{A}) = 3.33$, well-conditioned
- Example 5: $\kappa(\boldsymbol{A}) = 3.33$, well-conditioned
- Example 6: $\kappa(\boldsymbol{A}) = 512.18$, ill-conditioned