In [1]:
using DrWatson;
@quickactivate "NumericalAnalysis"

# Conditioning and Stability

To supplement these notes, it is recommended that you watch these video lectures on 

1. [conditioning](https://www.youtube.com/watch?v=a5oQktSURoE&list=PLvUvOH0OYx3BcZivtXMIwP6hKoYv0YvGn&index=2&t=4s)
2. [stability](https://www.youtube.com/watch?v=GgEPL3_wlDo&list=PLvUvOH0OYx3BcZivtXMIwP6hKoYv0YvGn&index=3)

## The Big Picture

When we study numerical methods, it is important to assess and control for what can go wrong. In order to do this, we will distinguish between the concepts of **problem** (which can often be described abstractly as a mathematical function) and the method of solution, *i.e.*, an **algorithm** for solving the problem. Even though we have yet to define these terms, we will emphasize from the start that

1. **conditioning** is a property of a problem, and 
2. **stability** is a property of an algorithm. 

We start by observing that a problem and an algorithm both provide sources of errors. A problem will typically involve some data, for example, the problem may require the specification of initial conditions, coefficients, or a matrix. When the problem is input into a computer, even before an algorithm is applied a perturbation will occur perhaps due to roundoff error. By how much might the solution change as a result of this type of perturbation to the problem? This is the issue that conditioning addresses. If a problem is highly sensitive to perturbations, then the problem is said to be **ill-conditioned**. If a problem is ill-conditioned, then there is likely no algorithm that will perform well in solving (even approximately) the problem. 

Similarly, an algorithm will often require the input of some data. For example, an iterative method to approximate the root of polynomial will require the input of an initial guess. For such an algorithm, what happens if the initial guess is off by some number of digits? Will the error decrease after some number of iterations or will it grow? If the error grows, even when our initial guess is but a small perturbation of the exact solution, then our algorithm is said to be **unstable**. Stable algorithms are always preferred over unstable algorithms. 

In oder to proceed, we need to develop some theoretical concepts around stability and conditioning. Before doing so, let's develop some further intuition for these ideas. 

## Developing Intuition

An algorithm designed to solve (perhaps approximately) a problem or class or problems will typically also involve input data. For example, our iterative method

$$x_{n+1} = \frac{1}{2}\left(x_{n} + \frac{a}{x_{n}} \right),$$

for approximating $\sqrt{a}$ requires an initial guess $x_{0}$ to get started. We know that if $x=\sqrt{a}$, then

$$
\begin{align*}
\frac{1}{2}\left(x + \frac{a}{x} \right) &= \frac{1}{2}\left(\sqrt{a} + \frac{a}{\sqrt{a}} \right) \\
&= \frac{1}{2}\left(\sqrt{a} + \sqrt{a} \right) \\
&= \sqrt{a} \\
&= x.
\end{align*}
$$

Thus, $x=\sqrt{a}$ is said to be a **fixed-point** (a concept we will define precisely and discuss in detail later) of the iteration

$$x_{n+1} = \frac{1}{2}\left(x_{n} + \frac{a}{x_{n}} \right).$$

Therefore, if $x_{0} = \sqrt{a}$ we get an exact solution after one iteration. What happens if $x_{0} = \sqrt{a + \epsilon}$, where $\epsilon$ is a small perturbation? Later we will prove in a more general setting that this algorithm for approximating $\sqrt{a}$ is stable. 

Let's examine an unstable algorithm for approximating $\sqrt{a}$. In the homework, you will be asked to show that $x=\sqrt{a}$ is a fixed-point for the iteration 

$$x_{n+1} = x_{n} + x_{n}^{2} - a.$$

Consider the specific case where $a=2$. Let $x_{0} = 1.414$, note that this initial guess is accurate to four digits. Let's implement this new algorithm in Julia and apply it with this particular initial guess:

In [2]:
f(x,a) = x + x^2 - a;

In [3]:
x = 1.414;
x = f(x,2);
x = f(x,2);
println(x)

1.4110842528159986


Observe that after just two iterations, our error has grown:

In [4]:
intial_rel_error = abs(1.414^2 - 2)/abs(2);
one_it_rel_error = abs(x^2 - 2)/abs(2);
println("Initial relative error ", intial_rel_error);
println("Relative error after one iteration ", one_it_rel_error);

Initial relative error 0.0003020000000001355
Relative error after one iteration 0.004420615727357524


Let's explore this some more by trying different number of iterations and/or different initial guesses. To do this, we will write a Julia function that iterates $x_{n+1} = x_{n} + x_{n}^{2} - 2$ some specified number of times.  

In [5]:
function unstable_root(init_guess,num_its)
    x = init_guess;
    for i = 1:num_its
        x = f(x,2);
    end
    return x
end

unstable_root (generic function with 1 method)

In [6]:
y1 = unstable_root(1.414,10);
y2 = unstable_root(1.5,3);
y3 = unstable_root(1.5,10);

Let's look at the resulting relative errors:

In [7]:
y1

-0.627342646471925

In [8]:
y1_rel_err = abs(y1^2 - 2)/abs(2);
y2_rel_err = abs(y2^2 - 2)/abs(2);
y3_rel_err = abs(y3^2 - 2)/abs(2);
println("Relative error ", y1_rel_err);
println("Relative error ", y2_rel_err);
println("Relative error ", y3_rel_err);

Relative error 0.8032206019588006
Relative error 37.04236602783203
Relative error 3.4629837042076306e245


For one last experiment, let's see what happens if we start with $x_{0}$ equal to `sqrt(2)`:

In [9]:
yf = unstable_root(sqrt(2),1);
yf_rel_err = abs(yf^2 - 2)/abs(2);
println("Relative error ", yf_rel_err);

Relative error 1.1102230246251565e-15


**Question:** What do you conclude about the algorithm $x_{n+1} = x_{n} + x_{n}^{2} - a$ when applied to  approximate $\sqrt{2}$? 

Now let's build some intuition regarding conditioning. Suppose we want to add 1 to some real number $x$. We can describe this problem (abstractly) by a function $h(x) = x + 1$. When we solve this on a computer we will have to deal with finite precision arithmetic since $x \mapsto \text{fl}(x) = x(1+\delta)$, where $|\delta|\leq \frac{\epsilon_{\text{mac}}}{2}$. Thus, our problem gets perturbed:

$$x + 1 \mapsto x(1+\delta) + 1.$$ 

Let's compute the resulting relative error:

$$
\begin{align*}
\frac{|h(x(1+\delta))-h(x)|}{|x+1|} &= \frac{|x(1+\delta) + 1 - (x+1)|}{|x+1|} \\
&= \frac{|\delta x|}{|x+1|},
\end{align*}
$$

and this error can be pretty large if $x+1$ is very small. For example, suppose that $x=-1.0012$ and that we round this to $-1.0$. We can compute the relative error for the input as

In [10]:
abs(-1.0-(-1.0012))/abs(-1.0012)

0.001198561725928975

Now let's add by 1.0 and compute the relative error, noting that $-1.0 + 1 = 0$ while $-1.0012 + 1 = -0.0012$:

In [11]:
abs((-1.0012 + 1) - (-1.0 + 1))/abs(-1.0012 + 1)

1.0

We see that we have a **loss of significance** due to **subtractive cancellation**. That is, the problem of adding 1 to a number $x$ when $x+1$ is very small is **ill-conditioned**. Regardless of the algorithm used to perform the addition, this loss of signficance can not be avoided. 

Now let's be a little more formal in our treatment of conditioning and stability.

## The Theory of Conditioning

Suppose that we have a (mathematical) problem that can be represented as a function $f:\mathbb{R} \rightarrow \mathbb{R}$. When this problem is treated computationally, the input gets mapped to it's finite precision representation, that is, $x \mapsto \tilde{x} = \text{fl}(x)$. We are interested in the ratio of relative errors:

$$\frac{\frac{|f(x) - f(\tilde{x})|}{|f(x)|}}{\frac{|x-\tilde{x}|}{|x|}}.$$

Now, using that $\tilde{x} = \text{fl}(x) = x(1+\delta)$, where $|\delta|\leq \frac{\epsilon_{\text{mac}}}{2}$ we have that

$$
\begin{align*}
\frac{\frac{|f(x) - f(\tilde{x})|}{|f(x)|}}{\frac{|x-\tilde{x}|}{|x|}} &= \frac{\frac{|f(x) - f(x(1+\delta))|}{|f(x)|}}{\frac{|x-x(1+\delta)|}{|x|}} \\
&= \frac{\frac{|f(x) - f(x(1+\delta))|}{|f(x)|}}{\frac{|x\delta|}{|x|}} \\
&= \frac{|f(x) - f(x(1+\delta))|}{|\delta f(x)|}.
\end{align*}
$$

Under the ideal situation, $\delta = 0$. Thus, we ask, what happens as $\delta \rightarrow 0$? This leads us to an important definition. Let

$$\kappa_{f}(x) = \lim_{\delta \rightarrow 0}\frac{|f(x) - f(x(1+\delta))|}{|\delta f(x)|}.$$

Then $\kappa_{f}(x)$ is called the **relative condition number** for the problem $f(x)$.  

It is often possible to compute $\kappa_{f}(x)$ without have to take a limit provided that $f$ is differentiable. This fact comes from the following calculation:

$$
\begin{align*}
\kappa_{f}(x) &= \lim_{\delta \rightarrow 0}\frac{|f(x) - f(x(1+\delta))|}{|\delta f(x)|} \\
&= \lim_{\delta \rightarrow 0}\left|\frac{f(x(1+\delta)) - f(x)}{\delta x} \frac{x}{f(x)} \right| \\
&= \lim_{\delta \rightarrow 0}\left|\frac{f(x+\delta x)) - f(x)}{\delta x} \frac{x}{f(x)} \right| \\
&= \left|\frac{xf'(x)}{f(x)} \right|.
\end{align*}
$$

As an example, consider a generalization of our problem to add one to a real number. That is, let $f(x) = x - c$ where $c\in \mathbb{R}$. Note that the problem of adding one is a special case where $c=-1$. Then

$$\kappa_{f}(x) = \left|\frac{x}{x - c}\right|,$$

and this will be large if $|x| \gg |x-c|$. Note that we get something for free here. There is no significant mathematical difference between the operations of addition and subtraction. Furthermore, the expression $\left|\frac{x}{x - c}\right|$ is symmetric in $x$ and $c$ so that if we perturb $c$ instead of $x$ the relative condition number will be $\left|\frac{c}{c - x}\right|$. 

**A problem were the relative condition number is much larger than 1 is typically considered ill-conditioned.**

As an exercise, you should analyze the conditioning of the problem of multiplication by a constant $c$. That is, compute $\kappa_{f}(x)$ for $f(x) = cx$. You will also show in the homework that the problem of evaluating the square root function for an input near 1 is well-conditioned. 

As the course proceeds, we will consider the conditioning of many common problems. For example, when we study numerical methods for solving linear systems $Ax=b$, where $A$ is a matrix, we will see that the condition of the problem is determined by the condition of the matrix $A$ (to be defined later). At this point you should have some significant understanding of what conditioning is all about. Next, let's look at the concept of stability in greater detail.  

## The Theory of Stability

Recall (or realize) that an algorithm is a complete specification of how, exactly, to solve a problem; each step of an algorithm must be unambiguously defined and there can be only a finite number of steps. Roughly, an algorithm is stable if it returns results that are about as accurate as the problem and input data. We have already seen that there might be more than one way to solve (or approximate a solution to) a problem. That is, there might be more than one algorithm that can be applied to a particular problem or class of problems. When error in the result of an algorithm exceeds what conditioning can explain, we call the algorithm **unstable**.

A reasonable question is, how do we assess the stability of an algorithm. A common approach is via the use of **backward error** and backward error analysis. To define this concept, let $f$ be a problem and let $\tilde{f}$ be an algorithm for computing the problem $f$. If our (exact) data is $x$ and $\tilde{y} = \tilde{f}(x)$, and if there is a value $\tilde{x}$ such that 

$$f(\tilde{x}) = \tilde{y} = \tilde{f}(x),$$

then the quantity 

$$\frac{|x-\tilde{x}|}{|x|},$$

is called the relative **backward error**. 

The point is, if an algorithm always produces small backward errors, then it is stable. 

We will illustrate this in the context of rootfinding for polynomials. It is convenient to make use of the Julia package [`Polynomials.jl`](https://github.com/JuliaMath/Polynomials.jl). 

In [12]:
using Polynomials

We will define a six-degree polynomial with six roots that we know exaclty:

In [13]:
r = [-2.0,-1,1,1,3,6] # list the roots
p = fromroots(r) # construct the polynomial

In [14]:
r_computed = sort(roots(p)) # numerically compute and sort the computed roots

6-element Vector{Float64}:
 -2.0000000000000013
 -0.999999999999999
  0.9999999902778504
  1.0000000097221495
  2.9999999999999996
  5.999999999999992

We can easily compute the relative error for each of the computed roots:

In [15]:
abs.(r - r_computed) ./ abs.(r)

6-element Vector{Float64}:
 6.661338147750939e-16
 9.992007221626409e-16
 9.722149640900568e-9
 9.722149529878266e-9
 1.4802973661668753e-16
 1.3322676295501878e-15

Now let's compute the backward error:

In [16]:
p_computed = fromroots(r_computed) # for a polynomials using the computed (estimated) roots of the original polynomial

To assess the backward error, take note that the relevant data in rootfinding are the polynomial coefficients. Thus, we will examine the relative error of the two polynomial coefficients:

In [17]:
abs.(coeffs(p) - coeffs(p_computed)) ./ abs.(coeffs(p))

7-element Vector{Float64}:
 1.973729821555834e-15
 2.3684757858670005e-15
 6.609699867535816e-16
 4.844609562000683e-16
 4.440892098500626e-15
 1.1102230246251565e-15
 0.0

We see that, even though there are some computed roots relatively far from the exact values, they are nevertheless the roots of a polynomial with roots very close to the roots of the original polynomial.  

## Up Next

In the next lecture we will begin to study numerical methods for systems of linear equations, beginnning with square linear systems wich corresponds with Chapter 2 of the textbook. If you feel a bit rusty on linear algebra, it is suggested that you study [this matrix algebra review video](https://www.youtube.com/watch?v=bRM3zrzZYg8&list=PLvUvOH0OYx3BcZivtXMIwP6hKoYv0YvGn&index=5). 