# Review-01-Scientific-Computing

Answers to review questions from Chapter 1: Scientific Computing <cite data-cite="heath2018scientific">(Heath, 2018)</cite>.

---
Questions marked with $\bigtriangledown$ are considered more difficult.

>1.1. True or false: A problem is ill-conditioned if its solution is highly sensitive to small changes in the problem data.

True.  A problem is ill-conditioned if the relative change in a solution is larger than the change in the input data. 

> 1.2. True or false: Using higher-precision arith- metic will make an ill-conditioned problem better conditioned.

False. The condition number is the ratio of the relative forward error to the relative backward error and neither of these can be strictly reduced by increasing precision.


> 1.3. True or false: The conditioning of a prob- lem depends on the algorithm used to solve it.

False.  Conditioning refers to data.  Stability refers to algorithms.

> 1.4. True or false: A good algorithm will pro- duce an accurate solution regardless of the condi- tion of the problem being solved.

False. A problem which is ill-conditioned **cannot** be solved accurately.

> 1.5. True or false: The choice of algorithm for solving a problem has no effect on the propagated data error.

True.  The propagated data error = $f(\hat{x}) - f(x)$ and compares the result of the true function using approximate and true input, hence ignoring the choice of algorithm. 

> 1.6. True or false: A stable algorithm applied to a well-conditioned problem necessarily produces an accurate solution.

True.  An accurate solution is by definition the result of stable algorithm and well-conditioned problem. 

> 1.7. True or false: If two real numbers are exactly representable as floating-point numbers, then the result of a real arithmetic operation on them will also be representable as a floating-point number.

False.  There is no such guarantee.

> 1.8. True or false: Floating-point numbers are distributed uniformly throughout their range.

False.  Floating point numbers are **not** distributed uniformly.

> 1.9. True or false: Floating-point addition is as- sociative but not commutative.

False.  Floating point addition is commutative, but **not** associative eg $(1 + \epsilon) + \epsilon \neq 1 + (\epsilon + \epsilon)$.

> 1.10. True or false: In a floating-point number system, the underflow level is the smallest positive number that perturbs the number 1 when added to it.

False.  The machine epsilon $\epsilon$ is the smallest number such that $1 + \epsilon > 1$.


> 1.11. True or false: The mantissa in IEEE dou- ble precision floating-point arithmetic is exactly twice the length of the mantissa in IEEE single precision.

False.  The mantissa in IEEE single precision is 24 bits and in IEEE double precision is 53 bits.

> 1.12. What three properties characterize a well- posed problem?

1. solution exists
2. solution is unique
3. solution depends continuously on input (eg no discontinuities)

> 1.13. List three sources of error in scientific com- putation.

1. computational error
  * truncation error
  * rounding error 
2. data error
  * approximations

> 1.14. Explain the distinction between truncation (or discretization) and rounding

Truncation is caused by use of mathematical approximations such as use of truncating series or discrete approximations.

Rounding is caused by the inexact representation of real numbers and arithmetic in the floating point representation used by a computer.

> 1.15. Explain the distinction between absolute error and relative error.

absolute error = approximate value - true value

relative error = absolute error / true value

The **relative error** is required in order to interpret the magnitude of an error in context of the problem.

> 1.16. Explain the distinction between computa- tional error and propagated data error.

computational error = $\hat{f}(\hat{x}) - f(\hat{x})$

propagated data error = $f(\hat{x}) - f(x)$

The computational error describes the difference between the approximating function and true function.

The propagated data error describes the difference between the approximate input and true input.

---


> 1.17. Explain the distinction between precision and accuracy.

Precision refers to how close two numbers are to each other.

Accuracy refers to how close a computed solution is to the true solution.

> 1.18. (a) What is meant by the conditioning of a problem?
(b) Is it affected by the algorithm used to solve the problem?
(c) Is it affected by the precision of the arithmetic used to solve the problem?

(a) Conditioning refers to data and is the ratio of the relative forward error to the relative backward error.  Values of this ratio which are much larger than 1 indicate an ill-conditioned problem.

(b) Conditioning refers to data, **not** algorithms.

(c) Yes. The relative forward error consists in part of computational error which has as a component rounding error.

> 1.19. If a computational problem has a condition number of 1, is this good or bad? Why?

Good. If $\text{cond} \gg 1$ (eg much larger), then we say a problem is ill-conditioned.

> 1.20. Explain the distinction between relative condition number and absolute condition number.

Since relative condition number has input or output in denominator, it will be undefined when either is 0.  In such cases, use absolute condition number which is defined as the ratio of the absolute change in solution with change in input.  The absolute condition number is useful in some kinds of problems such as root finding where the solution is expected to be 0.

> 1.21. What is an inverse problem? How are the conditioning of a problem and its inverse related?

The condition number of inverse of f is reciprocal of condition number.

> 1.22. (a) What is meant by the backward error in a computed result?
(b) When is an approximate solution to a given problem considered to be good according to back- ward error analysis?

(a) Backward error = approximate input - true input

(b) An approximate solution is good when it is an exact solution to a problem having a small backward error (referred to as "nearby").

> 1.23. Suppose you are solving a given problem using a given algorithm. For each of the follow- ing, state whether it is affected by the stability of the algorithm, and why.
(a ) Propagated data error
(b) Accuracy of computed result (c) Conditioning of problem

(a) Propagated data error relates the result obtained from the true function using approximate and true input and is **not** related to stability.

(b) Stability is concerned with the computational error which can affect the accuracy of the computed result, either through the introduction of truncation or rounding error.

(c) Conditioning and stability are orthogonal.

> 1.24. (a) Explain the distinction between for- ward error and backward error.
(b) How are forward error and backward error re- lated to each other quantitatively?

(a) Forward error is the difference between the computed result and true result.   Backward error is the difference between the approximate input and the true input.

(b) The condition number is the ratio of the relative forward error to the relative backward error. 

> 1.25. For a given floating-point number system, describe in words the distribution of machine num- bers along the real line.

Floating point numbers are finite and discrete.

This is in contrast to real numbers which are infinite and continuous.

> 1.26. In floating-point arithmetic, which is gen- erally more harmful, underflow or overflow? Why?

Overflow is generally more harmful since there is no way to approximate a quantity with an arbitrarily large magnitude.

In contrast, 0 is often a good approximation for underflow.

> 1.27. In floating-point arithmetic, which of the following operations on two positive floating-point operands can produce an overflow?
(a) Addition
(b) Subtraction (c) Multiplication (d) Division

Multiplication and division can produce overflow.

> 1.28. In floating-point arithmetic, which of the following operations on two positive floating-point operands can produce an underflow?
(a) Addition
(b) Subtraction (c) Multiplication (d) Division

Addition and subtraction can produce underflow.


> 1.29. List two reasons why floating-point number systems are usually normalized.

* Makes each bit pattern unique.
* Eliminates leading zeros, maximizing available precision.

> 1.30. In a floating-point system, what quantity determines the maximum relative error in repre- senting a given real number by a machine number?

The unit roundoff bounds the relative error in representing a number where $|\frac{fl(x) - x}{x}| \leq \epsilon$.

> 1.31. (a) Explain the difference between the rounding rules “round toward zero” and “round to nearest” in a floating-point system.
(b) Which of these two rounding rules is more ac- curate?
(c) What quantitative difference does this make in the unit roundoff εmach?

(a) round-to-zero chops or truncates digits whereas round-to-nearest finds the closest representable number

(b) round-to-nearest is more accurate

(c) The unit roundoff using round-to-nearest is 1/2 of unit roundoff using round-to-zero.

> 1.32. In a p-digit binary floating-point system with rounding to nearest, what is the value of the unit roundoff εmach?

$
\epsilon_{\text{mach}} = \frac{1}{2} \beta^{1-p}
$


> 1.33. In a floating-point system with gradual un- derflow (subnormal numbers), is the representa- tion of each number still unique? Why?

Yes, the representation is unique since a particular bit pattern in the exponent field is used to identify the subnormal.

> 1.34. In a floating-point system, is the product of two machine numbers usually exactly repre- sentable in the floating-point system? Why?

The product of 2 p-digit mantissas can result in possible loss of digits if $p_i + p_j > \text{precision}$. 

> 1.35. In a floating-point system, is the quotient of two nonzero machine numbers always exactly representable in the floating-point system? Why?

The quotient of 2 p-digit mantissas can result in loss of digits if $\frac{p_i}{p_j} >$ precision.

> 1.36. (a) Give an example to show that floating- point addition is not necessarily associative.
(b) Give an example to show that floating-point multiplication is not necessarily associative.

(a) $(1 + \epsilon) + \epsilon \neq 1 + (\epsilon + \epsilon)$

(b) $\frac{1}{2} (\text{max} + \text{max}) \neq \frac{1}{2} \text{max} + \frac{1}{2} \text{max}$

> 1.37. (a) In what circumstances does cancella- tion occur in a floating-point system? (b) Does the occurrence of cancellation imply that the true result of the specific operation causing it is not exactly representable in the floating-point system? Why?
(c) Why is cancellation usually bad?

(a) Subtracting 2 numbers of similar magnitudes results in the loss of the most significant aka leading digits.

(b) If the numbers have the same magnitude, then the subtraction results in fewer significant digits and is exactly representable.

(c) Cancellation is bad because the most significant digits are lost.  Compare this to rounding in which the least significant digits are lost.

> 1.38. Give an example of a number whose deci- mal representation is finite (i.e., it has only a finite number of nonzero digits) but whose binary rep- resentation is not.

1/10 is not exactly representable in binary.

> 1.39. Give examples of floating-point arithmetic operations that would produce each of the excep- tional values Inf and NaN.

Inf: divide a finite number by 0 eg 1/0, 2/0, ....

NaN: undefined operation or operation involving Inf eg 0/0, 0 * Inf