# Signal processing course 2018/2019-1 @ ELTE
# Assignment 1
## 09.17.2018

## Task 5
### Floating-point arithmetic

##### Representaion

We can represent floating point numbers [as follows](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html):

$$
\hat{z} = \underbrace{\pm d_{0}, d_{1}, d_{2}, \cdots d_{p-1}}_\text{Mantis/Significand} \times {\underbrace{\beta}_\text{Base}}^{\overbrace{e}^\text{Exp}} = \pm \left( \sum^{p-1}_{i=0}  d_{i} \beta^{-i} \right) \beta^{e}
$$

where, $p$ is the precision (number of floating-point digits), $i$ is the position of the bit of the significand from the left (starting at 0), $e$ is the exponent, and $0 \leq d_{i} \leq \beta$.

Two other parameters associated with floating-point representations are the largest and smallest allowable exponents, $e_{min}$ and $e_{max}$. Since there are $\beta^{p}$ possible significands, and $e_{max} - e_{min} + 1$ possible exponents, a floating-point number can be encoded in

$$
1 + \log_2(e_{max} - e_{min} + 1) + \log_2(\beta^{p})
$$

bits, where the first $1 +$ is for the sign bit.

***

##### Precision

Machine precision is a quantity that characterizes the accuracy of a floating-point system, and is used in backward error analysis of floating-point algorithms. It is also known as unit roundoff or machine epsilon. Usually denoted $Ε_{mach}$, its value depends on the particular rounding being used.

With rounding to zero

$$
Ε_{mach} = \beta^{1-p}
$$

whereas rounding to nearest,

$$
Ε_{mach} = \frac{1}{2} \beta^{1-p}
$$

This is important since it bounds the relative error in representing any non-zero real number x within the normalized range of a floating-point system.

To compute the relative error that corresponds to 0.5 ULP (unit in the last place or unit of least precision), observe that when a real number is approximated by the closest possible floating-point number

$$
d,d,d \cdots d,d \times \beta^{e}
$$

the error can be as large as 

$$
0,0,0 \cdots 0,0 \times \beta' \times \beta^{e}
$$

where $\beta'$ is the digit $\frac{\beta}{2}$, there are $p$ units in the significand of the floating-point number, and $p$ units of $0$ in the significand of the error. This error is

$$
\left( \left( \frac{\beta}{2} \right) \times \beta^{-p} \right) \times \beta^{e} = \frac{1}{2} \beta^{1-p} \times \beta^{e}
$$

Since numbers of the form 

$$
d,d,d \cdots d,d \times \beta^{e}
$$ 

all have the same absolute error, but have values that range between $\beta^{e}$ and $\beta \times \beta^{e}$, the relative error ranges between 

$$
\left( \left( \frac{\beta}{2} \right) \times \beta^{-p} \right) \times \frac{\beta^{e}}{\beta^{e + 1}} = \frac{1}{2} \beta^{-p}
$$

and

$$
\left( \left( \frac{\beta}{2} \right) \times \beta^{-p} \right) \times \frac{\beta^{e}}{\beta^{e}} = \frac{1}{2} \beta^{1-p}
$$

That is

$$
\frac{1}{2} \beta^{-p} \leq \Delta_{0.5ULP} \leq \frac{1}{2} \beta^{1-p}
$$

Setting the relative error

$$
\varepsilon = \frac{1}{2} \beta^{1-p}
$$

to the largest of the bounds like above, we can say that when a real number is rounded to the closest floating-point number, the relative error is always bounded by $\varepsilon$, which is referred to as machine epsilon.

The $\Delta$ absolute error of a quantity $\hat{z}$ could be calculated using the $\varepsilon$ relative error as follows:

$$
\Delta = \varepsilon \times \hat{z}
$$

In [None]:
import numpy as np

### Display a number in scientific notation

In [None]:
num_test = -3 * np.sqrt(7)

In [None]:
def scientific(num):
    # Split the number into its components
    num_str = f'{num:.6e}'
    # Split the number into the coefficient and exponent parts
    c, p = num_str.split('e')
    # Return the formatted string
    return f'{c}*10^{int(p)}'

In [None]:
print(f'Simple notation: {num_test}')
print(f'Scientific (normal) notation: {scientific(num_test)}')

### Calculate relative and absolute errors

In [None]:
def calc_eps_error(beta, p):
    return(1/2 * (1/beta**(abs(1-p))))

#### First variaton

In [None]:
beta, p = 10, 2
eps_1 = calc_eps_error(beta, p)
print(f'Relative error in the first case: {eps_1}')

In [None]:
delta_1 = np.abs(eps_1 * num_test)
print(f'Absolute error in the first case: {delta_1}')

#### Second variaton

In [None]:
beta, p = 16, 4
eps_2 = calc_eps_error(beta, p)
print(f'Relative error in the second case: {eps_2}')

In [None]:
delta_2 = np.abs(eps_2 * num_test)
print(f'Absolute error in the second case: {delta_2}')