# Floating Point Operations
https://docs.python.org/3/tutorial/floatingpoint.html

In [11]:
a = 9.4
b = a - 9
c = b - 0.4
print('a = ', a)
print('a - 9 = ', b)
print('(a - 9) - 0.4 = ', c)
print('c == 0?', c == 0.0)

a =  9.4
a - 9 =  0.40000000000000036
(a - 9) - 0.4 =  3.3306690738754696e-16
c == 0? False


In [12]:
a = 9.4
b = a - 0.4
c = b - 9
print('a = ', a)
print('a - 0.4 = ', b)
print('(a - 0.4) - 9 = ', c)
print('c == 0?', c == 0.0)

a =  9.4
a - 0.4 =  9.0
(a - 0.4) - 9 =  0.0
c == 0? True


## IEEE 754 Floating Point Standard

**Normalized** IEEE floating point: $\pm1.bbb\ldots b\times2^p$

| precision | sign   | exponent   | mantissa   |
|------|------|------|------|
|   single  | 1 | 8 | 23 |
|   double  | 1 | 11 | 52 |
|   long double  | 1 | 15 | 64 |

## Machine epsilon
**Machine epsilon**, $\epsilon_{\rm mach}$, is the distance between 1 and the smallest floating number greater than 1.
$$\epsilon_{\rm mach} = 2^{-52}\approx 2.22045\times10^{-16}$$

## IEEE Rounding to Nearest Rule
For double precision, if the 53rd bit to the right of the binary points is 0, then round down; otherwise, round up, unless all known bits to the right of the 1 are 0's, in which case 1 is added to the 52nd bit if and only if the bit 52 is 1.

**Example 1**

Float point representation of 9.4.

\begin{align}
9 \div 2 = 4\ R\ 1 & \hspace{10pt} 1\\
4 \div 2 = 2\ R\ 0 & \hspace{10pt} 0\\
2 \div 2 = 1\ R\ 0 & \hspace{10pt} 0\\
1 \div 2 = 0\ R\ 1 & \hspace{10pt} 1\\
\end{align}
$9_{10} = 1001_2$

\begin{align}
0.4 \times 2 = 0.8 + 0& \hspace{10pt}0\\
0.8 \times 2 = 0.6 + 1& \hspace{10pt}1\\
0.6 \times 2 = 0.2 + 1& \hspace{10pt}1\\
0.2 \times 2 = 0.4 + 0& \hspace{10pt}0\\
0.4 \times 2 = 0.8 + 0& \hspace{10pt}0\\
&\hspace{10pt}\vdots\\
\end{align}
$0.4_{10} = 0.\overline{0110}_2$

The float point representation is then,
$$\mathsf{fl}(9.4)=+1.00101100110011001100110011001100110011001100110011001\times2^{3}$$

\begin{align}
\mathsf{fl}(9.4)&=9.4 + 2^{-49} - 0.4\times 2^{-48}\\
&=9.4 + 0.2\times 2^{-49}
\end{align}

$$\mathsf{fl}(9.4 - 9)=0.4 + 0.2\times 2^{-49}$$

$$\mathsf{fl}(0.4) = 0.4 + 0.1\times 2^{-52}$$

\begin{align}
\mathsf{fl}(9.4 - 9) - \mathsf{fl}(0.4) &= 0.2\times 2^{-49} - 0.1\times 2^{-52}\\
&=0.1\times 2^{-52}(2^4 - 1)\\&= 3\times 2^{-53}
\end{align}

In [3]:
print('fl(9.4 - 9) - fl(0.4) = {}'.format((9.4 - 9) - 0.4))
print('3x2^-53 = {}'.format(3*2**(-53)))

fl(9.4 - 9) - fl(0.4) = 3.3306690738754696e-16
3x2^-53 = 3.3306690738754696e-16


**Example 2**

Float point representation of 1.1.

$1_{10} = 1_2$

\begin{align}
0.1 \times 2 = 0.2 + 0& \hspace{10pt}0\\
0.2 \times 2 = 0.4 + 0& \hspace{10pt}0\\
0.4 \times 2 = 0.8 + 0& \hspace{10pt}0\\
0.8 \times 2 = 0.6 + 1& \hspace{10pt}1\\
0.6 \times 2 = 0.2 + 1& \hspace{10pt}1\\
0.2 \times 2 = 0.4 + 0& \hspace{10pt}0\\
0.4 \times 2 = 0.8 + 0& \hspace{10pt}0\\
&\hspace{10pt}\vdots\\
\end{align}
$0.1_{10} = 0.0\overline{0011}_2$

The float point representation is then,
$$\mathsf{fl}(1.1)=+1.0001100110011001100110011001100110011001100110011001\times 2^0$$

\begin{align}
\mathsf{fl}(1.1)&=1.1 + 2^{-52} - 1.2\times 2^{-53}\\
&=1.1 + 0.8\times 2^{-53}
\end{align}

$$\mathsf{fl}(1.1 - 1) = 0.1 + 0.8\times 2^{-53}$$

$$\mathsf{fl}(0.1) = 0.1 + 0.8\times 2^{-57}$$

\begin{align}
\mathsf{fl}(1.1 - 1) - \mathsf{fl}(0.1) &= 0.8\times 2^{-53} - 0.8\times 2^{-57}\\
&=0.8\times 2^{-57}(2^4 - 1)\\&= 3\times 2^{-55}
\end{align}

In [4]:
a = 1.1
b = a - 1
c = b - 0.1
print('a = ', a)
print('a - 1 = ', b)
print('(a - 1) - 0.1 = ', c)
print('3x2^-55 = ', 3*2**-55)

a =  1.1
a - 1 =  0.10000000000000009
(a - 1) - 0.1 =  8.326672684688674e-17
3x2^-55 =  8.326672684688674e-17


**Example 3**

$$\mathsf{fl}(1 + 2^{-53}) = 1$$

In [5]:
print('fl(1 + 2^(-53)) = ', 1 + 2**(-53))
print('Is fl(1 + 2**(-53)) equal 1?',1 + 2**(-53) == 1)

fl(1 + 2^(-53)) =  1.0
Is fl(1 + 2**(-53)) equal 1? True


In [6]:
def almost_equal(a, b, eps = 1e-15):
    return abs(a - b) < eps

a = 1.1
b = a - 1
c = b - 0.1
print('a = ', a)
print('a - 1 = ', b)
print('(a - 1) - 0.1 = ', c)
print('3x2^-55 = ', 3*2**-55)

print('Is a - 1 ≐ 0.1? ', almost_equal(a - 1, 0.1))

a =  1.1
a - 1 =  0.10000000000000009
(a - 1) - 0.1 =  8.326672684688674e-17
3x2^-55 =  8.326672684688674e-17
Is a - 1 ≐ 0.1?  True


**Relative error:** $\mathsf{fl}(x)=x(1+\epsilon),\ \text{with }\epsilon \le \epsilon_{\rm mach}$

**Floating point operations:**
\begin{align}
\mathsf{fl}(x + y)&=(x + y)(1 + \epsilon_1)\\
\mathsf{fl}(x - y)&=(x - y)(1 + \epsilon_2)\\
\mathsf{fl}(x \times y)&=(x \times y)(1 + \epsilon_3)\\
\mathsf{fl}(x \div y)&=(x \div y)(1 + \epsilon_4)
\end{align}

**Example 4 – Error propagation**
\begin{align}
y &= a + b + c\\
\dot{y} &= \mathsf{fl}((a + b) + c)\\
\ddot{y} &= \mathsf{fl}(a + (b + c))
\end{align}

\begin{align}
\eta &= \mathsf{fl}(a + b) = (a + b)(1 + \epsilon_1)\\
\dot{y} &= \mathsf{fl}(\eta + c) = (\eta + c)(1 + \epsilon_2)\\
 &= ((a + b)(1 + \epsilon_1) + c)(1 + \epsilon_2)\\
 &=(a+b+c)\left[1 + \frac{a + b}{a + b + c}\epsilon_1(1+\epsilon_2)+\epsilon_2\right].
\end{align}

The relative error, $\epsilon_{\dot{y}} = (\dot{y} - y)/y$,  of $\dot{y}$ to within first order is equal to:
$$ \epsilon_{\dot{y}} \dot{=} \frac{a + b}{a + b + c}\epsilon_1 + \epsilon_2$$

With similar computations, we obtain:
$$ \epsilon_{\ddot{y}} \dot{=} \frac{b + c}{a + b + c}\epsilon_1 + \epsilon_2$$



In [7]:
import numpy as np

a = 0.000023371258
b = 3367.8429
c = -3367.7811
eta = np.float32(a + b)
zeta = np.float32(b + c)
print('(a + b) + c = {0:.7e}'.format(np.float32((eta + c))))
print('a + (b + c) = {0:.7e}'.format(np.float32((a + zeta))))
print('Exact: a + b + c = 6.18023371258e-02')

(a + b) + c = 6.1917577e-02
a + (b + c) = 6.1823372e-02
Exact: a + b + c = 6.18023371258e-02


In [8]:
a = 0.000023371258
b = 3367.8429
c = -3367.7811
eta = a + b
zeta = b + c
print('(a + b) + c = {0:.15e}'.format(eta + c))
print('a + (b + c) = {0:.15e}'.format(a + zeta))
print('Exact: a + b + c = 6.18023371258e-02')

(a + b) + c = 6.182337125801496e-02
a + (b + c) = 6.182337125794834e-02
Exact: a + b + c = 6.18023371258e-02


In [9]:
import math

# try to get the largest number from fl(sin(2*pi*n)) by choosing an appropriate value of an integer n
x = 2*math.pi*999999999999999999
print('fl(sin(x)) =', math.sin(x))

fl(sin(x)) = 0.9842895889634229


# Python modules: decimal and fractions

In [10]:
import decimal
import fractions
import math
import timeit

decimal.getcontext().prec = 100
a = decimal.Decimal(2).sqrt()
print(a)
a = decimal.Decimal(1)/decimal.Decimal(10)
print(a)
b = fractions.Fraction(1,10)
c = fractions.Fraction(4,40)
print(b)
print(b*c)

a_sqrt_2_dec = decimal.Decimal(2).sqrt()
a_sqrt_2 = math.sqrt(2)

def test_decimal_product():
    a_sqrt_2_dec*a_sqrt_2_dec
    return

def test_product():
    a_sqrt_2*a_sqrt_2
    return

print('time to take product of decimals:', timeit.timeit('test_decimal_product()', setup="from __main__ import test_decimal_product", number=1000000),'s')
print('time to take product of floats:', timeit.timeit('test_product()', setup="from __main__ import test_product", number=1000000),'s')

1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641573
0.1
1/10
1/100
time to take product of decimals: 0.552694801997859 s
time to take product of floats: 0.11496358399745077 s
