# Number Representation and Precision

Real numbers are stored with a decimal precision (or mantissa) and the decimal exponent range. The mantissa contains the significant figures of the number (and thereby the precision of the number). A number like (9.90625)10 in the decimal representation is given in a binary representation by

(1001.11101)$_2$ = $1\times2^3 +0\times2^2 +0\times2^1 +1\times2^0 +1\times2^{−1} +1\times2^{−2} +1\times2^{−3} +0\times2^{−4} +1 \times 2^{−5}$

and it has an exact machine number representation since we need a finite number of bits to represent this number. This representation is however not very practical. Rather, we prefer to use a scientific notation. In the decimal system we would write a number like 9.90625 in what is called the normalized scientific notation. This means simply that the decimal point is shifted and appropriate powers of 10 are supplied. Our number could then be written as
$9.90625 = 0.990625 \times 10^1$,
and a real non-zero number could be generalized as
$x = \pm r \times 10^n$,
with a $r$ a number in the range $1/10 \le r < 1$. In a similar way we can represent a binary number in
scientific notation as
$x = \pm q \times 2^m$,
with a $q$ a number in the range $1/2 \le q < 1$.

In a typical computer, floating-point numbers are represented in the way described above, but with certain restrictions on q and m imposed by the available word length. In the machine, our number x is represented as

$x = (−1)^s \times mantissa \times 2^{exponent}$

where $s$ is the sign bit, and the exponent gives the available range. With a single-precision word, 32 bits, 8 bits would typically be reserved for the exponent, 1 bit for the sign and 23 for the mantissa. 

## 32-bit – single precision:

Sign bit: 1 bit

Exponent: 8 bits

Significand precision: 24 bits (23 explicitly stored)

This gives 6–9 significant decimal digits precision!

## 64-bit = double precision:

Sign bit: 1 bit

Exponent: 11 bits

Significand precision: 53 bits (52 explicitly stored)

This gives 15–17 significant decimal digits precision.
This the the Python default standard


## 128-bit = quadruple precision:

Sign bit: 1 bit

Exponent: 15 bits

Significand precision: 113 bits (112 explicitly stored)

This gives 33–36 significant decimal digits precision.


## 256-bit – Octuple precision:

Sign bit: 1 bit
    
Exponent: 19 bits
    
Significand precision: 237 bits (236 explicitly stored)

THIS IS RARELY IMPLEMENTED


# Precision effects

One important consequence of rounding error is that you should **NEVER Use an if statment to test equality of two floats.**  For instance, you should nerev, in any program, have a statment like:

In [None]:
x = 3 * 1.1
if x == 3.3:
    print(x)

If you need to do a logic trigger based on a float:

In [None]:
epsilon = 1e-12
if abs(x-3.3) < epsilon:
    print(x)

## Which operations are most important in dealing with precision?

__Subtraction__ and __Derivatives__

## Subtraction

a = b - c

We have:   $fl(a) = fl(b) - fl(c) = a(1+\epsilon_a)$  or
            $fl(a) = b(1+\epsilon_b) - c(1+\epsilon_c)$
            
So, $fl(a)/a = 1 + \epsilon_b (b/a) - \epsilon_c (c/a)$

IF $b \sim c$, we have the potential of increased error on $fl(a)$


If we have:

$x = 1000000000000000$

$y = 1000000000000001.2345678901234$

as far the computer is concerned:
    

In [None]:
x = 1000000000000000
y = 1000000000000001.2345678901234
 
print(y-x) 

**The true result should be 1.2345678901234!**

In other words, instead of 16-figure accuracy we now only have three figures and the fractional error is a few percent of the true value.  This is much worse than before!


To see another exanple of this in practice, consider two numbers:

$x = 1$, and $ y = 1+10^{-14}\sqrt 2$ 

Simply we can see that:

$ 10^{14} (y - x) = \sqrt 2$

Let us try the same calculation in python:
 

In [None]:
from math import sqrt
x = 1.0
y = 1.0 + (1e-14)*sqrt(2)

print((1e14)*(y-x))
print(sqrt(2))

Again error off by a percent.  We need to be careful in how we code math!

## Example 1:  Summing $1/n$ 

Consider the series:

$$s_1 = \sum_{n=1}^N \frac{1}{n}$$ which is finite when N is finite, then consider

$$s_2 = \sum_{n=N}^1 \frac{1}{n}$$ which when summed analyitically should give $s_2 = s_1$

In [None]:
# Write a code to perform both of these to sums for N = 1e8 and compare
sum1 = 0
for i in range(1,100000001):
    sum1 += 1/i
sum2 = 0
for i in range(100000000,0,-1):
    sum2 += 1/i
    
print(sum1,sum2)

## Example 2: $e^{-x}$

There are three possible algorithms for $e^{-x}$

1) Simple: $$e^{-x} = \sum_{n=0}^{\infty} (-1)^n \; \frac{x^n}{n!}$$  

2) Recursion: $$e^{-x} = \sum_{n=0}^{\infty} s_n = \sum_{n=0}^{\infty} (-1)^n \; \frac{x^n}{n!}$$  where $$ S_n = -s_{n-1} \frac{x}{n}$$

3) Inverse:  $$e^{x} {\sum_{n=0}^{\infty} \frac{x^n}{n!}}$$  Then take the inverse:   $$e^{-x} = \frac{1}{e^{x}}$$


In [84]:
import numpy as np
from math import factorial

# write a function to compute e^-X for all three methods 
# Then chack their output for x = 0 - 100, in steps of 10 and 
# Compare to the numpy version of exp(-x) which is imported above. 

def e_minusx_simple(x,n):
    emxsmp = 0
    for i in range(0,n):
        emxsmp += (-1)**i * ((x**i)/(factorial(i)))
    return emxsmp

def e_minusx_recurse(x,n):
    if x == 0:
        return 1.0
    else:
        sn = x/n
        if n == 1:
            return 0
        else:
            return (-1*recurse(x,n-1)*(x/n)) + sn

def e_minusx_inverse(x,n):
    emx = 0
    for i in range(0,n):
        emx += x**i / factorial(i)
    emxinv = 1/emx
    return emxinv

x = 10
n = 100
print(x,n, e_minusx_simple(x,n),e_minusx_recurse(x,n), e_minusx_inverse(x,n), np.exp(-x))
print(e_minusx_recurse(10,100))

10 100 4.5399929433607724e-05 0.09083346138551655 4.539992976248486e-05 4.5399929762484854e-05
0.09083346138551655


In [87]:
#still not sure my recursion one is correct but at least it runs now 
n = 100
for i in range(0,101,10):
    print(i,e_minusx_simple(i,n))
    print(i,e_minusx_recurse(i,n))
    print(i,e_minusx_inverse(i,n))
    print(i,np.exp(-i))
    print('~'*25)

0 1.0
0 1.0
0 1.0
0 1.0
~~~~~~~~~~~~~~~~~~~~~~~~~
10 4.5399929433607724e-05
10 0.09083346138551655
10 4.539992976248486e-05
10 4.5399929762484854e-05
~~~~~~~~~~~~~~~~~~~~~~~~~
20 5.47810291652921e-10
20 0.1664342247961777
20 2.0611536224385583e-09
20 2.061153622438558e-09
~~~~~~~~~~~~~~~~~~~~~~~~~
30 -8.553020689132783e-05
30 0.23035862408259405
30 9.357622968840171e-14
30 9.357622968840175e-14
~~~~~~~~~~~~~~~~~~~~~~~~~
40 -123.09303988190482
40 0.0846033444068402
40 4.248354255291594e-18
40 4.248354255291589e-18
~~~~~~~~~~~~~~~~~~~~~~~~~
50 -564767275982.2412
50 -638600477.118587
50 1.9287498485811295e-22
50 1.9287498479639178e-22
~~~~~~~~~~~~~~~~~~~~~~~~~
60 -4.385497375758544e+19
60 -3.706623274465469e+16
60 8.756523735728401e-27
60 8.75651076269652e-27
~~~~~~~~~~~~~~~~~~~~~~~~~
70 -2.0436358336030707e+26
70 -1.35721236651527e+23
70 3.977161397179805e-31
70 3.975449735908647e-31
~~~~~~~~~~~~~~~~~~~~~~~~~
80 -1.2156134999799205e+32
80 -6.577212533547068e+28
80 1.8362668153382484e-35
