### Machine limits for integer and floating-point types
Get the machine limit for <i>np.float32</i>, <i>np.float64</i>, and <i>np.double</i> using <i>np.iinfo()</i>.

In [None]:
import numpy as np
np.iinfo(np.int32)

In [None]:
np.iinfo(np.int32).max

In [None]:
np.iinfo(int)

In [None]:
np.finfo(np.float32)

In [None]:
np.finfo(np.float32).eps

In [None]:
np.finfo(np.float64)

In [None]:
np.finfo(np.double)

### Check subtraction of two floating-point numbers

In [None]:
import numpy as np

def diff(n1, n2, d):
    print("First number:", n1)
    print("Second number:", n2)
    print("Their difference:", d)
    print("Computed difference - input difference = ", n1 - n2 - d)

In [None]:
x1 = np.float32(1.5)
x2 = np.float32(1.0)
x_diff = np.float32(0.5)

diff(x1, x2, x_diff)

In [None]:
x1 = np.float32(1.1)
x2 = np.float32(1.0)
x_diff = np.float32(0.1)

diff(x1, x2, x_diff)

### Inexactness
Which of the following expresions are <i>True</i>?

In [None]:
0.1 + 0.1 + 0.1 == 0.3

In [None]:
x1 = 0.1
x2 = 0.10000000000000001
x3 = 0.1000000000000000055511151231257827021181583404541015625

In [None]:
eval(repr(x1)) == x1

In [None]:
eval(repr(x1)) == x2

In [None]:
eval(repr(x1)) == x3

### Holes in value range
What is the result of the following subtraction? Is it 0.0?

In [None]:
a = 1.0
b = 0.1
c = 1.1
c - a - b

### Fractions
The [fractions](https://docs.python.org/3/library/fractions.html) module provides support for rational number arithmetic.

Return is a new Fraction instance with value numerator/denominator.

In [None]:
from fractions import Fraction

Fraction(1.1)

Fraction(1.1) does not return Fraction(11, 10). To find rational approximations to a given floating-point number:

In [None]:
Fraction(1.1).limit_denominator()

### Conversions
Conversions to an integer can reveal the inaccuracies in a floating-point number. The closest single-precision floating-point number to 20.23 is slightly less than 20.23. When it is multiplied by a hundred, the result is slightly less than 2023.0. Note, there is no rounding in converting 'y' to an integer 'i', the number is truncated:

In [None]:
x = np.float32(20.23)
y = x * 100.
i = int(y)
print(i, y)

In [None]:
x = np.float64(20.23)
y = x * 100.
i = int(y)
print(i, y)

Assigning a single-precision number to a double-precision number doesn't increase the number of significant digits:

In [None]:
x = np.float32(1.66661)
y = np.float64(x)
print(y)

Why there are simingly random <i>00025177002</i> numbers and not <i>00000000000</i>?

The floating-point padding with zeros is done in the binary representation: 1.10101010101001101111010000000000000000000000000000000000010101...

### Rounding

In [None]:
round(256.49999) == 256

In [None]:
-1.225 * 100

### [Decimal](https://docs.python.org/3/library/decimal.html) fixed point and floating-point arithmetic

In [None]:
from decimal import *
getcontext()

In [None]:
Decimal('0.1') + Decimal('0.1') + Decimal('0.1') == Decimal('0.3')

### Accuracy of floating-point arithmetic
Examples from Donald E. Knuth "The Art of Computer Programming", volume 2 / Seminumerical Algorithms, Section 4.2.2

In [None]:
u, v, w = 11111113, -11111111, 7.51111111
(u + v) + w

In [None]:
u + (v + w)

In [None]:
u, v, w = 20000, -6, 6.0000003
(u*v) + (u*w)

In [None]:
u * (v+w)

In [None]:
from decimal import Decimal, getcontext
getcontext().prec = 8

u, v, w = Decimal(20000.), Decimal(-6.), Decimal('6.0000003')
(u*v) + (u*w)

In [None]:
u * (v+w)

### Summing many numbers
What are the potential arithmetic issues when summing many numbers?

[Patriot Missile Defense:
Software Problem Led to System Failure at Dhahran, Saudi Arabia](https://www.gao.gov/products/imtec-92-26)

In [None]:
import numpy as np

def summ():
    tenth = np.float32(0.1)
    count = np.float32(60*60*100*10)
    print(f"{count} {count*0.1}")
    sum = np.float32(0)
    n = np.int64(0)
    while n < 1000000:
        sum += tenth
        n += 1
        if n < 21 or n%36000 == 0:
            print(f"step {n} expected {0.1*n} solution {sum} diff {np.abs(0.1*n - sum)}")

summ()

### [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)

In [None]:
import numpy as np

def kahan_summ():
    tenth = np.float32(0.1)
    count = np.float32(60*60*100*10)
    print(f"{count} {count*0.1}")
    sum = np.float32(0)      # Prepare the accumulator.
    n = np.int64(0)
    c = np.float32(0)        # A running compensation for lost low-order bits.
    while n < 1000000:
        y = tenth - c        # c is zero the first time around.
        t = sum + y          # Alas, sum is big, y small, so low-order digits of y are lost.
        c = (t - sum) - y    # (t - sum) cancels the high-order part of y; subtracting y recovers negative (low part of y)
        sum = t              # Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers!
        n += 1               # Next time around, the lost low part will be added to y in a fresh attempt.
        if n < 21 or n%36000 == 0:
            print(f"step {n} expected {0.1*n} solution {sum} diff {np.abs(0.1*n - sum)}")

kahan_summ()

### Harmonic number
The following function calculates the n-th [harmonic number](https://en.wikipedia.org/wiki/Harmonic_number) in two ways. What is the difference between the two loops? One loop runs through numbers 1 up to n, adding reciprocals. Second loop runs from n down to 1 and summs up the recipricals.

In [None]:
import numpy as np

def harmonic_number(n):
    # forward sum
    f_sum = np.float32(0.0)
    for i in range(1, n+1):
        f_sum += np.float32(1.0/i)
    
    # backward sum
    b_sum = np.float32(0.0)
    for i in range(n, 0, -1):
        b_sum += np.float32(1.0/i)

    print("Forward sum", f_sum)
    print("Backward sum", b_sum)

In [None]:
%%time
harmonic_number(10_000_000)

In [None]:
harmonic_number(100_000_000)

### Quadratic equations

In [None]:
import math
import numpy as np

a=1.0
b=1.786737589984535
c=1.149782767465722e-8

In [None]:
x_1 = (-b - math.sqrt(b**2 - 4*a*c))/(2*a)
x_2 = (-b + math.sqrt(b**2 - 4*a*c))/(2*a)
print(x_1, x_2)

In [None]:
x_1 = (-b - np.sign(b)*math.sqrt(b**2 - 4*a*c))/(2*a)
x_2 = c/(a*x_1)
print(x_1, x_2)