### Machine limits for integer and floating-point types
Get the machine limit for <i>np.float32</i>, <i>np.float64</i>, and <i>np.double</i> using <i>np.iinfo()</i>.

In [1]:
import numpy as np
np.iinfo(np.int32)

iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [2]:
np.iinfo(np.int32).max

2147483647

In [3]:
np.iinfo(int)

iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [4]:
np.finfo(np.float32)

finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)

In [5]:
np.finfo(np.float32).eps

1.1920929e-07

In [6]:
np.finfo(np.float64)

finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)

In [7]:
np.finfo(np.double)

finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)

### Check subtraction of two floating-point numbers

In [8]:
import numpy as np

def diff(n1, n2, d):
    print("First number:", n1)
    print("Second number:", n2)
    print("Their difference:", d)
    print("Computed difference - input difference = ", n1 - n2 - d)

In [9]:
x1 = np.float32(1.5)
x2 = np.float32(1.0)
x_diff = np.float32(0.5)

diff(x1, x2, x_diff)

First number: 1.5
Second number: 1.0
Their difference: 0.5
Computed difference - input difference =  0.0


In [10]:
x1 = np.float32(1.1)
x2 = np.float32(1.0)
x_diff = np.float32(0.1)

diff(x1, x2, x_diff)

First number: 1.1
Second number: 1.0
Their difference: 0.1
Computed difference - input difference =  2.2351742e-08


### Inexactness
Which of the following expresions are <i>True</i>?

In [11]:
0.1 + 0.1 + 0.1 == 0.3

False

In [12]:
x1 = 0.1
x2 = 0.10000000000000001
x3 = 0.1000000000000000055511151231257827021181583404541015625

In [13]:
eval(repr(x1)) == x1

True

In [14]:
eval(repr(x1)) == x2

True

In [15]:
eval(repr(x1)) == x3

True

### Holes in value range
What is the result of the following subtraction? Is it 0.0?

In [16]:
a = 1.0
b = 0.1
c = 1.1
c - a - b

8.326672684688674e-17

### Fractions
The [fractions](https://docs.python.org/3/library/fractions.html) module provides support for rational number arithmetic.

Return is a new Fraction instance with value numerator/denominator.

In [17]:
from fractions import Fraction

Fraction(1.1)

Fraction(2476979795053773, 2251799813685248)

Fraction(1.1) does not return Fraction(11, 10). To find rational approximations to a given floating-point number:

In [18]:
Fraction(1.1).limit_denominator()

Fraction(11, 10)

### Conversions
Conversions to an integer can reveal the inaccuracies in a floating-point number. The closest single-precision floating-point number to 20.23 is slightly less than 20.23. When it is multiplied by a hundred, the result is slightly less than 2023.0. Note, there is no rounding in converting 'y' to an integer 'i', the number is truncated:

In [19]:
x = np.float32(20.23)
y = x * 100.
i = int(y)
print(i, y)

2022 2022.9999542236328


In [20]:
x = np.float64(20.23)
y = x * 100.
i = int(y)
print(i, y)

2023 2023.0


Assigning a single-precision number to a double-precision number doesn't increase the number of significant digits:

In [21]:
x = np.float32(1.66661)
y = np.float64(x)
print(y)

1.6666100025177002


Why there are simingly random <i>00025177002</i> numbers and not <i>00000000000</i>?

The floating-point padding with zeros is done in the binary representation: 1.10101010101001101111010000000000000000000000000000000000010101...

### Rounding

In [22]:
round(256.49999) == 256

True

In [23]:
-1.225 * 100

-122.50000000000001

### [Decimal](https://docs.python.org/3/library/decimal.html) fixed point and floating-point arithmetic

In [24]:
from decimal import *
getcontext()

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])

In [25]:
Decimal('0.1') + Decimal('0.1') + Decimal('0.1') == Decimal('0.3')

True

### Accuracy of floating-point arithmetic
Examples from Donald E. Knuth "The Art of Computer Programming", volume 2 / Seminumerical Algorithms, Section 4.2.2

In [26]:
u, v, w = 11111113, -11111111, 7.51111111
(u + v) + w

9.51111111

In [27]:
u + (v + w)

9.511111110448837

In [28]:
u, v, w = 20000, -6, 6.0000003
(u*v) + (u*w)

0.005999999993946403

In [29]:
u * (v+w)

0.005999999999062311

In [30]:
from decimal import Decimal, getcontext
getcontext().prec = 8

u, v, w = Decimal(20000.), Decimal(-6.), Decimal('6.0000003')
(u*v) + (u*w)

Decimal('0.01')

In [31]:
u * (v+w)

Decimal('0.0060000')

### Summing many numbers
What are the potential arithmetic issues when summing many numbers?

[Patriot Missile Defense:
Software Problem Led to System Failure at Dhahran, Saudi Arabia](https://www.gao.gov/products/imtec-92-26)

In [32]:
import numpy as np

def summ():
    tenth = np.float32(0.1)
    count = np.float32(60*60*100*10)
    print(f"{count} {count*0.1}")
    sum = np.float32(0)
    n = np.int64(0)
    while n < 1000000:
        sum += tenth
        n += 1
        if n < 21 or n%36000 == 0:
            print(f"step {n} expected {0.1*n} solution {sum} diff {np.abs(0.1*n - sum)}")

summ()

3600000.0 360000.0
step 1 expected 0.1 solution 0.10000000149011612 diff 1.4901161138336505e-09
step 2 expected 0.2 solution 0.20000000298023224 diff 2.980232227667301e-09
step 3 expected 0.30000000000000004 solution 0.30000001192092896 diff 1.1920928910669204e-08
step 4 expected 0.4 solution 0.4000000059604645 diff 5.960464455334602e-09
step 5 expected 0.5 solution 0.5 diff 0.0
step 6 expected 0.6000000000000001 solution 0.6000000238418579 diff 2.3841857821338408e-08
step 7 expected 0.7000000000000001 solution 0.7000000476837158 diff 4.768371575369912e-08
step 8 expected 0.8 solution 0.8000000715255737 diff 7.152557368605983e-08
step 9 expected 0.9 solution 0.9000000953674316 diff 9.536743161842054e-08
step 10 expected 1.0 solution 1.0000001192092896 diff 1.1920928955078125e-07
step 11 expected 1.1 solution 1.1000001430511475 diff 1.4305114737211966e-07
step 12 expected 1.2000000000000002 solution 1.2000001668930054 diff 1.6689300519345807e-07
step 13 expected 1.3 solution 1.300000190

### [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)

In [33]:
import numpy as np

def kahan_summ():
    tenth = np.float32(0.1)
    count = np.float32(60*60*100*10)
    print(f"{count} {count*0.1}")
    sum = np.float32(0)      # Prepare the accumulator.
    n = np.int64(0)
    c = np.float32(0)        # A running compensation for lost low-order bits.
    while n < 1000000:
        y = tenth - c        # c is zero the first time around.
        t = sum + y          # Alas, sum is big, y small, so low-order digits of y are lost.
        c = (t - sum) - y    # (t - sum) cancels the high-order part of y; subtracting y recovers negative (low part of y)
        sum = t              # Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers!
        n += 1               # Next time around, the lost low part will be added to y in a fresh attempt.
        if n < 21 or n%36000 == 0:
            print(f"step {n} expected {0.1*n} solution {sum} diff {np.abs(0.1*n - sum)}")

kahan_summ()

3600000.0 360000.0
step 1 expected 0.1 solution 0.10000000149011612 diff 1.4901161138336505e-09
step 2 expected 0.2 solution 0.20000000298023224 diff 2.980232227667301e-09
step 3 expected 0.30000000000000004 solution 0.30000001192092896 diff 1.1920928910669204e-08
step 4 expected 0.4 solution 0.4000000059604645 diff 5.960464455334602e-09
step 5 expected 0.5 solution 0.5 diff 0.0
step 6 expected 0.6000000000000001 solution 0.6000000238418579 diff 2.3841857821338408e-08
step 7 expected 0.7000000000000001 solution 0.699999988079071 diff 1.1920929021691506e-08
step 8 expected 0.8 solution 0.800000011920929 diff 1.1920928910669204e-08
step 9 expected 0.9 solution 0.9000000357627869 diff 3.5762786843029915e-08
step 10 expected 1.0 solution 1.0 diff 0.0
step 11 expected 1.1 solution 1.100000023841858 diff 2.3841857821338408e-08
step 12 expected 1.2000000000000002 solution 1.2000000476837158 diff 4.7683715642676816e-08
step 13 expected 1.3 solution 1.3000000715255737 diff 7.152557368605983e-08

### Harmonic number
The following function calculates the n-th [harmonic number](https://en.wikipedia.org/wiki/Harmonic_number) in two ways. What is the difference between the two loops? One loop runs through numbers 1 up to n, adding reciprocals. Second loop runs from n down to 1 and summs up the recipricals.

In [34]:
import numpy as np

def harmonic_number(n):
    # forward sum
    f_sum = np.float32(0.0)
    for i in range(1, n+1):
        f_sum += np.float32(1.0/i)
    
    # backward sum
    b_sum = np.float32(0.0)
    for i in range(n, 0, -1):
        b_sum += np.float32(1.0/i)

    print("Forward sum", f_sum)
    print("Backward sum", b_sum)

In [35]:
%%time
harmonic_number(10_000_000)

Forward sum 15.403683
Backward sum 16.686031
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 23.5 s


In [36]:
%%time
harmonic_number(100_000_000)

Forward sum 15.403683
Backward sum 18.807919
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 4min 26s


### Quadratic equations

In [37]:
import math
import numpy as np

a=1.0
b=1.786737589984535
c=1.149782767465722e-8

In [38]:
x_1 = (-b - math.sqrt(b**2 - 4*a*c))/(2*a)
x_2 = (-b + math.sqrt(b**2 - 4*a*c))/(2*a)
print(x_1, x_2)

-1.786737583549439 -6.435095900592103e-09


In [39]:
x_1 = (-b - np.sign(b)*math.sqrt(b**2 - 4*a*c))/(2*a)
x_2 = c/(a*x_1)
print(x_1, x_2)

-1.786737583549439 -6.435095886781673e-09
