[**Demo %s**](#demo-float-double)


Python has native `int` and `float` types.

In [2]:
print(f"The type of {1} is {type(1)}")
print(f"The type of {float(1)} is {type(1.0)}")

The type of 1 is <class 'int'>
The type of 1.0 is <class 'float'>


The `numpy` package has its own `float` types:

In [3]:
one = float64(1)
print(f"The type of {one} is {type(one)}")

The type of 1.0 is <class 'numpy.float64'>


Both `float` and `float64` are double precision, using 64 binary bits per value. Although it is not normally necessary to do so, we can deconstruct a float into its significand and exponent:

In [4]:
x = 3.14
mantissa, exponent = frexp(x)
print(f"significand: {mantissa * 2}, exponent: {exponent - 1}")

significand: 1.57, exponent: 1


In [5]:
mantissa, exponent = frexp(x / 8)
print(f"significand: {mantissa * 2}, exponent: {exponent - 1}")

significand: 1.57, exponent: -2


The spacing between floating-point values in $[2^n,2^{n+1})$ is $2^n \epsilon_\text{mach}$, where $\epsilon_\text{mach}$ is machine epsilon, given here for double precision:

In [6]:
mach_eps = finfo(float).eps
print(f"machine epsilon is {mach_eps:.4e}")

machine epsilon is 2.2204e-16


Because double precision allocates 52 bits to the significand, the default value of machine epsilon is $2^{-52}$.

In [7]:
print(f"machine epsilon is 2 to the power {log2(mach_eps)}")

machine epsilon is 2 to the power -52.0


A common mistake is to think that $\epsilon_\text{mach}$ is the smallest floating-point number. It's only the smallest *relative to 1*. The correct perspective is that the scaling of values is limited by the exponent, not the significand. The actual range of positive values in double precision is

In [8]:
finf = finfo(float)
print(f"range of positive values: [{finf.tiny}, {finf.max}]")

range of positive values: [2.2250738585072014e-308, 1.7976931348623157e+308]


For the most part you can mix integers and floating-point values and get what you expect.

In [9]:
1/7

0.14285714285714285

In [10]:
37.3 + 1

38.3

In [11]:
2**(-4)

0.0625

You can convert a floating value to an integer by wrapping it in `int`.

In [12]:
int(3.14)

3