In [2]:
import numpy as np
np.array([1, 2, 3]).dtype


dtype('int32')

In [3]:
np.array([255], np.uint8) + 1

array([0], dtype=uint8)

In [4]:
np.array([2 ** 31 - 1])

array([2147483647])

In [5]:
np.array([2 ** 31 - 1]) + 1

array([-2147483648])

In [6]:
np.array([2 ** 63 - 1]) + 1

array([-9223372036854775808], dtype=int64)

In [8]:
np.array([255], np.uint8)[0] + 1

256

In [9]:
np.array([2 ** 31 - 1]) [0] + 1

  np.array([2 ** 31 - 1]) [0] + 1


-2147483648

In [10]:
np.array([2 ** 63 - 1]) [0] + 1

  np.array([2 ** 63 - 1]) [0] + 1


-9223372036854775808

Unlike true floating point errors (where the hardware FPU sets a flag whenever it does anatomic operation that overflows), we need to implement the integer overflow detectionourselves. We do it on the scalars, but not arrays because it would be too slow toimplement for every atomic operation on arrays. (Robert Kern, one of the NumPy coredevelopers)

In [13]:
# You can turn in into an error
with np.errstate(over = 'raise'):
    print(np.array([2 ** 31 - 1])[0] + 1)

FloatingPointError: overflow encountered in scalar add

In [14]:
# or suppress it vtemporarily
with np.errstate(over = 'ignore'):
    print(np.array([2 ** 31 - 1])[0] + 1)

-2147483648


#### Floats
As pure Python
float
did not diverge from the IEEE 754-standardized C
double
type(note the difference in naming), the floating point numbers transition from Python to NumPy is pretty much hassle-free: Python
float
is directly compatible with
np.float64
and Python
complex
— with
np.complex128


In [15]:

x = np.array([-1234.5])
1 / (1 + np.exp(-x))
# Output: RuntimeWarning: overflow encountered in exp
# array([0.])

np.exp(np.array([1234.5]))
# Output: RuntimeWarning: overflow encountered in exp
# array([inf])


  1 / (1 + np.exp(-x))
  np.exp(np.array([1234.5]))


array([inf])

One thing that distinguishes floats from integers is that they are inexact.
You can’t compare two floats with a == b, unless you’re sure they are represented exactly. 
You can expect floats to exactly represent integers — but only below a certain level(limited by the number of the significant digits):

In [16]:
92 / 9945539648888.0 + 1

1.0000000000092504

In [17]:
len('9279945539648888')

16

For the financial data decimal.
Decimal type is handy as it involves no additional tolerances at all:

In [18]:
from decimal import Decimal as D

a = np.array([D('0.1'), D('0.2')]); a
# Output: array([Decimal('0.1'), Decimal('0.2')], dtype=object)

a.sum()
# Output: Decimal('0.3')


Decimal('0.3')

For pure mathematical calculations fractions.Fraction can be used:

In [19]:
from fractions import Fraction

a = np.array([1, 2]) + Fraction(); a
# Output: array([Fraction(1, 1), Fraction(2, 1)], dtype=object)

a /= 10; a
# Output: array([Fraction(1, 10), Fraction(1, 5)], dtype=object)

a.sum()
# Output: Fraction(3, 10)


Fraction(3, 10)

Complex numbers are treated the same way as floats.

 There are extra convenience functions with intuitive names like np.
real
(z), np.
imag
(z), np.
abs
(z), np.
angle
(z) that work on both scalars and arrays as a whole. The only difference from the pure Python
complex,
np.complex_
does not work with integers:

In [20]:
np.array([1 + 2j])

array([1.+2.j])

#### Bools
The boolean values are stored as single bytes for better performance.
np.bool_
is aseparate type from Python’s
bool
because it doesn’t need reference counting and a link to the base class required for any pure Python type. So if you think that using 8 bits to store one bit of information is excessive look at this:

In [22]:
import sys
sys.getsizeof(True)

28

#### Strings
Initializing a NumPy array with a list of Python strings packs them into a fixed-width native NumPy dtype called np.str_. 
Reserving a space necessary to fit the longest string for every element might look wasteful (especially in the fixed USC-4 encoding as opposed to ‘dynamic’ choice of the UTF width in Python
str
)

In [23]:
import numpy as np

np.array(['abcde', 'x', 'y', 'x'])
# Output: array(['abcde', 'x', 'y', 'x'], dtype='<U5')
# Comments: 4 bytes per any character, so 5 characters * 4 bytes = 20 bytes per element


array(['abcde', 'x', 'y', 'x'], dtype='<U5')

Another option is to keep references to Python
str's in a NumPy array of objects:

In [24]:
import numpy as np

np.array(['abcde', 'x', 'y', 'x'], object)
# Output: array(['abcde', 'x', 'y', 'x'], dtype=object)
# Comments: 1 byte per ASCII character, so each element size is 49 + len(element) bytes


array(['abcde', 'x', 'y', 'x'], dtype=object)

If you’re dealing with a raw sequence of bytes NumPy has a fixed-length version of a Python
bytes type called np.bytes_:

In [25]:
import numpy as np

np.array([b'abcde', b'x', b'y', b'x'])
# Output: array([b'abcde', b'x', b'y', b'x'], dtype='|S5')
# Comments: 1 byte per ASCII character, so each element size is 5 bytes


array([b'abcde', b'x', b'y', b'x'], dtype='|S5')

Here’s a useful function that decomposes a
datetime64 array to an array of 7 integer columns (years, months, days, hours, minutes, seconds, microseconds):

In [26]:
def dt2cal(dt):
    # allocate output
    out = np.empty(dt.shape + (7,), dtype='u4')
    # decompose calendar floors
    Y, M, D, h, m, s = [dt.astype(f'M8[{x}]') for x in "YMDhms"]
    out[..., 0] = Y + 1970  # Gregorian Year
    out[..., 1] = (M - Y) + 1  # month
    out[..., 2] = (D - M) + 1  # day
    out[..., 3] = (dt - D).astype("m8[h]")  # hour
    out[..., 4] = (dt - h).astype("m8[m]")  # minute
    out[..., 5] = (dt - m).astype("m8[s]")  # second
    out[..., 6] = (dt - s).astype("m8[us]")  # microsecond
    return out

# Example usage:
a = np.array(['2021-12-15T09:00:00.000000', 
              '2021-12-18T19:00:00.000000', 
              '2021-12-24T09:00:00.000000'], dtype='datetime64[us]')

dt2cal(a)
# Output:
# array([[2021, 12, 15,  9,  0,  0,     0],
#        [2021, 12, 18, 19,  0,  0,     0],
#        [2021, 12, 24,  9,  0,  0,     0]], dtype=uint32)


array([[2021,   12,   15,    9,    0,    0,    0],
       [2021,   12,   18,   19,    0,    0,    0],
       [2021,   12,   24,    9,    0,    0,    0]], dtype=uint32)

#### Combinations thereof
A ‘structured array’ in NumPy is an array with a custom dtype made from the typesdescribed above as the basic building blocks (akin to struct in C). 
A typical exampleis an RGB pixel color: a 3 bytes long type (usually 4 for alignment), in which thecolors can be accessed by name:

In [27]:
import numpy as np

rgb = np.dtype([('x', np.uint8), ('y', np.uint8), ('z', np.uint8)])

a = np.zeros(5, rgb); a
# Output:
# array([(0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0)],
#       dtype=[('x', 'u1'), ('y', 'u1'), ('z', 'u1')])

a[0]
# Output: (0, 0, 0)

a[0]['x']
# Output: 0

a[0]['x'] = 10
a
# Output:
# array([(10,  0,  0), ( 0,  0,  0), ( 0,  0,  0), ( 0,  0,  0), ( 0,  0,  0)],
#       dtype=[('x', 'u1'), ('y', 'u1'), ('z', 'u1')])

a['z'] = 5
a
# Output:
# array([(10,  0,  5), ( 0,  0,  5), ( 0,  0,  5), ( 0,  0,  5), ( 0,  0,  5)],
#       dtype=[('x', 'u1'), ('y', 'u1'), ('z', 'u1')])


array([(10, 0, 5), ( 0, 0, 5), ( 0, 0, 5), ( 0, 0, 5), ( 0, 0, 5)],
      dtype=[('x', 'u1'), ('y', 'u1'), ('z', 'u1')])

Even though this syntax is convenient for addressing particular columns as a whole, neither structured arrays nor recarrays are something you’d want to use  in the innermost loop of a compute-intensive code:


In [29]:
a = np.random.rand(100000, 4)

b = a.view(dtype=[('x', np.float64), ('y', np.float64)])

c = np.recarray(buf=a, shape=len(a), dtype=[('x', np.float64), ('y', np.float64)])

# Reference calculation
s1 = 0
for r in a:
    s1 += (r[0]**2 + r[1]**2)**-1.5

# 5x slower
s2 = 0
for r in b:
    s2 += (r['x']**2 + r['y']**2)**-1.5

# 7x slower
s3 = 0
for r in c:
    s3 += (r.x**2 + r.y**2)**-1.5

# 20x faster
s1_fast = np.sum((a[:, 0]**2 + a[:, 1]**2)**-1.5)

# Same as s1
s2_fast = np.sum((b['x']**2 + b['y']**2)**-1.5)

# Same as s1
s3_fast = np.sum((c.x**2 + c.y**2)**-1.5)
