# Optimizing Storage: Numpy Data Types

Now that you have a bit more practical experience, it’s time to go back to theory and look at data types. Data types don’t play a central role in a lot of Python code. Numbers work like they’re supposed to, strings do other things, Booleans are true or false, and other than that, you make your own objects and collections.

In NumPy, though, there’s a little more detail that needs to be covered. NumPy uses C code under the hood to optimize performance, and it can’t do that unless all the items in an array are of the same type. That doesn’t just mean the same Python type. They have to be the same underlying C type, with the same shape and size in bits!

Python defines only one type of a particular data class (there is only one integer type, one floating-point type, etc.). This can be convenient in applications that don’t need to be concerned with all the ways data can be represented in a computer. For scientific computing, however, more control is often needed.

In NumPy, there are 24 new fundamental Python types to describe different types of scalars. These type descriptors are mostly based on the types available in the C language that CPython is written in, with several additional types compatible with Python’s types.

## Data Type Objects (`dtype`)

A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. It describes the following aspects of the data:

- Type of the data (integer, float, Python object, etc.)
- Size of the data (how many bytes is in e.g. the integer)
- Byte order of the data (little-endian or big-endian)
- If the data type is structured data type, an aggregate of other data types, (e.g., describing an array item consisting of an integer and a float),
  - what are the names of the “fields” of the structure, by which they can be accessed,
  - what is the data-type of each field, and
  - which part of the memory block each field takes.
- If the data type is a sub-array, what is its shape and data type.

To describe the type of scalar data, there are several built-in scalar types in NumPy for various precision of integers, floating-point numbers, etc. An item extracted from an array, e.g., by indexing, will be a Python object whose type is the scalar type associated with the data type of the array.

> **Note:** Note that the scalar types are not dtype objects, even though they can be used in place of one whenever a data type specification is needed in NumPy.

Structured data types are formed by creating a data type whose field contain other data types. Each field has a name by which it can be accessed. The parent data type should be of sufficient size to contain all its fields; the parent is nearly always based on the void type which allows an arbitrary item size. Structured data types may also contain nested structured sub-array data types in their fields.

Finally, a data type can describe items that are themselves arrays of items of another data type. These sub-arrays must, however, be of a fixed size.m

If an array is created using a data-type describing a sub-array, the dimensions of the sub-array are appended to the shape of the array when the array is created. Sub-arrays in a field of a structured type behave differently, see [Field Access Numpy Documentation](https://numpy.org/doc/stable/reference/arrays.indexing.html#arrays-indexing-fields).

Sub-arrays always have a C-contiguous memory layout.

A simple data type containing a 32-bit big-endian integer:

In [36]:
dt = np.dtype('>i4')

In [37]:
dt.byteorder

'>'

In [38]:
dt.itemsize

4

In [39]:
dt.name

'int32'

In [40]:
dt.type is np.int32

True

A structured data type containing a 16-character string (in field ‍`name`) and a sub-array of two 64-bit floating-point number (in field `grades`):

In [49]:
dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])

In [50]:
dt['name']

dtype('<U16')

In [51]:
dt['grades']

dtype(('<f8', (2,)))

Items of an array of this data type are wrapped in an array scalar type that also has two fields:

In [53]:
x = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)

In [64]:
x[0]

array(['Sarah', 'John'], dtype='<U16')

In [26]:
x[1]['grades']

array([6., 7.])

In [28]:
type(x[1])

numpy.void

In [29]:
type(x[1]['grades'])

numpy.ndarray

## Specifying and constructing data types

Whenever a data-type is required in a NumPy function or method, either a dtype object or something that can be converted to one can be supplied. Such conversions are done by the dtype constructor:

In [30]:
dt = np.dtype(np.int32)      # 32-bit integer

In [31]:
dt = np.dtype(np.complex128) # 128-bit complex floating-point number

Several python types are equivalent to a corresponding array scalar when used to generate a dtype object:

|built-in python type|numpy type|
|:--|:--|
|`int`|`int_`|
|`bool`|`bool_`|
|`float`|`float_`|
|`complex`|`cfloat`|
|`bytes`|`bytes_`|
|`str`|`str_`|
|`buffer`|`void`|
|all others|`object_`|

### More on Data Types

This section of the tutorial was designed to get you just enough knowledge to be productive with NumPy’s data types, understand a little of how things work under the hood, and recognize some common pitfalls. It’s certainly not an exhaustive guide. The [NumPy documentation on `ndarrays`](https://numpy.org/doc/stable/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray) has tons more resources.

There’s also a lot more information on [`dtype` objects](https://numpy.org/doc/stable/reference/arrays.dtypes.html), including the different ways to construct, customize, and optimize them and how to make them more robust for all your data-handling needs. If you run into trouble and your data isn’t loading into arrays exactly how you expected, then that’s a good place to start.

Lastly, the NumPy `recarray` is a powerful object in its own right, and you’ve really only scratched the surface of the capabilities of structured datasets. It’s definitely worth reading through the [`recarray` documentation](https://numpy.org/doc/stable/reference/generated/numpy.recarray.html) as well as the documentation for the other specialized array [subclasses](https://numpy.org/doc/stable/reference/arrays.classes.html) that NumPy provides.