## Data types

### What is a data type

The data type of an array gives `numpy` some information on how to deal with this array. Most common data types are:

- `int_`
- `float_`
- `str_`
- `bool_`

These types are a bit different from the ones of Python. They can be accessed using the `dtype` attribute (whereas Python type is given by `type(...)`) A numpy array is always of type `numpy.ndarray`, but the dtype depends on its content:

In [2]:
import numpy as np
arr = np.array([1, 2, 3])
print(type(arr))
print(arr.dtype)

<class 'numpy.ndarray'>
int64



### Use data types

#### Automatically assigned data type

In most cases, numpy will choose a dtype automatically. The chosen dtype is the one compatible with all the elements of the array.

In [3]:
np.array([1, 2, 3]).dtype

dtype('int64')

If one of the integer has a '.', Python thinks it's a float (even though decimal part is 0):

In [4]:
np.array([1, 2, 3.]).dtype      

dtype('float64')

If some non-numeric values exist, the dtype is non-numeric and mathematical operations are impossible:

In [5]:
arr = np.array(['azerty', 45, 98])
print(arr.dtype)
arr.sum()

<U21


UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> None

#### Change data type

One can change the data type using `astype`, by specifying one of these:

- a numpy dtype: object or string
- a Python type for which equivalent dtype exists in `numpy`


In [12]:
arr = np.array([1, 2, 3])
print(arr.dtype)
arr = arr.astype(np.float_)   # numpy dtype, specified as an object
print(arr.dtype)

int64
float64


In [11]:
arr = arr.astype(int)         # python type
print(arr.dtype)

int64
complex128


In [13]:
arr = arr.astype('complex')   # numpy dtype, specified as a string
print(arr.dtype)

complex128


Modifying the _dtype_ can change the data:

In [6]:
np.array([1, 2, 3.65]).astype(int)

array([1, 2, 3])

_casting_ is sometimes possible, for instance regarding boolean values:

In [18]:
np.array([1, 2, 0]).astype(bool)

array([ True,  True, False])

## Working with `nan`

### Definition

`nan` means 'not a number'. A `nan` value (`np.nan`) is used to describe:

- a missing or unknown value
- the result of an impossible mathematical operation

You must never deal with `np.nan` using equality tests (==): the preferred way is to use dedicated functions of `numpy`.

### `nan` propagation

As `np.inf` (infinite), `nan` values propagate in mathematical operations:

In [7]:
arr = np.arange(16).reshape((4,4)).astype(float)
arr[1, 2] = np.nan
arr

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5., nan,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])

In [8]:
arr.sum(axis=1)

array([ 6., nan, 38., 54.])

`numpy.isnan()` returns a boolean describing which value is a nan. With `numpy.where` replacement is possible:

In [9]:
cond = np.isnan(arr)
arr[cond] = 0
arr.sum(axis=1)

array([ 6., 16., 38., 54.])