# 1.3.3.1. More data types
## Casting
"Bigger" type wins in mixed-type operations:

In [2]:
(import [numpy :as np])
(import [helpers [*]])
(require [helpers [*]])

[None, None, None]

In [3]:
(+ (np.array [1 2 3]) 1.5)

[array([2.5, 3.5, 4.5])]

Assignment never changes the type!

In [4]:
(setv a (np.array [1 2 3]))
a.dtype
(setv #s(a 0) 8.9) ; <-- float is truncated to integer
a

[None, dtype('int64'), None, array([8, 2, 3])]

Forced casts:

In [5]:
(setv a (np.array [1.7 1.2 1.6]))
(setv b (a.astype int)) ; <-- cast to integer via truncation
b

[None, None, array([1, 1, 1])]

Rounding:

In [6]:
(setv a (np.array [1.2 1.5 1.6 2.5 3.5 4.5]))
(setv b (np.around a))
b                      ; still floating-point

[None, None, array([1., 2., 2., 2., 4., 4.])]

In [10]:
(setv c ((. (np.around a) astype) int))
c

[None, array([1, 2, 2, 2, 4, 4])]

## Different data type sizes

Integers (signed):

|Type     |Definition                              |
| ------- |:--------------------------------------:|
|**int8** |8 bits                                  |
|**int16**|16 bits                                 |
|**int32**|32 bits (same as int on 32-bit platform)|
|**int64**|64 bits (same as int on 64-bit platform)|

In [11]:
(. (np.array [1] :dtype int) dtype)

[dtype('int64')]

In [12]:
(. (np.iinfo np.int32) max)
(- (** 2 31) 1)

[2147483647, 2147483647]

Unsigned integers:

|Type      |Definition|
| -------- |:--------:|
|**uint8** |8 bits    |
|**uint16**|16 bits   |
|**uint32**|32 bits   |
|**uint64**|64 bits   |

In [13]:
(. (np.iinfo np.uint32) max)
(- (** 2 32) 1)

[4294967295, 4294967295]

Floating-point numbers:

|Type        |Description                                             |
| ---------- |:------------------------------------------------------:|
|**float16** |16 bits                                                 |
|**float32** |32 bits                                                 |
|**float64** |64 bits (same as float)                                 |
|**float96** |96 bits, platform-dependent (same as **np.longdouble**) |
|**float128**|128 bits, platform-dependent (same as **np.longdouble**)|

In [14]:
(. (np.finfo np.float32) eps)

[1.1920929e-07]

In [15]:
(. (np.finfo np.float64) eps)

[2.220446049250313e-16]

In [20]:
(= (+ (np.float32 1e-8) (np.float32 1) 1))

[True]

In [21]:
(= (+ (np.float64 1e-8) (np.float64 1)) 1)

[False]

#### Long integers

Python 2 has a specific type for `long` integers that cannot overflow which are represented with an `L` immediately after the number (no space).  In Python 3, however, all integers are long and, thus, cannot overflow.

In [22]:
(. (np.iinfo np.int64) max)
(- (** 2 63) 1)

[9223372036854775807, 9223372036854775807]

Complex floating-point numbers:

|Type          |Description                           |
| ------------ |:------------------------------------:|
|**complex64** |two 32-bit floats                     |
|**complex128**|two 64-bit floats                     |
|**complex192**|two 96-bit floats, platform-dependent |
|**complex256**|two 128-bit floats, platform-dependent|

### Smaller data types

If you don't know you need special data types, then you probably don't.

Comparison on using `float32` instead of `float64`:

- Half the size in memory on disk
- Half the memory bandwidth required (may be a bit faster in some operations)

In [37]:
(setv a (np.zeros [ 10000000 ] :dtype np.float64))
(setv b (np.zeros [ 10000000 ] :dtype np.float32))
(do (import [time [time]])
    (setv time0 (time))
    (* a a)
    (setv time1 (time))
    (- time1 time0))
(do (import [time [time]])
    (setv time2 (time))
    (* b b)
    (setv time3 (time))
    (- time3 time2))


[None, None, 0.012767314910888672, 0.009488582611083984]

- **But**: bigger rounding errors - sometimes in surprising places (i.e., don't use them unless you really need them) 

## 1.3.3.2 Structured data types

|Type         |Description         |
| ----------- |:------------------:|
|`sensor_code`|(4-character string)|
|`position`   |(float)             |
|`value`      |(float)             |

In [38]:
(setv samples (np.zeros [6] :dtype [(, "sensor_code" "S4")
                                    (, "position" float)
                                    (, "value" float)]))
samples

[None, array([(b'', 0., 0.), (b'', 0., 0.), (b'', 0., 0.), (b'', 0., 0.),
       (b'', 0., 0.), (b'', 0., 0.)],
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])]

In [39]:
samples.ndim

[1]

In [40]:
samples.shape

[(6,)]

In [41]:
samples.dtype.names

[('sensor_code', 'position', 'value')]

In [43]:
(setv #s(samples :) [(, "ALFA" 1 0.37)
                     (, "BETA" 1 0.11)
                     (, "TAU" 1 0.13)
                     (, "ALFA" 1.5 0.37) 
                     (, "ALFA" 3 0.11) 
                     (, "TAU" 1.2 0.13)])
samples

[None, array([(b'ALFA', 1. , 0.37), (b'BETA', 1. , 0.11), (b'TAU', 1. , 0.13),
       (b'ALFA', 1.5, 0.37), (b'ALFA', 3. , 0.11), (b'TAU', 1.2, 0.13)],
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])]

Field access works by indexing with field names:

In [46]:
(get samples "sensor_code")

[array([b'ALFA', b'BETA', b'TAU', b'ALFA', b'ALFA', b'TAU'], dtype='|S4')]

In [48]:
(get samples "value")

[array([0.37, 0.11, 0.13, 0.37, 0.11, 0.13])]

In [49]:
#s(samples 0)

[(b'ALFA', 1., 0.37)]

In [50]:
(get #s(samples 0) "sensor_code")

[b'ALFA']

In [51]:
(setv (get #s(samples 0) "sensor_code") "TAU")
#s(samples 0)

[None, (b'TAU', 1., 0.37)]

Multiple fields at once:

In [52]:
(get samples ["position" "value"])

[array([(1. , 0.37), (1. , 0.11), (1. , 0.13), (1.5, 0.37), (3. , 0.11),
       (1.2, 0.13)], dtype=[('position', '<f8'), ('value', '<f8')])]

Fancy indexing works, as usual:

In [64]:
#s(samples (= (get samples "sensor_code") b"ALFA"))

[array([(b'ALFA', 1.5, 0.37), (b'ALFA', 3. , 0.11)],
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])]

**Note:** There are a bunch of other syntaxes for constructing structured arrays, see [here](http://docs.scipy.org/doc/numpy/user/basics.rec.html) and [here](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#specifying-and-constructing-data-types)

## 1.3.3.3. `maskedarray`: dealing with (propagation of) missing data

- For floats, one could use `NaN`s, but masks work for all types:

In [66]:
(setv x (np.ma.array [1 2 3 4] :mask [0 1 0 1]))
x

[None, masked_array(data=[1, --, 3, --],
             mask=[False,  True, False,  True],
       fill_value=999999)]

In [67]:
(setv y (np.ma.array [1 2 3 4] :mask [0 1 1 1]))
y

[None, masked_array(data=[1, --, --, --],
             mask=[False,  True,  True,  True],
       fill_value=999999)]

In [69]:
(+ x y)

[masked_array(data=[2, --, --, --],
             mask=[False,  True,  True,  True],
       fill_value=999999)]

- Masking versions of common functions:

In [70]:
(np.ma.sqrt [1 -1 2 -2])

[masked_array(data=[1.0, --, 1.4142135623730951, --],
             mask=[False,  True, False,  True],
       fill_value=1e+20)]

**Note:** There are other useful [array siblings](http://scipy-lectures.org/advanced/advanced_numpy/index.html#array-siblings)

While it is off-topic in a chapter on numpy, let's take a moment to recall good coding practices which really do pay-off in the long run:

### Good practices

- Explicit variable names (no need of a comment to explain what is in the variable)
- Style: spaces after commas, around `=`, etc.
    
    A certain number of rules for writing "beautiful" code (and, more importantly, using the same conventions as everybody else!) are given in the [Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008) and the [Docstring Conventions](https://www.python.org/dev/peps/pep-0257) page (to manage help strings).

- Except in some rare cases, variable names and comments should be in English.