# NumPy

* Jake VanderPlas. 2016. *Python Data Science Handbook: Essential Tools for Working with Data*. O'Reilly Media, Inc.
* Chapter 2 - Introduction to NumPy
* https://github.com/jakevdp/PythonDataScienceHandbook

In [None]:
import numpy as np
np.__version__

In [None]:
# type TAB to get the HUGE numpy namespace
#np.

## Data Types

NumPy provides an alternative implementation for numerical arrays, improving the performance of data-driven computation compared to standard Python built-in lists.


### Python Integers vs C Integers

* Python `int`s are *complex* objects (written in C)
   * Dynamically-typed language
   * Almost infinite integer arithmetic precision
```c
struct _longobject {
    long ob_refcnt;         # reference count
    PyTypeObject *ob_type;  # type of the variable
    size_t ob_size;         # size of the following data members
    long ob_digit[1];       # integer value encoded into a long array
};
```
* C language integers (char, short, int, long, long...) are simple references to a position in memory whose bytes encode an integer value.

### Python Lists vs NumPy Arrays

* Python `list`s are *complex* objects (much more than `int`s)
   * `list`s are heterogeneous
   * Different object types are different sizes 
   * `list`s contain an array with <u>references</u> to each object
   * Contain a pointer to a block of pointers, each of which points to a Python object
* *Standard* Numpy arrays are homogeneus.
   * Contain a single pointer to one contiguous block of data.

<center><img src="img/array_vs_list.png" alt="NumPy array vs python list" style="width: 100%;"/></center>

## NumPy Arrays

### Creating Arrays from Python Lists

* `np.array(some_list)` &rarr; create an (homogeneous) array
* `np.array(some_list, dtype=<data type>)` &rarr; create an array of a given type

<br>

In [None]:
np.array([1, 4, 2, 5, 3])

<br>

In [None]:
np.array([1, 4, 2, 5, 3], dtype='float32')

<br>
If types do not match, NumPy will upcast if possible:

In [None]:
np.array([1, 4, 2, 5.9, 3])

<br>
Nested lists result in multi-dimensional arrays

In [None]:
np.array([[ 0,  1,  2,  3], [10, 11, 12, 13], [20, 21, 22, 23]])

### NumPy Standard Data Types

when constructing an array, the data type (`dtype`) argument can be specified using:
 * string &rarr; `dtype='float32'`
 * NumPy object &rarr; `dtype=np.float32`

<br>

In [None]:
np.array([1, 4, 2, 5, 3], dtype='float32')

<br>

In [None]:
np.array([1, 4, 2, 5, 3], dtype=np.float32)

|  Data type | Description                                                                   |   |   |   |
|:----------:|-------------------------------------------------------------------------------|---|---|---|
| bool_      | Boolean (True or False) stored as a byte                                      |   |   |   |
| int_       | Default integer type (same as C long; normally either int64 or int32)         |   |   |   |
| intc       | Identical to C int (normally int32 or int64)                                  |   |   |   |
| intp       | Integer used for indexing (same as C ssize_t; normally either int32 or int64) |   |   |   |
| int8       | Byte (-128 to 127)                                                            |   |   |   |
| int16      | Integer (-32768 to 32767)                                                     |   |   |   |
| int32      | Integer (-2147483648 to 2147483647)                                           |   |   |   |
| int64      | Integer (-9223372036854775808 to 9223372036854775807)                         |   |   |   |
| uint8      | Unsigned integer (0 to 255)                                                   |   |   |   |
| uint16     | Unsigned integer (0 to 65535)                                                 |   |   |   |
| uint32     | Unsigned integer (0 to 4294967295)                                            |   |   |   |
| uint64     | Unsigned integer (0 to 18446744073709551615)                                  |   |   |   |
| float_     | Shorthand for float64.                                                        |   |   |   |
| float16    | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa             |   |   |   |
| float32    | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa           |   |   |   |
| float64    | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa          |   |   |   |
| complex_   | Shorthand for complex128.                                                     |   |   |   |
| complex64  | Complex number, represented by two 32-bit floats                              |   |   |   |
| complex128 | Complex number, represented by two 64-bit floats                              |   |   |   |

### Creating Arrays from Scratch

Default values of `dtype` (most of times):
 * Integers &rarr; `int64`
 * Reals (floating point) &rarr; `float64`

<br>

* `np.zeros(10)` &rarr; a length-10 array filled with zeros 

<br>

In [None]:
np.zeros(10)

<br>

In [None]:
np.zeros(10, dtype='int64')

<br>

In [None]:
np.zeros(10, dtype='float32')

<br>

In [None]:
np.zeros(10, dtype='int32')

<br>

* `np.eye(3)` &rarr; a 3x3 identity matrix

<br>

In [None]:
np.eye(3)

<br>

* `np.ones((3,5))` &rarr; a 3x5 array filled with ones

<br>

In [None]:
np.ones((3,5))

<br>

* `np.full((3,5), 1.8)` &rarr; a 3x5 array filled with `1.8`

<br>

In [None]:
np.full((3,5), 1.8)

<br>

* `np.empty((3,5))` &rarr; an <u>*uninitialized*</u> 3x5 array

<br>

In [None]:
np.empty((3,5))

<br>

* `np.random.random((3,5))` &rarr; a 3x5 array of uniformly distributed random values in the half-open interval $[0,1)$

<br>

In [None]:
np.random.random((3,5))

<br>

* `np.random.seed(int)` &rarr; sets a seed for random number generation (**provides reproducibility**)


<br>

In [None]:
np.random.seed(43)
np.random.random((3,5))

<br>

* `np.random.normal(10, 2, (3,5))` &rarr; a 3x5 array of normally distributed random values with $\mu = 10$ and $\sigma = 2$

<br>

In [None]:
np.random.normal(10, 2, (3, 5))

<br>

* `np.random.randint(-7, 3, (3, 5))` &rarr; a 3x5 array of random integers in the half-open interval $[-7,3)$

<br>

In [None]:
np.random.randint(-7, 3, (3, 5))

<br>

* `np.arange(4, 20, 2)` &rarr; an array filled with a linear sequence in the range $[4,20)$ and step $2$

<br>

In [None]:
np.arange(4, 20, 2)

<br>

* `np.linspace(1, 7, 5)` &rarr; an array of five values evenly spaced in the range $[1,7]$

<br>

In [None]:
np.linspace(1, 7, 5)

### More on NumPy Arrays

* Array dimensions
* Array attributes
* Array indexing
* Array slicing
* Reshaping of arrays
* Array concatenation and splitting

#### Array dimensions

NumPy arrays can have any number of dimensions:

* 0-dimensional arrays &rarr; **scalars** or **rank-0 tensor**
* 1-dimensional arrays &rarr; **vectors** or **rank-1 tensor**
* 2-dimensional arrays &rarr; **matrices** or **rank-2 tensor**
* 3-dimensional arrays &rarr; **tensors** or **rank-3 tensor**
* ...
* N-dimensional arrays &rarr; **tensors** or **rank-N tensor**


In [None]:
np.random.randint(1,10,(2,2,2,3,6))

#### Array attributes

Given a NumPy array `x`:

* `x.ndim` &rarr; number of dimensions
* `x.shape` &rarr; a tuple of size `x.ndim` containing the size of each dimension
* `x.size` &rarr; total size of the array
* `x.dtype` &rarr; data type of the array

<br>

In [None]:
x = np.random.rand(3, 4)
print(x)
print(f'{x.ndim=}  {x.shape=}  {x.size=}  {x.dtype=}')

#### Array attributes

NumPy arrays can have any number of dimensions:

* 0-dimensional arrays &rarr; **scalars** or **rank-0 tensor**
* 1-dimensional arrays &rarr; **vectors** or **rank-1 tensor**
* 2-dimensional arrays &rarr; **matrices** or **rank-2 tensor**
* 3-dimensional arrays &rarr; **tensors** or **rank-3 tensor**
* ...
* N-dimensional arrays &rarr; **tensors** or **rank-N tensor**


## UFuncs - Universal Functions

## Aggregations Functions

## Broadcasting

## Boolean Manipulation

## Fancy Indexing