# Numerical Computing - Numpy Arrays

# Table of contents

[Executive Summary](#summary)


### **Resources**: 

- [_Python for Finance (2nd ed.)_](http://shop.oreilly.com/product/0636920117728.do): Sec. 4.Numpy Arrays, 4.Basic Vectorization
- _[Numpy Quickstart Tutorial - The Basics](https://docs.scipy.org/doc/numpy/user/quickstart.html#the-basics)_ (An Example; Array Creation; Printing Arrays; Basic Operations; Universal Functions; Indexing, Slicing and Iterating), _[Numpy Quickstart Tutorial - Shape Manipulation](https://docs.scipy.org/doc/numpy/user/quickstart.html#shape-manipulation)_ (Changing the shape of an array; Stacking together different arrays).

# Executive Summary <a name="summary"></a>

**TODO**

The following sections are organized as follows: 
- ...

# 1. Introduction: from Lists to Arrays <a name="intro"></a>

We'll keep this discussion as intuitive as possible and as informal as possible too.
The concept of _array_ belongs to two knowledge domains (at least):

- **Mathematics**: an array is a sequence of numbers of the same type (Naturals, Rationals, Reals,...). It can be:

    - a 1-dimensional vector $v$. That is, the sequence of elements indexed by the one integer $i$;
    - a 2-dimensional matrix $M$. That is, the sequence of elements indexed by the couple of integers $(i,j)$;
    - a N>2-dimensional tensor $T$. That is, the sequence of elements indexed by n-tuple of n integers $(i_1, \cdots, i_n)$. (*)
    
(*) Feel lost? Ok, no problem. Let's make an example with a 3-dimensional Tensor. So, if $N=3$, a 3-dimensional sequence of numbers can be visualized as a  _cube_ of numbers indexed by the 3 indexes $(i,j,k)$ where indexes $i$ and $j$ run along _height_ (rows axes) and _width_ (columns axes) of the cube, respectively. While index $k$ runs along the depth (say the _pages_ axes) of the cube. Therefore, each page is distinguished by the value of index $k=0,1,2,...$, whereas numbers on the same page differ by the values of $(i,j)$ indexes (yes, you can think each page as a matrix of numbers). For example, the $\color{red}{\text{red 1}}$ in the front page has indexes $(i,j,k) = (3,2,0)$, wherease the three $\color{green}{\text{green 2}}$ on the bottom right corner have share same $i=4$ and $j=4$ indexes and differ by the value of $k=0$ (front-page), $k=1$ (second page) and $k=2$ (back page). See picture. 

<img src="../images/tensor3d.png" width="500">

- **Informatics**: an array is a sequence of data of the same data-type. The fact that all data stored in an array are of the same data-type is important because it allows to allocate the same amount of memory (bits) for each item in the array. Moreover, being a _sequence,_ translates into the fact that consecutive items are stored in consecutive portions of memory, which are thus easily to be indexed and therefore quicker to be accessed. 


We have already seen a great example of sequence-like data-structure in basic Python: the `list`. In particular Lists feature the following key facts:

**a**: _Lists are sequences._ Therefore, consecutive elements of the lists can be allocated in consecutive slots of memory.

**b**: _Lists can store simultaneously data of heterogeneous data-type._ Therefore, it's not known _a priori_ whether we can reserve the same amount of memory to each element of the list.

**c**: _Lists are mutable (e.g. think to `.append()` method)._ Therefore, the totale amount of memory to be reserved for the allocation of the whole list is not known _a priori_ or, at most, is not fixed.

Points **b** and **c**, though they make lists very flexible, they also represent bottlenecks in terms of memory usage and performance. Lists are somehow too _general_ to be excell excel in performance too. 

There is the need of a more _specialized_ data-structure, sharing with lists the sequentiality of data, but compromising on some flexibility in the name of performance. That's why we have [NumPy](https://docs.scipy.org/doc/numpy/user/quickstart.html#quickstart-tutorial) and its data-structure `numpy.ndarray` has been created.

Key-facts of Numpy's arrays:

**a**: arrays extend the sequentiality of lists, introducing a built-in notion of dimensions (called _axes_ );

**b**: array's length ( _size_ ) is constrained to be immutable;

**c**: array's items are constrained to have the same data-type;

The built-in notion of dimensions allows to easily map the mathematical concepts of vectors, matrices and N-dimensional tensors into 1-dim, 2-dim and N-dim Numpy's arrays, respectively. Moreover, the constraints on array size ( **b** ) and same data-type **c** allow several speed improvements and _vectorization_ of code. That is, those allow to have fast(er) memory access and to write functions that work on all the elements of an array "at once".  

## 1.1. `numpy.ndarray` $\mu \epsilon \tau \alpha$-informations <a name="meta_info"></a>

These key-facts translates into the following meta-informations that can be accessed as [attributes of any array](https://docs.scipy.org/doc/numpy/user/quickstart.html#the-basics):

Attribute | Meaning | Constraints (if any)
:---: | :---: | :---:
`.ndim`  | The number of axes (dimensions) of an array: 1 for a vector, 2 for a matrix.... | -  |
`.shape` | The dimensions of the array: a Tuple `(n,m)` for a matrix of `n` rows and `m` cols| -  |
`.size` | The number of elements of the array: `n` $\times$ `m` for a matrix of shape `(n,m)` | fixed (*)  |
`.dtype` | The data-type of array's elements | fixed for all elements  |

We'll use these attributes to explore arrays that we'll introduce.

(*) the `.resize()` method allows to actually re-size an array, but creating a new array. See section [2.4.3. Changing the size: `.resize()`](#reshape).


The function `type()` returns `numpy.ndarray` for NumPy's arrays. 

As preliminary import we import `numpy` modulus and give to it the `np` alias

In [1]:
import numpy as np

Now we have access to all the contents of NumPy module. Let's start!

# 2. 1-dim arrays <a name="1_dim"></a>

We start with one-dimensional arrays (i.e. vectors). That is, a sequence of elements (usually numbers), all of the same data-type. As said, in NumPy, dimensions are called _axes_ and 1-dim arrays have 1 dimension.

## 1.1. Array Creation <a name="creation"></a>

Array can be created:
- from Lists or Tuples;
- from sequences of numbers;
- using placeholder functions.

### 1.1.1. From Lists or Tuples  <a name="from_lists"></a>

We define a list of squares

In [16]:
lis = [i**2 for i in range(10)]
lis

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [25]:
type(lis)

list

and we can define a 1-dim array `vec` accordingly

In [17]:
vec = np.array(lis)
vec

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [26]:
type(vec)

numpy.ndarray

Notice that I defined separately the list `lis` just for clarity. The above definition is equivalent to

In [57]:
vec = np.array([i**2 for i in range(10)])
vec

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Let's take a look at `vec` meta-informations:

`vec` is 1-dimensional:

In [27]:
vec.ndim

1

altough not very significant in the 1-dim case, let's see its shape: it has all the 10 values arranged along its unique dimension 

In [28]:
vec.shape

(10,)

The number of elements:

In [29]:
vec.size

10

observe that in the 1-dim case, you can retrieve the number of elements also using the `len()` function...

In [30]:
len(vec)

10

...but we'll see this is not the case in the N-dimensional case, so please use `vec.size` if you want to know how many numbers your array holds. 

Finally, having defined our vector as the array of the first 10 integers squared, it is created with elements of integer data-type and its `.dtype` is inferred accordingly

In [33]:
vec.dtype

dtype('int32')

Don't be scared the fact that what is returned is `dtype('int32')` and not simply `int`, it's just that NumPy has chosen to have it's own data-types. Anyway, you can interpret `dtype('int32')` as `int` peacefully.

Notice, that we could also have chosen explicitly to define our array as an array of Floats, instead of integers, using the `dtype` parameter of `np.array()` function

In [35]:
vec_float = np.array(lis, dtype='float')
vec_float

array([ 0.,  1.,  4.,  9., 16., 25., 36., 49., 64., 81.])

In [37]:
vec_float.dtype

dtype('float64')

and as you can see `vec_float` is the same of `vec` but its elements are all casted as decimal numbers and its data-type is then `dtype('float64')` (NumPy's version for `float`).

A possible _signature_ for the creational function `np.array()` would be 

`np.array(sequence[, dtype])` 

where `sequence` could be a list and `dtype` - if not provided - is inferred by the data-type of `sequence`'s elements, as we have just seen. The syntax `[, optionalArgument]` is conventional. Get familiar with it. 

#### What if we mix data-types?

- `int` and `float`: NumPy promotes integers to floats and define the array as float `dtype`

In [66]:
lis = [1, 2.5, 5, 6, 7.5]
print("lis: ", lis)

vec = np.array(lis)
print("dtype: ", vec.dtype)
vec

lis:  [1, 2.5, 5, 6, 7.5]
dtype:  float64


array([1. , 2.5, 5. , 6. , 7.5])

- numbers and `str`: NumPy casts all the numbers as Strings and define the array as Unicode-encoded characters `dtype` (the `U` stands for Unicode), that is an array of string-like characters.

In [65]:
lis = [1, 2.5, "EUR"]
print("lis: ", lis)

vec = np.array(lis)
print("dtype: ", vec.dtype)
vec

lis:  [1, 2.5, 'EUR']
dtype:  <U32


array(['1', '2.5', 'EUR'], dtype='<U32')

Again, in these examples, lists are separately defined just for clarity. You could equivalently do:

In [64]:
vec = np.array(lis)
print("dtype: ", vec.dtype)
vec

dtype:  <U32


array(['1', '2.5', 'EUR'], dtype='<U32')

### 1.1.2. From sequences of numbers: `np.arange()`, `np.linspace()`  <a name="arange_linspace"></a>

If you want to create an array from a sequence of numbers, you can use 

`arange([start,] stop[, step])` 

which create a 1-dim array of numbers from `start` to `stop-1`, each `step` numbers.

As suggested by the use of `[]` conventional syntax in `arange`'s signature, the parameters `start` and `step` are optional and - if not specified - default values are `start=0` and `step=1`.

In [67]:
vec = np.arange(10)
vec

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

which is equal to

In [42]:
vec = np.arange(0,11,1)

In general we can write

In [44]:
vec = np.arange(1, 7, 0.25)
vec

array([1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 , 2.75, 3.  , 3.25, 3.5 ,
       3.75, 4.  , 4.25, 4.5 , 4.75, 5.  , 5.25, 5.5 , 5.75, 6.  , 6.25,
       6.5 , 6.75])

### 1.1.3. Using placeholder content: `np.zeros()`, `np.ones()`, `np.empty()`  <a name="arange_linspace"></a>

## 1.2. Indexing, Slicing and Iterating <a name="indexing_slicing_iterating"></a>

parlare del ruolo del `:` (colon)

## 1.3. Basic operations (are _element-wise_ ) <a name="indexing_slicing_iterating"></a>

- array + scalar
- array + array

### 1.3.1. _Focus on:_ `+` and `*` operators on lists <a name="elementwise"></a>

parlare anche di `+=` e `*=`

### 1.3.2. Built-in methods: `.min()`, `.max()`, `.sum()` and more  <a name="built_in_methods"></a>

### 1.3.3. Universal functions  <a name="univ_func"></a>

# 2. N-dim arrays <a name="N_dim"></a>

## 2.1. Array Creation <a name="creation_ndim"></a>

### 2.1.1. From Lists or Tuples  <a name="from_lists_ndim"></a>

### 2.1.2. _Focus on:_ printing arrays  <a name="from_lists_ndim"></a>

### 2.1.3. Using placeholder content: `np.zeros()`, `np.ones()`, `np.empty()` with `shape` parameter <a name="arange_linspace"></a>

## 2.2. Indexing, Slicing and Iterating <a name="indexing_slicing_iterating"></a>

## 2.3. Basic operations (are _element-wise_ ) <a name="indexing_slicing_iterating"></a>

- array + scalar
- array + array
- broadcasting

### 2.3.1. _Focus on:_ Matrix Operations <a name="matrix_operations"></a>

### 2.3.2. Built-in methods: `.min()`, `.max()`, `.sum()` and more with `axis` parameter  <a name="built_in_methods"></a>

### 2.3.3. Universal functions  <a name="univ_func"></a>

## 2.4. Shape Manipulation: <a name="shape_manipulation"></a>

### 2.4.1. Changing the shape: `.reshape()` <a name="reshape"></a>

### 2.4.2. _Focus on:_ Matrix Transpose `.T` <a name="matrix_operations"></a>

### 2.4.3. Changing the size: `.resize()` <a name="reshape"></a>

### 2.4.4. From N-dim to 1-dim: `.flatten()` <a name="flatten"></a>

## 2.5. Stacking Arrays together: `.hastack()` and `.vstack()` <a name="flatten"></a>

# 3. Lists Vs Arrays <a name="list_vs_arrays"></a>

## 3.1. Speed-comparison: Lists 0 - 1 Arrays <a name="flatten"></a>

## 3.2. _Vectorization_ of code: Lists 0 - 2 Arrays <a name="vectorization"></a>