# Numerical Computing - Numpy Arrays

# Table of contents

[Executive Summary](#summary)


### **Resources**: 

- [_Python for Finance (2nd ed.)_](http://shop.oreilly.com/product/0636920117728.do): Sec. 4.Numpy Arrays, 4.Basic Vectorization
- _[Numpy Quickstart Tutorial - The Basics](https://docs.scipy.org/doc/numpy/user/quickstart.html#the-basics)_ (An Example; Array Creation; Printing Arrays; Basic Operations; Universal Functions; Indexing, Slicing and Iterating), _[Numpy Quickstart Tutorial - Shape Manipulation](https://docs.scipy.org/doc/numpy/user/quickstart.html#shape-manipulation)_ (Changing the shape of an array; Stacking together different arrays).

# Executive Summary <a name="summary"></a>

We'll keep this discussion as intuitive as possible and as informal as possible too.
The concept of _array_ belongs to two knowledge domains (at least):

- **Mathematics**: an array is a sequence of numbers of the same type (Naturals, Rationals, Reals,...). It can be:

    - a 1-dimensional vector $v$. That is, the sequence of elements indexed by the one integer $i$;
    - a 2-dimensional matrix $M$. That is, the sequence of elements indexed by the couple of integers $(i,j)$;
    - a N>2-dimensional tensor $T$. That is, the sequence of elements indexed by n-tuple of n integers $(i_1, \cdots, i_n)$. (*)
    
(*) Feel lost? Ok, no problem. Let's make an example with a 3-dimensional Tensor. So, if $N=3$, a 3-dimensional sequence of numbers can be visualized as a  _cube_ of numbers indexed by the 3 indexes $(i,j,k)$ where indexes $i$ and $j$ run along _height_ (rows axes) and _width_ (columns axes) of the cube, respectively. While index $k$ runs along the depth (say the _pages_ axes) of the cube. Therefore, each page is distinguished by the value of index $k=0,1,2,...$, whereas numbers on the same page differ by the values of $(i,j)$ indexes (yes, you can think each page as a matrix of numbers). For example, the $\color{red}{\text{red 1}}$ in the front page has indexes $(i,j,k) = (3,2,0)$, wherease the three $\color{green}{\text{green 2}}$ on the bottom right corner have share same $i=4$ and $j=4$ indexes and differ by the value of $k=0$ (front-page), $k=1$ (second page) and $k=2$ (back page). See picture. 

<img src="../images/tensor3d.png" width="500">

- **Informatics**: an array is a sequence of data of the same data-type. The fact that all data stored in an array are of the same data-type is important because it allows to allocate the same amount of memory (bits) for each item in the array. Moreover, being a _sequence,_ translates into the fact that consecutive items are stored in consecutive portions of memory, which are thus easily to be indexed and therefore quicker to be accessed. 


We have already seen a great example of sequence-like data-structure in basic Python: the `list`. In particular Lists feature the following key facts:

**a**: _Lists are sequences._ Therefore, consecutive elements of the lists can be allocated in consecutive slots of memory.

**b**: _Lists can store simultaneously data of heterogeneous data-type._ Therefore, it's not known _a priori_ whether we can reserve the same amount of memory to each element of the list.

**c**: _Lists are mutable (e.g. think to `.append()` method)._ Therefore, the totale amount of memory to be reserved for the allocation of the whole list is not known _a priori_ or, at most, is not fixed.

Points **b** and **c**, though they make lists very flexible, they also represent bottlenecks in terms of memory usage and performance. Lists are somehow too _general_ to be excell excel in performance too. 

There is the need of a more _specialized_ data-structure, sharing with lists the sequentiality of data, but compromising on some flexibility in the name of performance. That's why we have [NumPy](https://docs.scipy.org/doc/numpy/user/quickstart.html#quickstart-tutorial) and its data-structure `numpy.ndarray` has been created.

Key-facts of Numpy's arrays:

**a**: arrays extend the sequentiality of lists, introducing a built-in notion of dimensions (called _axes_ );

**b**: array's length ( _size_ ) is constrained to be immutable;

**c**: array's items are constrained to have the same data-type;

The built-in notion of dimensions allows to easily map the mathematical concepts of vectors, matrices and N-dimensional tensors into 1-dim, 2-dim and N-dim Numpy's arrays, respectively. Moreover, the constraints on array size ( **b** ) and same data-type **c** allow several speed improvements and _vectorization_ of code. That is, those allow to have fast(er) memory access and to write functions that work on all the elements of an array "at once".  

These key-facts translates into the following meta-informations that can be accessed as [attributes of any array](https://docs.scipy.org/doc/numpy/user/quickstart.html#the-basics):

Attribute | Meaning | Constraints (if any)
:---: | :---: | :---:
`.ndim`  | The number of axes (dimensions) of an array: 1 for a vector, 2 for a matrix.... | -  |
`.shape` | The dimensions of the array: a Tuple `(n,m)` for a matrix of `n` rows and `m` cols| -  |
`.size` | The number of elements of the array: `n` $\times$ `m` for a matrix of shape `(n,m)` | fixed (*)  |
`.dtype` | The data-type of array's elements | fixed for all elements  |

We'll use these attributes to explore arrays that we'll introduce.

(*) the `.resize()` method allows to actually re-size an array, but creating a new array. See section [2.4.3. Changing the size: `.resize()`](#reshape).


The function `type()` returns `numpy.ndarray` for NumPy's arrays. 

**TODO**

The following sections are organized as follows: 
- In Sec. [1](#tuple) Tuples (`tuple`) are introduced as the Python data-structure for _ordered_ sequence-like objects that _cannot be_ modified once defined. 
- In Sec. [2](#list) Lists (`list`) are introduced as the Python data-structure for _ordered_ sequence-like objects that _can be_ modified once defined. In this context `for` loops are introduced in Sec. [2.7](#for).
- In Sec. [3](#dict) Dicts (`dict`) are introduced as the Python data-structure for _not ordered_ collection-like objects that _can be_ modified once defined and that implement a _key-to-value_ map.
- In Sec. [4](#set) Sets (`set`) are introduced as the Python data-structure for _not ordered_ collection-like objects that _can be_ modified once defined and that contain unique elements (that is, every elements appears only once). 

As preliminary import we import `numpy` modulus and give to it the `np` alias

In [1]:
import numpy as np

# 1. 1-dim arrays <a name="1_dim"></a>

## 1.1. Array Creation <a name="creation"></a>

### 1.1.1. From Lists or Tuples  <a name="from_lists"></a>

### 1.1.2. From sequences of numbers: `np.arange()`, `np.linspace()`  <a name="arange_linspace"></a>

### 1.1.3. Using placeholder content: `np.zeros()`, `np.ones()`, `np.empty()`  <a name="arange_linspace"></a>

## 1.2. Indexing, Slicing and Iterating <a name="indexing_slicing_iterating"></a>

parlare del ruolo del `:` (colon)

## 1.3. Basic operations (are _element-wise_ ) <a name="indexing_slicing_iterating"></a>

- array + scalar
- array + array

### 1.3.1. _Focus on:_ `+` and `*` operators on lists <a name="elementwise"></a>

parlare anche di `+=` e `*=`

### 1.3.2. Built-in methods: `.min()`, `.max()`, `.sum()` and more  <a name="built_in_methods"></a>

### 1.3.3. Universal functions  <a name="univ_func"></a>

# 2. N-dim arrays <a name="N_dim"></a>

## 2.1. Array Creation <a name="creation_ndim"></a>

### 2.1.1. From Lists or Tuples  <a name="from_lists_ndim"></a>

### 2.1.2. _Focus on:_ printing arrays  <a name="from_lists_ndim"></a>

### 2.1.3. Using placeholder content: `np.zeros()`, `np.ones()`, `np.empty()` with `shape` parameter <a name="arange_linspace"></a>

## 2.2. Indexing, Slicing and Iterating <a name="indexing_slicing_iterating"></a>

## 2.3. Basic operations (are _element-wise_ ) <a name="indexing_slicing_iterating"></a>

- array + scalar
- array + array
- broadcasting

### 2.3.1. _Focus on:_ Matrix Operations <a name="matrix_operations"></a>

### 2.3.2. Built-in methods: `.min()`, `.max()`, `.sum()` and more with `axis` parameter  <a name="built_in_methods"></a>

### 2.3.3. Universal functions  <a name="univ_func"></a>

## 2.4. Shape Manipulation: <a name="shape_manipulation"></a>

### 2.4.1. Changing the shape: `.reshape()` <a name="reshape"></a>

### 2.4.2. _Focus on:_ Matrix Transpose `.T` <a name="matrix_operations"></a>

### 2.4.3. Changing the size: `.resize()` <a name="reshape"></a>

### 2.4.4. From N-dim to 1-dim: `.flatten()` <a name="flatten"></a>

## 2.5. Stacking Arrays together: `.hastack()` and `.vstack()` <a name="flatten"></a>

# 3. Lists Vs Arrays <a name="list_vs_arrays"></a>

## 3.1. Speed-comparison: Lists 0 - 1 Arrays <a name="flatten"></a>

## 3.2. _Vectorization_ of code: Lists 0 - 2 Arrays <a name="vectorization"></a>