# NumPy Tests

This notebook is used to investigate features and properties of the ``numpy`` library. It loosly follows online references for the library, starting from easiest aspects building to more complex usage of the library.

In [1]:
import numpy as np

## Internal representation of ``ndarray``

A ``ndarray`` is internally stored as a contiguous chunk of memory of $M$ bytes, where $M$ depends on
* the size $s$ of the items stored in the ``ndarray``
* the number of items $n_i$ in each dimension $i=1,\ldots,N$

The following example creates an ``ndarray`` of $32$ items arranged in a three dimensional array of size $4\times 2 \times 4$

It is important to notice that elements in ``ndarray`` are homogeneuous in the sense that there cannot be different type of elements in the same array (*e.g.*, ``int64`` and ``float64``).

In [2]:
a = np.arange(32).reshape([4,2,4])

In [3]:
print("ndim    ", a.ndim)
print("Shape   ", a.shape)
print("Size    ", a.size)
print("Stride  ", a.strides)
print()
print("Type    ", a.dtype)
print("ItSize  ", a.itemsize)
print("Flags\n" , a.flags)

ndim     3
Shape    (4, 2, 4)
Size     32
Stride   (64, 32, 8)

Type     int64
ItSize   8
Flags
   C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


Notice that by default ``numpy`` assigned ``int64`` as type for the elements of the array.

### Strides
One of the central concept in ``numpy`` is the concept od *stride*. A stride is an indication of how data is layed out on the raw array of bytes. The stride:
* contains ne number for each of the $N$ dimension of the ``ndarray``,
* each number indicates the distance (in bytes) between the initial byte of two consecutive elements of the corresponding dimension

Let start with the "trivial" example of a single dimension array. Consider an ``ndarray`` with $n$ elements each of size $s$. Of course having just one dimension the distance between consecutive elements is the number of bytes occupied by a single element. For ``float32`` items this will be $4$

In [4]:
n = 50
v = np.ndarray([n], dtype=np.float16)
print("ndim      ", v.ndim)
print("Strides   ", v.strides)

ndim       1
Strides    (2,)


Now let's add a dimension to extend the vector into am $1\times n$ matrix (*i.e.*, a row vector).

In [5]:
w = v.reshape([1,n])
print("ndim      ", w.ndim)
print("Strides   ", w.strides)

ndim       2
Strides    (100, 2)


Even though the added dimension is $N_0 = 1$, the information about strides is given for both dimensions, in particular the stride corresponding to dimension $N_0$ is $s_0 = n_1*s_1$, where $n_1$ is the number of elements in dimension $n_1$ ($50$ in our example) and $s_1$ is the size of elements in that dimension ($s_1 = 2$ for ``float16`` in our example). However in most cases the ``ndarray`` has homogeneous elements in all dimensions meaning that $s_i=s$ for any $i$ (**Question** is it possible to have ``ndarray`` with heterogeneous items in different dimensions?)

An interesting application of strides is to obtain the transpose of a metrix $M^T$. When a matrix is transposed, the rows (dimension $0$) the columns (dimension $1$) and vice-versa. In terms of stride this simply inverts the stride between rows and columns.

In [33]:
M = np.arange(10).reshape([5,2])
print("Matrix M")
print("shape    ", M.shape)
print("strides  ", M.strides)
print("Data     ", M.data)
print()
print("Matrix M^T")
print("shape    ", M.T.shape)
print("strides  ", M.T.strides)
print("Data     ", M.T.data)

Matrix M
shape     (5, 2)
strides   (16, 8)
Data      <memory at 0x115036990>

Matrix M^T
shape     (2, 5)
strides   (8, 16)
Data      <memory at 0x115036990>


## Constructing ``ndarray``

To construct ``ndarray`` the library offers several possibilities:
* Use *constructors* of the class ``ndarray``
* Use convenience factory functions (like ``zeros``, ``ones``, ``asarray``, ...)
Factory functions are more convenient because they also initialize content of the array. This is not guaranteed if one uses the contructor

In [14]:
a = np.ndarray(shape=[2,3], dtype=np.float16)
print("ndarray array with constructor")
print(a)
print()
b = np.zeros(shape=[2,3], dtype=np.float16)
print("ndarray with factory function zeros")
print(b)

ndarray array with constructor
[[0.000e+00 0.000e+00 5.960e-08]
 [0.000e+00 4.495e-01 4.504e-01]]

ndarray with factory function zeros
[[0. 0. 0.]
 [0. 0. 0.]]


### Factory functions
Let see a list of conveniente (and commonly used) of factory functions. We have already seen ``zeros`` function that creates an ``ndarray`` with all elements initialized to zero. 

Similarly there is a ``ones`` function and the closely related ``ones_like`` that returns an ``ndarray`` filled with ones and having the same shape as the passed array

In [18]:
shape = [4,2]
a_ones = np.ones(shape=shape)
print("ones")
print(a_ones)

print("ones_like")
print(np.ones_like(a_ones.T))

ones
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
ones_like
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [21]:
print(np.full(shape=shape, fill_value=4))

[[4 4]
 [4 4]
 [4 4]
 [4 4]]


The functions available (including the ``*_like`` version) are
* ``zeros`` array of all $0$
* ``ones`` array of all $1$
* ``empty`` unitialized array
* ``full`` array filled with given value