#### **Author**: `marimuthu`(mario) [[@kmario23](https://github.com/kmario23)]

# **An introduction to NumPy**

NumPy (*Numerical Python*) is a Python library for scientific computing, that provide high-performance vector, matrix, and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), the performance is very good.

It offers `ndarray` data structure for storing and `ufuncs` for efficiently processing the (homogeneous) data. Some of the important functionalities include: `basic slicing`, `advanced or fancy indexing`, `broadcasting`, etc.

#### **How are NumPy arrays different from Python lists?**

 - Python lists are very general. They can contain any kind of object. They are dynamically typed. 
 - They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
 - Numpy arrays are statically typed and homogeneous. The type of the elements is determined when the array is created.
 - Numpy arrays are memory efficient.
 - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

In [1]:
# mandatory imports

import numpy as np
import matplotlib.pyplot as plt

In [2]:
# check version
np.__version__

'1.15.2'

#### **Getting help**

In [23]:
# read about signature and docstring 
np.ndarray?

[0;31mInit signature:[0m [0mnp[0m[0;34m.[0m[0mndarray[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
ndarray(shape, dtype=float, buffer=None, offset=0,
        strides=None, order=None)

An array object represents a multidimensional, homogeneous array
of fixed-size items.  An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)

Arrays should be constructed using `array`, `zeros` or `empty` (refer
to the See Also section below).  The parameters given here refer to
a low-level method (`ndarray(...)`) for instantiating an array.

For more information, refer to the `numpy` module and examine the
methods and attributes of an array.

Parameters
----------
(for the __new__ method; see Notes below)

-------

### **Creating N-dimensional NumPy arrays**
There are a number of ways to initialize new numpy arrays, for example from

 - a Python list or tuples
 - using functions that are dedicated to generating numpy arrays, such as *`numpy.arange`*, *`numpy.linspace`*, etc.
 - reading data from files


#### **From lists**
  To create new vector and matrix arrays using Python lists we can use the `numpy.array()` function.

In [10]:
# a vector: the argument to the array function is a Python list
# more generally, 1D array
lst = [1,2,3,4]
v = np.array(lst)

v

array([1, 2, 3, 4])

In [12]:
# get its datatype
v.dtype

dtype('int64')

In [15]:
# a matrix: the argument to the array function is a nested Python list (can also be a tuple of tuples)
# more generally, a 2D array
list_of_lists = [[1, 2], [3, 4]]
M = np.array(list_of_lists)

M

array([[1, 2],
       [3, 4]])

In [16]:
# a row vector

row_vec = v[np.newaxis, :]  # v[None, :]
row_vec.shape

(1, 4)

In [17]:
# a column vector
col_vec = v[:, np.newaxis]  # v[:, None]
col_vec.shape

# read more about newaxis here: https://stackoverflow.com/questions/29241056/how-does-numpy-newaxis-work-and-when-to-use-it

(4, 1)

#### **Construction using intrinsic array generating functions**

NumPy provides many functions for generating arrays. Some of them are:

- numpy.arange()
- numpy.linspace()
- numpy.logspace()

In [38]:
# when using linspace, both end points ARE included
np.linspace(0, 10, 25)

array([ 0.        ,  0.41666667,  0.83333333,  1.25      ,  1.66666667,
        2.08333333,  2.5       ,  2.91666667,  3.33333333,  3.75      ,
        4.16666667,  4.58333333,  5.        ,  5.41666667,  5.83333333,
        6.25      ,  6.66666667,  7.08333333,  7.5       ,  7.91666667,
        8.33333333,  8.75      ,  9.16666667,  9.58333333, 10.        ])

In [41]:
np.logspace(0, 5, 10, base=np.e)

array([  1.        ,   1.742909  ,   3.03773178,   5.29449005,
         9.22781435,  16.08324067,  28.03162489,  48.85657127,
        85.15255772, 148.4131591 ])

In [37]:
# a 3D array
# a random array where the values come from a standard Normal distribution
gaussian = np.random.randn(2 * 3 * 4)

# reshape the array to desired shape.
# only the number of dimensions can be altered 
# the number of elements CANNOT be changed during a reshape operation

gaussian = gaussian.reshape(2, 3, 4)
gaussian.shape

(2, 3, 4)

In [19]:
# an array full of zero values
# one can also specify a desired datatype

zero_arr = np.zeros((3, 4), dtype=np.uint8)
zero_arr

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=uint8)

In [21]:
# an array full of ones
# one can also specify datatype

ones_arr = np.ones((3, 4), dtype=np.float32)
ones_arr

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], dtype=float32)

In [24]:
# a 4x4 identity (matrix) array

iden = np.identity(4, dtype=np.float128)  # np.eye(4, dtype=np.float128)
iden

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]], dtype=float128)

In [25]:
# a diagonal array

diag = np.diag([1, 2, 3, 4.0])
diag.dtype

dtype('float64')

In [26]:
# get the list of all supported data types
np.sctypes

{'int': [numpy.int8, numpy.int16, numpy.int32, numpy.int64],
 'uint': [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64],
 'float': [numpy.float16, numpy.float32, numpy.float64, numpy.float128],
 'complex': [numpy.complex64, numpy.complex128, numpy.complex256],
 'others': [bool, object, bytes, str, numpy.void]}

#### **A note on datatypes**

If no datatype is specified during array construction using `np.array()`, NumPy assigns a default `dtype`. This is dependent on the OS (32 or 64 bit) and the elements of the array. 

- On a 32-bit system, `np.int32` would be assigned if all the values of the array are integers. If at least one value is float, then `np.float32` would be assigned (i.e., integers are up-cast to floating point). 
- Analogously, on a 64-bit machine, `np.int64` would be assigned if all the values of the array are integers. If at least one value is float, then `np.float64` would be assigned.

---------

## **NumPy Array Attributes**

- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

Each array has attributes such as: 
 - `ndim` (the number of dimensions)
 - ``shape`` (the size of each dimension)
 - ``size`` (the total number of elements in the array)
 - ``nbytes`` (lists the total memory consumed by the array (in bytes))

In [34]:
# a 3D random array where the values come from a standard Normal distribution
gaussian = np.random.randn(2 * 3 * 4).reshape((2, 3, 4))

In [36]:
# get number of dimensions of the array
gaussian.ndim
print("total dimensions of the array is: ", gaussian.ndim)

# get the shape of the array
gaussian.shape
print("the shape of the array is: ", gaussian.shape)

# get the total number of elements in the array
gaussian.size
print("total number of items is: ", gaussian.size)

# get memory consumed by each item in the array
gaussian.itemsize
print("memory consumed by each item is: ", gaussian.itemsize)

# get memory consumed by the array
gaussian.nbytes
print("total memory consumed by the whole array is: ", gaussian.nbytes)

total dimensions of the array is:  3
the shape of the array is:  (2, 3, 4)
total number of items is:  24
memory consumed by each item is:  8
total memory consumed by the whole array is:  192


## **Array Indexing**

 - We can index elements in an array using square brackets and indices. For 1D arrays, indexing works the same as with Python list.

In [50]:
# 1D array of random integers
# get 10 integers from 0 to 23

num_samples = 10
integers = np.random.randint(23, size=num_samples)
integers

array([15, 20, 17,  8, 15, 13, 10,  7, 17,  7])

In [55]:
# indexing 1D array needs only one index
# get 3rd element (remember: NumPy unlike MATLAB is 0 based indexing)
integers[2]

99

In [66]:
twoD_arr = np.arange(1, 10).reshape(3, -1)
twoD_arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [68]:
# indexing 2D array needs only two indices.
# then it returns a scalar value

# value at last row and last column
twoD_arr[-1, -1]

9

In [80]:
# however, if we use only one (valid) index then it returns a 1D array

# get all elments in the last row
twoD_arr[-1]  # or twoD_arr[-1, ] or twoD_arr[-1, :]

array([7, 8, 9])

In [75]:
# remember `gaussian` is a 3D array. 
gaussian

array([[[-0.71197255,  0.06679773, -1.25741239,  0.47844702],
        [-1.23454505, -0.44886724,  1.48619942,  1.42442859],
        [ 0.52162646, -0.15109659,  0.67476457, -0.06875083]],

       [[-0.87707042, -0.07463458,  0.62705308,  1.26295083],
        [ 0.35081138, -1.61352544, -4.72934193, -1.35457184],
        [ 0.95712844, -0.99410615, -0.78774352, -0.65398486]]])

In [76]:
# So, a 2D array is returned when using one index

# return last slice
gaussian[-1]

array([[-0.87707042, -0.07463458,  0.62705308,  1.26295083],
       [ 0.35081138, -1.61352544, -4.72934193, -1.35457184],
       [ 0.95712844, -0.99410615, -0.78774352, -0.65398486]])

In [77]:
# a 1D array is returned when using a pair of indices

# return first row from last slice
gaussian[-1, 0]

array([-0.87707042, -0.07463458,  0.62705308,  1.26295083])

In [78]:
# return last row from last slice
gaussian[-1, -1]

array([ 0.95712844, -0.99410615, -0.78774352, -0.65398486])

In [79]:
# return last element of row of last slice
idx = (-1, -1, -1)
gaussian[idx]

-0.6539848636123415

We can also assign new values to elements in an array using indexing:

In [81]:
# updating the array by assigning values
# truncation will happen if there's a datatype mismatch
integers[2] = 99.21
integers

array([15, 20, 99,  8, 15, 13, 10,  7, 17,  7])

#### **Index slicing**

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array.  
Negative indices counts from the end of the array (positive index from the begining):

In [82]:
# slice a portion of the array
# similar to Python iterator slicing
# x[start:stop:step]

# get last 5 elements
integers[-5:]

# if `stop` is omitted then it'll be sliced till the end of the array
# by default, step is 1

array([13, 10,  7, 17,  7])

In [83]:
# get alternative elements (every other element) from the array
# equivalently step = 2

integers[::2]

array([15, 99, 15, 10, 17])

In [64]:
# reversing the array
integers[::-1]

array([ 6, 11, 11, 15,  2,  3, 19, 99, 10,  1])

In [67]:
# forward traversal of array
integers[3::]

array([19,  3,  2, 15, 11, 11,  6])

In [69]:
# reverse travesal of array (starting from 4th element)
integers[3::-1]

array([19, 99, 10,  1])

Array slices are mutable: if they are assigned a new value the original array from which the slice was extracted is modified:

In [85]:
# assign new values to the last two elements
integers[-2:] = [-23, -46]

integers

array([ 15,  20,  99,   8,  15,  13,  10,   7, -23, -46])

## **nD arrays (a.k.a tensors)**

In [86]:
# a 2D array
twenty = (np.arange(4 * 5)).reshape(4, 5)
twenty

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [87]:
# slice first 2 rows and 3 columns
twenty[:2, :3]

array([[0, 1, 2],
       [5, 6, 7]])

In [88]:
# slice and get only the corner elements
# three "jumps" along dimension 0
# four "jumps" along dimension 1
twenty[::3, ::4]

array([[ 0,  4],
       [15, 19]])

In [89]:
# reversing the order of elements along columns (i.e. along dimension 0)
twenty[::-1, ...]

array([[15, 16, 17, 18, 19],
       [10, 11, 12, 13, 14],
       [ 5,  6,  7,  8,  9],
       [ 0,  1,  2,  3,  4]])

In [90]:
# reversing the order of elements along rows (i.e. along dimension 1)
twenty[..., ::-1]

array([[ 4,  3,  2,  1,  0],
       [ 9,  8,  7,  6,  5],
       [14, 13, 12, 11, 10],
       [19, 18, 17, 16, 15]])

In [91]:
# reversing the rows and columns (i.e. along both dimensions)
twenty[::-1, ::-1]

array([[19, 18, 17, 16, 15],
       [14, 13, 12, 11, 10],
       [ 9,  8,  7,  6,  5],
       [ 4,  3,  2,  1,  0]])

In [92]:
# or more intuitively
np.flip(twenty, axis=(0, 1))

# or equivalently
np.flipud(np.fliplr(twenty))
np.fliplr(np.flipud(twenty))

array([[19, 18, 17, 16, 15],
       [14, 13, 12, 11, 10],
       [ 9,  8,  7,  6,  5],
       [ 4,  3,  2,  1,  0]])

#### **Fancy indexing**
 - Fancy indexing is the name for when an array or a list is used in-place of an index:

In [95]:
# a 2D array
twenty

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [97]:
# get 2nd, 3rd, and 4th rows
row_indices = [1, 2, 3]
twenty[row_indices]

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [99]:
col_indices = [1, 2, -1] # remember, index -1 means the last element
twenty[row_indices, col_indices]

array([ 6, 12, 19])

We can also use index masks:
 - If the index mask is a NumPy array of data type *bool*, then an element is selected (*True*) or not (*False*) depending on the value of the index mask at the position of each element

In [100]:
# 1D array
integers

array([ 15,  20,  99,   8,  15,  13,  10,   7, -23, -46])

In [108]:
# mask has to be of the same shape as the array to be indexed; else IndexError would be thrown
# mask for indexing alternate elements in the array
row_mask = np.array([True, False, True, False, True, False, True, False, True, False])

integers[row_mask]

array([ 15,  99,  15,  10, -23])

In [109]:
# alternatively
row_mask = np.array([1, 0, 1, 0, 1, 0, 1, 0, 1, 0], dtype=np.bool)

integers[row_mask]

array([ 15,  99,  15,  10, -23])

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [110]:
range_arr = np.arange(0, 10, 0.5)
range_arr

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [113]:
mask = (range_arr > 5) * (range_arr < 7.5)
mask

array([False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True, False, False, False,
       False, False])

In [114]:
range_arr[mask]

array([5.5, 6. , 6.5, 7. ])

In [119]:
# or equivalently

mask = (5 < range_arr) & (range_arr < 7.5)
range_arr[mask]

array([5.5, 6. , 6.5, 7. ])

## **view** vs **copy**

As the name suggests, it is simply another way of **viewing** the data of the array. Technically, that means that the data of both objects is _shared_. You can create *views* by selecting a slice of the original array, or also by changing the dtype (or a combination of both). These different kinds of views are described below:

- **Slice views**
  - This is probably the most common source of view creations in NumPy. The rule of thumb for creating a slice view is that the viewed elements can be addressed with offsets, strides, and counts in the original array. For example:

In [122]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [123]:
# create a slice view
s1 = a[1::3]
s1

array([1, 4, 7])

In the above code snippet, `s1` is a *view* of `a`. If we update elements of `a`, then the changes are reflected in `s1`.

In [127]:
a[7] = 77
s1

array([ 1,  4, 77])

- **Dtype views**
  - Another way to create array views is by assigning another dtype to the same data area. For example:

In [147]:
b = np.arange(10, dtype='int16')
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)

In [148]:
b32 = b.view(np.int32)
b32 += 1

# check array b and see the changes reflected
b

array([1, 1, 3, 3, 5, 5, 7, 7, 9, 9], dtype=int16)

In [149]:
b8 = b.view(np.int8)
b8

array([1, 0, 1, 0, 3, 0, 3, 0, 5, 0, 5, 0, 7, 0, 7, 0, 9, 0, 9, 0],
      dtype=int8)

#### **Note**: `dtype` views are not as useful as slice views, but can come in handy in some cases (for example, for quickly looking at the bytes of a generic array).

 - Fancy indexing returns copies not *views*.
 - Basic slicing returns *views* not copies.

## **Super useful functions**

In [101]:
# toy data
arr = np.arange(5 * 7).reshape(5, 7)
arr

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34]])

In [102]:
# randomly shuffle the array along axis 0
# NOTE: this is an in-place operation
np.random.shuffle(arr)
arr

array([[28, 29, 30, 31, 32, 33, 34],
       [14, 15, 16, 17, 18, 19, 20],
       [ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [21, 22, 23, 24, 25, 26, 27]])

In [150]:
# argmax of an array
arr = np.arange(4, 2 * 11).reshape(2, 9)


#### **Storing NumPy arrays using native file format**

In [48]:
random_arr = np.random.randn(2, 3, 4)
np.save("persist/random-array.npy", random_arr)

# The exclamation mark means that this line should be run through `bash` as though it were run on the terminal
!file persist/random-array.npy

persist/random-array.npy: data
