<a href="https://colab.research.google.com/github/schoppfe/Deep-Learning-with-PyTorch-2.x/blob/main/01_Numpy_Refresher_Part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 style="font-size:30px;">Numpy Refresher (Part-1)</h1>

- Why NumPy?
- Performance Comparison
- NumPy Basics
- Mathematical Functions


<img src='https://opencv.org/wp-content/uploads/2023/05/c3_w1_NumPy_logo.jpg' width="75%" align='left'><br/>

## 1 Why do we need a special library for math and DL?
Python provides data types such as lists / tuples out of the box. Then, why are we using special libraries for deep learning tasks, such as Pytorch or TensorFlow, and not using standard types?

The major reason is efficiency - In pure python, there are no primitive types for numbers, as there are in other languages (e.g., C/C++). All the data types in Python are objects with lots of properties and methods. You can see it using the `dir` function:

In [None]:
a = 3
print(len(dir(a)))
dir(a)[-10:]

72


['as_integer_ratio',
 'bit_count',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 'real',
 'to_bytes']

### 1.1 Python Performance Issues

- Slow in tasks that require a lot of simple math operations on numbers
- Huge memory overhead due to storing plain numbers as objects
- Runtime overhead during memory dereferencing - cache issues

NumPy is an abbreviation for "numerical python" and as the naming indicates, it provides a rich collection of operations for numerical data types with a Python interface. The core data structure of NumPy is `ndarray` - a multidimensional array. Let's take a look at its interface in comparison with plain python lists.

## 2 Performance Comparison: NumPy Arrays vs Python Lists

Let's imagine a simple task - we have several 2-dimensional points and we want to represent them as a list of points for further processing. For the sake of simplicity, we will not create a `Point` object and will instead use a list where each 'point' in the list is another list that contains the coordinates of each point (`x` and `y`):

In [None]:
# Create points list using explicit specification of coordinates of each point.
points = [[0, 1], [10, 5], [7, 3]]
points

[[0, 1], [10, 5], [7, 3]]

### <font color="CornFlowerBlue">Random Integers</font>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
random.randint(a, b)
```

Return a random integer *N* such that `a <= N <= b`.

Documentation: <a href="https://docs.python.org/3/library/random.html?highlight=randint#random.randint" target=_blank>random.randint</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
# Create random points.
from random import randint

num_points = 10
x_range = (0, 10)
y_range = (0, 50)
points = [[randint(*x_range), randint(*y_range)] for _ in range(num_points)]
points

[[9, 9],
 [5, 18],
 [6, 47],
 [7, 23],
 [6, 16],
 [7, 45],
 [3, 5],
 [2, 27],
 [4, 42],
 [8, 35]]

In the example below, we demonstrate how we can convert a Python list to A NumPy `ndarray`.

In [None]:
import numpy as np
points = np.array(points)  # We are able to create numpy arrays from python lists.
points

array([[ 9,  9],
       [ 5, 18],
       [ 6, 47],
       [ 7, 23],
       [ 6, 16],
       [ 7, 45],
       [ 3,  5],
       [ 2, 27],
       [ 4, 42],
       [ 8, 35]])

### <font color="CornFlowerBlue">Numpy Random Integers</font>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />


``` pyhton
np.random.randint(low, high=None, size=None, dtype=int)
```
Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval `[low, high)`. If `high` is `None` (the default), then results are from `[0, low)`.

Documentation: <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html" target=_blank>np.random.randint</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

We can use the NumPy function `randint` to create an `ndarray` of points.

In [None]:
# Create random points using numpy library.
num_dims = 2
num_points = 10
x_range = (0, 11)
y_range = (0, 51)
points = np.random.randint(
    low=(x_range[0], y_range[0]), high=(x_range[1], y_range[1]), size=(num_points, num_dims)
)
points

array([[10, 44],
       [ 0, 30],
       [ 9, 44],
       [ 4, 25],
       [ 2, 31],
       [ 0, 34],
       [ 6, 38],
       [ 8, 34],
       [10, 12],
       [ 3, 10]])

It may look as over-complication to use NumPy for the creation of such a list and we still cannot see the good sides of this approach. But let's take a look at the performance side.

In [None]:
num_dims = 2
num_points = 100000
x_range = (0, 10)
y_range = (0, 50)

### 2.1 Python Performance

In [None]:
%timeit \
points = [[randint(*x_range), randint(*y_range)] for _ in range(num_points)]

206 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 2.2 NumPy Performance

In [None]:
%timeit \
points = np.random.randint(low=(x_range[0], y_range[0]),  \
                           high=(x_range[1], y_range[1]), \
                           size=(num_points, num_dims))

3.5 ms ± 8.85 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Wow, NumPy is **about 35 times faster** than pure Python on this task! One may say that the size of the array we're generating is relatively large, but this pales in comparion by many orders of magnitude to the number of computations required to train a neural network as well see later in this course.

## 3 NumPy Basics
In this section we will review some of the more useful operations of NumPy arrays, which are most commonly used in machine learning tasks.

### 3.1 Converting Lists to Arrays


In [None]:
py_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

np_array = np.array(py_list)
np_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [None]:
py_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

np_array= np.array(py_list)
np_array

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

### 3.2 Slicing and Indexing

In [None]:
print('First row:\t\t\t{}\n'.format(np_array[0]))
print('First row:\t\t\t{}\n'.format(np_array[0, :]))
print('First column:\t\t\t{}\n'.format(np_array[:, 0]))
print('3rd row 2nd column element:\t{}\n'.format(np_array[2, 1]))
print('2nd row onwards and 2nd column onwards :\n{}\n'.format(np_array[1:, 1:]))
print('Last 2 rows and last 2 columns:\n{}\n'.format(np_array[-2:, -2:]))
print('Array with 3rd, 1st, and 4th row:\n{}\n'.format(np_array[[2, 0, 3]]))
print('Array with 1st and 3rd col:\n{}\n'.format(np_array[:, [0, 2]]))

First row:			[1 2 3]

First row:			[1 2 3]

First column:			[ 1  4  7 10]

3rd row 2nd column element:	8

2nd row onwards and 2nd column onwards :
[[ 5  6]
 [ 8  9]
 [11 12]]

Last 2 rows and last 2 columns:
[[ 8  9]
 [11 12]]

Array with 3rd, 1st, and 4th row:
[[ 7  8  9]
 [ 1  2  3]
 [10 11 12]]

Array with 1st and 3rd col:
[[ 1  3]
 [ 4  6]
 [ 7  9]
 [10 12]]



### 3.3 Basic Attributes of NumPy Arrays

Get a full list of attributes of an ndarray object <a href="https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html" target=_blank>here</a>.

In [None]:
print('Data type:\t{}'.format(np_array.dtype))
print('Array shape:\t{}'.format(np_array.shape))

Data type:	int32
Array shape:	(4, 3)


### <font color="CornFlowerBlue">Create a convenience function for printing array information</font>

Let's create a conveneinece function (with name `array_info`) to print the NumPy array data, its shape, and its data type. We will use this function to print various arrays further below in this notebook.

In [None]:
def array_info(array):
    print('Array:\n{}'.format(array))
    print('Data type:\t{}'.format(array.dtype))
    print('Array shape:\t{}\n'.format(array.shape))

array_info(np_array)

Array:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Data type:	int32
Array shape:	(4, 3)



### 3.4 Creating NumPy Arrays Using Built-in Functions and Datatypes

The full list of supported data types can be found <a href="https://numpy.org/devdocs/user/basics.types.html" target=_blank>here</a>.


### <font color="CornFlowerBlue">Sequence Arrays: arange</font>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
np.arange([start, ]stop, [step, ]dtype=None)
```

Return evenly spaced values in `[start, stop)`.

Documentation: <a href="https://numpy.org/doc/stable/reference/generated/numpy.arange.html" target=_blank>np.arange</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
# Sequence array.
array = np.arange(10, dtype=np.int64)
array_info(array)

Array:
[0 1 2 3 4 5 6 7 8 9]
Data type:	int64
Array shape:	(10,)



In [None]:
# Sequence array.
array = np.arange(5, 10, 2, dtype=np.float32)
array_info(array)

Array:
[5. 7. 9.]
Data type:	float32
Array shape:	(3,)



### <font color="CornFlowerBlue">Sequence Arrays: linspace</font>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
```

Returns num evenly spaced samples, calculated over the interval `[start, stop]`.

Note that `linspace` allows you to specify the number of values and infers the step size, while `arange` allows you to specify the steps size and infers the number of points. `linspace` also allows you to speficiy whether or not the endpoint is included.

Documentation: <a href="https://numpy.org/doc/stable/reference/generated/numpy.linspace.html" target=_blank>np.linspace</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
# Linspace.
linespace = np.linspace(0, 5, 7, dtype=np.float32)   # 7 elements between 0 and 5
array_info(linespace)

Array:
[0.        0.8333333 1.6666666 2.5       3.3333333 4.1666665 5.       ]
Data type:	float32
Array shape:	(7,)



### <font color="CornFlowerBlue">Zeros Array</font>

In [None]:
# Zero array/matrix.
zeros = np.zeros((2, 3), dtype=np.float32)
array_info(zeros)

Array:
[[0. 0. 0.]
 [0. 0. 0.]]
Data type:	float32
Array shape:	(2, 3)



### <font color="CornFlowerBlue">Ones Array</font>

In [None]:
# Ones array/matrix.
ones = np.ones((3, 2), dtype=np.int8)
array_info(ones)

Array:
[[1 1]
 [1 1]
 [1 1]]
Data type:	int8
Array shape:	(3, 2)



### <font color="CornFlowerBlue">Constant Array</font>

In [None]:
# Constant array/matrix.
array = np.full((3, 3), 3.14)
array_info(array)

Array:
[[3.14 3.14 3.14]
 [3.14 3.14 3.14]
 [3.14 3.14 3.14]]
Data type:	float64
Array shape:	(3, 3)



### <font color="CornFlowerBlue">Identity Array</font>

In [None]:
# Identity array/matrix.
identity = np.eye(5, dtype=np.float32)  # Identity matrix of shape 5x5
array_info(identity)

Array:
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
Data type:	float32
Array shape:	(5, 5)



### <font color="CornFlowerBlue">Random Integers Array</font>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
np.random.randint(low, high=None, size=None, dtype='l')
```
Return random integer from the `discrete uniform` distribution in `[low, high)`. If high is `None`, then return elements are in `[0, low)`

Documentation: <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html" target=_blank>np.random.randint</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
# Random integers array/matrix.
rand_int = np.random.randint(5, 10, (2,3)) # Random integer array of shape 2x3, values lies in [5, 10).
array_info(rand_int)

Array:
[[6 7 5]
 [9 6 6]]
Data type:	int32
Array shape:	(2, 3)



### <font color="CornFlowerBlue">Random Array</font>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
np.random.random(size=None)
```

Return random floats in the half-open interval `[0.0, 1.0)`.

Results are from the `continuous uniform` distribution in `[0.0, 1.0)`.

Documentation: <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.random.html#numpy-random-random" target=_blank>np.random.random</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
# Random array/matrix.
random_array = np.random.random((5, 5))   # Random array of shape 5x5.
array_info(random_array)

Array:
[[0.05895785 0.97584925 0.61851749 0.91807735 0.20773511]
 [0.41814773 0.69043864 0.46579991 0.96900877 0.361576  ]
 [0.55125011 0.26091529 0.4094983  0.98888228 0.24258659]
 [0.22430972 0.80618191 0.49751464 0.86630376 0.63467053]
 [0.362403   0.49147104 0.30757364 0.00550166 0.70009507]]
Data type:	float64
Array shape:	(5, 5)



### <font color="CornFlowerBlue">Boolean Array</font>

If we compare above `random_array` with some `constant` or `array` of the same shape, we will get a boolean array.

In [None]:
# Boolean array/matrix.
bool_array = random_array > 0.5
array_info(bool_array)

Array:
[[False  True  True  True False]
 [False  True False  True False]
 [ True False False  True False]
 [False  True False  True  True]
 [False False False False  True]]
Data type:	bool
Array shape:	(5, 5)



The boolean array can be used to get value from the array. For example, If we use a boolean array of the same shape as a numerical array, we will get those values for which the boolean array is True, and other values will be masked.

Let's use the above `boolen_array` to get values from `random_array`.

In [None]:
# Use boolean array/matrix to get values from array/matrix.
values = random_array[bool_array]
array_info(values)

Array:
[0.97584925 0.61851749 0.91807735 0.69043864 0.96900877 0.55125011
 0.98888228 0.80618191 0.86630376 0.63467053 0.70009507]
Data type:	float64
Array shape:	(11,)



Basically, from the above method, we are filtering values that are greater than `0.5`.

### 3.5 Data Type Conversion

Sometimes it is necessary to convert one data type to another data type.

In [None]:
age_in_years = np.random.randint(0, 100, 10)
array_info(age_in_years)

Array:
[24 52 26 15 73 98 16 17 68 17]
Data type:	int32
Array shape:	(10,)



Do we really need an `int64` data type to store age?

So let's convert it to `uint8`.

In [None]:
age_in_years = age_in_years.astype(np.uint8)
array_info(age_in_years)

Array:
[24 52 26 15 73 98 16 17 68 17]
Data type:	uint8
Array shape:	(10,)



Let's convert it to `float64`. 😜

In [None]:
age_in_years = age_in_years.astype(np.float64)
array_info(age_in_years)

Array:
[24. 52. 26. 15. 73. 98. 16. 17. 68. 17.]
Data type:	float64
Array shape:	(10,)



## 4 Mathematical Functions

Numpy supports a lot of Mathematical operations with array/matrix. Here we will see a few of them which are useful in Deep Learning. All supported functions can be found <a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html" target=_blank>here</a>.

### 4.1. Exponential Function

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

Exponential functions ( also called `exp` ) are used in neural networks as activation functions. They are used in softmax functions which are widely used in Classification tasks.


``` python
np.exp(x)
```

Calculate the exponential of all elements in the input array.

Return element-wise `exponential` of `array`.

Documentation: <a href="https://numpy.org/doc/stable/reference/generated/numpy.exp.html" target=_blank>np.exp</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
array = np.array([np.full(3, -1), np.zeros(3), np.ones(3)])
array_info(array)

# Exponential of a array/matrix
print('Exponential of an array:')
exp_array = np.exp(array)
array_info(exp_array)

Array:
[[-1. -1. -1.]
 [ 0.  0.  0.]
 [ 1.  1.  1.]]
Data type:	float64
Array shape:	(3, 3)

Exponential of an array:
Array:
[[0.36787944 0.36787944 0.36787944]
 [1.         1.         1.        ]
 [2.71828183 2.71828183 2.71828183]]
Data type:	float64
Array shape:	(3, 3)



### 4.2 Square Root

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

Root Mean Square Error (RMSE) is commonly used to measure the accuracy of continuous variables. We will use the `sqrt` function to compute such quantities later in the course.


``` python
np.sqrt(x)
```

Return the non-negative square root of an array, element-wise.

Documentation: <a href="https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html" target=_blank>np.sqrt</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
array = np.arange(10)
array_info(array)

print('Square root:')
root_array = np.sqrt(array)
array_info(root_array)

Array:
[0 1 2 3 4 5 6 7 8 9]
Data type:	int32
Array shape:	(10,)

Square root:
Array:
[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712 3.        ]
Data type:	float64
Array shape:	(10,)



### 4.3 Logarithm

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

'Cross-Entropy' and 'Log Loss' are the most commonly used loss functions in Machine Learning classification problems. We will use the `log` function to compute such quantities.

``` python
np.log(x)
```

Natural logarithm, element-wise.

The natural logarithm `log` is the inverse of the exponential function, so that `log(exp(x)) = x`. The natural logarithm is logarithm in base `e`.

Documentation: <a href="https://numpy.org/doc/stable/reference/generated/numpy.log.html" target=_blank>np.log</a>



<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
array = np.array([0, np.exp(1), np.exp(1)**2, 1, 10])
array_info(array)

print('Logarithm:')
log_array = np.log(array)
array_info(log_array)

Array:
[ 0.          2.71828183  7.3890561   1.         10.        ]
Data type:	float64
Array shape:	(5,)

Logarithm:
Array:
[      -inf 1.         2.         0.         2.30258509]
Data type:	float64
Array shape:	(5,)



  log_array = np.log(array)


<font color='red'>**Note:** Warning is indicated because we are trying to calculate `log(0)`.</font>

### 4.4 Power

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
np.power(x1, x2)
```
Returns first array elements raised to powers from second array, element-wise.

Raise each base in `x1` to the positionally-corresponding power in `x2`. `x1` and `x2` must be broadcastable to the same shape. Note that an integer type raised to a negative integer power will raise a ValueError.

What is **broadcasting**? We will see later.

Documentation: <a href="https://numpy.org/doc/stable/reference/generated/numpy.power.html" target=_blank>np.power</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
array = np.arange(0, 6, dtype=np.int64)
array_info(array)

print('Power 3:')
pow_array = np.power(array, 3)
array_info(pow_array)

Array:
[0 1 2 3 4 5]
Data type:	int64
Array shape:	(6,)

Power 3:
Array:
[  0   1   8  27  64 125]
Data type:	int64
Array shape:	(6,)



### 4.5 Clip Values

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

``` python
np.clip(a, a_min, a_max)
```

Clip (limit) the values in an array. Return element-wise clipped values between `a_min` and `a_max`.

Documentation:  <a href="https://numpy.org/doc/stable/reference/generated/numpy.clip.html" target=_blank>np.clip</a>

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

In [None]:
array = np.random.random((3, 3))
array_info(array)

# Clipped between 0.2 and 0.5
print('Clipped between 0.2 and 0.5')
cliped_array = np.clip(array, 0.2, 0.5)
array_info(cliped_array)

# Clipped to 0.2
print('Clipped to 0.2')
cliped_array = np.clip(array, 0.2, np.inf)
array_info(cliped_array)

Array:
[[0.33110926 0.88180262 0.43539401]
 [0.61913961 0.32578066 0.14985472]
 [0.15753426 0.399099   0.46209346]]
Data type:	float64
Array shape:	(3, 3)

Clipped between 0.2 and 0.5
Array:
[[0.33110926 0.5        0.43539401]
 [0.5        0.32578066 0.2       ]
 [0.2        0.399099   0.46209346]]
Data type:	float64
Array shape:	(3, 3)

Clipped to 0.2
Array:
[[0.33110926 0.88180262 0.43539401]
 [0.61913961 0.32578066 0.2       ]
 [0.2        0.399099   0.46209346]]
Data type:	float64
Array shape:	(3, 3)



## 5. Additional References

* https://numpy.org/doc/stable/
* https://numpy.org/devdocs/user/quickstart.html
* https://numpy.org/numpy-tutorials/index.html

