<img style="float: right; width: 150px;" src="firrm.jpg">

## <span style="color:#4375c7">Quantitative Finance</span>
***
*Course materials are for educational purposes only. Nothing contained herein should be considered investment advice or an opinion regarding the suitability of any security. For more information about this course, please contact us.*
***

## Introduction to the Scientific Toolbox of Python: NumPy

**NumPy** (NumericalPython) is the fundamental package for scientific computing with Python. Since element-wise computation using lists does not work (or is very tedious and highly inefficient), NumPy provides a more elegant solution. Its alternative to a Python list is called a NumPy array. Unlike Python lists, calculations can now be performed over entire arrays and are **element-wise**. It has a powerful n-dimensional array object called `ndarray` and several routines to manipulate it. It also has other derived objects such as masked arrays and matrices, which we will examine in more detail below. More information about the library can be found in the **[NumPy documentation](https://www.numpy.org/doc/)**.

Note that NumPy is used by other libraries like SciPy, matplotlib, Scikit-lean (a machine learning tool) and Pandas to store multi-dimensional data.

### Session contents:
1. **[Ndarrays, Indexing and Slicing](#ndarrays)**
2. **[Ndarrays Properties](#properties)**
3. **[NumPy Constants](#constants)** 
4. **[More on the Creation of Arrays and Matrices](#arrays)** 
5. **[Statistical Functions](#stats)** 
***

## 1. Ndarrays, Indexing and Slicing  <a name="ndarrays"></a> 

Let's import the NumPy library and create a simple one-dimensional ndarray.

In [2]:
!pip install -r https://raw.githubusercontent.com/firrm/DAI/main/requirements.txt #ensure that the required packages are installed
# Import the library

import numpy as np

# Creating a simple one-dimensional array for returns
# we define the elements of the array to be 16 bit integers
fcf = np.array([50, 100, 80], np.int16) 
print(fcf)
print(type(fcf))


[ 50 100  80]
<class 'numpy.ndarray'>


Before we dive deeper into ndarrays, let us emphasize the differences between Python lists and NumPy arrays.
Note that NumPy arrays can only contain one data type!

In [4]:
python_list = [0.05, 0.1, -0.08]

# Summing up
python_list + python_list

[0.05, 0.1, -0.08, 0.05, 0.1, -0.08]

In [6]:
# Performing the same summation using the NumPy array created above
fcf + fcf

array([100, 200, 160], dtype=int16)

As we can see, the summation of Python lists simply adds the two lists together. In contrast, using NumPy arrays, we can **compute the sum of the elements element-wise**.

Extending the example to a **2-dimensional array**:

In [7]:
# Creating a 2-dimensional array
fcf = np.array([[50, 100, 80], [30, 45, 20]], np.int16)
print(fcf)

[[ 50 100  80]
 [ 30  45  20]]


This method of creating arrays can be extended to n dimensions.

Now, let us print the first element of the first row, the second element of the first row, and the third element of the second row:

In [8]:
print(fcf[0, 0]); print(fcf[0, 1]); print(fcf[1, 2])

50
100
20


We have learned the basics of ndarrays and indexing. Now we come to **slicing** this data structure. Imagine slicing a piece of paper (2-dimensional structure): you can either slice it horizontally or vertically around 2 axes. That's what we want to do in our example above.

In [9]:
# Slicing the first column of fcf
print(fcf[:, 0])

# Slicing the second row of fcf
print(fcf[1, :])

[50 30]
[30 45 20]


Now, we move to the **3-dimensional array**:

In [10]:
# Creating a 3-dimensional array
fcf = np.array([[[50, 100, 80], [30, 45, 20]], 
                [[10, 30, 20], [15, 25, 35]]], np.int16)
print(fcf)

[[[ 50 100  80]
  [ 30  45  20]]

 [[ 10  30  20]
  [ 15  25  35]]]


In [11]:
# Printing the third column of the first row of the first 2-dimensional array
print(fcf[0, 0, 2]) # The syntax is: x[array, row, column]

# Printing the second column of the second row of the second 2-dimensional array
print(fcf[1, 1, 1])

# Slicing 3-dimensional arrays:
# Slicing the second column for all rows of each 2-dimensional array
print(fcf[:, :, 1])

# Slicing the third column and second row of each 2-dimensional array
print(fcf[:, 1, 2])

80
25
[[100  45]
 [ 30  25]]
[20 35]


Where is the confirmation of the fact, that this is actually a 3-dimensional array?

## 2. Ndarrays Properties  <a name="properties"></a> 

To confirm the question of dimensionality, let us once again look at the following example. 

In [12]:
# Creating a 3-dimensional array 
fcf = np.array([[[50, 100, 80], [30, 45, 20]], 
                [[10, 30, 20], [15, 25, 35]]], np.int16)
print(fcf)

[[[ 50 100  80]
  [ 30  45  20]]

 [[ 10  30  20]
  [ 15  25  35]]]


The confirmation of dimensionality comes from the fact that we can check the properties of this array. All multidimensional arrays will have the following properties:

In [10]:
# Checking for the shape of the array
print(fcf.shape) # shape gives us more information about what the data structure looks like

# Checking for the number of dimensions of the array
print(fcf.ndim)

# Checking for the data type of all the individual elements
print(fcf.dtype)

# Checking for the size (number of elements in the array)
print(fcf.size)

# Calculate the size in terms of bytes
print(fcf.nbytes)

# Transpose a given multidimensional matrix
print(fcf.T)


(2, 2, 3)
3
int16
12
24
[[[ 50  10]
  [ 30  15]]

 [[100  30]
  [ 45  25]]

 [[ 80  20]
  [ 20  35]]]


## 3. Ndarrays Constants  <a name="constants"></a> 

In this section, we will take a brief look at important constants included in the NumPy library.

In [11]:
print(np.inf) # inf: infinity
print(np.NAN) # NAN: not a number
print(np.NINF) # NINF: negative infinity
print(np.NZERO) # NZERO: negative zero
print(np.PZERO) # PZERO: positive zero

print(np.pi) # constant pi

inf
nan
-inf
-0.0
0.0
3.141592653589793


## 4. More on the Creation of Arrays and Matrices  <a name="arrays"></a> 

We already know how to create simple arrays. For econometric calculations, we however often need the identity matrix. We can do this by using the `eye` command.

In [17]:
# Let's start by creating an empty array filled with random values
r_matrix = np.empty([3, 3]) # only allow integer values up to 256
print(r_matrix)

[[3.45126646e-31 6.90253292e-31 1.72563323e-31]
 [6.90253292e-31 1.50130091e-30 4.65920972e-31]
 [1.72563323e-31 4.65920972e-31 2.67473151e-31]]


In [19]:
# Creating a 3x3 identity matrix
id_matrix = np.eye(3, dtype = np.uint8)
print(id_matrix)

[[1 0 0]
 [0 1 0]
 [0 0 1]]


The position of the main diagonal can be shifted upwards (downwards) by adding the position of a diagonal parameter `k>=0` (`k<=0`). For the above example:

In [20]:
# Shifting the position of the main diagonal upwards
id_matrix = np.eye(3, dtype = np.uint8, k=1)
print(id_matrix)

[[0 1 0]
 [0 0 1]
 [0 0 0]]


Another way of creating an identity matrix is to use the command `np.identity`, which is already built in the NumPy library.

In [21]:
id_matrix = np.identity(5, dtype = np.uint8)
print(id_matrix)

[[1 0 0 0 0]
 [0 1 0 0 0]
 [0 0 1 0 0]
 [0 0 0 1 0]
 [0 0 0 0 1]]


Furthermore, we can also create a matrix of ones or zeroes, or even other values.

In [22]:
ones = np.ones((2, 3, 3), dtype = np.int16)
print(ones)

[[[1 1 1]
  [1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]
  [1 1 1]]]


In [23]:
zeros = np.zeros((2, 3, 3), dtype = np.int16)
print(zeros)

[[[0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]]]


In [24]:
fill_matrix = np.full((3, 3, 3),  dtype = np.int16, fill_value = 3)
print(fill_matrix)

[[[3 3 3]
  [3 3 3]
  [3 3 3]]

 [[3 3 3]
  [3 3 3]
  [3 3 3]]

 [[3 3 3]
  [3 3 3]
  [3 3 3]]]


Suppose we want to create upper or lower triangular matrices. We can do this by:

In [19]:
ones = np.ones((6,6), dtype = np.uint8)
upper = np.triu(ones, k = 0) #k denotes the location of the diagonal
lower = np.tril(ones, k=0)
print(upper)
print(lower)

[[1 1 1 1 1 1]
 [0 1 1 1 1 1]
 [0 0 1 1 1 1]
 [0 0 0 1 1 1]
 [0 0 0 0 1 1]
 [0 0 0 0 0 1]]
[[1 0 0 0 0 0]
 [1 1 0 0 0 0]
 [1 1 1 0 0 0]
 [1 1 1 1 0 0]
 [1 1 1 1 1 0]
 [1 1 1 1 1 1]]


Just as in the Pandas library, we can perform random sampling in NumPy as follows:

In [20]:
random_returns = np.random.randint(low = 0, high =10, size = 10)
print(random_returns)

# Creating a 2-dimensional random array
random_returns_2d = np.random.rand(3,3)
print(random_returns_2d)

[0 9 0 6 9 0 2 4 3 0]
[[0.18302332 0.70037983 0.60880259]
 [0.99738317 0.6655288  0.07639514]
 [0.30654183 0.76082136 0.291935  ]]


The output differs every time we run the code.

## 5. Statistical Functions <a name="stats"></a> 

In this section, we will briefly introduce some of Numpy's useful statistical functions. Take the following return series as a starting point:

In [25]:
r1 = np.array([0.05, 0.01, 0.1, 0.12, 0.06, 0.04, 0.05])

We can apply the following statistical functions (note that this list is not extensive):
- calculate the median `np.median()`
- calculate the average `np.average()`
- calculate the mean `np.mean()`
- calculate the standard deviation `np.std()`
- calculate the variance `np.var()`

Applying the above functions to our return series yields:

In [26]:
print(np.median(r1))
print(np.average(r1))
print(np.mean(r1))
print(np.std(r1))
print(np.var(r1))

0.05
0.06142857142857143
0.06142857142857143
0.03440455593940656
0.0011836734693877553


In statistics and econometrics, it is often necessary to draw a histogram of the underlying distribution. We do this by:

In [27]:
print(np.histogram(r1))

(array([1, 0, 1, 2, 1, 0, 0, 0, 1, 1]), array([0.01 , 0.021, 0.032, 0.043, 0.054, 0.065, 0.076, 0.087, 0.098,
       0.109, 0.12 ]))


The first array of the output shows the frequency values, and the second array shows the frequency ranges (or bins - the ranges of all the data). In the next session, we will look at how to use this information to create plots. 

To practice the basics of Python, please take the datacamp course "Introduction to Python".
For a quick overview of the NumPy package, ckeck: https://datacamp-community-prod.s3.amazonaws.com/e9f83f72-a81b-42c7-af44-4e35b48b20b7