<b> 
    <font size="7">
        Computational Finance and FinTech <br><br>
        M.Sc. International Finance
    </font>
</b>
<br><br>
<img src="pics/HWR.png" width=400px>
<br><br>
<b>
    <font size="5"> 
        Prof. Dr. Natalie Packham <br>
        Berlin School of Economics and Law <br>
        Summer Term 2025
    </font>
</b>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Numerical-and-Computational-Foundations" data-toc-modified-id="Numerical-and-Computational-Foundations-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Numerical and Computational Foundations</a></span><ul class="toc-item"><li><span><a href="#Arrays-with-Python-lists" data-toc-modified-id="Arrays-with-Python-lists-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Arrays with Python lists</a></span></li><li><span><a href="#NumPy-arrays" data-toc-modified-id="NumPy-arrays-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>NumPy arrays</a></span></li><li><span><a href="#Structured-NumPy-arrays" data-toc-modified-id="Structured-NumPy-arrays-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Structured NumPy arrays</a></span></li><li><span><a href="#Data-Analysis-with-pandas:-DataFrame" data-toc-modified-id="Data-Analysis-with-pandas:-NumFrame-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Data Analysis with pandas: DataFrame</a></span></li></ul></li></ul></div>

# Numerical and Computational Foundations

* Further reading: __Py4Fi, Chapters 4 and 5__

## Arrays with Python lists

### Introduction to Python arrays

* Before introducing more sophisticated objects for data storage, let's take a look at the built-in Python `list` object. 
* A `list` object is a one-dimensional array:

In [None]:
v = [0.5, 0.75, 1.0, 1.5, 2.0] 

* `list` objects can contain arbitrary objects. 
* In particular, a `list` can contain other `list` objects, creating two- or higher-dimensional arrays:

In [None]:
m = [v, v, v]  
m

### `list` objects

In [None]:
m[1]

In [None]:
m[1][0]

* Feel free to push this to higher dimensions...

In [None]:
v1 = [0.5, 1.5]
v2 = [1, 2]
m = [v1, v2]
c = [m, m]  
c

In [None]:
c[1][1][0]

### Reference pointers

* Important: `list`'s work with __reference pointers__. 
* Internally, when creating new objects out of existing objects, only pointers to the objects are copied, not the data!

In [None]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
m = [v, v, v]
m

In [None]:
v[0] = 'Python'
m

### Python array class

* Python also has an `array` module
* See [Documentation](https://docs.python.org/3/library/array.html)

## NumPy arrays


### NumPy arrays

* `NumPy` is a library for richer array data structures.
* The basic object is `ndarray`, which comes in two flavours:

![ndarray](pics/ndarray.png)

<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

* The `ndarray` object is more specialised than the `list` object, but comes with more functionality. 
* An array object represents a multidimensional, homogeneous array of fixed-size items. 
* Here is a useful [tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)

### Regular NumPy arrays
* Creating an array:

In [None]:
import numpy as np # import numpy
a = np.array([0, 0.5, 1, 1.5, 2]) # array(...) is the constructor for ndarray's

In [None]:
type(a)

* `ndarray` assumes objects of the same type and will modify types accordingly: 

In [None]:
b = np.array([0, 'test'])
b

In [None]:
type(b[0])

### Constructing arrays by specifying a range
* `np.arange()` creates an array spanning a range of numbers (= a sequence).
* Basic syntax: `np.arange(start, stop, steps)`
* It is possible to specify the data type (e.g. `float`)
* To invoke an explanation of `np.arange` (or any other object or method), type `np.arange?`

In [None]:
np.arange?

In [None]:
np.arange(0, 2.5, 0.5)

<div class="alert alert-block alert-warning">
    NOTE: The interval specification refers to a half-open interval: [start, stop).
</div>

### `ndarray` methods
* The `ndarray` object has a multitude of useful built-in methods, e.g.
    * `sum()` (the sum), 
    * `std()` (the standard deviation), 
    * `cumsum()` (the cumulative sum). 
* Type `a.` and hit `TAB` to obtain a list of the available functions.
* More documentation is found [here](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.html#numpy.ndarray). 

In [None]:
a.sum()

In [None]:
a.std()

In [None]:
a.cumsum()

### Slicing 1d-Arrays

* With one-dimensional `ndarray` objects, indexing works as usual.

In [None]:
a

In [None]:
a[1]

In [None]:
a[:2]

In [None]:
a[2:]

### Mathematical operations

* Mathematical operations are applied in a __vectorised__ way on an `ndarray` object. 
* Note that these operations work differently on `list` objects.

In [None]:
l = [0, 0.5, 1, 1.5, 2]
l

In [None]:
2 * l

* `ndarray`:

In [None]:
a = np.arange(0, 7, 1)
a

In [None]:
2 * a

### Mathematical operations (cont'd)

In [None]:
a + a 

In [None]:
a ** 2 

In [None]:
2 ** a

In [None]:
a ** a

### Universal functions in NumPy

* A number of universal functions in `NumPy` are applied element-wise to arrays: 

In [None]:
np.exp(a)

In [None]:
np.sqrt(a)

### Multiple dimensions

* All features introduced so far carry over to multiple dimensions.
* An array with two rows:

In [None]:
b = np.array([a, 2 * a])
b

* Selecting the first row, a particular element, a column:

In [None]:
b[0]

In [None]:
b[1,1]

In [None]:
b[:,1]

### Multiple dimensions

* Calculating the sum of all elements, column-wise and row-wise:

In [None]:
b.sum()

In [None]:
b.sum(axis = 0)

In [None]:
b.sum(axis = 1)

__Note:__ `axis = 0` refers to column-wise and `axis = 1` to row-wise. 

### Further methods for creating arrays

* Often, we want to create an array and populate it later.
* Here are some methods for this:

In [None]:
np.zeros((2,3), dtype = 'i') # array with two rows and three columns

In [None]:
np.ones((2,3,4), dtype = 'i') # array dimensions: 2 x 3 x 4

In [None]:
np.empty((2,3))

### Further methods for creating arrays

In [None]:
np.eye(3)

In [None]:
np.diag(np.array([1,2,3,4]))

### NumPy dtype objects

![dtype object](pics/dtype.png)

<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### Logical operations

* NumPy Arrays can be compared, just like lists.

In [None]:
first = np.array([0, 1, 2, 3, 3, 6,])
second = np.array([0, 1, 2, 3, 4, 5,])

In [None]:
first > second

In [None]:
first.sum() == second.sum()

In [None]:
np.any([a == 4])

In [None]:
np.all([a == 4])

### Reshape and resize 

* `ndarray` objects are immutable, but they can be reshaped (changes the view on the object) and resized (creates a new object):  

In [None]:
ar = np.arange(15)
ar

In [None]:
ar.reshape((3,5))

In [None]:
ar

### Reshape and resize 

In [None]:
ar.resize((5,3))

In [None]:
ar

__Note:__ `reshape()` did not change the original array. `()resize` did change the array's shape permanently.

### Reshape and resize 

* `reshape()` does not alter the total number of elements in the array. 
* `resize()` can decrease (down-size) or increase (up-size) the total number of elements. 

In [None]:
ar

In [None]:
np.resize(ar, (3,3))

### Reshape and resize 

In [None]:
np.resize(ar, (5,5))

In [None]:
a.shape # returns the array's dimensions

### Further operations

* Transpose:

In [None]:
g = np.arange(0, 6)
g.resize(2,3)
g

In [None]:
g.T

* Flattening:

In [None]:
g.flatten()

### Further operations

* Stacking: `hstack` or `vstack` can used to connect two arrays horizontally or vertically.

In [None]:
b = np.ones((2,3))

In [None]:
np.vstack((g, b))

<div class="alert alert-block alert-warning">
    NOTE: The size of the to-be connected dimensions must be equal.
</div>

## Structured NumPy arrays

### Structured NumPy arrays

* The specialisation of `ndarray` may be to narrow. 
* However, one can instantiate `ndarray` with a dedicated `dtype`.
* This allows to build database-like data sets where each row corresponds to an "entry".

### Structured NumPy arrays

* Creating a data type:

In [None]:
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'),
               ('Height', 'f'), ('Children/Pets', 'i4', 2)])  
dt  

* Equivalently:

In [None]:
dt = np.dtype({'names': ['Name', 'Age', 'Height', 'Children/Pets'],
             'formats':'O int float int,int'.split()})  

dt  

### Structured NumPy arrays

* Now create the `ndarray` with the new data type:

In [None]:
s = np.array([('Smith', 45, 1.83, (0, 1)),
              ('Jones', 53, 1.72, (2, 2))], dtype=dt)  

s  

In [None]:
type(s)  

### Structured NumPy arrays

* The columns can be accessed through their names:

In [None]:
s['Name']  

In [None]:
s['Height'].mean()  

In [None]:
s[0]  

In [None]:
s[1]['Age']  

## Data Analysis with pandas: DataFrame

### Data analysis with pandas

* `pandas` is a powerful Python library for data manipulation and analysis. Its name is derived from **pan**el **da**ta.
* We cover the following data structures: 

![Pandas datatypes](pics/pandas.png)


<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### DataFrame Class

* [`DataFrame`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.html) is a class that handles tabular data, organised in columns.
* Each row corresponds to an entry or a data record.
* It is thus similar to a table in a relational database or an Excel spreadsheet.

In [None]:
import pandas as pd

df = pd.DataFrame([10,20,30,40], # data as a list 
                 columns=['numbers'], # column label
                 index=['a', 'b', 'c', 'd']) # index values for entries

In [None]:
df

### DataFrame Class

* The `columns` can be named (but don't need to be).

* The `index` can  take different forms such as numbers or strings.

* The input data for the `DataFrame` Class can come in different types, such as `list`, `tuple`, `ndarray` and `dict` objects.  

### Simple operations

* Some simple operations applied to a `DataFrame` object:

In [None]:
df.index

In [None]:
df.columns

### Simple operations

In [None]:
df.loc['c'] # selects value corresponding to index c

In [None]:
df.loc[['a', 'd']] # selects values correponding t indices a and d

In [None]:
df.iloc[1:3] # select second and third rows

### Simple operations

In [None]:
df.sum()

* Vectorised operations as with `ndarray`:

In [None]:
df ** 2

### Extending `DataFrame` objects

In [None]:
df['floats'] = (1.5, 2.5, 3.5, 4.5) # adds a new column

In [None]:
df

In [None]:
df['floats']

### Extending `DataFrame` objects

* A `DataFrame` object can be taken to define a new column:

In [None]:
df['names'] = pd.DataFrame(['Yves', 'Sandra', 'Lilli', 'Henry'], 
                          index = ['d', 'a', 'b', 'c'])

In [None]:
df

### Extending `DataFrame` objects

* Appending data:

In [None]:
df = df.append(pd.DataFrame({'numbers': 100, 'floats': 5.75, 'names': 'Jill'},
                           index = ['y',]))

In [None]:
df

### Extending `DataFrame` objects

* Be careful when appending without providing an index -- the index gets replaced by a simple range index:

In [None]:
df.append({'numbers': 100, 'floats': 5.75, 'names': 'Jill'}, ignore_index=True)

### Extending `DataFrame` objects

* Appending with missing data:

In [None]:
df = df.append(pd.DataFrame({'names': 'Liz'},
                           index = ['z']),
                           sort = False)

In [None]:
df

### Mathematical operations on Data Frames

* A lot of mathematical methods are implemented for `DataFrame` objects:

In [None]:
df[['numbers', 'floats']].sum()

In [None]:
df['numbers'].var()

In [None]:
df['numbers'].max()

### Time series with Data Frame

* In this section we show how a DataFrame can be used to manage time series data. 
* First, we create a `DataFrame` object using random numbers in an `ndarray` object.

In [None]:
import numpy as np
import pandas as pd
np.random.seed(100)
a = np.random.standard_normal((9,4))
a

In [None]:
df = pd.DataFrame(a)

__Note:__ To learn more about Python's built-in pseudo-random number generator (PRNG), see [here](https://docs.python.org/3/library/random.html). 

### Practical example using `DataFrame` class

In [None]:
df

### Practical example using `DataFrame` class

* Arguments to the `DataFrame()` function for instantiating a `DataFrame` object: 

![DataFrame object](pics/data_frame.png)

<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### Practical example using `DataFrame` class

* In the next steps, we set column names and add a time dimension for the rows. 

In [None]:
df.columns = ['No1', 'No2', 'No3', 'No4']

In [None]:
df

In [None]:
df['No3'].values.flatten()

### Practical example using `DataFrame` class

* `pandas` is especially strong at handling times series data efficiently. 
* Assume that the data rows in the `DataFrame` consist of monthtly observations starting in January 2019.
* The method `date_range()` generates a `DateTimeIndex` object that can be used as the row index. 

In [None]:
dates = pd.date_range('2019-1-1', periods = 9, freq = 'M')
dates

### Practical example using `DataFrame` class

* Parameters of the `date_range()` function:

![Date range parameters](pics/date_range.png)


<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### Practical example using `DataFrame` class

* Frequency parameter of `date_range()` function:

![Date range frequencies](pics/date_range_freq.png)
![Date range frequencies](pics/date_range_freq_2.png)

<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### Practical example using `DataFrame` class

* Now set the row index to the dates:

In [None]:
df.index = dates

df

### Practical example using `DataFrame` class

* Next, we visualise the data: 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

* More about customising the plot style: [here](https://seaborn.pydata.org/tutorial/aesthetics.html).

### Practical example using `DataFrame` class

* Plot the cumulative sum for each column of `df`:

In [None]:
df.cumsum().plot(lw = 2.0, figsize = (10,6));

### Practical example using `DataFrame` class

* A bar chart:

In [None]:
df.plot.bar(figsize = (10,6), rot = 15);

### Practical example using `DataFrame` class

* Parameters of `plot()` method:

![Parameters of plot method](pics/plot_1.png)


<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### Practical example using `DataFrame` class

* Parameters of `plot()` method:


![Plot_parameters](pics/plot_2.png)

<div align="right" style="font-size:14px">Source: Python for Finance, 2nd ed.</div>

### Practical example using `DataFrame` class

* Useful functions:

In [None]:
df.info() # provide basic information

### Practical example using `DataFrame` class


In [None]:
df.sum()

In [None]:
df.mean(axis=0) # column-wise mean

In [None]:
df.mean(axis=1) # row-wise mean

### Useful functions: `groupby()`

In [None]:
df['Quarter'] = ['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q3', 'Q3', 'Q3',]

In [None]:
df

### Useful functions: `groupby()`

In [None]:
groups = df.groupby('Quarter')

In [None]:
groups.mean()

In [None]:
groups.max()

### Useful functions: `groupby()`

In [None]:
groups.aggregate([min, max]).round(3)

### Selecting and filtering data

* Logical operators can be used to filter data. 
* First, construct a `DataFrame` filled with random numbers to work with. 

In [None]:
data = np.random.standard_normal((10,2))

In [None]:
df = pd.DataFrame(data, columns = ['x', 'y'])

In [None]:
df.head(2) # the first two rows

In [None]:
df.tail(2) # the last two rows

### Selecting and filtering data

In [None]:
(df['x'] > 1) & (df['y'] < 1) # check if value in x-column is greater than 1 and value in y-column is smaller than 1

In [None]:
df[df['x'] > 1]

In [None]:
df.query('x > 1') # query()-method takes string as parameter

### Selecting and filtering data

In [None]:
(df > 1).head(3) # Find values greater than 1

In [None]:
df[df > 1].head(3) # Select values greater than 1 and put NaN (not-a-number) in the other entries

### Concatenation

* Adding rows from one data frame to another data frame can be done with `append()` or `concat()`:

In [None]:
df1 = pd.DataFrame(['100', '200', '300', '400'],
                  index = ['a', 'b', 'c', 'd'],
                  columns = ['A',])

df2 = pd.DataFrame(['200', '150', '50'],
                  index = ['f', 'b','d'],
                  columns = ['B',])

### Concatenation


In [None]:
df1.append(df2, sort = False)

### Concatenation


In [None]:
pd.concat((df1, df2), sort = False)

### Joining 

* In Python, `join()` refers to joining `DataFrame` objects according to their index values. 
* There are four different types of joining: 
    1. `left` join
    2. `right` join
    3. `inner` join
    4. `outer` join

### Joining 

In [None]:
df1.join(df2, how = 'left') # default join, based on indices of first dataset

In [None]:
df1.join(df2, how = 'right') # based on indices of second dataset

### Joining 

In [None]:
df1.join(df2, how = 'inner') # preserves those index values that are found in both datasets

In [None]:
df1.join(df2, how = 'outer') # preserves indices found in both datasets

### Merging

* Join operations on  `DataFrame` objects are based on the datasets indices.
* __Merging__ operates on a shared column of two `DataFrame` objects.
* To demonstrate the usage we add a new column `C` to `df1` and `df2`.

In [None]:
c = pd.Series([250, 150, 50], index = ['b', 'd', 'c'])
df1['C'] = c
df2['C'] = c

### Merging

In [None]:
df1

In [None]:
df2

### Merging

* By default, a merge takes place on a shared column, preserving only the shared data rows: 

In [None]:
pd.merge(df1, df2)

* An __outer merge__ preserves all data rows: 

In [None]:
pd.merge(df1, df2, how = 'outer')

### Merging

* There are numerous other ways to merge  `DataFrame` objects. 
* To learn more about merging in Python, see the pandas document on [DataFrame merging](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html). 

In [None]:
pd.merge(df1, df2, left_on = 'A', right_on = 'B')

In [None]:
pd.merge(df1, df2, left_on = 'A', right_on = 'B', how = 'outer')