![head.png](figures/head.jpg)

# Financial Data Analytics in Python

**Prof. Dr. Fabian Woebbeking**</br>
Assistant Professor of Financial Economics

IWH - Leibniz Institute for Economic Research</br>
MLU - Martin Luther University Halle-Wittenberg

fabian.woebbeking@iwh-halle.de

# Introduction


## Learning experience

This course is predominantly hands on and draws from several subject areas, such as financial economics, data science, textual analysis, etc. The fundamental concept is to **introduce and implement concepts that are relevant to financial economists**. As such, in class and grading, we focus both on knowledge transfer as well as the solution.

*I hate programming!*</br>
*I hate programming!*</br>
*I hate programming!*</br>
*OH! IT WORKS!*</br>
*I love programming!*</br>

### Materials

The course is hosted as a Git repository on GitHub (see README.md therein): https://github.com/cafawo/FinancialDataAnalytics

We will rely heavily on Jupyter notebooks during the class. **Especially during class and as a backup**, you can always open the latest version of the material using:
* the interactive [**Colab version**](https://colab.research.google.com/github/cafawo/FinancialDataAnalytics/blob/master/slides.ipynb)
* or a static [html version](https://cafawo.github.io/FinancialDataAnalytics/slides.html)

> "Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs." [(Google, 2023)](https://research.google.com/colaboratory/faq.html)

Here you can find Colab links for all notebooks in the repository:

In [30]:
"""
Yeah I know, this cell does not run on Colab. :)
The solution requires web scraping or the GitHub Python API.
If you have a better solution, open a pull request.
"""
import os
colab_url = "https://colab.research.google.com/github/cafawo/FinancialDataAnalytics/blob/master/"
# Loop through all the directories and files in the root directory and its subdirectories
for root, _, files in os.walk('.'):
    for file in files:
        # Check if the file has a .ipynb extension
        if file.endswith('.ipynb'):
            # Extract the relative path to the notebook
            notebook_path = os.path.join(root, file).replace("\\","/")
            # Print the Colab url
            print(colab_url + notebook_path[2:])

https://colab.research.google.com/github/cafawo/FinancialDataAnalytics/blob/master/slides.ipynb
https://colab.research.google.com/github/cafawo/FinancialDataAnalytics/blob/master/homework/01_setup.ipynb


### Grading

**Deliverables:**

1. Small homework assignments are designed to keep track of your progress (individually)
2. Two case studies on current topics in data science and financial economics (individually)
3. One presentation on libraries, tools or data science topics that cannot be covered in detail during the classes (potentially in groups)

All students are requested to hand in their homework, cases and presentation slides as **one Git repository**. This repository is submitted only once, by the end of the entire course. Please note that Git keeps track of changes made (when and by whom) in a repository, copy pasting a repository - i.e. plagiarism - from a colleague will not be tolerated. 

**Deadlines** are enforced through the **commit timestamps** within your repository. For example, a homework assignment has to be commited before the start of the following lecture (where the solution is discussed). Homework assignments that are commited before the deadline are equally weighted, others are ignored(!). You have to submit a **minimum of 4** homework assignments.



**Bonus points:**

Bonus points are awarded to incentivize a collaborative class environnement. Points are awarded for **asking, answering and voting** on Q&A here: https://github.com/cafawo/FinancialDataAnalytics/discussions

You are also allowed to ask and answer questions yourself. Please note that the discussion board is monitored and should be used strictly for Q&A that relates to the class. "How do you like the food in the dining hall?" is not considered a relevant question, hence, does not get you any points. You are allowed to cite and link external sources (e.g. Stack Overflow). You can add an additional answer (there is a voting system).

Additionally, you are very much invited to propose changes (so called pull-requests) to the course repository, see https://github.com/cafawo/FinancialDataAnalytics/pulls

## Tech requirements

This class requires you to run Python codes (incl. scripts and Jupyter notebooks). Furthermore, we use Git and GitHub as a versioning system. All software are open source and/or free!

If you do not have the tech yet:

* Install Python (I highly recommend Anaconda):
    * https://www.anaconda.com/products/distribution
* Install Git (choose one of these):
    * https://git-scm.com/downloads
    * https://anaconda.org/anaconda/git

The remainder of the slides assumes that you have both Python and Git installed on your system. Nevertheless, should you encounter difficulties, consider using Colab (see "Materials" above).

## Git and GitHub

![git.png](https://github.com/cafawo/Derivatives/blob/main/figures/git.png?raw=1)

### Git (local repository)
> Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. [(see Git, 2023)](https://git-scm.com/)

Some source-code editors come with build-in Git (and even GitHub) capabilities or can be extended (e.g. [Microsoft's Visual Studio Code](https://code.visualstudio.com/), which I use during this course).

Your local repository is essentially a folder on your local file system. Changes made in that folder can be committed to the (local) git repository. 

First, "stage" your changes - this is sth. like a pre-commit:

```Bash
# The '*' adds all changes made in your folder (you should be more selective)
git add *
```

Second, commit your staged changes to the local repository:
```Bash
git commit -m 'Commit message'
```

In the broadest sense, you could see Git as a block chain of commits (changes) made to your repository. You can thus
* observe a complete history.
* `git checkout` the state at any commit to the repository.

Why Git? ... 

![conint.png](https://d1.awsstatic.com/product-marketing/DevOps/continuous_integration.4f4cddb8556e2b1a0ca0872ace4d5fe2f68bbc58.png)


>Continuous integration is a DevOps software development practice where developers regularly merge their code changes into a central repository, after which automated builds and tests are run. [AWS, 2023](https://aws.amazon.com/devops/continuous-integration)




More on Git:

* About Git itself: https://git-scm.com/about
* Getting started (videos, tutorials): https://git-scm.com/doc


### GitHub (remote repository)

> GitHub is an Internet hosting service for software development and version control using Git.

Connecting to the remote repository

After installing git, you can `clone` the course repository to your local system:

```Bash
git clone https://github.com/cafawo/FinancialDataAnalytics.git
```

Your local Git repository remembers its origins. This enables you to `pull` updates from the remote (Git does not synchronize automatically). 

```Bash
git pull
```

If you have write access to the remote, you can also `push` changes to it.

```Bash
git push
```

Careful: Git tries its best to merge the remote with the local repository, however, might fail if the two repositories are 'too' diverging. This should not concern you too much as a single user, but becomes very relevant when collaborating on a remote.

This is all we need for this course, however, it is only the tip of the iceberg. More on GitHub:

* Working with GitHub (remotes): https://skills.github.com/


## Python

- Python is an open-source programming language that can be downloaded and used for free.
- Python was created by Guido van Rossum and first published in 1991.
- Today the language is largely developed by the Python Software Foundation, a nonprofit organization.
- It is named after the British comedy group "Monty Python".

If you are new to Python, many examples and extended information can be found on the following websites:  

* [Beginners' Guide](https://wiki.python.org/moin/BeginnersGuide)
* [Python.org](https://docs.python.org/3/tutorial/)
* [Scipy Lectures](http://scipy-lectures.org/_downloads/ScipyLectures-simple.pdf)
* [The Hitchhiker’s Guide to Python](https://docs.python-guide.org)

Pros:
- __Universal__: Python runs on any operating system.
- __Easy to learn__: Although Python is highly versatile (e.g. can be used for scientific computing), it is relatively easy to learn.
- __Readable code__: Python is a high-level programming language, making it easy to read and work with.
- __General purpose__: The language can be applied to solve different problems at hand.
- __Open source__ and __free__.
- __Cross-platform__
- __Indentation aware__: indentation is used instead of braces to mark code blocks.

Cons:
- __Speed__: While Python is not slow, it cannot keep up with compiled languages such as C, C++, Fortran, COBOL, etc.

### Running Python

You can call `python` directly from a shell or start an online version at [online shell](https://www.python.org/shell/). This gives you a nice calculator, but has limited use for our class. We will focus on scripts (.py suffix) and Jupyter notebooks (.ipynb suffix).

#### Scripts (**.py** suffix)

For our development process we usually focus on python scripts. These scripts can be edited with any text editor. A python script, e.g. run.py, can be executed in the shell

```
python run.py
```

You will have more fun when opening scripts inside an __IDE__ (Integrated Development Environment), which is an application that integrates programming, running code, debugging, etc. One example is [Spyder](https://www.spyder-ide.org/), which ships directly with anaconda.

#### Jupyter notebooks (**.ipynb** suffix)

> The Jupyter Notebook is a web-based interactive computing platform that allows users to author data- and code-driven narratives that combine live code, equations, narrative text, visualizations, interactive dashboards and other media. [(see Jupyter.org, 2023)](https://jupyter.org/about)

Jupyter notebooks run out of the box with our Anaconda distribution. This script here is written in Jupyter. (Remember the backup solution using Colab, which we discussed initially.)

Some source-code editors can also open and run Jupyter notebooks (e.g. [Microsoft's Visual Studio Code](https://code.visualstudio.com/), which I use during this course).

I recommend Jupyter notebooks as a presentation tool and python scripts for the development of more complex systems.

Jupyter notebooks differentiate two basic input cells:

#### Text cells aka markdown cells
This is a **text cell**. You can **double-click** to edit this cell. Text cells
use markdown syntax. To learn more, see this [markdown
guide](https://colab.research.google.com/notebooks/markdown_guide.ipynb).

You can also add math to text cells using [LaTeX](http://www.latex-project.org/)
to be rendered by [MathJax](https://www.mathjax.org). Just place the statement
within a pair of **\$** signs. For example `$\sqrt{3x-1}+(1+x)^2$` becomes
$\sqrt{3x-1}+(1+x)^2.$

##### Code cells

Below is a **code cell**.

* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;

In [31]:
a = 10 + 2
print(a)

12


# Python basics

## Style

Without being too pedantic, we follow the [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/). When in doubt, return to this source for guidance.

### Naming convention
Here are some best practices to follow when naming stuff.
* Use all lowercase. Ex: name instead of Name
* One exception: class names should start with a capital letter and follow by lowercase letters.
* Use snake_case convention (i.e., separate words by underscores, look like a snake). Ex: gross_profit instead of grossProfit or GrossProfit.
* Should be meaningful and easy to remember. Ex: interest_rate instead of r or ir.
* Should have a reasonable length. Ex: sales_apr instead of sales_data_for_april
* Avoid names of popular functions and modules. Ex: avoid print, math, or collections.

### Comments

Comments should help to understand how your code works and your intentions behind it! 

> Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes! Comments should be complete sentences. The first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!). [(PEP 8)](https://peps.python.org/pep-0008/#comments)

## Data types and structures

Python comes with a small set of functions and types built into it that are always available [(see overview HERE)](https://docs.python.org/3/library/functions.html).

See: 
* https://docs.python.org/3/tutorial/introduction.html
* https://docs.python.org/3/tutorial/datastructures.html

In [32]:
# Hi, I'm a comment

In [33]:
# Printing to the standard output
print("Hello World!")

Hello World!


In [34]:
# A simple calculator
print(2 + 2 * 2 + 2 ** 2)

10


### Variables

In [35]:
# Assigning variables
savings = 1000
r = 0.025

In [36]:
# Calculate the savings value in ten years, given an interest rate of 2.5%.
future_value = savings*(1+r)**10 

In [37]:
# Note that assigning a variable (above) does not print an output.
print(future_value)

1280.0845441963565


### Basic data types

| Object type | Meaning | Used for
|-|---|-
| int | Integer value | Natural numbers 
|float | Floating-point number | Real numbers
|bool | Boolean value | Something true or false
| str | String object | Character, word, text 

#### `int()`

Zero, natural numbers and their negative counterparts, i.e., $\mathbb Z=\{\ldots,-3,-2,-1,0,1,2,3,\ldots\}$

In [38]:
int_example = 4
print(int_example)
print(type(int_example))

4
<class 'int'>


#### `float()`

Floating point numbers ("floats"), are a computer representation of real numbers, i.e., $\mathbb R$

In [39]:
float_example = 1/4
print(float_example)
print(type(float_example))

0.25
<class 'float'>


#### `bool()`

* Boolean: True or False
* Data type associated with logical expressions
* A Boolean is either `True` or `False`

In [40]:
bool_example = 5 > 4
print(bool_example)
print(type(bool_example))

True
<class 'bool'>


In [41]:
# Also
print(True == 1)  # !
print(False == 0)  # !

True
True


#### `str()` 
* Strings represent text, i.e., a string is a sequence of characters. 
* A string object is defined by wrapping its contents in in single or double quotation marks.

In [42]:
str_example = "Hello World!"
print(str_example)
print(type(str_example))

Hello World!
<class 'str'>


In [43]:
# Build in Functions for strings, e.g.
print(str_example.upper())
print(str_example.split(' '))
print(str_example.split('l'))

HELLO WORLD!
['Hello', 'World!']
['He', '', 'o Wor', 'd!']


In [44]:
# String formatting (this is cool stuff)
formatted_str = f"Plain: The future value from above is equal to {future_value}"
print(formatted_str)
# Fancy string formatting (this is cool stuff)
fancy_formatted_str = f"Fancy: The future value from above is equal to {future_value:,.2f} with {r*100:.2f}% interest."
print(fancy_formatted_str)

Plain: The future value from above is equal to 1280.0845441963565
Fancy: The future value from above is equal to 1,280.08 with 2.50% interest.


### Basic data structures

| Object type | Meaning | Used for
|-|---|-
| list | Mutable container | Changing set of objects
| tuple | Immutable container | Fixed set ob objects, record
| dict | Mutable container | Key-value store
| set | Mutable container | Collection of unique objects

Please note that in Python **indices start at 0!**

#### `list()`
* A list is created by listing its elements within square brackets, separated by commas.

In [45]:
# Integers are just an example, you could store any other data type
example_list = [1, 2, 3, 4, 5, 6, 7]
# List indices
print(example_list[0])    # First element
print(example_list[-1])   # Last element
print(example_list[:4])   # First four elements
print(example_list[1:4])  # 2nd to 4th element


1
7
[1, 2, 3, 4]
[2, 3, 4]


In [46]:
# Lists are mutable, hence,
example_list[0] = 'A'
example_list[2:4] = ['B', 'C']
print(example_list)
example_list.append('D')
print(example_list)

['A', 2, 'B', 'C', 5, 6, 7]
['A', 2, 'B', 'C', 5, 6, 7, 'D']


In [47]:
# List comprehensions: very fast, very powerful (also see control flows below)
example_list = [1, 2, 3, 4, 5, 6, 7]
example_list_new = [x**2 for x in example_list]
print(example_list_new)
example_list_str = [element.upper() for element in ['aaa', 'aab', 'aba']]
print(example_list_str)

[1, 4, 9, 16, 25, 36, 49]
['AAA', 'AAB', 'ABA']


#### `tuple()`
* The elements of a tuple are written between parentheses, or just separated by commas.
* Tuples are immutable (an immutable object is an object whose state cannot be modified after it is created).

In [48]:
example_tuple = (1, 2, 3, 'Tom')
print(example_tuple)

(1, 2, 3, 'Tom')


#### `dict()`

* key -> value pairs

In [49]:
example_dict = {'A':2.5, '2':2.5, 2.5:3, 'B':'Peter'}
print(example_dict)
print(example_dict['B'])
print(example_dict.keys())
print(example_dict.values())

{'A': 2.5, '2': 2.5, 2.5: 3, 'B': 'Peter'}
Peter
dict_keys(['A', '2', 2.5, 'B'])
dict_values([2.5, 2.5, 3, 'Peter'])


#### `set()`

* Unique objects

In [50]:
example_set = {1, 1, 1, 2, 3, 4}
print(example_set)
example_set_2 = {1, 2, 3, 4, 5, 6}
print(example_set_2)
print(example_set - example_set_2)
print(example_set_2 - example_set)

{1, 2, 3, 4}
{1, 2, 3, 4, 5, 6}
set()
{5, 6}


### Control flow

* Control flow refers to control structures such as if-statements and for-loops that control the order in which code is executed.
* Python uses indentation to separate control flow statements 

See:
* https://docs.python.org/3/tutorial/controlflow.html

#### `if` statements

In [51]:
if 5 + 5 == 10:
    print("This is True")

This is True


In [52]:
x = 5
if x == 1:
    print(f"{x} = 1")
elif x == 2:
    print(f"{x} = 2")
elif x == 3:
    print(f"{x} = 3")
else:
    print(f"{x} > 3")

5 > 3


#### `for` loop

In [53]:
for flower in ['tulip', 'rose', 'cucumber']:
    print(f"Is a {flower} a flower?")

Is a tulip a flower?
Is a rose a flower?
Is a cucumber a flower?


In [54]:
for i in range(5):
    print(i)

0
1
2
3
4


#### `while` loop

Runs while the condition is true.

In [55]:
i = 0
while i <= 5:
    print(i)
    i += 1

0
1
2
3
4
5


In [56]:
# Breaking a loop (see: https://docs.python.org/3/tutorial/controlflow.html)
i = 0
while True:  # This runs for a very long time ...
    print(i)
    i += 1
    if i >= 3:  # ... unless you break it.
        break

0
1
2


#### Functions

See:
* https://docs.python.org/3/tutorial/controlflow.html#defining-functions


In [57]:
def calculate_future_value(present_value, interest_rate, maturity):
    return present_value * (1 + interest_rate) ** maturity

print(f"The future value after one year is  EUR {calculate_future_value(1000, 0.1, 1):,.2f}")
print(f"The future value after two years is EUR {calculate_future_value(1000, 0.1, 10):,.2f}")

The future value after one year is  EUR 1,100.00
The future value after two years is EUR 2,593.74


In [58]:
# This will convert the notebook to a html file
import os
# Convert to html slides
os.system('jupyter nbconvert slides.ipynb --to html')

0