# <span style='color:red'>Quantitative Investing with Python</span>

### Professor Juhani Linnainmaa

Dartmouth College and Kepos Capital (Co-Director of Research)

--- 

## **Learning objective:**

- Working knowledge of quantitative investing: **how to construct and backtest trading strategies**
  - Factor investing / alternative risk premiums
  - Brief introduction to statistical arbitrage (time permitting)

## **Goals:**

1. Basics of using Python 
   - Python coding is often done in an IDE such as PyCharm/VSCode or in **Jupyter notebooks**
   - Jupyter notebooks are convenient in that you can write notes and code side by side and, because code is executed in steps, debugging is easy
   - We can write standalone 'scripts' or larger projects that consist of multiple files
     - Python files typically have .py extensions; notebooks have .ipynb extensions
   - We will use Jupyter Lab through Dartmouth's own system at jhub.dartmouth.edu
   - The goal is to get you up and running to the point where you know where to look up more information
   - **If** you work in an organization that uses Python, there would likely be a common codebase (repository) that is maintained with git or svn 


2. Loading and analyzing financial data
   - Get data from public sources using APIs as well as from commonly used databases (CRSP and Compustat)
   - Explore data


3. Construct academic factors
   - Understanding the decisions made for constructing factors
   - Examine sensitivity to the strategy rules: when you construct a trading rule, you have to make many decisions about *what* and *how* you trade
   - Construct additional factors


4. Predict stock returns and trade these predictions
   - Overfitting

## **Learning method:**

- Learn by doing: we write and modify some code to understand how things can be done and give ideas of what is possible

---

Outside the scope of this course:

- Multiperiod optimization with trading costs

# **Topic 1:** Getting Familiar with Python

## Working with Jupyter notebooks

A <span style='color:red'>Jupyter notebook</span> consists of cells

There are two types of cells: code and markdown

When we work with notebooks, we create, delete, and modify cells and execute code in those cells

We are either in **edit** or **command** mode
- In the EDIT mode you are inside a cell with a blinking cursor
- In the COMMAND mode you are outside the cell
- You can move from the COMMAND mode to EDIT mode by hitting ENTER; you move from EDIT model to COMMAND mode by hitting ESC

Here are the keyboard shortcuts in the **command** mode:

- Up- and down-arrows: move up and down between cells
- A: create a new cell above
- B: create a new cell below
- X: delete cell
- Z: undo (e.g., undelete cell)
- M: convert a CODE cell into a MARKDOWN cell 
- Y: convert a MARKDOWN cell into a CODE cell

To execute code, press CTRL + ENTER in either model

## Background on Python

When you start a new Python session, there is nothing (except basic Python) in the memory
- This is true even if you open an old notebook that has code in it

When we run code in cells, objects get defined and they remain in memory

There is a counter next to each cell that has been executed to indicate the execution order

You can clear the memory by **restarting the kernel**
- Each notebook (if you have many open) has its own kernel

If you execute code and want to force it to stop, you can **interrupt the kernel**
- Execution breaks but memory isn't cleared (but whatever *was* executed has altered objects)

## Object-oriented programming language 

Python is object oriented programming language
- This means that most of the things in the language are objects with their own properties and methods
- What does this mean?
  - If we have list of numbers, we (typically) would **not** call a separate function to add an element to that list
    
    MY_LIST = [1, 2, 3]
    MY_LIST = ADD_ITEM(MY_LIST, 4)
    
  - Rather, if you have a list, the list object itself has a number of methods for doing things to it, such as appending a new element:
  
    MY_LIST = [1, 2, 3]
    MY_LIST.append(4)
    
- There is a thing in Python about modifying objects **in place** versus returning a copy. Append modifies in place.
- **What does this mean?** In a language such as Python you'll see lots of methods being applied to objects. 
  - This will be *extremely* convenient when we work with data because we can "chain" methods
  - Instead of writing 10 lines of code to reshape and filter your data, you can do it in 1 line
- If you haven't done OOP, just go with it and it'll make sense/click at some point 

## Extending basic Python

When you start a new session, Python comes in with certain basic set of commands 

However, we typically want to extend functionality by bringing in additional functions written by others

Almost every Python example that you see starts with a bunch of **import** statements, such as 

```
import numpy as np
import pandas as pd
```

I'll come back to the meaning of these statements after introducing some basics

- To see what methods an object has, write [object]. and press tab
  - You can also use this to autocomplete

- To see information about objects and "docstring" (manual page), write object? or object??
  - You can also see a function's signature by pressing SHIFT+TAB inside the parentheses

## Introduction

- We can define variables to contain (or, rather, refer to) single numbers, lists of numbers, arrays (vectors and matrices) or data
- There are many different object types in Python
- We have the first code cell below
  - If we want to add comments into a code block, we can do so by putting the # symbol before the comment
  - If we want to have longer comments, we use triple quotation marks 

In [None]:
# Strings: Define a string and print it

x = 'Hello world!'
print(x)

# Define another string -- Python does both single and double-quotation marks
y = "Hello world pt. 2!"
print(y)

# Why? You might want to have quotes inside a string (more easily)
z = 'Here is a "quote" from someone'
print(z)

In [None]:
# Integers and Floats

x = 1
y = 1.7

print(x+y)

In [None]:
# List is a list of multiple objects, e.g., strings or number. It is indicated by square brackets.

L = [1, 2, 3, 10, 27]
print(L)

In [None]:
# Tuple is a list-like object that cannot be changed (mutated) after its created. 
# Indicate by parentheses ()
# This means that it can be used in places where a list wouldn't work (e.g., as a dictionary key)

my_list = [1, 2, 3]
print('My original list: ', my_list)

my_list[1] = 99
print('\nMy updated list: ', my_list)

my_tuple = (1, 2, 3)
print('\nMy original tuple: ', my_tuple)

print('\nTrying to change my tuple will give an error:')

#my_tuple[1] = 99

In [None]:
my_tuple

In [None]:
# Dictionary is a "list" of key-value pairs. The values can be accessed by using the key

D = {'student1': 'John',
     'student2': 'Mary',
     'student3': 'Michael',
     'student4': 'Alice'}

print(D['student1'])
print(D['student3'])

- There are many small things that are specific to Python
- The simple things are easy to learn and if you've done any coding, the logic is the same as always
- But there are lots of small things:
  - += and -=
  - Variables: by reference or create a new variable? 
  - f-strings (formatted strings)
  - The use of indentation in if-else and loops
  - Iterating and generators 
  - Unpacking 
  - Indexing starts at zero and we stop before the stop value
  - List comprehensions
  - Lambda functions
  - Ternary expressions
- What makes learning Python is a bit tricky is that when you read code, you'll inevitably run into all these "small things" -- and it is easy to get lost
  - Moving to larger scale projects, there are classes and class inheritances that would need some studying to understand

In [None]:
# By reference

a = [1, 2, 3]
b = a
c = a.copy()

print(f'{a=} and {b=} and {c=}')

a += [5]

print(f'{a=} and {b=} and {c=}')

print('both a and b refer to the same object, c is its own object!')

In [None]:
# there is no *end* or {} brackets in if-else statements. We indent to identicate the start and end.
# elif stands for "else if"

a = 3

if a==1:
    print('Number one')
elif a==2:
    print('Number two')
elif a==3:
    print('Number three')
else:
    print('Something else')

In [None]:
# We often iterate (go through, item by item) through, e.g., lists

# Note that in Python we don't have create a loop and then access the elements; the loop automatically stops when we've looped through all items
my_list = [1, 1, 2, 3, 5, 8, 13]

for item in my_list:
    print(item)

In [None]:
# We can define 'generators' that we can then iterate to yield values one by one; 
# this is like the list above, "generating" new values as we call it
# "range" is one of these generators
# we need to specify the STOP value.

my_range = range(10)
for r in my_range:
    print(r)

print('Note that the list starts at 0 and it stops BEFORE it sees the STOP value of 10')

In [None]:
# List comprehensions is popular Python feature for creating lists (or dictionaries) without having to loop
# Any time you want to create a list of something based on some rule (take out some letters, apply some 
# transfomration, add something, filter based on some rule...) you can probably use list comprehensions

# If I want to create a list that says Student_1, Student_2,..., Student_10, I could write:

student_list = ['Student_' + str(n) for n in range(1,11)]
print(student_list)

In [None]:
# If I have a list of numbers and want to pick just the numbers *not* divisible by three:

my_list = list(range(1,51))
print('My numbers: ', end='')
print(my_list)

not_by_three = [n for n in my_list if not (n % 3) == 0]

print('\nTake out those divisible by three: ', end='')
print(not_by_three)

### Functions

- Functions are defined with def (function_name)
- They can have "positional" or "keyword" arguments (a positional argument is identified by its position, a keyword argument has a name)
- Similar to other blocks, the beginning and end are identified with indentation
- Functions typically return something

In [None]:
def add_exclamation_mark(input_str):
    output_str = input_str + "!"
    return output_str

In [None]:
print('Write something and press enter:')
my_input = input('-> ')

print('\nYou wrote: ' + my_input)

output = add_exclamation_mark(my_input)

print('\nHere is what we get after applying the function: ' + output)


### Lambda functions are for convenience for creating small 'anonymous' functions

- They have a weird syntax -- before you get used to it
- Lambda functions are useful when you need to do some transformation on the fly (e.g., list comprehensions, applying something to every value in your data, etc.)
- **Key thing to remember at first**: when you see "lambda", it means that some function is being created on the fly for one-off use 
  - We *can* use lambda to define named functions (as I do below), but it is considered bad form

In [None]:
add_question_mark = lambda input_str: input_str + '?'

In [None]:
print('Write something and press enter:')
my_input = input('-> ')

print('\nYou wrote: ' + my_input)

output = add_question_mark(my_input)

print('\nHere is what we get after applying the (lambda) function: \n\n' + output)

A typical example of using lambda functions is when sorting something

- Below, create a dictionary of four people with their ages
- Suppose we want to list these people 

In [None]:
unsorted_people = [('Alice', 30), ('Bob', 25), ('Cindy', 10), ('Daphne', 37)]
sorted(unsorted_people, key=lambda x: x[1])

### Unpacking lists and tuples

- Unpacking refers to extracting values from lists etc. and assigning them into different variables
- There is a special unpacking * operator that is used in different places in Python. Below, I use it to get "everything in the middle"

In [None]:
my_list = [1, 8]
a, b = my_list
print(f'{a=} and {b=}')

my_other_list = [1, 2, 3, 4, 5, 8]
a, *b, c = my_other_list

print(f'{a=} and {b=} and {c=}')

# Expanding capabilities -- Importing packages

- When you start Python, it comes with a limited set of functions
- However, even base Python includes many other functions but they have to be imported
- We don't want to have a system that has everything in it -- it would "pollute the namespace" and create overhead
- We typically just import what we need
- Python code typically begins with a block of import statements
  - When you write code, you realize you need something else and then you add that to the list of packages you import
  
#### Details

- We can import everything from a package (don't), we can import the package with some alias (so that everything can be accessed through that alias), or we can import specific functions from a package
- Typical packages as numpy (for matrices and math) and pandas (for data science). We often import datetime function from datetime package
- So we could have the following:

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime

A = np.array([[1, 2, 0], [4, 5, 6], [7, 8, 9]])

print('Original matrix:')
print(A)
print('\nInverse of the matrix:')
print(np.linalg.inv(A))

df = pd.DataFrame({'col a': [1, 2, 3], 'col b': [10, 11, 12]})
print('\nA dataframe:')
print(df)
print('\n')

print(datetime(2023,9,23))

In [None]:
import yfinance as yf
from datetime import datetime, timedelta

# Define the ticker symbols for Apple and Google
ticker_symbols = ['AAPL', 'GOOGL']

# Define the start and end dates for the data
end_date = datetime.today()
start_date = end_date - timedelta(days=5 * 365)  # Five years ago

# Download the data for each stock
stock_data = yf.download(ticker_symbols, start=start_date, end=end_date)

# Print the first few rows of the downloaded data
print(stock_data.head())