In [1]:
# imports
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
import time
import pickle
import matplotlib.pyplot as plt
import seaborn as sns

# IEOR 4500. Project 5. Pairs trading

In this project we address the basic elements of the pairs-trading strategy.

## Notation:
$p_i^t$ denotes the (closing) price of asset $i$ at time $t$.

The basic premise is as follows. Suppose that we consider a pair $(i,j)$ of assets. When we invest $S_k$ on this pair, we do the following:

- We take the position $S_k$ in asset $i$.
- We take the position $-S_k$ in asset $j$.

The worth of this position is judged as follows:

- The number of shares in asset $i$ equals $k/p_i^t$.
- The value of the position in asset $i$, at time $t+1$, equals $kp_i^{t+1}/p_i^t$.
- The value of the position in asset $j$, at time $t+1$, equals $-kp_j^{t+1}/p_j^t$.

Hence, if we close the pair position at time $t+1$, the value we accrue (gain or loss) equals
$$kp_i^{t+1}/p_i^t - kp_j^{t+1}/p_j^t.$$

Conceptually, you may have thought, throughout, that $k > 0$, i.e., we are longing $i$ and shorting $j$. However, make sure you understand that the formula is correct if $k < 0$, i.e., we short $i$ and long $j$.

- Denote
$$\Delta_i^t = p_i^{t+1}/p_i^t - p_j^{t+1}/p_j^t,$$
and
$$\bar{\Delta}_{ij} = \frac{1}{T} \sum_{t=0}^{T-1} \Delta_i^t.$$

The optimization problem we want to solve is:

Minimize
$$-\sum_{i<j} x_{ij}\Delta_{ij} + \theta \left( \frac{1}{T-1} \sum_{t=0}^{T-1} (\Delta_i^t - \bar{\Delta}_{ij})^2 \right)$$
(1a)

Subject to
$$-1 \leq x_{ij} \leq 1,$$
for all pairs $i < j$.
(1b)

Here, $\theta \geq 0$ is a risk-tolerance parameter. Your code should work for values of $\theta$ ranging from very small to large, e.g., $0 \leq \theta \leq 10^6$.

1. Implement a first-order method, using projected gradients, for this problem. Yes, you can also attempt to handle it using a solver, but I want to see the first-order implementation.

2. You should test it using the daily data that I have uploaded; Wilshire 5000 and Russell 1000. Using the first data set you should be able to get at least 3000 names with more than 1000 valid data values that are date-aligned.


In [2]:
# reading in data
data_folder = './data'
index_name = 'closeRussell1000'

stuff = dict()
file_endings = ['delta', 'delta_bar', 'delta_centered', 'pair_names']
for end in file_endings:
    with open(f'./data/{index_name}_{end}.pkl', 'rb') as f:
        stuff[end] = pickle.load(f)

In [11]:
data = np.array([[1,2,3], [np.nan,2,3]])

In [13]:
data[:, 1][np.isnan(data)]

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

In [None]:
def f(x: np.ndarray, theta: float, pi: float) -> np.ndarray:
    '''Function which we intend to minimize. Vectorized.'''
    t = len(x)
    return -np.dot(delta_bar, x) + theta/t * np.linalg.norm(delta_centered @ x, pi)**pi

def g(x: np.ndarray, theta: float, pi: float) -> np.ndarray:
    '''Gradient of function f. Vectorized.'''
    t = len(x)
    delta_centered_at_x = delta_centered @ x
    return -delta_bar + (theta / t * pi) * ((delta_centered_at_x**(pi-1)).T @ delta_centered)

def gradient_descent() -> tuple:
    '''Gradient descent function. Using gradient normalization, momentum, clipping, and batches. Vectorized.'''
    # for iter in iters:
    #   for batch in batches:
    #       eval grad => norm grad => step with momentum => clip it => check convergence
    pass

In [None]:
params = dict(
    x_0 = np.random.uniform(-1, 1, p),
)