# Sharpe Ratio

## Using LSTMs to Predict Sharpe Ratio of the Next Quarter

In [1]:
import pandas as pd
import yfinance as yf  # Yahoo Finance

In [2]:
import numpy as np
import json
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset

from sklearn.decomposition import PCA

In this notebook, we will be attempting to predict the quarterly sharpe ratio of several commodity futures. All data in this notebook was collected via Yahoo Finance's API. We will predict the sharpe ratio using the futures with the nearest expiration dates, specifically:

1. Corn/Maize Futures (ZC=F), daily close values.
2. Soybean Futures (ZS=F), daily close values.
3. Wheat Futures (ZW=F), daily close values.
4. Oats Futures (ZO=F), daily close values.

To make predictions, we will use two different types of features - static and serial features. Static features are the word embedding (GloVe) of the name of the future contract (Corn / Soybeans / etc.), as they do not change with time there is no reason to use them in the LSTM model. On the other hand, serial features are the quarterly Sharpe Ratios and the years / quarters of the value. Overall, the data for each future contract includes a series of all sharpe ratios for every quarter available in yahoo finance and a vector of size 10 including the embeddings of the future contract's name and category.

The word embeddings come frome gensim, a python module that collected data from wikipedia to train its GloVe model. GloVe is an open-source project by Stanford launched in 2014. Briefly, GloVe learns a vector representation of words based on their neighborhoods and how they appear in each others contexts. The goal is for two words with embeddings $w_1, w_2$:

$$
w_i^T w_j \approx \ln P_{ij} + b_i + b_j
$$

where $P_{ij}$ is the probability that $j$ is in the context (some window around the word) of $i$ when $i$ is a randomly sampled occurence of $i$ in the corpus.

In [3]:
import gensim.downloader as api

# Load word vectors
word_vectors = api.load("glove-wiki-gigaword-100")




In [4]:
def get_vector(string):
    sol = np.zeros(100)
    for word in string.lower().split():
        try:
            sol += word_vectors[word]
        except:
            continue
    return sol

In [5]:
with open('commodities_tickers.json', 'r') as f:
    data = json.load(f)

categories = data.keys()
categories_vectors = {category: get_vector(category) for category in categories}

data_reshaped = {key2: [key1, value2] for key1, value1 in data.items() for key2, value2 in value1.items()}

# make matrix of word vectors for each commodity
commodities = data_reshaped.keys()
commodities_vectors = np.array([get_vector(commodity) for commodity in commodities])

# apply PCA to reduce the dimensionality of the word vectors
pca = PCA(n_components=5)
commodities_vectors_pca = pca.fit_transform(commodities_vectors)
categories_vectors_pca = pca.transform(np.array(list(categories_vectors.values())))
categories_vectors_pca = {key: value for key, value in zip(categories, categories_vectors_pca)}

data_reshaped = {key: [np.concatenate([categories_vectors_pca[value[0]], commodities_vectors_pca[i]]), value[1]] for i, (key, value) in enumerate(data_reshaped.items())}

In order to compute the sharpe ratio, we must find a risk-free stock to compare our stocks' returns to. In this case we chose for the risk-free stock to be ^IRX (13 WEEK USA TREASURY BILL). All values used will be CLOSE values. The data collected is the maximum available in yahoo finance.

In [6]:
RISK_FREE = yf.Ticker('^IRX')
RISK_FREE = RISK_FREE.history(period='max')
RISK_FREE['Datetime'] = RISK_FREE.index
RISK_FREE.reset_index(drop=True, inplace=True)
RISK_FREE['Date'] = RISK_FREE['Datetime'].dt.date
RISK_FREE['Returns-RF'] = RISK_FREE['Close'].pct_change()

def retrieve_feature(product, data_reshaped):
    return yf.Ticker(data_reshaped[product][1]).history(period='max')

def feature_extraction(data):
    data['Datetime'] = data.index
    data.reset_index(drop=True, inplace=True)
    data['Date'] = data['Datetime'].dt.date
    data.dropna(inplace=True)
    data['Return'] = data['Close'].pct_change()
    return data[['Datetime', 'Date', 'Return']]

def risk_free_return(data):
    data = data.merge(RISK_FREE[['Date', 'Returns-RF']], on='Date', how='left')
    data.rename(columns={'Returns-RF': 'rf'}, inplace=True)
    data.dropna(inplace=True)
    data['Risk-Adjusted Return'] = data['Return'] - data['rf']
    return data[['Datetime', 'Risk-Adjusted Return']]

def sharpe_ratio(data):
    # add feature of quarter such that jan/feb/mar -> 1, apr/may/jun -> 2, jul/aug/sep -> 3, oct/nov/dec -> 4
    data['Quarter'] = data['Datetime'].dt.quarter
    # group by quarter and year, compute the mean and standard deviation of risk-adjusted return
    data_grouped = data.groupby([data['Datetime'].dt.year, 'Quarter'])

    mean = data_grouped['Risk-Adjusted Return'].mean()
    std = data_grouped['Risk-Adjusted Return'].std()
    length = data_grouped['Risk-Adjusted Return'].count()

    sharp_ratio = mean / std * (length ** 0.5)
    return sharp_ratio.reset_index()

In [7]:
def get_final_series(product, data_reshaped):
    """
    Given the name of a product and the dictionary of all products, return the financial data of the product, the risk-adjusted return, and the Sharpe ratio.
    """
    data = retrieve_feature(product, data_reshaped)
    data = feature_extraction(data)
    data = risk_free_return(data)
    sharpe = sharpe_ratio(data)
    values = sharpe.values
    # for every row in values, concatenate the corresponding vector in data_reshaped
    meta_data = data_reshaped[product][0]
    # copy meta_data for each row in values
    meta_data = np.tile(meta_data, (values.shape[0], 1))
    return np.concatenate([meta_data, values], axis=1)


We will train on all financial data that we have except for the cereals we are interested in predicting.

In [8]:
data_train = [torch.tensor(get_final_series(product, data_reshaped)) for product in data_reshaped.keys() if product not in ['Corn', 'Soybeans', 'Wheat', 'Oats']]
data_test = {product: torch.tensor(get_final_series(product, data_reshaped)) for product in ['Corn', 'Soybeans', 'Wheat', 'Oats']}

Predictor is an LSTM. The theoretical background of LSTMs is in the file deep_sequential.ipynb. Hyperparameter training was used, but we only present the final product.

Only the sequential data is inserted to the LSTM part, because the static data is the same over all time points. After the LSTM returns a final vector for the serial input of values (year/quarter/sharpe ratio), we concatenate that information with the meta-information about the futures (GloVe embeddings) and insert it into a fully connected network with one hidden layer and a

In [9]:
class Predictor(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, device):
        super(Predictor, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(3, hidden_size, num_layers, batch_first=True)
        self.fc1 = nn.Linear(hidden_size + 10, 8)
        self.fc2 = nn.Linear(8, output_size)
        self.device = device

    def forward(self, x):
        x_serial = x[:, :, -3:]
        # print(x_serial)
        x_static = x[:, 0, :-3]
        # print(x_static)

        out, (h_n, c_n) = self.lstm(x_serial)
        # print(torch.cat([out[:, -1, :], x_static], dim=1))
        out = F.relu(self.fc1(torch.cat([out[:, -1, :], x_static], dim=1)))
        return self.fc2(out)

In [10]:
def create_inputs(series, device):
    """
    Every vector in the sequence should predict the last index (sharpe ratio) of the next vector in the sequence. Hence the last row of the input should not be used. In the end, we are only interested in predicting the final sharpe ratio of the sequence.
    """
    x = series.unsqueeze(0).float().to(device)
    x = x[:, :-1, :]
    y = series[-1, -1].unsqueeze(0).unsqueeze(0).float().to(device)
    return x, y

In [11]:
def create_inputs2(series, device):
    train_data_size = int(0.8 * series.shape[0])
    train_data = series[:train_data_size]
    test_data = series[train_data_size:-1]
    train_x = train_data.unsqueeze(0).float().to(device)
    val_x = test_data.unsqueeze(0).float().to(device)
    train_y = series[train_data_size, -1].unsqueeze(0).unsqueeze(0).float().to(device)
    val_y = series[-1, -1].unsqueeze(0).unsqueeze(0).float().to(device)
    return train_x, train_y, val_x, val_y

In [12]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Predictor(13, 64, 2, 1, device)
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
criterion = nn.MSELoss()

running_loss = np.inf
for epoch in range(1, 101):
    previous_running_loss = running_loss
    running_loss = 0
    for data in data_train:
        train_x, train_y = create_inputs(data, device)
        optimizer.zero_grad()
        outputs = model(train_x)

        loss = criterion(outputs, train_y)
        running_loss += loss.item()
        loss.backward()
        optimizer.step()
    if abs(previous_running_loss - running_loss) < 1e-4:
      print(f'Epoch {epoch}, Loss {running_loss / len(data_train)}')
      break
    if epoch % 10 == 0 or epoch == 1:
        print(f'Epoch {epoch}, Loss {running_loss / len(data_train)}')


Epoch 1, Loss 0.9990135297801317
Epoch 10, Loss 0.49878592299646696
Epoch 20, Loss 0.4208293967647478
Epoch 30, Loss 0.3849711169543298
Epoch 40, Loss 0.3612002514518281
Epoch 50, Loss 0.3445150070904447
Epoch 60, Loss 0.32934238641610136
Epoch 70, Loss 0.3140702413373447
Epoch 80, Loss 0.30151009191991995
Epoch 90, Loss 0.2901322982099373
Epoch 100, Loss 0.27983164646536807


In [13]:
with torch.no_grad():
    for product in data_test:
        data = data_test[product]
        val_x, val_y = create_inputs(data, device)
        outputs = model(val_x)
        # compute absolute error
        error = (val_y - outputs) ** 2
        try:
            overall = torch.stack(overall, error)
        except:
            overall = error

print('Test MSE:', overall.mean().item())

Test MSE: 4.091559410095215


Unfortunately we get a relatively high MSE on our test set. This score is high because Sharpe ratio doesn't tend to reach such high values as it is normalized. This may be because our model overfit the training data, or many other reasons such as not taking macro events into consideration, Sharpe ratio containing a large element of randomness, etc. Regardless, below we can see the predictions of the Sharpe Ratios in the next quarter (final quarter of 2024), based on all past data and the embeddings.

In [14]:
with torch.no_grad():
    for product in data_test:
        data = data_test[product]
        x = data.unsqueeze(0).float().to(device)
        outputs = model(x)
        print(f'Sharpe ratio prediction for {product} for next quarter: {outputs.item()}')

Sharpe ratio prediction for Corn for next quarter: -0.43392103910446167
Sharpe ratio prediction for Soybeans for next quarter: -0.33158957958221436
Sharpe ratio prediction for Wheat for next quarter: -0.4126572608947754
Sharpe ratio prediction for Oats for next quarter: -0.15434390306472778
