<div style="background-color:#000;"><img src="pqn.png"></img></div>

In [None]:
import time
import numpy as np
import pandas as pd
import cudf
import cupy as cp

### Load and preprocess price data using pandas

First, we define a function to load our price data from a CSV file using pandas. The function will read the data, set the date as the index, and ensure all date indices are in the correct datetime format.

In [None]:
def get_prices_as_pandas(prices_file):
    d = pd.read_csv(prices_file)
    d.set_index("date_time", inplace=True)
    d.index = pd.to_datetime(d.index)
    return d.bfill().ffill()

This function reads a CSV file containing price data into a pandas DataFrame. It sets the column "date_time" as the index, converting it to a datetime format. This ensures that any missing data points at the beginning or end of the series are filled using a forward and backward fill method, providing a complete dataset for analysis.

### Load and preprocess price data using cuDF

Next, we define a similar function to load the price data using cuDF, a GPU-accelerated library similar to pandas. This will allow us to perform computations on the GPU.

In [None]:
def get_prices_as_cudf(prices_file):
    c = cudf.read_csv(prices_file)
    c.set_index("date_time", inplace=True)
    c.index = cudf.to_datetime(c.index)
    return c.bfill().ffill()

This function performs the same operations as the pandas function but utilizes cuDF to leverage GPU processing. It reads the CSV file and converts the "date_time" column to the index in datetime format. Just like the pandas version, it fills any missing values to ensure the data is ready for efficient GPU processing.

### Compute optimal asset weights using pandas on the CPU

We will now compute the optimal asset weights using the classical Markowitz mean-variance optimization method with pandas. This involves reading the price data, calculating returns, and deriving the portfolio weights that minimize risk.

In [None]:
print("=== Pandas (CPU) Computation ===")

In [None]:
start_cpu = time.time()

In [None]:
df_pandas = get_prices_as_pandas("intraday_prices.csv")
n_assets = len(df_pandas.columns)

In [None]:
df_returns_cpu = df_pandas.pct_change().dropna()
mean_returns_cpu = df_returns_cpu.mean()
cov_matrix_cpu = df_returns_cpu.cov()

In [None]:
inv_cov_cpu = np.linalg.inv(cov_matrix_cpu.values)
ones_cpu = np.ones((n_assets, 1))
w_cpu = inv_cov_cpu.dot(ones_cpu)
w_cpu = w_cpu / (ones_cpu.T.dot(w_cpu))

In [None]:
end_cpu = time.time()
cpu_elapsed = end_cpu - start_cpu

In [None]:
print(f"CPU elapsed time: {cpu_elapsed:.4f} seconds")
print(f"Optimal weights (first 5):\n{w_cpu[:5].flatten()}")
print()

We start by reading the price data and calculating daily asset returns as percentage changes. After computing the mean returns and covariance matrix, we use these to find the optimal portfolio weights that minimize variance. The weights are computed using a closed-form solution involving the inverse of the covariance matrix. The elapsed time for these calculations is printed, along with the first few optimal weights.

### Perform the same computations using cuDF and cuPY on the GPU

Now, we will perform the same computations using cuDF and cuPY to leverage the GPU's computational power. This involves similar steps, but the operations will be accelerated by the GPU.

In [None]:
print("=== cuDF (GPU) Computation ===")

In [None]:
df_cudf = get_prices_as_cudf("intraday_prices.csv")
n_assets = len(df_cudf.columns)

In [None]:
start_gpu = time.time()

In [None]:
df_returns_gpu = df_cudf.pct_change().dropna()
mean_returns_gpu = df_returns_gpu.mean()
cov_matrix_gpu = df_returns_gpu.cov()

In [None]:
cov_cp = cov_matrix_gpu.values
inv_cov_gpu = cp.linalg.inv(cov_cp)

In [None]:
ones_gpu = cp.ones((n_assets, 1))
w_gpu = inv_cov_gpu.dot(ones_gpu)
w_gpu = w_gpu / (ones_gpu.T.dot(w_gpu))

In [None]:
end_gpu = time.time()
gpu_elapsed = end_gpu - start_gpu

In [None]:
print(f"GPU elapsed time: {gpu_elapsed:.4f} seconds")
print(f"Optimal weights (first 5):\n{w_gpu[:5].flatten()}")
print()

This code performs similar operations as before, but using cuDF and cuPY. The GPU processes the same price data, calculating returns, mean returns, and the covariance matrix. The covariance matrix inversion and weight calculations are accelerated by the GPU, resulting in potentially faster computations. We print the elapsed time and the first few optimal weights obtained through GPU processing.

### Compare the computation times between CPU and GPU

Finally, we calculate the speedup achieved by using the GPU over the CPU. This comparison helps us understand the benefits of GPU acceleration for financial computations.

In [None]:
speedup = cpu_elapsed / gpu_elapsed if gpu_elapsed > 0 else float('inf')
print(f"Speedup (CPU / GPU) ~ {speedup:.2f}x")

Here, we determine how much faster the GPU computations are compared to the CPU ones by calculating the ratio of CPU time to GPU time. This provides an insight into the efficiency gain from utilizing GPU resources, especially when dealing with large datasets. Understanding this difference can help decide which approach to use based on the available hardware.

### Your next steps

You've seen how to compute optimal portfolio weights using both CPU and GPU. Try modifying the code to use a different dataset or change the optimization criteria. Experiment with different parameter values or methods to see how they affect the results. This hands-on practice will deepen your understanding of the optimization process and its applications in finance.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk.