# Lab 2: Portfolio Optimization

In this lab we will explore:
- How to compute the return of a portfolio of assets.
- How to compute the volatility of a portfolio of assets.
- How to optimize a portfolio.

## Imports

In this lab we use the following new packages:
- `cvxpy`: Used for convex optimization and quadratic programming.
- `numpy`: Useful for matrix opperations.
- `matplotlib`: Plotting library.
- `seaborn`: A wrapper for `matplotlib` that makes charting a little easier.

In [2]:
import sf_quant.data as sfd
import sf_quant.optimizer as sfo
import polars as pl
import datetime as dt
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data

Here we pull daily returns for Apple, Ford, Verizon, and Coke.

In [4]:
start = dt.date(2024, 1, 1)
end = dt.date(2024, 12, 31)

in_sample = [start, dt.date(2024, 5, 31)]
out_of_sample = [dt.date(2024, 6, 1), end]

columns = [
    'date',
    'barrid',
    'ticker',
    'return'
]

tickers = sorted(['AAPL', 'F', 'VZ', 'K'])

returns = (
        sfd.load_assets(
        start=start,
        end=end,
        in_universe=True,
        columns=columns
    )
    .filter(
        pl.col('ticker').is_in(tickers)
    )
    .with_columns(
        pl.col('return').truediv(100)
    )
)

returns

date,barrid,ticker,return
date,str,str,f64
2024-01-02,"""USAB1X1""","""AAPL""",-0.035787
2024-01-03,"""USAB1X1""","""AAPL""",-0.007488
2024-01-04,"""USAB1X1""","""AAPL""",-0.0127
2024-01-05,"""USAB1X1""","""AAPL""",-0.004013
2024-01-08,"""USAB1X1""","""AAPL""",0.024175
…,…,…,…
2024-12-24,"""USAHQN1""","""K""",0.001487
2024-12-26,"""USAHQN1""","""K""",0.003588
2024-12-27,"""USAHQN1""","""K""",0.00074
2024-12-30,"""USAHQN1""","""K""",-0.004312


## Expected Returns Forecast

In practice we will forecast returns using a combination of signal forecasts, but for demonstration purposes we will just use the average historical returns as our expected returns forecast.

### Instructions
1. Create a expected returns vector for the `in_sample` period (it should be 1x4)

In [5]:
def expected_returns_task(returns: pl.DataFrame) -> np.array:
    """
    Compute the average daily return of each asset for the in_sample period.

    Args:
        returns (pl.DataFrame): a data frame containing date, ticker, and return columns

    Returns:
        np.array: a numpy array (1x4) with an entry for the average return of each ticker (sorted alphabetically by ticker)
    """

    return (
        returns
        .filter(
            pl.col('date').is_between(*in_sample)
        )
        .group_by('ticker')
        .agg(
            pl.col('return').mean()
        )
        .sort('ticker')
        ['return'].to_numpy()
    )

er_vector = expected_returns_task(returns)

er_vector

array([0.00011135, 0.00052617, 0.00092021, 0.0012419 ])

## Covariance Matrix Forecast

In practice we will use forecasted covariance matrices from Barra, but for demonstration purposes we will use the historical covariance matrix as our forecast.

### Instructions

1. Create the covariance matrix using returns from the `in_sample` period.

In [11]:
returns.filter(
    pl.col('date').is_between(*in_sample)
).sort('ticker').pivot(on='ticker', index='date', values='return')

date,AAPL,F,K,VZ
date,f64,f64,f64,f64
2024-01-02,-0.035787,-0.002461,0.033089,0.0313
2024-01-03,-0.007488,-0.037007,-0.004848,0.007202
2024-01-04,-0.0127,-0.002562,-0.011134,0.005363
2024-01-05,-0.004013,0.014555,-0.003167,0.021082
2024-01-08,0.024175,0.010127,-0.002118,-0.002488
…,…,…,…,…
2024-05-24,0.016588,0.004129,-0.004729,0.007862
2024-05-28,0.000053,-0.039474,-0.025229,-0.009562
2024-05-29,0.001579,-0.010274,0.001681,-0.006606
2024-05-30,0.005255,0.018166,-0.004362,0.031458


In [13]:
def covariance_matrix_task(returns: pl.DataFrame) -> np.array:
    """
    Compute the historical covariance matrix using the returns from the in_sample period.

    Args:
        returns (pl.DataFrame): a data frame containing date, ticker, and return columns

    Returns:
        np.array: a numpy array (4x4) containing the covariances of the assets (columns and rows are sorted alphabetically by ticker)
    """

    return (
        pl.from_pandas(
            returns
            .filter(
                pl.col('date').is_between(*in_sample)
            )
            .sort('ticker')
            .pivot(on='ticker', index='date', values='return')
            .sort('date')
            .drop('date')
            .to_pandas()
            .cov()
        )
        .to_numpy()
    )

cov_mat = covariance_matrix_task(returns)

cov_mat

array([[ 2.03265610e-04,  4.40064300e-05, -5.49098285e-06,
        -1.03171749e-05],
       [ 4.40064300e-05,  4.23402071e-04,  7.82493618e-05,
         4.68156080e-05],
       [-5.49098285e-06,  7.82493618e-05,  1.99415916e-04,
         4.00302177e-05],
       [-1.03171749e-05,  4.68156080e-05,  4.00302177e-05,
         1.88445492e-04]])

## Mean Variance Optimization

In order to determine how much of our portfolio to invest in each assset, we will use mean variance optimization to determine the maximum sharpe ratio portfolio.

### Instructions
1. Create the objective function for the `cp.Problem`

utility = portfolio_return - 0.5 * gamma * portfolio_variance

2. Constrain the problem such that
- Weights sum to 1 (full investment)
- Weights are greater than 0 (long only)
- Weights are less than 1 (no buying on margin)

3. Find the optimal portfolio weights by solving the `cp.Problem`

In [15]:
def optimization_task(tickers: list[str], er_vector: np.array, cov_mat: np.array) -> pl.DataFrame:
    """
    Compute the optimal weights of the portfolio given the expected returns forecast and the covariance matrix forecast.

    Args:
        ticker (list[str]): a list of tickers sorted alphabetically
        er_vector (np.array): a numpy array (1x4) with an entry for the average return of each ticker (sorted alphabetically by ticker)
        cov_mat (np.array): a numpy array (4x4) containing the covariances of the assets (columns and rows are sorted alphabetically by ticker)

    Returns:
        pl.DataFrame: a data frame with columns ticker and weight
    """
    n_assets = cov_mat.shape[0]
    
    weights = cp.Variable(n_assets)

    gamma = 10

    portfolio_return = weights.T @ er_vector
    portfolio_variance = weights.T @ cov_mat @ weights

    objective = cp.Maximize(portfolio_return - 0.5 * gamma * portfolio_variance)

    constraints = [
        cp.sum(weights) == 1,
        weights >= 0,
        weights <= 1
    ]

    problem = cp.Problem(objective, constraints)
    problem.solve()

    return pl.DataFrame(
        {
            'ticker': tickers,
            'weight': weights.value
        }
    )

weights = optimization_task(tickers, er_vector, cov_mat)

weights

ticker,weight
str,f64
"""AAPL""",0.082826
"""F""",-1.8768e-23
"""K""",0.336433
"""VZ""",0.580741


## In Sample vs. Out of Sample Performance

Now that we've identified the optimal weights let's see if our optimized portfolio outperforms a simple equal weighted portfolio.

### Instructions
1. Compute the baseline returns for the `out_of_sample` period by computing the equal weight returns for each period.
2. Compute the optimized returns for the `out_of_sample` period by computing the weighted return of the portfolio for each period using our optimal weights.
3. Plot the cumulative returns of each strategy in one chart.
4. Compute the total return, average daily return (annualized), volatility (annualized), and sharpe ratio (annualized) for both strategies.
5. Go back and try a handful of different `gamma` values in your optimizer and see how your performance results change.
6. Write a few sentences commenting on the results.

In [20]:
def baseline_returns_task(returns: pl.DataFrame) -> pl.DataFrame:
    """
    Compute the average daily return of each asset for the out_of_sample period.

    Args:
        returns (pl.DataFrame): a data frame containing date, ticker, and return columns

    Returns:
        pl.DataFrame: a data frame containing date, return, and cumulative_return columns
    """

    return (
        returns
        .filter(
            pl.col('date').is_between(*out_of_sample)
        )
        .sort('date')
        .group_by('date')
        .agg(
            pl.col('return').mean()
        )
        .sort('date')
        .with_columns(
            cumulative_return = pl.col('return').add(1).cum_prod().sub(1)
        )
        .with_columns(
            pl.col('return', 'cumulative_return').mul(100)
        )
    )

baseline_returns = baseline_returns_task(returns)

baseline_returns

date,return,cumulative_return
date,f64,f64
2024-06-03,0.297425,0.297425
2024-06-04,0.313175,0.611531
2024-06-05,-0.12875,0.481994
2024-06-06,-0.0683,0.413365
2024-06-07,0.048225,0.461789
…,…,…
2024-12-24,0.539525,13.904218
2024-12-26,0.39435,14.353399
2024-12-27,-0.436875,13.853817
2024-12-30,-1.01365,12.699738


In [23]:
def optimized_returns_task(returns: pl.DataFrame, weights: pl.DataFrame) -> pl.DataFrame:
    """
    Compute the average daily return of each asset for the out_of_sample period.

    Args:
        returns (pl.DataFrame): a data frame containing date, ticker, and return columns
        weights (pl.DataFrame): a data frame with columns ticker and weight

    Returns:
        pl.DataFrame: a data frame containing date, return, and cumulative_return columns
    """

    return (
        returns
        .filter(
            pl.col('date').is_between(*out_of_sample)
        )
        .sort('date')
        .join(weights, on='ticker', how='left')
        .group_by('date')
        .agg(
            pl.col('return').mul('weight').mean()
        )
        .sort('date')
        .with_columns(
            cumulative_return = pl.col('return').add(1).cum_prod().sub(1)
        )
        .with_columns(
            pl.col('return', 'cumulative_return').mul(100)
        )
    )

optimized_returns = optimized_returns_task(returns, weights)

optimized_returns

date,return,cumulative_return
date,f64,f64
2024-06-03,-0.025471,-0.025471
2024-06-04,0.29175,0.266205
2024-06-05,-0.15173,0.114071
2024-06-06,0.019062,0.133154
2024-06-07,-0.175722,-0.042802
…,…,…
2024-12-24,-0.014614,3.652702
2024-12-26,0.095119,3.751296
2024-12-27,-0.035729,3.714227
2024-12-30,-0.180111,3.527426


In [None]:
# TODO: Chart the baseline and optimized cumulative returns

In [None]:
# TODO: Compute the total_return, average_return, volatility, and sharpe for the baseline and optimized strategies

Comment here on the results that you have found.

## `sf-quant` Optimizer Module

What if I told you that you don't have to write out the optimization code by hand again? Wouldn't that be great? Lucky for you we've pre-built a module in the `sf-quant` package that provides abstracted functionality for portfolio optimization.

### Instructions

- Create a `list` of constraints including `FullInvestment`, `LongOnly`, and `NoBuyingOnMargin`.
- Get the weights from the `sf_quant.optimizer.mve_optimizer` function using your previously computed returns forecast and covariance matrix.

Note: you should get really similar weights to your previous results.

In [None]:
def task_sf_optimizer_weights(tickers: list[str], er_vector: np.ndarray, cov_mat: np.ndarray) -> pl.DataFrame:
    """
    Compute the optimal weights of the portfolio given the expected returns forecast and the covariance matrix forecast.
    Make sure to use the sf_quant package.

    Args:
        ticker (list[str]): a list of tickers sorted alphabetically
        er_vector (np.array): a numpy array (1x4) with an entry for the average return of each ticker (sorted alphabetically by ticker)
        cov_mat (np.array): a numpy array (4x4) containing the covariances of the assets (columns and rows are sorted alphabetically by ticker)

    Returns:
        pl.DataFrame: a data frame with columns ticker and weight
    """
    
    # TODO: Finish function
    pass

weights_sf = task_sf_optimizer_weights(tickers, er_vector, cov_mat)

weights_sf

## Barra Covariance Matrix

Now let's improve our portfolio by using a better forecast for the covariance matrix. We are very fortunate to have access to the MSCI Barra covariance matrix forecasts. In this task you will load the Barra covariance matrix for the last day of your in sample period and compute the new optimal weights.

### Instructions

- Use the `sf_quant.data.construct_covariance_matrix` function to pull the covariance matrix for the last market date of the in sample period.
- Compute the optimal weights of your portfolio using this new covariance matrix.
- Hint: since the Barra covariance matrix is annualized, you will need to multiply your returns forecast by 252.

In [None]:
def task_barra_optimizer_weights(returns: pl.DataFrame) -> pl.DataFrame:
    """ 
    Compute the optimal weights of the portfolio given the expected returns forecast and the barra covariance matrix forecast.
    Make sure to use the sf_quant package.

    Args:
        returns (pl.DataFrame): a data frame containing date, ticker, and return columns

    Returns:
        pl.DataFrame: a data frame with columns barrid and weight
    """
    # TODO: Finish function
    pass

weights_barra = task_barra_optimizer_weights(returns)

weights_barra

## Performance Analysis

Now that we've got the Barra optimal weights, let's see how they perform compared to our out of sample baseline.

### Instructions
1. Compute the optimized barra returns for the `out_of_sample` period by computing the weighted return of the portfolio for each period using our optimal weights.
2. Plot the cumulative returns of each strategy in one chart.
3. Compute the total return, average daily return (annualized), volatility (annualized), and sharpe ratio (annualized) for both strategies.
4. Go back and try a handful of different `gamma` values in your optimizer and see how your performance results change.
5. Write a few sentences commenting on the results.

In [None]:
def optimized_barra_returns_task(returns: pl.DataFrame, weights_barra: pl.DataFrame) -> pl.DataFrame:
    """
    Compute the average daily return of each asset for the out_of_sample period.

    Args:
        returns (pl.DataFrame): a data frame containing date, ticker, and return columns
        weights_barra (pl.DataFrame): a data frame with columns ticker and weight

    Returns:
        pl.DataFrame: a data frame containing date, return, and cumulative_return columns
    """
    # TODO: Finish function
    pass

optimized_returns_barra = optimized_barra_returns_task(returns, weights_barra)

optimized_returns_barra

In [None]:
# TODO: Chart the baseline and optimized cumulative returns

In [None]:
# TODO: Compute the total_return, average_return, volatility, and sharpe for the baseline and optimized strategies

Comment here on the results that you have found.