# Midterm 2

## FINM 36700 - 2024

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

# Instructions

## Please note the following:

Points
* The exam is 100 points.
* You have 120 minutes to complete the exam.
* For every minute late you submit the exam, you will lose one point.


Submission
* You will upload your solution to the `Midterm 2` assignment on Canvas, where you downloaded this. 
* Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers.)
* Your submission should **include all code used in your analysis in a file format that the code can be executed.** 

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo - you are welcome to use the code posted there without citation.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_2_data.xlsx`

This file contains the following sheets:
- for Section 2:
    * `sector stocks excess returns` - MONTHLY excess returns for 49 sector stocks
    * `factors excess returns` - MONTHLY excess returns of AQR factor model from Homework 5
- for Section 3:
    * `factors excess returns` - MONTHLY excess returns of AQR factor model from Homework 5

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 25     |
| 2       | 40     |
| 3       | 35     |

### Each numbered question is worth 5 points unless otherwise specified.

# 1. Short Answer

#### No Data Needed

These problems do not require any data file. Rather, analyze them conceptually. 

### 1.1.

Historically, which pricing factor among the ones we studied has shown a considerable decrease in importance?

Historically, within the Fama-French Five-Factor (FF5F) model, the Value factor (HML) has shown a considerable decrease in importance. The HML factor measures the return difference between stocks with high and low book-to-market ratios. Over recent decades, the value premium—the excess returns of value stocks over growth stocks—has significantly diminished and has even turned negative at times as noticed in the casestudy.

### 1.2.

True or False: For a given factor model and a set of test assets, the addition of one more factor to that model will surely decrease the cross-sectional MAE. 

True or False: For a given factor model and a set of test assets, the addition of one more factor to that model will surely decrease the time-series MAE. 

Along with stating T/F, explain your reasoning for the two statements.

### 1.3.

Consider the scenario in which you are helping two people with investments.

* The young person has a 50 year investment horizon.
* The elderly person has a 10 year investment horizon.
* Both individuals have the same portfolio holdings.

State who has the more certain cumulative return and explain your reasoning.

### 1.4.

Suppose we find that the 10-year bond yield works well as a new pricing factor, along with `MKT`.

Consider two ways of building this new factor.
1. Directly use the index of 10-year yields, `YLD`
1. Construct a Fama-French style portfolio of equities, `FFYLD`. (Rank all the stocks by their correlation to bond yield changes, and go long the highest ranked and shor tthe lowest ranked.)

Could you test the model with `YLD` and the model with `FFYLD` in the exact same ways? Explain.

### 1.5.

Suppose we implement a momentum strategy on cryptocurrencies rather than US stocks.

Conceptually speaking, but specific to the context of our course discussion, how would the risk profile differ from the momentum strategy of US equities?

***

# 2. Pricing and Tangency Portfolio

You work in a hedge fund that believes that the AQR 4-Factor Model (present in Homework 5) is the perfect pricing model for stocks.

$$
\mathbb{E} \left[ \tilde{r}^i \right] = \beta^{i,\text{MKT}} \mathbb{E} \left[ \tilde{f}_{\text{MKT}} \right] + \beta^{i,\text{HML}} \mathbb{E} \left[ \tilde{f}_{\text{HML}} \right] + \beta^{i,\text{RMW}} \mathbb{E} \left[ \tilde{f}_{\text{RMW}} \right] + \beta^{i,\text{UMD}} \mathbb{E} \left[ \tilde{f}_{\text{UMD}} \right]
$$

The factors are available in the sheet `factors excess returns`.

The hedge fund invests in sector-tracking ETFs available in the sheet `sectors excess returns`. You are to allocate into these sectors according to a mean-variance optimization with...

* regularization: elements outside the diagonal covariance matrix divided by 2.
* modeled risk premia: expected excess returns given by the factor model rather than just using the historic sample averages.

You are to train the portfolio and test out-of-sample. The timeframes should be:
* Training timeframe: Jan-2018 to Dec-2022.
* Testing timeframe: Jan-2023 to most recent observation.

In [7]:
import pandas as pd
import numpy as np
from typing import Union


FILEIN = '../data/midterm_2_data.xlsx'

sector_excess_returns = pd.read_excel(FILEIN, sheet_name='sector excess returns', index_col=0)
factors_excess_returns = pd.read_excel(FILEIN, sheet_name='factors excess returns', index_col=0)
training_data_sectors = sector_excess_returns.loc['2018-01':'2022-12']
training_data_factors = factors_excess_returns.loc['2018-01':'2022-12']

### 2.1.
(8pts)

Calculate the model-implied expected excess returns of every asset.

The time-series estimations should...
* NOT include an intercept. (You assume the model holds perfectly.)
* use data from the `training` timeframe.

With the time-series estimates, use the `training` timeframe's sample average of the factors as the factor premia. Together, this will give you the model-implied risk premia, which we label as
$$
\lambda_i := \mathbb{E}[\tilde{r}_i]
$$

* Store $\lambda_i$ and $\boldsymbol{\beta}^i$ for each asset.
* Print $\lambda_i$ for `Agric`, `Food`, `Soda`

In [8]:

factor_premia = training_data_factors.mean()
betas = pd.DataFrame(index=training_data_sectors.columns, columns=factor_premia.index)

for sector in training_data_sectors.columns:
    X = training_data_factors
    Y = training_data_sectors[sector]
    betas.loc[sector] = np.linalg.lstsq(X, Y, rcond=None)[0]

lambda_i = betas.dot(factor_premia)
lambda_agric = lambda_i['Agric']
lambda_food = lambda_i['Food ']
lambda_soda = lambda_i['Soda ']
print("Agri:", lambda_agric)
print("Food:", lambda_food)
print("Soda:", lambda_soda)


Agri: 0.003655102106916501
Food: 0.005454267868380435
Soda: 0.007336244651963082


In [9]:
def calc_tangency_weights(
    returns: pd.DataFrame,
    cov_mat: str = 1,
    return_graphic: bool = False,
    return_port_ret: bool = False,
    target_ret_rescale_weights: Union[None, float] = None,
    annual_factor: int = 12,
    name: str = 'Tangency'
):
    """
    Calculates tangency portfolio weights based on the covariance matrix of returns.

    Parameters:
    returns (pd.DataFrame): Time series of returns.
    cov_mat (str, default=1): Covariance matrix for calculating tangency weights.
    return_graphic (bool, default=False): If True, plots the tangency weights.
    return_port_ret (bool, default=False): If True, returns the portfolio returns.
    target_ret_rescale_weights (float or None, default=None): Target return for rescaling weights.
    annual_factor (int, default=12): Factor for annualizing returns.
    name (str, default='Tangency'): Name for labeling the weights and portfolio.

    Returns:
    pd.DataFrame or pd.Series: Tangency portfolio weights or portfolio returns if `return_port_ret` is True.
    """
    returns = returns.copy()
    
    if 'date' in returns.columns.str.lower():
        returns = returns.rename({'Date': 'date'}, axis=1)
        returns = returns.set_index('date')
    returns.index.name = 'date'

    if cov_mat == 1:
        cov_inv = np.linalg.inv((returns.cov() * annual_factor))
    else:
        cov = returns.cov()
        covmat_diag = np.diag(np.diag((cov)))
        covmat = cov_mat * cov + (1 - cov_mat) * covmat_diag
        cov_inv = np.linalg.pinv((covmat * annual_factor))  
        
    ones = np.ones(returns.columns.shape) 
    mu = returns.mean() * annual_factor
    scaling = 1 / (np.transpose(ones) @ cov_inv @ mu)
    tangent_return = scaling * (cov_inv @ mu)
    tangency_wts = pd.DataFrame(
        index=returns.columns,
        data=tangent_return,
        columns=[f'{name} Weights']
    )
    port_returns = returns @ tangency_wts.rename({f'{name} Weights': f'{name} Portfolio'}, axis=1)

    if return_graphic:
        tangency_wts.plot(kind='bar', title=f'{name} Weights')

    if isinstance(target_ret_rescale_weights, (float, int)):
        scaler = target_ret_rescale_weights / port_returns[f'{name} Portfolio'].mean()
        tangency_wts[[f'{name} Weights']] *= scaler
        port_returns *= scaler
        tangency_wts = tangency_wts.rename(
            {f'{name} Weights': f'{name} Weights Rescaled Target {target_ret_rescale_weights:.2%}'},
            axis=1
        )
        port_returns = port_returns.rename(
            {f'{name} Portfolio': f'{name} Portfolio Rescaled Target {target_ret_rescale_weights:.2%}'},
            axis=1
        )

    if cov_mat != 1:
        port_returns = port_returns.rename(columns=lambda c: c.replace('Tangency', f'Tangency Regularized {cov_mat:.2f}'))
        tangency_wts = tangency_wts.rename(columns=lambda c: c.replace('Tangency', f'Tangency Regularized {cov_mat:.2f}'))
        
    if return_port_ret:
        return port_returns
    return tangency_wts


### 2.2.

Use the expected excess returns derived from (2.1) with the **regularized** covariance matrix to calculate the weights of the tangency portfolio.

- Use the covariance matrix only for `training` timeframe.
- Calculate and store the vector of weights for all the assets.
- Return the weights of the tangency portfolio for `Agric`, `Food`, `Soda`.

$$
\textbf{w}_{t} = \dfrac{\tilde{\Sigma}^{-1} \bm{\lambda}}{\bm{1}' \tilde{\Sigma}^{-1} \bm{\lambda}}
$$

Where $\tilde{\Sigma}^{-1}$ is the regularized covariance-matrix.

In [11]:
tangency_weights = calc_tangency_weights(
    returns=training_data_sectors,
    cov_mat=0.5,
    return_port_ret=False,
    annual_factor=12,
    name='Tangency'
)

tangency_agric = tangency_weights.loc['Agric']
tangency_food = tangency_weights.loc['Food ']
tangency_soda = tangency_weights.loc['Soda ']

tangency_agric, tangency_food, tangency_soda

(Tangency Regularized 0.50 Weights    0.14409
 Name: Agric, dtype: float64,
 Tangency Regularized 0.50 Weights   -0.06981
 Name: Food , dtype: float64,
 Tangency Regularized 0.50 Weights    0.32268
 Name: Soda , dtype: float64)

### 2.3.

Evaluate the performance of this allocation in the `testing` period. Report the **annualized**
- mean
- vol
- Sharpe

In [15]:
def stats_mean_vol_sharpe(data,portfolio = None,portfolio_name = 'Portfolio',annualize = 12):
    
    if portfolio is None:
        returns = data
    else:
        returns = data @ portfolio
    
    output = returns.agg(['mean','std'])
    output.loc['sharpe'] = output.loc['mean'] / output.loc['std']
    
    output.loc['mean'] *= annualize
    output.loc['std'] *= np.sqrt(annualize)
    output.loc['sharpe'] *= np.sqrt(annualize)
    
    if portfolio is None:
        pass
    else:
        output.columns = [portfolio_name]
    
    return output

In [16]:
testing_data_sectors = sector_excess_returns.loc['2023-01':]

common_sectors = tangency_weights.index.intersection(testing_data_sectors.columns)

tangency_weights_aligned = tangency_weights.loc[common_sectors]

testing_data_sectors_aligned = testing_data_sectors[tangency_weights.index]

performance_stats = stats_mean_vol_sharpe(
    data=testing_data_sectors_aligned,
    portfolio=tangency_weights,
    portfolio_name='Tangency Portfolio',
    annualize=12
)

In [17]:
performance_stats

Unnamed: 0,Tangency Portfolio
mean,0.176801
std,0.15301
sharpe,1.155487


### 2.4.

(7pts)

Construct the same tangency portfolio as in `2.2` but with one change:
* replace the risk premia of the assets, (denoted $\lambda_i$) with the sample averages of the excess returns from the `training` set.

So instead of using $\lambda_i$ suggested by the factor model (as in `2.1-2.3`) you're using sample averages for $\lambda_i$.

- Return the weights of the tangency portfolio for `Agric`, `Food`, `Soda`.

Evaluate the performance of this allocation in the `testing` period. Report the **annualized**
- mean
- vol
- Sharpe

In [21]:
sample_lambda_i_adjusted = training_data_sectors.mean() * 12  


tangency_weights_with_sample_averages = calc_tangency_weights(
    returns=training_data_sectors,
    cov_mat=0.5,
    return_port_ret=False,
    annual_factor=12,
    name='Tangency Sample Averages'
)


common_sectors_sample_averages = tangency_weights_with_sample_averages.index.intersection(testing_data_sectors.columns)
testing_data_sectors_aligned_sample_averages = testing_data_sectors[common_sectors_sample_averages]
tangency_weights_with_sample_averages_aligned = tangency_weights_with_sample_averages.loc[common_sectors_sample_averages]


tangency_agric_sample_avg = tangency_weights_with_sample_averages_aligned.loc['Agric']
tangency_food_sample_avg = tangency_weights_with_sample_averages_aligned.loc['Food ']
tangency_soda_sample_avg = tangency_weights_with_sample_averages_aligned.loc['Soda ']


performance_with_sample_averages = stats_mean_vol_sharpe(
    data=testing_data_sectors_aligned_sample_averages,
    portfolio=tangency_weights_with_sample_averages_aligned,
    portfolio_name='Tangency Portfolio Sample Averages',
    annualize=12
)


tangency_agric_sample_avg, tangency_food_sample_avg, tangency_soda_sample_avg


(Tangency Regularized 0.50 Sample Averages Weights    0.14409
 Name: Agric, dtype: float64,
 Tangency Regularized 0.50 Sample Averages Weights   -0.06981
 Name: Food , dtype: float64,
 Tangency Regularized 0.50 Sample Averages Weights    0.32268
 Name: Soda , dtype: float64)

### 2.5.

Which allocation performed better in the `testing` period: the allocation based on premia from the factor model or from the sample averages?

Why might this be?

### 2.6.
Suppose you now want to build a tangency portfolio solely from the factors, without using the sector ETFs.

- Calculate the weights of the tangency portfolio using `training` data for the factors.
- Again, regularize the covariance matrix of factor returns by dividing off-diagonal elements by 2.

Report, in the `testing` period, the factor-based tangency stats **annualized**...
- mean
- vol
- Sharpe


In [41]:
# chatgpt prompt : fix my code:

# factor_cov_matrix = training_data_factors.cov()
# regularized_cov_matrix = factor_cov_matrix.copy()
# off_diagonal_mask = ~np.eye(factor_cov_matrix.shape[1], dtype=int)  
# regularized_cov_matrix[off_diagonal_mask] /= 2 

# factor_mean_returns = training_data_factors.median() * 10  

# tangency_weights_factors_only = calc_tangency_weights(
#     returns=training_data_factors,
#     cov_mat=factor_cov_matrix, 
#     return_port_ret=True,       
#     annual_factor=10,          
#     name='Tangency Factors'
# )

# testing_data_factors_aligned = testing_data_factors[~tangency_weights_factors_only.index]  # Added `~` to invert the index alignment
# performance_factors_only = stats_mean_vol_sharpe(
#     data=testing_data_factors_aligned,
#     portfolio=tangency_weights_factors_only,
#     portfolio_name='Tangency Portfolio', 
#     annualize=False  
# )

# performance_factors_only * 100 


In [29]:

factor_cov_matrix = training_data_factors.cov()
regularized_cov_matrix = factor_cov_matrix.copy()
off_diagonal_mask = ~np.eye(factor_cov_matrix.shape[0], dtype=bool)
regularized_cov_matrix.values[off_diagonal_mask] /= 2

factor_mean_returns = training_data_factors.mean() * 12  

tangency_weights_factors_only = calc_tangency_weights(
    returns=training_data_factors,
    cov_mat=0.5,                   
    return_port_ret=False,          
    annual_factor=12,
    name='Tangency Factors Only'
)


testing_data_factors_aligned = testing_data_factors[tangency_weights_factors_only.index]  # Align columns for testing period
performance_factors_only = stats_mean_vol_sharpe(
    data=testing_data_factors_aligned,
    portfolio=tangency_weights_factors_only,
    portfolio_name='Tangency Portfolio Factors Only',
    annualize=12
)

performance_factors_only

Unnamed: 0,Tangency Portfolio Factors Only
mean,0.062376
std,0.058191
sharpe,1.071918


### 2.7.

Based on the hedge fund's beliefs, would you prefer to use the ETF-based tangency or the factor-based tangency portfolio? Explain your reasoning. Note that you should answer based on broad principles and not on the particular estimation results.

***

# 3. Long-Run Returns

For this question, use only the sheet `factors excess returns`.

Suppose we want to measure the long run returns of various pricing factors.

### 3.1.

Turn the data into log returns.
- Display the first 5 rows of the data.

Using these log returns, report the **annualized**
* mean
* vol
* Sharpe

### 3.2.

Consider 15-year cumulative log excess returns. Following the assumptions and modeling of Lecture 6, report the following 15-year stats:
- mean
- vol
- Sharpe

How do they compare to the estimated stats (1-year horizon) in `3.1`? 

In [26]:
print(factors_excess_returns.head)

<bound method NDFrame.head of                MKT     HML     RMW     UMD
date                                      
1980-01-01  0.0551  0.0175 -0.0170  0.0755
1980-02-01 -0.0122  0.0061  0.0004  0.0788
1980-03-01 -0.1290 -0.0101  0.0146 -0.0955
1980-04-01  0.0397  0.0106 -0.0210 -0.0043
1980-05-01  0.0526  0.0038  0.0034 -0.0112
...            ...     ...     ...     ...
2024-04-01 -0.0467 -0.0052  0.0148 -0.0042
2024-05-01  0.0434 -0.0166  0.0298 -0.0002
2024-06-01  0.0277 -0.0331  0.0051  0.0090
2024-07-01  0.0124  0.0573  0.0022 -0.0242
2024-08-01  0.0161 -0.0112  0.0085  0.0478

[536 rows x 4 columns]>


In [27]:
df = factors_excess_returns
df['log_returns'] = np.log(1 + df['MKT'])

print(df['log_returns'].head())

trading_days = 252
annualized_mean = df['log_returns'].mean() * trading_days
annualized_vol = df['log_returns'].std() * np.sqrt(trading_days)
annualized_sharpe = annualized_mean / annualized_vol

print(f"Mean: {annualized_mean}")
print(f"Volatility: {annualized_vol}")
print(f"Sharpe Ratio: {annualized_sharpe}")

date
1980-01-01    0.053636
1980-02-01   -0.012275
1980-03-01   -0.138113
1980-04-01    0.038932
1980-05-01    0.051263
Name: log_returns, dtype: float64
Mean: 1.544518959824679
Volatility: 0.7279005281156687
Sharpe Ratio: 2.121881905791451


### 3.3.

What is the probability that momentum factor has a negative mean excess return over the next 
* single period?
* 15 years?

In [None]:
momentum_returns = df['UMD']

mean_return = momentum_returns.mean()
std_dev = momentum_returns.std()

single_period_prob = norm.cdf(0, loc=mean_return, scale=std_dev)

mean_15_year = mean_return * 15
std_dev_15_year = std_dev * np.sqrt(15)
multi_period_prob = norm.cdf(0, loc=mean_15_year, scale=std_dev_15_year)

print(f"single period probability of negative Return: {single_period_prob}")
print(f"15-year probability of negative mean Return: {multi_period_prob}")

### 3.4.

Recall from the case that momentum has been underperforming since 2009. 

Using data from 2009 to present, what is the probability that momentum *outperforms* the market factor over the next
* period?
* 15 years?

In [39]:
from scipy.stats import norm
import numpy as np

spread = df['MKT'] - df['UMD'] 


mu, sigma = spread.mean() * 12, spread.std() * np.sqrt(12)


single_period_prob = 1 - norm.cdf(0, mu, sigma)

multi_period_prob = 1 - norm.cdf(0, mu, sigma / np.sqrt(15))

print("probability of outperformance over a single period:", single_period_prob.round(2))
print("probability of outperformance over 15 years:", multi_period_prob.round(2))


probability of outperformance over a single period: 0.54
probability of outperformance over 15 years: 0.65


### 3.5.
Conceptually, why is there such a discrepancy between this probability for 1 period vs. 15 years?

What assumption about the log-returns are we making when we use this technique to estimate underperformance?

### 3.6.

Using your previous answers, explain what is meant by time diversification.

### 3.7.

Is the probability that `HML` and `UMD` both have negative cumulative returns over the next year higher or lower than the probability that `HML` and `MKT` both have negative cumulative returns over the next year?

Answer conceptually, but specifically. (No need to calculate the specific probabilities.)

***