# Midterm

## FINM 25000 - 2024

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

# Instructions

## Please note the following:

Points
* The exam is `115` points.
* You have `180` minutes to complete the exam.
* For every minute late you submit the exam, you will lose one point.

Submission
* You will upload your solution to the `Midterm` assignment on Canvas. (Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers,) and it should **include all code used in your analysis in a file format that the code can be executed.** 

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo--you are welcome to use the code posted there without citation.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_data.xlsx`

This file has sheets for...
* `info` - names of each stock ticker
* `excess returns` - weekly excess returns on several stocks
* `SPY` - weekly excess returns on SPY
* `forecasting` - monthly data on `USO` asset returns and two forecasting signals.

Note 
* the data for `excess returns` and `SPY` is **weekly** so any annualizations should use `52` weeks in a year.
* the data for `forecasting` is monthly, so any annualization should use `12` months in a year.

#### If useful
here is code to load in the data.

In [30]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt



In [37]:
FILEIN = 'midterm_data.xlsx'
sheet_exrets = 'excess returns'
sheet_spy = 'spy'

retsx = pd.read_excel(FILEIN, sheet_name=sheet_exrets).set_index('date')
spy = pd.read_excel(FILEIN, sheet_name=sheet_spy).set_index('date')
forecasting = pd.read_excel(FILEIN, sheet_name='forecasting').set_index('date')

In [3]:
display(retsx)

Unnamed: 0_level_0,AAPL,MSFT,AMZN,NVDA,GOOGL,TSLA,XOM
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2016-01-15,-0.006077,-0.033445,-0.068591,-0.092883,-0.035777,-0.036346,0.030855
2016-01-22,0.045278,0.026605,0.047061,0.050539,0.050316,-0.010817,-0.011908
2016-01-29,-0.051720,0.042053,-0.027223,0.018024,0.009834,-0.067483,0.005221
2016-02-05,-0.036236,-0.096825,-0.151901,-0.104974,-0.082989,-0.156939,0.021309
2016-02-12,-0.007887,-0.000785,0.002275,-0.034052,-0.003101,-0.078688,0.013486
...,...,...,...,...,...,...,...
2023-06-16,0.022605,0.048275,0.017411,0.101882,0.011357,0.066761,-0.020323
2023-06-23,0.007143,-0.023729,0.028225,-0.013688,-0.012008,-0.017497,-0.028343
2023-06-30,0.044200,0.021626,0.013113,0.007353,-0.016430,0.025297,0.052513
2023-07-07,0.001491,0.008703,0.014003,0.023204,0.016614,0.066815,-0.019683


In [4]:
display(spy)

Unnamed: 0_level_0,SPY
date,Unnamed: 1_level_1
2016-01-15,-0.021430
2016-01-22,0.014429
2016-01-29,0.016801
2016-02-05,-0.029789
2016-02-12,-0.007023
...,...
2023-06-16,0.026036
2023-06-23,-0.014222
2023-06-30,0.023245
2023-07-07,-0.010670


In [38]:
display(forecasting)

Unnamed: 0_level_0,USO,Tnote rate,Tnote rate change
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2009-05-31,0.27,3.46,0.34
2009-06-30,0.04,3.52,0.06
2009-07-31,-0.03,3.50,-0.02
2009-08-31,-0.02,3.40,-0.10
2009-09-30,0.00,3.31,-0.09
...,...,...,...
2023-07-31,0.15,3.96,0.14
2023-08-31,0.03,4.09,0.13
2023-09-30,0.08,4.57,0.48
2023-10-31,-0.07,4.88,0.30


## Scoring

| Problem | Points |
|---------|--------|
| 1       | 50     |
| 2       | 25     |
| 3       | 20     |
| 4       | 20     |

### Each numbered question is worth 5 points.

### Notation
(Hidden LaTeX commands)

$$\newcommand{\mux}{\tilde{\boldsymbol{\mu}}}$$
$$\newcommand{\wtan}{\boldsymbol{\text{w}}^{\text{tan}}}$$
$$\newcommand{\wtarg}{\boldsymbol{\text{w}}^{\text{port}}}$$
$$\newcommand{\mutarg}{\tilde{\boldsymbol{\mu}}^{\text{port}}}$$
$$\newcommand{\wEW}{\boldsymbol{\text{w}}^{\text{EW}}}$$
$$\newcommand{\wRP}{\boldsymbol{\text{w}}^{\text{RP}}}$$
$$\newcommand{\wREG}{\boldsymbol{\text{w}}^{\text{REG}}}$$
$$\newcommand{\targ}{\text{USO}}$$

# 1. Short Answer

### No Data Needed

These problem does not require any data file. Rather, analyze the situation conceptually, based on the information below. 

### 1

In what sense was ProShares `HDG` successful in hedging the `HFRI`, and in what sense was it unsuccessful in tracking the `HFRI`?

ProShares HDG was successful in hedging the HFRI in the sense that it was very good at mimicking hedge funds and allowing investors and opportunity to diversify the hedge funds do. HDG was not successfull in in tracking the HFRI in the sense that hedge fund replication is a new field, and it is not clear whether delivering hedge fund beta exposure accomplishes anything, because the point of a hedge fund is to deliver alpha. Additionally, the number of factors used by HDG is debateble and may not fully encompass the factors considered by hedge funds.

### 2

Did we find that **TIPS** have been useful in expanding the mean-variance frontier in the past? Did we conclude they might be useful in the future? Explain.

TIPS have historically been useful in expanding the mean-variance frontier in the past. This is because TIPS had low correlation with many other assets which makes it an attractive asset to include in a portfolio to hedge against other assets. They may be in the future as well because they are still an asset class that has low correlation with other assets like SPY. 

### 3.

Consider a Linear Factor Pricing Model (LFPM).

Which metric do we examine to understand its fit, (or errors)?

An LFPM shows returns of an assets based on its exposure to different risk factors. The metric used to evaluate fit is R squared which measures the amount of variability that is accounted for by the model.

### 4.

What aspect of the classic mean-variance optimization approach leads to extreme answers? How did regularization help with this issue?

Mean-variance optimization can lead to extreme answers when all factors or highly correlated. The optimization also depends highly on the expected returns value which may not always be accurate. Small changes in expected returns can propogate into large changes in the mean variance optimization weights which is not ideal. Regularization helps with this issue be lessening the impact that a small difference in expected returns can have.

### 5.

Suppose investors are **not** mean-variance investors. If we find an investment with a Sharpe ratio higher than the "market", would this would be inconsistent with the CAPM?

This would be inconsistent with the CAPM because according to CAPM the market is supposed to be the efficient frontier but an investment with a higher Sharpe ratio than the market is a violation of this.

### 6.

Which is more useful in assessing the model’s fit for pricing: the r-squared of the time-series regressions, the r-squared of the cross-sectional regression, or neither?

The r-squared of the cross-sectional regression will generally be more useful because it gives the variation in average returns that is explained by the variation in betas. This is useful in understanding returns across all assets rather than the time-series of just one asset.

### 7.

GMO stated that they had a “contrarian” investment style. What did they mean by this? Was this seen in our investigation of the fund, GMWAX?

They stated they had contrarian investment style because they used a long-term value oriented investment style where they made 7 year forecasts of assets. This was seen in our investigation because they acheived much better sharpe ratio compared to the market (higher returns and lower volatility)

### 8.

How does Harvard make their portfolio allocation more realistic than a basic mean- variance optimization would imply? Is their approach easily implemented and computed from a numerical standpoint?

Harvard's analysis incorporates portfolio constraints that investors might face, such as minimum and maximum positions. They take these constraints into account in their calculation to produce a more realistic efficient frontier. Their approach is easily implemented from a numerical standpoint because it is still a linear optimization problem.

### 9.

If we want to hedge a portfolio's returns with respect to SPY, how could we calculate the optimal ratio? How would this ratio then be used to build the hedged position?

We would caculate the optimal ratio by collecting historical data and running a regression to calculate the optimal hedge ration. To build the hedged position we would take a short position that is the hedge ration times the value of the portfolio. 

### 10.

Name one way in which Fama and French construct the factors that helps reduce cross- factor correlation.

Fama and French reduce the cross-factor correlation by choosing factors that are orthogonal ie very unrelated. By choosing factors that are very unrelated, Fama and French reduce the cross factor correlation.

***

# 2. Allocation


Consider a mean-variance optimization of **excess** returns provided in `midterm_data.xlsx.`

### 1. 

Report the following **annualized** statistics:
* mean
* volatility
* Sharpe ratio

In [5]:
def summary_stats(rets, adj_factor=12):
    """
    Given a dataframe of returns, this function returns a dataframe with
    a summary of performance statistics for individual securities.
    """
    stats = {}
    
    stats['Annualized Mean'] = rets.mean() * adj_factor
    stats['Annualized Volatility'] = rets.std() * np.sqrt(adj_factor)
    stats['Annualized Sharpe Ratio'] = (stats['Annualized Mean'] / stats['Annualized Volatility'])
    
    return pd.DataFrame(stats, index=rets.columns)

stats = summary_stats(retsx).sort_values("Annualized Sharpe Ratio", ascending=False)

display(stats)

Unnamed: 0,Annualized Mean,Annualized Volatility,Annualized Sharpe Ratio
NVDA,0.150152,0.224866,0.667739
MSFT,0.066482,0.115391,0.576142
AAPL,0.073712,0.136373,0.54052
TSLA,0.131476,0.291606,0.450868
AMZN,0.055259,0.149106,0.370604
GOOGL,0.044614,0.13173,0.338681
XOM,0.028661,0.149694,0.191461


### 2.

Report the weights of the tangency portfolio.

In [6]:
def tan_portfolio(mean_rets, cov_matrix):
    """
    Given a vector of mean returns and a covariance matrix of returns, this function returns
    a vector of tangency portfolio weights.
    """
    inv_cov = np.linalg.inv(cov_matrix)
    ones = np.ones(mean_rets.shape)
    return (inv_cov @ mean_rets) / (ones.T @ inv_cov @ mean_rets)


def gmv_portfolio(cov_matrix):
    """
    Given a covariance matrix of returns, this function returns a vector of GMV portfolio weights.
    """
    try:
        cov_inv = np.linalg.inv(cov_matrix)
    except TypeError:
        cov_inv = np.linalg.inv(np.array(cov_matrix))

    one_vector = np.ones(len(cov_matrix.index))
    return cov_inv @ one_vector / (one_vector @ cov_inv @ (one_vector))


def mv_portfolio(mean_rets, cov_matrix, target=None):
    """
    Given a vector of mean returns and a covariance matrix of returns, this function returns
    a vector of MV portfolio weights.
    """
    w_tan = tan_portfolio(mean_rets, cov_matrix)

    if target is None:
        return w_tan

    w_gmv = gmv_portfolio(cov_matrix)
    delta = (target - mean_rets @ w_gmv) / (mean_rets @ w_tan - mean_rets @ w_gmv)
    return delta * w_tan + (1 - delta) * w_gmv

pd.options.display.float_format = '{:.2f}'.format

w_tan = mv_portfolio(retsx.mean(), retsx.cov())
w_tan_df = pd.DataFrame(w_tan, index=retsx.columns, columns=['Tangency Portfolio'])
display(w_tan_df.sort_values(by='Tangency Portfolio', ascending=False))

Unnamed: 0,Tangency Portfolio
MSFT,0.79
NVDA,0.5
AAPL,0.32
TSLA,0.11
XOM,0.02
AMZN,-0.23
GOOGL,-0.5


### 3.
Report the Sharpe ratio achieved by the tangency portfolio over this sample. Annualize it (accounting for weekly data.)

In [7]:
w_tan_rets = retsx @ w_tan_df
tan_summ = summary_stats(w_tan_rets)
display(tan_summ)

Unnamed: 0,Annualized Mean,Annualized Volatility,Annualized Sharpe Ratio
Tangency Portfolio,0.13,0.17,0.76


### 4.

* What weight is given to the asset with the lowest Sharpe ratio?
* What Sharpe ratio does the lowest (most negative) weight asset have?

Explain why the weights are not most extreme for the assets with the largest/smallest Share Ratios.

a.) XOM had the lowest sharpe ratio of 0.19 ,and is given a weight of 0.02
b.) The most negative weight asset is GOOG which has a sharpe ratio of 0.33

The weights are not most extreme for assets with the largest and smallest sharpe ratios because the optimzation will not simply maximize assets with higher sharpe ratio. It will also hedge assets against each other and make sure the volatility is not too high.  

### 5.

To target a mean return of `0.001` weekly, would you be invested in the risk-free rate or borrowing from the risk-free rate?

No asset is completely risk free other than straight cash which does not achieve 0.001 weekly. Therefore, we would need to borrow from the risk free rate and invest in assets which will acheive 0.001 weekly returns. 0.001 is not very high however, so we can probably invest in bonds or other low risk assets with out taking a risk on higher risk options.

***

# 3. Performance

### 1. 

Report the following performance metrics of excess returns for Tesla (`TSLA`).
* skewness
* kurtosis

You are not annualizing any of these stats.

What do these metrics indicate about the nature of the returns?

In [26]:
def risk_stats(rets):
    """
    """
    stats = {}
    
    stats['Skewness'] = rets.skew()
    stats['Kurtosis'] = rets.kurtosis()
    
    
    return pd.DataFrame(stats, index=rets.columns)

stats = risk_stats(pd.DataFrame(retsx['TSLA']))
display(stats)

Unnamed: 0,Skewness,Kurtosis
TSLA,0.44,1.53


### 2. 

Report the maximum drawdown for `TSLA` over the sample.
* Ignore that your data is in excess returns rather than total returns.
* Simply proceed with the excess return data for this calculation.

In [28]:
def drawdown(rets):
    """
    """
    stats = {}
    

    stats["Min"] = rets.min()
    stats["Max"] = rets.max()
    cum_prod = (1+ rets).cumprod()
    cum_max = cum_prod.cummax()
    drawdowns = (cum_prod - cum_max) / cum_max
    stats["Peak"] = cum_prod.idxmax()
    stats["Bottom"] = drawdowns.idxmin()

    cum_prod = (1+rets).cumprod()
    #display(cum_prod)
    cum_max = cum_prod.cummax()
    #display(cum_max)
    drawdowns = (cum_prod - cum_max)
    #display(drawdowns)
    drawdowns.idxmin()
    recovery_date = []
    for col in cum_prod.columns:
        prev_max = cum_max[col][: drawdowns[col].idxmin()].max()
        recovery_wealth = pd.DataFrame([cum_prod[col][drawdowns[col].idxmin() :]]).T
        recovery_date.append(
            recovery_wealth[recovery_wealth[col] >= prev_max].index.min()
        )
    stats["Recovery"] = ["-" if pd.isnull(i) else i for i in recovery_date]

    stats["Duration (days)"] = [
        (i - j).days if i != "-" else "-"
        for i, j in zip(stats["Recovery"], stats["Bottom"])
    ]
    
    
    return pd.DataFrame(stats, index=rets.columns)

stats = drawdown(pd.DataFrame(retsx['TSLA']))

display(stats)
print("Max Drawdown is: ")
print(stats['Max'] - stats['Min'])

Unnamed: 0,Min,Max,Peak,Bottom,Recovery,Duration (days)
TSLA,-0.28,0.33,2021-11-05,2023-01-06,-,-


Max Drawdown is: 
TSLA   0.62
dtype: float64


### 3.

For `TSLA`, calculate the following metrics, relative to `SPY`:
* market beta
* alpha
* information ratio

Annualize alpha and information ratio.

*Recall that this is weekly data, with 52 weeks per year.*

In [31]:
def univariate_regression(y, X, intercept = True, adj = 12):
    if intercept:
        X = sm.add_constant(X)
        
    model = sm.OLS(y, X, missing = 'drop')
    results = model.fit()
    intercept = results.params[0] if intercept else 0
    beta = results.params[1] if intercept else results.params[0]

    summary = {}
    summary['Market Beta'] = beta
    summary['Treynor Ratio'] = (y.mean() / beta) * adj
    summary['Information Ratio'] = (intercept / results.resid.std()) * np.sqrt(adj)

    return pd.DataFrame(summary, index = [y.name])

def iterative_regression(y, X, intercept = True, one_to_many = False, adj = 12):
    if one_to_many:
        summary = pd.concat([univariate_regression(y[col], X, intercept, adj) for col in y.columns], axis = 0)
        summary.index = y.columns
        return summary
    else:
        summary = pd.concat([univariate_regression(y, X[col], intercept, adj) for col in X.columns], axis = 0)
        summary.index = X.columns
        return summary


regression_metrics_hfs = iterative_regression(pd.DataFrame(retsx['TSLA']), spy, one_to_many = True, adj = 12)
display(regression_metrics_hfs)

Unnamed: 0,Market Beta,Treynor Ratio,Information Ratio
TSLA,1.78,0.07,0.29


### 4.

Comment on what you conclude about `TSLA` based on the statistics calculated in the previous question.

Market beta of 1.78 indicates that 'TSLA' is more volatile than the rest of the SPY. This is because market beta measures the volatility compared relative to SPY. The Treynor ratio is low but positive (0.07). This indicates that 'TSLA' has generated a return that exceeds what would be expected given its market risk but only slightly and did not deliver exceptional returns for its risk. Information ratio measures performance of a stock of fund above the benchmark. In this case a information ration of 0.29 indicates that 'TSLA' performed 29% better than SPY which is a positive result for 'TSLA'

***

# 4. Forecasting

Forecast (total) returns on oil as tracked by the ETF ticker, `USO`. 

As signals, use two interest rate signals, as seen in Treasury-notes. (No need to consider anything specific about Treasury notes, just read these as macroeconomic signals.)
* T-note rate
* month-over-month change in the T-note rate

Find the all data needed for this problem in the sheet `forecasting`.

Note that the data in this sheet is monthly, not weekly.

### 1.

Estimate a forecasting regression of $\targ$ on the two (lagged) signals.

$$r_{t+1}^{\targ} = \alpha + \beta^{x}x_t + \beta^z z_t + \epsilon_{t+1}$$

where
* $x$ denotes the interest-rate signal.
* $z$ denotes the change in rate signal.

Report the r-squared, as well as the OLS estimates for the intercept and the two betas. (No need to annualize the stats.)

In [43]:



# Multi-variate regression.\
signals = pd.concat([pd.DataFrame(forecasting['Tnote rate']), pd.DataFrame(forecasting['Tnote rate change'])], axis=1)
multi_regr = sm.OLS(forecasting, sm.add_constant(signals)).fit()
forecasting['Multi'] = multi_regr.params[0] + multi_regr.params[1] * forecasting['Tnote rate'] + multi_regr.params[2] * forecasting['Tnote rate change']

print('-'*50)
print('Multivariate Regression Summary')
display(multi_regr.params)
print(f'R^2: {multi_regr.rsquared:.4f}')

--------------------------------------------------
Multivariate Regression Summary


Unnamed: 0,0,1,2,3
const,-0.0,-0.0,-0.0,
Tnote rate,0.0,1.0,-0.0,
Tnote rate change,0.16,0.0,1.0,


ValueError: shapes (175,4) and (175,4) not aligned: 4 (dim 1) != 175 (dim 0)

### 2.

Use your forecasted returns, $\hat{r}^{\targ}_{t+1}$ to build trading weights:

$$w_t = 0.50 + 50\;\hat{r}^{\targ}_{t+1}$$

(So the rule says to hold 50% in the ETF plus/minus 50x the forecast. Recall the forecast is a monthly percentage, so it is a small number.)

Calculate the return from implementing this strategy. Denote this as $r^x_t$.

Report the first and last 5 values.

### 3.

Calculate the following (annualized) performance metrics for both the passive investment, $r^\targ$, as well as the strategy implemented in the previous problem, $r^x$.

* mean
* volatility
* max drawdown

**Remember to annualize mean and volatility for monthly data.** (No need to annualize max drawdown.)

In [None]:
# this function gives mean, volatility, max drawdown
def performance_metrics(rets, adj_factor=12):
    """
    """
    stats = {}
    
    stats['Annualized Mean'] = rets.mean() * adj_factor
    stats['Annualized Volatility'] = rets.std() * np.sqrt(adj_factor)

    stats["Min"] = rets.min()
    stats["Max"] = rets.max()
    cum_prod = (1+ rets).cumprod()
    cum_max = cum_prod.cummax()
    drawdowns = (cum_prod - cum_max) / cum_max
    stats["Peak"] = cum_prod.idxmax()
    stats["Bottom"] = drawdowns.idxmin()

    cum_prod = (1+hf_series).cumprod()
    #display(cum_prod)
    cum_max = cum_prod.cummax()
    #display(cum_max)
    drawdowns = (cum_prod - cum_max)
    #display(drawdowns)
    drawdowns.idxmin()
    recovery_date = []
    for col in cum_prod.columns:
        prev_max = cum_max[col][: drawdowns[col].idxmin()].max()
        recovery_wealth = pd.DataFrame([cum_prod[col][drawdowns[col].idxmin() :]]).T
        recovery_date.append(
            recovery_wealth[recovery_wealth[col] >= prev_max].index.min()
        )
    stats["Recovery"] = ["-" if pd.isnull(i) else i for i in recovery_date]

    stats["Duration (days)"] = [
        (i - j).days if i != "-" else "-"
        for i, j in zip(stats["Recovery"], stats["Bottom"])
    ]
    
    
    return pd.DataFrame(stats, index=rets.columns)

### 4.

Comment on whether the active strategy (using forecasting), $r^z$ is an improvement on the passive strategy of just holding $\targ$ directly.

Need to compare the mean, volatility, and max drawdown of passive and r^z.