# Homework 2

## FINM 25000 - 2025

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

## HBS Case

### *ProShares Hedge Replication ETF*

***

# 1. The ProShares ETF Product

**Section 1 is not graded**, and you do not need to submit your answers. But you are encouraged to think about them, and we will discuss them.

## 1. Alternative ETFs

Describe the two types of investments referenced by this term.

Answer:

The two types of investments referenced by "Alternative ETFs" are (1) alternative asset classes that are not "mainstream" such real estate, commodities previous metal, currencies and volatility, and (2) alternative strategies that are unconventional investment processes such as geared investing, L/S strats, market neutral, absolute return, convertable/merger arb., managed futures, global macro.

## 2. Hedge Funds.


* a. Using just the information in the case, what are two measures by which hedge funds are an attractive investment?

1. Attractive returns. From 1994 to 2013, hedge funds had 10% higher retursn than stock and more than twice as much as bonds. Additionally, hedge funds provided a "smoother ride" (i.e. a lower vol.).
2. Attractive risk-return characteristics in terms of sharpe ratio. Hedge funds has less than half the vol. of the S&P 500 and had annualized returns less than 1% lower than those respective companies. As a result, hedge fund allocation could provide a dominant efficient frontier and diversification benefits.

* b. What are the main benefits of investing in hedge funds via an ETF instead of directly?

There are many benefits for investing in hedge fund ETFs compared to direct investment.
- Democratization: more access for retail investors to hedge fund beta, rules-based investments strategy, lower fees (more money in the hands of investors)
- - Accessibility
- - Transparency
- - Lower fees
- - Liquidity
- - Diversification
- - Regulatory oversight
- - Tax reporting

## 3. The Benchmarks

* a. Explain as simply as possible how HFRI, MLFM, MLFM-ES, and HDG differ in their construction and purpose.

- - HFRI: a index available for hedge fund investors that was designed to reflect the collective performance of hedge funds through an equally weighted composite of over 2,000 constituent hedge funds that were available to accredited investors
- - MLFM: a statistical multi-factor model designed to track the performance of hedge funds (Factors: S&P 500, Russell 2000, MSCI EAFE, MSCI Emerging Markets, Eurodollar/US dollar exchange rate, three-month Eurodollar Deposit yields)
- - MLFM-ES: an adapted version MLFM but the six index components were tradable, the three-month Eurodollar deposit yields were replaced with US Treasury Bills and the dollar/euro exchange rates were replaced with ProShares UltraShort Euro (EUO)
- - HDG: Hedge Fund Replication ETF, a ProShare's liquid alternative strategy product that provided exposure to hedge funds at low fees, with full transparency and providing daily liquidity

* b. How well does the Merrill Lynch Factor Model (MLFM) track the HFRI?

- - The MLFM had a correlation coefficient of 90% with HFRI

* c. In which factor does the MLFM have the largest loading? (See a slide in Exhibit 1.)

- - The factor with the largest loading was the 3-month T-Bills 

* d. What are the main concerns you have for how the MLFM attempts to replicate the HFRI?

- - The factors were limited and didn't fully envelope potential sources of returns in HFRI
- - The regression used to determine the weights was backward-looking and lagged behind changes in hedge fund styles
- - Whether the model captures hedge fund alpha or beta. The purpose of investing is to earn "alpha" but if MLFM only captured beta to the factor than it might not replicate the skill of the hedge fund managers


## 4. The HDG Product

* a. What does ProShares ETF, HDG, attempt to track? Is the tracking error small?

- - HDG attempts to track the performance of MLFM --> MLFM-ES with a very high correlation (99.7%).

* b. HDG is, by construction, delivering beta for investors. Isn't the point of hedge funds to generate alpha? Then why would HDG be valuable?

- - Yes, the point of hedge funds is to generate alpha. However, HDG is valuable since it provides the benefits of hedge funds ETFs as mentioned earlier including: accessibility, transperancy, lower fees, liquidity, etc.

* c. The fees of a typical hedge-fund are 2% on total assets plus 20% of excess returns if positive. HDG's expense ratio is roughly 1% on total assets. What would their respective net Sharpe Ratios be, assuming both have a gross excess returns of 10% and volatility of 20%?


In [1]:
sharpe_ratio = ((1 - 0.02) * (1 - 0.2) * (1 - 0.01) * 0.1) / 0.2

print(f" The sharpe ratio would be: {sharpe_ratio:.2f}")

 The sharpe ratio would be: 0.39


***

# 2.  Analyzing the Data

Use the data found on Canvas, in <b>'proshares analysis data.xlsx'</b>. 

It has monthly data on financial indexes and ETFs from `Aug 2011` through `May 2025`.

In [None]:
import pandas as pd
import numpy as np
#import statsmodels.api as sm
import seaborn as sns
import matplotlib as plt

def calc_return_metrics(rets, adj_factor=12):
    """
    Calculate return metrics for a given dataset. Specifically:
    - Annualized Return
    - Annualized Volatility
    - Annualized Sharpe Ratio

    Args:
        rets : Returns time series.
        adj (int, optional): Annualization. Defaults to 12.

    Returns:
        DataFrame or dict: Summary of return metrics.
    """
    summary = {}
    summary['Annualized Mean'] = rets.mean() * adj_factor
    summary['Annualized Volatility'] = rets.std() * np.sqrt(adj_factor)
    summary['Annualized Sharpe Ratio'] = (
        summary['Annualized Mean'] / summary['Annualized Volatility']
        )
    return pd.DataFrame(summary, index=rets.columns)

def calc_risk_metrics(data, as_df=False, adj_factor=12):
    """
    Calculate risk metrics for a given dataset. Specifically:
    - Skewness
    - Kurt
    - VaR (0.05)
    - CVaR (0.05)
    - Max Drawdown
    - Bottom (of drawdown)
    - Recovery (of drawdown)

    Args:
        data : Returns time series.
        as_df (bool, optional): Return a df or a dictionary. Defaults to False.
        adj (int, optional): Annualization. Defaults to 12.

    Returns:
        DataFrame or Dictionary: Summary of risk metrics.
    """
    summary = dict() # an empty dictionary
    summary['Skewness'] = data.skew() # calculates skewness of the data (assymmetry of the PDF)
    summary['Excess Kurtosis'] = data.kurtosis() # calculates kurtosis ("tailedness" of the distribution)
    summary['VaR (0.05)'] = data.quantile(0.05, axis=0) # Value at Risk (max. expected loss over a given time horizon at 95% confidence level)
    summary['CVaR (0.05)'] = data[data <= data.quantile(0.05, axis=0)].mean() # Conditional VaR (expected value of returns given that the returns are already below the 5% VaR)
    summary['Min'] = data.min() # Min return observed in data
    summary['Max'] = data.max() # Max return observed in data

    # Cumulative returns on $1000
    wealth_index = 1000 * (1 + data).cumprod()
    # assumes initial investment of $100 and tracks how that investment would grow over time
    # (1+data) : converts return to growth factors
    # .cumprod() : cumulative product of the growth factors (compounding the growth)

    previous_peaks = wealth_index.cumprod()
    # calculates the previous peaks in the wealth index
    # iterates through the index and for each point stores the max value up to that point

    # Biggest difference between cumulative max and your current wealth
    drawdowns = (wealth_index - previous_peaks) / previous_peaks
    # drawdown: peak-to-trough decline
    # absolute difference between the current wealth and the highest point achieved before or at that time divided by the previous peaks
    # yields a percentage of the peak value

    summary['Max Drawdown'] = drawdowns.min()
    # find the largest and most negative drawdown (worst peak-to-trough decline)

    summary['Peak'] = previous_peaks.idxmax() # the date when he highest peak occurred
    summary['Bottom'] = drawdowns.idxmin() # data when the max drawdown reached its lowest point

    recovery_date = []

## 1. 

For the series in the "hedge fund series" tab, report the following summary statistics:
* mean
* volatility
* Sharpe ratio

Annualize these statistics.

In [None]:
hedge_fund_series_df = pd.read_excel(io = './proshares_analysis_data.xlsx',
                                     sheet_name='hedge_fund_series',
                                     index_col=0,
                                     parse_dates=[0])

hedge_fund_series_df.head(5)

In [None]:
metrics = calc_return_metrics(hedge_fund_series_df).sort_values(
    'Annualized Sharpe Ratio', ascending=False)
display(metrics)

## 2.

For the series in the "hedge fund series" tab, calculate the following statistics related to tail-risk.
* Skewness
* Excess Kurtosis (in excess of 3)
* VaR (.05) - the fifth quantile of historic returns
* CVaR (.05) - the mean of the returns at or below the fifth quantile
* Maximum drawdown - include the dates of the max/min/recovery within the max drawdown period.

There is no need to annualize any of these statistics.

## 3. 

For the series in the "hedge fund series" tab, run a regression of each against SPY (found in the "merrill factors" tab.) Include an intercept. Report the following regression-based statistics:
* Market Beta
* Treynor Ratio
* Information ratio

Annualize these three statistics as appropriate.

## 4. 

Discuss the previous statistics, and what they tell us about...

* the differences between SPY and the hedge-fund series?
* which performs better between HDG and QAI.
* whether HDG and the ML series capture the most notable properties of HFRI.

## 5. 

Report the correlation matrix for these assets.
* Show the correlations as a heat map.
* Which series have the highest and lowest correlations?

## 6.

Replicate HFRI with the six factors listed on the "merrill factors" tab. Include a constant, and run the unrestricted regression,

$\newcommand{\hfri}{\text{hfri}}$
$\newcommand{\merr}{\text{merr}}$

$$\begin{align}
r^{\hfri}_{t} &= \alpha^{\merr} + x_{t}^{\merr}\beta^{\merr} + \epsilon_{t}^{\merr}\\[5pt]
\hat{r}^{\hfri}_{t} &= \hat{\alpha}^{\merr} + x_{t}^{\merr}\hat{\beta}^{\merr}
\end{align}$$

Note that the second equation is just our notation for the fitted replication.

a. Report the intercept and betas.

b. Are the betas realistic position sizes, or do they require huge long-short positions?

c. Report the R-squared.

d. Report the volatility of $\epsilon^{\merr}$, the tracking error.

## 7.

Let's examine the replication out-of-sample (OOS).

Starting with $t = 61$ month of the sample, do the following:

* Use the previous 60 months of data to estimate the regression equation. 
This gives time-t estimates of the regression parameters, $\tilde{\alpha}^{\merr}_{t}$ and $\tilde{\beta}^{\merr}_{t}$.

* Use the estimated regression parameters, along with the time-t regressor values, $x^{\merr}_{t}$, calculate the time-t replication value that is, with respect to the regression estimate, built "out-of-sample" (OOS).

$$\hat{r}^{\hfri}_{t} \equiv \tilde{\alpha}^{\merr} + (x_{t}^{\merr})'\tilde{\beta}^{\merr}$$

* Step forward to $t = 62$, and now use $t = 2$ through $t = 61$ for the estimation. Re-run the steps above, and continue this process throughout the data series. Thus, we are running a rolling, 60-month regression for each point-in-time.

How well does the out-of-sample replication perform with respect to the target?

***

# 3.  Extensions
<i>**This section is not graded, and you do not need to submit it.** Still, we may discuss it in class some extensions in class.

For those looking for a challenge, try a few of these.

## 1. 

Merrill constrains the weights of each asset in its replication regression of HFRI. Try constraining your weights by re-doing 2.6.

* Use Non-Negative Least Squares (NNLS) instead of OLS.
* Go further by using a Generalized Linear Model to put separate interval constraints on each beta, rather than simply constraining them to be non-negative.

#### Hints
* Try using LinearRegression in scikit-learn with the parameter `positive=True`. 
* Try using GLM in statsmodels.

## 2. 

Let's decompose a few other targets to see if they behave as their name suggests.

* Regress HEFA on the same style factors used to decompose HFRI. Does HEFA appear to be a currency-hedged version of EFA?

* Decompose TRVCI with the same style factors used to decompose HFRI. The TRVCI Index tracks venture capital funds--in terms of our styles, what best describes venture capital?

* TAIL is an ETF that tracks SPY, but that also buys put options to protect against market downturns. Calculate the statistics in questions 2.1-2.3 for TAIL. Does it seem to behave as indicated by this description? That is, does it have high correlation to SPY while delivering lower tail risk?

## 3. 

The ProShares case introduces Levered ETFs. ProShares made much of its name originally through levered, or "geared" ETFs.

Explain conceptually why Levered ETFs may track their index well for a given day but diverge over time. How is this exacerbated in volatile periods like 2008?

## 4.

Analyze SPXU and UPRO relative to SPY.
- SPXU is ProShares -3x SPX ETF.
- UPRO is ProShres +3x SPX ETF.

Questions:
* Analyze them with the statistics from 2.1-2.3. 

* Do these two ETFs seem to live up to their names?

* Plot the cumulative returns of both these ETFs along with SPY.

* What do you conclude about levered ETFs?

## 5.

In `Section 2`, we estimated the replications using an intercept. Try the full-sample estimation, but this time without an intercept.

$$\begin{align}
r^{\hfri}_{t} &= \alpha^{merr} + x_{t}^{\merr}\beta^{\merr} + \epsilon_{t}^{\merr}\\[5pt]
\check{r}^{\hfri}_{t} &= \check{\alpha}^{\merr} + x_{t}^{\merr}\check{\beta}^{\merr}
\end{align}$$

Report

* the regression beta. How does it compare to the estimated beta with an intercept, $\hat{\beta}^{\merr}$?

* the mean of the fitted value, $\check{r}^{\hfri}_{t}$. How does it compare to the mean of the HFRI?

* the correlations of the fitted values, $\check{r}^{\hfri}_{t}$ to the HFRI. How does the correlation compare to that of the fitted values with an intercept, $\hat{r}^{\hfri}_{t}$

Do you think Merrill and ProShares fit their replicators with an intercept or not?

***