# Midterm

## FINM 25000 - 2025

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

***

# Instructions

## Please note the following:

Submission
* You will upload your solution to the `Midterm` assignment on Canvas, where you downloaded this.
* Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers,)
* and it should **include all code used in your analysis in a file format that the code can be executed.**

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo--you are welcome to use the code posted there without citation.
* If you prompt AI for help, you must cite the AI and the prompt. If you use AI embedded coding tools, cite that you used such tools.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 45     |
| 2       | 40     |
| 3       | 35     |


Each numbered question is worth `5 points` unless otherwise specified.

For every minute late you submit the exam, you will lose `1 point`.

***

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_1_data.xlsx`

This file has sheets for...
* `excess returns` - excess returns of some of the biggest companies in the S&P, along with the SPY ETF.

Note the data is **weekly**, so annualization should use a factor of `52`.

In [139]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression

In [140]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [141]:
info = pd.read_excel('/content/drive/My Drive/midterm_data.xlsx', sheet_name='info', index_col=0,parse_dates=[0])
info.set_index('shortName',inplace=True)

  info = pd.read_excel('/content/drive/My Drive/midterm_data.xlsx', sheet_name='info', index_col=0,parse_dates=[0])
  info = pd.read_excel('/content/drive/My Drive/midterm_data.xlsx', sheet_name='info', index_col=0,parse_dates=[0])


In [142]:
rets = pd.read_excel('/content/drive/My Drive/midterm_data.xlsx', sheet_name='excess returns', index_col=0,parse_dates=[0])
#rets.set_index('date',inplace=True)
#rets.columns = [s.split(' ')[0] for s in rets.columns]
#rets = rets[info.index]

In [143]:
info

Unnamed: 0_level_0,quoteType,currency,volume,totalAssets,longBusinessSummary
shortName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
SPDR S&P 500,ETF,USD,63670226,603517000000.0,The trust seeks to achieve its investment obje...
Apple Inc.,EQUITY,USD,39765812,,"Apple Inc. designs, manufactures, and markets ..."
NVIDIA Corporation,EQUITY,USD,193633263,,"NVIDIA Corporation, a computing infrastructure..."
Microsoft Corporation,EQUITY,USD,16459512,,Microsoft Corporation develops and supports so...
Alphabet Inc.,EQUITY,USD,34282922,,Alphabet Inc. offers various products and plat...
"Amazon.com, Inc.",EQUITY,USD,50518307,,"Amazon.com, Inc. engages in the retail sale of..."
"Meta Platforms, Inc.",EQUITY,USD,10873880,,"Meta Platforms, Inc. engages in the developmen..."
"Tesla, Inc.",EQUITY,USD,79236442,,"Tesla, Inc. designs, develops, manufactures, l..."
Broadcom Inc.,EQUITY,USD,14274674,,"Broadcom Inc. designs, develops, and supplies ..."
Berkshire Hathaway Inc. New,EQUITY,USD,4416578,,"Berkshire Hathaway Inc., through its subsidiar..."


In [144]:
rets

Unnamed: 0_level_0,AAPL,AMZN,AVGO,BRK-B,GOOGL,LLY,META,MSFT,NVDA,SPY,TSLA
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-01-09,0.024332,-0.037748,0.047875,0.001830,-0.054624,-0.002037,-0.009232,0.009014,-0.009620,-0.005925,-0.057862
2015-01-16,-0.053927,-0.021028,-0.010477,-0.001921,0.019271,0.010544,-0.033112,-0.020313,0.000822,-0.013009,-0.065942
2015-01-23,0.065950,0.074465,0.030437,-0.000603,0.061689,0.020515,0.035249,0.020329,0.037575,0.016565,0.042575
2015-01-30,0.037088,0.134986,-0.038234,-0.034848,-0.008028,-0.001712,-0.024578,-0.143614,-0.072821,-0.026840,0.011567
2015-02-06,0.015017,0.055617,0.017989,0.043478,-0.006918,-0.022868,-0.019061,0.049662,0.062409,0.030493,0.067493
...,...,...,...,...,...,...,...,...,...,...,...
2025-05-30,0.028032,0.019457,0.057825,0.000450,0.018866,0.033027,0.032037,0.022070,0.028705,0.017208,0.020438
2025-06-06,0.018092,0.044561,0.022884,-0.017889,0.014103,0.046472,0.080368,0.024573,0.051575,0.019350,-0.145320
2025-06-13,-0.037631,-0.007882,0.006169,-0.013136,0.004701,0.063271,-0.022268,0.008738,0.000765,-0.004571,0.101224
2025-06-20,0.022435,-0.012088,0.004461,-0.006243,-0.046698,-0.069841,-0.001487,0.004412,0.012517,-0.005282,-0.010409


***

# 1. Risk Statistics and Decomposition

### 1.1. (10pts)

Display a table with the following metrics for each of the return series.

* mean (annualized)
* volatility (annualized)
* Sharpe ratio (annualized)
* skewness
* kurtosis
* maximum drawdown

In [145]:
def performance_summary(rets, adj_factor):
    summary = {}
    summary['Annualized Mean'] = rets.mean() * adj_factor
    summary['Annualized Volatility'] = rets.std() * np.sqrt(adj_factor)
    summary['Annualized Sharpe Ratio'] = (
        summary['Annualized Mean'] / summary['Annualized Volatility']
        )
    return pd.DataFrame(summary, index=rets.columns)
metrics = performance_summary(rets,52)

def tail_risk(returns_df, quantile):
  summary = pd.DataFrame()
  summary['Skewness'] = returns_df.skew() #skewness function
  summary['Excess Kurtosis'] = returns_df.kurtosis() # excess kurtosis function
  summary['Var (0.05)'] = returns_df.quantile(q = 0.05, axis='index') #fifth quantile
  summary['CVar (0.05)'] = returns_df[returns_df <= returns_df.quantile(q=0.05, axis='index')].mean() #mean of the returns at or below fifth quantile

  #max drawdown calculations
  index = 1000 * (1 + returns_df).cumprod() #cumulative product of all returns given $1000
  peaks = index.cummax() #highest points of the data at or before that date
  drawdowns = (index - peaks) / peaks #pct. calculation for drawdowns ()
  summary['Max Drawdown'] = drawdowns.min() #smallest on an absolute basis (more neg. the bigger drawdown)

  # dates for maximum / peak of each drawdown
  summary['Peak (in max. drawdown period)'] = peaks.idxmax()

  #dates for minimum / troughs of each drawdown
  summary['Trough (in max. drawdown period)'] = drawdowns.idxmin()

  recovery_dates = dict()
  for col in returns_df.columns:
      peak_date = summary.loc[col, 'Peak (in max. drawdown period)']
      trough_date = summary.loc[col, 'Trough (in max. drawdown period)']

      peak_value = index[col].loc[peak_date]

      index_after_trough = index[col].loc[trough_date:]

      recovery_date = index_after_trough[peak_value <= index_after_trough].index.min()
      recovery_dates[col] = recovery_date if recovery_date else pd.NaT

  summary['Recovery Date'] = recovery_dates

  return summary
risks = tail_risk(rets,0.05)
cols_to_drop = ['Var (0.05)', 'CVar (0.05)','Peak (in max. drawdown period)', 'Trough (in max. drawdown period)', 'Recovery Date']
risks.drop(columns=cols_to_drop, inplace=True)

desired_metrics = pd.merge(metrics, risks, left_index=True, right_index=True)
display(desired_metrics)

Unnamed: 0,Annualized Mean,Annualized Volatility,Annualized Sharpe Ratio,Skewness,Excess Kurtosis,Max Drawdown
AAPL,0.227994,0.276003,0.826057,-0.214185,1.852149,-0.348104
AMZN,0.300886,0.305453,0.985051,0.06155,1.754491,-0.54583
AVGO,0.382419,0.375069,1.019597,0.639696,3.515656,-0.409481
BRK-B,0.130218,0.189958,0.685508,-0.199913,2.608872,-0.266894
GOOGL,0.220044,0.279465,0.787376,0.572854,3.673239,-0.415141
LLY,0.268667,0.283345,0.948195,0.210125,1.638064,-0.254568
META,0.274415,0.352006,0.779574,0.062078,3.990455,-0.758756
MSFT,0.25332,0.239516,1.057631,0.066817,2.37236,-0.350826
NVDA,0.653633,0.461871,1.415184,0.336949,1.391086,-0.657787
SPY,0.118939,0.171315,0.694271,-0.627808,6.3637,-0.325741


### 1.2.

As a standalone investment, which is most attractive? And least? Justify your answer.

I'm drawn towards MSFT because it has good returns with moderate volaitiy and one of the highest Sharpe ratios. Additioanlly, it is positivly skewed (which indicates a chance of upward motion). Its kurtosis and drawdown's fair well as well. Overall, MSFT is a prefered option because it does not contain any red flags in these metrics and a safer asset is ideal in the case of a standalone investment.  

### 1.3. (10pts)

For each investment, estimate a regression against `SPY`. Report the
* alpha (annualized as a mean)
* beta
* info ratio
* r-squared

In [146]:
def univariate_regression(funds, explanatory):
    """
    Function is designed to calculate the univariate regression of y against X.
    Can also do downside beta (when market < 0) and upside beta (when market > 0)

    Returns:
        DataFrame: Summary of results
    """
   # funds = funds.copy()
    #funds.drop(columns='SPY', inplace=True)
    reg_results = []
    for fund in funds.columns:
      response = funds[fund]
        # set-up of Ordinary Least Sqaured Regression (drop missing values & add constant for regression)
      results = sm.OLS(response, sm.add_constant(explanatory), missing = 'drop').fit()

      # constant and slope of explanatory variable (index 0 and 1 respectively)
      parameters = results.params

      intercept = parameters.iloc[0] # returns in excess of the market
      beta = parameters.iloc[1]

      summary = dict()

      summary['Alpha'] = intercept * 52
      summary['Beta'] = beta

      summary['R-Squared'] = results.rsquared

      # normalize returns by the amount of market risk being taken on
      #summary['Treynor Ratio'] = (response.mean() / beta) * 12 # annualize the ratio

      residuals = results.resid

      #returns in excess of the market penalized by the variance of the regression
      summary['Information Ratio'] = (intercept / residuals.std()) * np.sqrt(52) #annualize the ratio

      reg_results.append(pd.DataFrame(summary, index = [response.name]))
    return pd.concat(reg_results)
univariate_regression(rets, rets['SPY'])

Unnamed: 0,Alpha,Beta,R-Squared,Information Ratio
AAPL,0.09651137,1.105465,0.470818,0.480686
AMZN,0.1747534,1.060484,0.353762,0.711681
AVGO,0.2211647,1.355773,0.38348,0.750985
BRK-B,0.03378997,0.810733,0.534602,0.260746
GOOGL,0.09387001,1.060832,0.422891,0.44215
LLY,0.1953153,0.616715,0.139036,0.742895
META,0.1371958,1.153691,0.31526,0.471008
MSFT,0.1316832,1.022681,0.535057,0.806298
NVDA,0.4494312,1.71686,0.405526,1.262048
SPY,8.597723000000001e-17,1.0,1.0,1.397793


### 1.4.

Based on this table, which investment seems most attractive relative to holding `SPY`? Justify your answer.

NVDA because it has the highest alpha and information ratio, which implies that there are strong returns with a justifyable risk.

### 1.5.

Suppose you expect `AAPL` to do well relatively, but you want to hedge broad market risk (`SPY`) and A.I. risk (`NVDA`).

For every $100 in `AAPL`, what should you hold in `SPY` and `NVDA`?

Estimate the regression including an intercept.

In [147]:
def calc_multi_regr(y, X, intercept=True, adj=52):
    """
    Calculate a multivariate regression of y on X. Adds useful metrics such
    as the Information Ratio and Tracking Error. Note that we can't calculate
    Treynor Ratio or Downside Beta here.

    Args:
        y : target variable
        X : independent variables
        intercept (bool, optional): Defaults to True.
        adj (int, optional): Annualization factor. Defaults to 12.

    Returns:
        DataFrame: Summary of regression results
    """
    if intercept:
        X = sm.add_constant(X)

    model = sm.OLS(y, X, missing="drop")
    results = model.fit()
    summary = dict()

    inter = results.params.iloc[0] if intercept else 0
    betas = results.params[1:] if intercept else results.params

    summary["Alpha (intercept)"] = inter * adj
    summary["R-Squared"] = results.rsquared

    X_cols = X.columns[1:] if intercept else X.columns

    for i, col in enumerate(X_cols):
        summary[f"{col} Beta"] = betas.iloc[i]

    summary["Information Ratio"] = (inter / results.resid.std()) * np.sqrt(adj)
    summary["Tracking Error"] = (
        inter / summary["Information Ratio"]
        if intercept
        else results.resid.std() * np.sqrt(adj)
    )
    return pd.DataFrame(summary, index=[y.name])
hedged_data = calc_multi_regr(rets['AAPL'], rets[['SPY', 'NVDA']])
hedge_SPY = -100 * hedged_data['SPY Beta']
hedge_NVDA = -100 * hedged_data['NVDA Beta']
display(hedged_data)

print(f"Hold in SPY: ${hedge_SPY.iloc[0]:.2f}")
print(f"Hold in NVDA: ${hedge_NVDA.iloc[0]:.2f}")

Unnamed: 0,Alpha (intercept),R-Squared,SPY Beta,NVDA Beta,Information Ratio,Tracking Error
AAPL,0.07294,0.475397,1.015422,0.052446,0.36487,0.003844


Hold in SPY: $-101.54
Hold in NVDA: $-5.24


Note: the negative implies a shorting position.

### 1.6.

Without estimating anything new, consider the idea of replicating `AAPL` using `SPY`, and `NVDA`. Which regression statistic best indicates if your replication tracks the target well?

Since we're not trying to 'beat' AAPL, the best statistic here would be R-squared, as it purely measures the fit.

### 1.7.

In the ProShares case, did we find the attempts at hedge-fund replication were successful?

Specifically, did we achieve high **correlation** to the...
* Merril Lynch Benchmark?
* HFRI Index?

Were there any drawbacks to using our replication rather than the direct product?


In the ProShares case, HDG had a positive tracking error, indicating some deviation from the benchmark. However, the regression-based replication consistently lagged behind the HFRI, limiting its ability to fully capture hedge fund dynamics.

Note: I am assuming you are asking purely about the case and not the regression we just performed.

***

# 2. Portfolio Allocation

### 2.1.

Display the correlation matrix of the returns.

Based on this information, which investment do you anticipate will get extra weight in the portfolio, beyond what it would merit for its mean return? Explain.

In [148]:
display(rets.corr())

Unnamed: 0,AAPL,AMZN,AVGO,BRK-B,GOOGL,LLY,META,MSFT,NVDA,SPY,TSLA
AAPL,1.0,0.483467,0.50973,0.410575,0.54351,0.230372,0.431583,0.586599,0.489128,0.686162,0.447456
AMZN,0.483467,1.0,0.400055,0.291329,0.593167,0.163836,0.518642,0.619179,0.528325,0.594779,0.400784
AVGO,0.50973,0.400055,1.0,0.332626,0.451265,0.154145,0.385706,0.532544,0.585338,0.619258,0.36555
BRK-B,0.410575,0.291329,0.332626,1.0,0.363985,0.285065,0.29676,0.405488,0.320359,0.731165,0.220023
GOOGL,0.54351,0.593167,0.451265,0.363985,1.0,0.194443,0.530472,0.64943,0.46246,0.6503,0.36131
LLY,0.230372,0.163836,0.154145,0.285065,0.194443,1.0,0.167004,0.275814,0.193175,0.372875,0.163093
META,0.431583,0.518642,0.385706,0.29676,0.530472,0.167004,1.0,0.550671,0.428785,0.561481,0.266692
MSFT,0.586599,0.619179,0.532544,0.405488,0.64943,0.275814,0.550671,1.0,0.598311,0.731476,0.396805
NVDA,0.489128,0.528325,0.585338,0.320359,0.46246,0.193175,0.428785,0.598311,1.0,0.636809,0.410513
SPY,0.686162,0.594779,0.619258,0.731165,0.6503,0.372875,0.561481,0.731476,0.636809,1.0,0.501967


BRK-B and LLY have consitently lower correlation with the other assets, which indicates that they will get extra weight in the tangency portfolio because it could help diversify risk.

### 2.2.

Calculate and report the weights of the mean-variance optimized portfolio, also called the tangency portfolio.

*Note that these are excess returns.*

In [149]:
def tan_weights(df):
  """
  Given a df with excess returns, computes the weights of the tangency
  portfolios.
  """
  mu = df.mean().values  # mean excess returns (monthly)
  Sigma = df.cov().values  # covariance matrix (monthly)
  ones = np.ones(len(mu))

  inv_Sigma = np.linalg.inv(Sigma) #inverse

  top = inv_Sigma @ mu
  bottom = ones @ top
  w_tan = top / bottom
  tangency_weights = pd.Series(w_tan, index=df.columns)
  tangency_df = tangency_weights.to_frame(name="Tangency Portfolio Weights")
  return tangency_df

#To get sharpe ratio and other metrics wrt tan portfolio weights
w_tan_df = tan_weights(rets)
print("Tangency Portfolio Weights: ")
display(w_tan_df.sort_values(by='Tangency Portfolio Weights', ascending=False))
print()
display(metrics['Annualized Sharpe Ratio'].sort_values(ascending=False))

Tangency Portfolio Weights: 


Unnamed: 0,Tangency Portfolio Weights
BRK-B,3.084359
LLY,1.202377
NVDA,1.067801
MSFT,0.859136
AVGO,0.64562
AMZN,0.536354
AAPL,0.477018
META,0.385017
TSLA,0.337308
GOOGL,0.298598





Unnamed: 0,Annualized Sharpe Ratio
NVDA,1.415184
MSFT,1.057631
AVGO,1.019597
AMZN,0.985051
LLY,0.948195
AAPL,0.826057
TSLA,0.788407
GOOGL,0.787376
META,0.779574
SPY,0.694271


### 2.3.

Report the following performance statistics of the portfolio achieved with the optimized weights calculated above.
* mean
* volatility
* Sharpe

(Annualize all three statistics.)

In [150]:
w_tan_returns = rets @ w_tan_df
tan_summary = performance_summary(w_tan_returns,52)
display(tan_summary)

Unnamed: 0,Annualized Mean,Annualized Volatility,Annualized Sharpe Ratio
Tangency Portfolio Weights,1.545781,0.723538,2.136419


### 2.4.

Consider the biggest positive weight (long) and most negative weight (short).

Do they align with the most extreme Sharpe ratios? Explain.

The biggest positive weight belongs to BRK-B and the most negative weight belongs to SPY. However, the most extreme Sharpe ratios are NVDA (largest) and BRK-B (smallest). This is because an asset's individual Sharpe ratio does not determine its weight in the tangency portfolio because the tangency weights are determined by how assets work together in a portfolio.

### 2.5.

Try dropping `SPY` from the set of assets.

Re-run the optimization and report the new tangency weights.

In [151]:
new_ret = rets.drop(columns='SPY')
w_tan_df_new = tan_weights(new_ret)
print("New Tangency Portfolio Weights: ")
display(w_tan_df_new.sort_values(by='Tangency Portfolio Weights', ascending=False))

New Tangency Portfolio Weights: 


Unnamed: 0,Tangency Portfolio Weights
LLY,0.399535
NVDA,0.370675
AMZN,0.14516
AVGO,0.104653
BRK-B,0.047814
TSLA,0.038724
META,0.028621
MSFT,0.014582
GOOGL,-0.071598
AAPL,-0.078166


### 2.6.

Mark each of the statements as `True or False`. No justification is needed.

In our analysis of the multi-asset portfolio optimization, we found that a change in TIPS mean excess returns caused a large change in the...

* performance of the tangency portfolio.
* weights of the tangency portfolio.
* correlation structure of the assets.

Perforance of the tangency portfolio: True

Weights of the tangency portfolio: True

Correlation structure of the assets: False

### 2.7. (10pts)

1. Briefly explain why the optimized portfolio is unrealistic in practice.

1. What does Harvard do to make the optimization more practical.

1. Why did Harvard optimize in levels (securities within an asset class, then all the broad asset classes) rather than direclty optimizing all the securities.

1. Basic mean-variance optimization often relies heavily on precise estimates of expected returns, variances, and covariances, which are hard to predict accurately. It can result in highly concentrated allocations that are not practical due factors like transaction costs, liquidity constraints, and risk of estimation errors.
2. Harvard makes the optimization more realistic by imposing practical constraints on asset class weights, such as minimum and maximum allocations. This constrains the optimization based on institutional guidelines, liquidity needs, and peer benchmarking, which makes it more implementable in the real world.
3. They did this to improve scalability and manage complexity. Otherwise, it would be computationally infeasable to optimize.

****

# 3. Expected Returns

### 3.1.

Consider the CAPM as tested with a single stock,

$$\mathbb{E}\left[r\right] = \beta\,  \mathbb{E}\left[x\right]$$

where
* $r$ denotes the return on `NVDA`, and ETF for U.S. oil
* $x$ denotes the returns on `SPY`, an ETF for USD currency trades.

Estimate the associated regression,
$$r_t = \alpha + \beta x_t + \epsilon_t$$

Report,
* $\alpha$
* $\beta$
* r-squared

Annualize alpha.

In [152]:
def calc_iter_regr(y, X, one_to_many, adj=52):
    """
    Iterative regression for checking one X column against many different y columns,
    or vice versa. "one_to_many=True" means that we are checking one X column against many
    y columns, and "one_to_many=False" means that we are checking many X columns against a
    single y column.

    Args:
        y : Target variable(s)
        X : Independent variable(s)
        intercept (bool, optional): Defaults to True.
        one_to_many (bool, optional): Which way to run the regression. Defaults to False.
        adj (int, optional): Annualization.

    Returns:
        DataFrame : Summary of regression results.
    """
    if one_to_many:
        summary = pd.concat(
            [univariate_regression(y[[col]], X) for col in y.columns], axis=0
        )
        summary.index = y.columns
        return summary
    else:
        summary = pd.concat(
            [univariate_regression(y.to_frame(), X[col]) for col in X.columns], axis=0
        )
        summary.index = X.columns
        return summary

capm_regr = calc_iter_regr(rets[['NVDA']], rets[['SPY']], one_to_many=True)
"""
I realized last minute that I should've used uni as my original answer
regressed on all rets. I just changed the inputs as a quick fix.
"""
display(capm_regr)

Unnamed: 0,Alpha,Beta,R-Squared,Information Ratio
NVDA,0.449431,1.71686,0.405526,1.262048


### 3.2.

What evidence is there that this (simplistic) factor pricing model does not price `NVDA` correctly?

Alpha is shockingly high (around 44.94%), which signifies CAPM underprices NVDA’s expected return.

### 3.3.

What is the (annualized) average excess return of `NVDA`?

According to the CAPM, what level of average excess return is explained by the factor risk?

In [153]:
avg_spy_annual = (rets['SPY'].mean()) * 52
alpha = capm_regr.loc['NVDA', 'Alpha']
beta = capm_regr.loc['NVDA', 'Beta']
avg_nvda_annual = alpha + beta*avg_spy_annual
explained_return = beta * avg_spy_annual
print(f"Average excess return of NVDA: {avg_nvda_annual:.2%}")
print(f"Return explained by CAPM factor risk: {explained_return*100:.2f}%")

Average excess return of NVDA: 65.36%
Return explained by CAPM factor risk: 20.42%


### 3.4.

Now let's disregard the equilibrium pricing model above, and try calculating the expected return of `NVDA` via forecasting signals. To keep things simple, just use a forecasting regression (rather than a neural network, regression tree, etc.)
$$r_{t+1} = \alpha + \beta x_t + \epsilon_{t+1}$$

We estimate a forecasting regression of `NVDA` on `SPY`.


From this **forecasting** regression, report
* $\alpha$
* $\beta$
* r-squared

In [154]:
rets = rets.rename(columns={'SPY': 'r_SPY', 'NVDA': 'r_NVDA'})
rets['r_NVDA_t+1'] = rets['r_NVDA'].shift(-1)
df = rets.dropna()

X = sm.add_constant(df['r_SPY'])
y = df['r_NVDA_t+1']
model = sm.OLS(y, X)
results = model.fit()

results_df = pd.DataFrame({
    'Alpha': [results.params['const']],
    'Beta': [results.params.iloc[1]],
    'R-squared': [results.rsquared]
})
display(results_df)

Unnamed: 0,Alpha,Beta,R-squared
0,0.013119,-0.227943,0.007127


### 3.5.

Given the stats in `3.4.`, what do you think of this forecast for `NVDA`? Be specific.

The low R-squared indicates this model has weak predictive power. The negative beta suggests SPY returns are a poor and potentially misleading predictor of NVDA’s future returns. However, the positive alpha points to systematic returns unexplained by SPY.

### 3.6.

According to the signal forecasting model, what is the expected return (annualized) of `NVDA` in `July 2025`?

In [155]:
alpha = results_df.loc[0, 'Alpha']
beta = results_df.loc[0, 'Beta']
spy_return_july2025 = df.loc[:'2025-06-30', 'r_SPY'].iloc[-1]

nvda_weekly_forecast = alpha + beta * spy_return_july2025
nvda_annualized_forecast = (1 + nvda_weekly_forecast) ** 52 - 1

print(f"Expected weekly return of NVDA: {nvda_weekly_forecast:.4f}")
print(f"Annualized expected return of NVDA: {nvda_annualized_forecast:.2%}")

Expected weekly return of NVDA: 0.0143
Annualized expected return of NVDA: 109.49%


I acknowledge that these look a bit unrealistic and wonky, but I did not have enough time to properly diagnose what could have gone wrong here.

### 3.7.

Why is it important to create style factors which go **long** and **short** the targeted style?

It is important because this approach refines factor singals by isolating the style’s unique impact on returns, which makes them more precise for portfolio construction and performance attribution.

****