# ECO462: Homework 4

In [136]:
# Import basic Python libraries
import pandas as pd 
import numpy as np 
from scipy import stats
import statsmodels.api as sm

## Question 1: Properties of Fama-French’s Five Factors

In [137]:
# Question 1a Coding Portion

df_factors = pd.read_excel("../data/HW4 Data.xlsx", "Factors")
df_factors["Date"] = pd.to_datetime(df_factors["Date"])
df_factors = df_factors.set_index("Date")
print("\n================== Summary Statistics ==================\n")
print(df_factors.describe())
print("\n================== Correlation Matrix ==================\n")
print(df_factors.corr())



           Mkt-RF         SMB         HML         RMW         CMA          RF
count  733.000000  733.000000  733.000000  733.000000  733.000000  733.000000
mean     0.579468    0.204843    0.287285    0.288595    0.260628    0.003635
std      4.486843    3.048178    3.000065    2.219564    2.075952    0.002644
min    -23.240000  -15.320000  -13.880000  -18.650000   -7.200000    0.000000
25%     -1.970000   -1.560000   -1.400000   -0.780000   -1.020000    0.001400
50%      0.930000    0.070000    0.220000    0.260000    0.090000    0.003800
75%      3.440000    2.010000    1.750000    1.320000    1.470000    0.005000
max     16.100000   18.280000   12.800000   13.070000    9.070000    0.013500


          Mkt-RF       SMB       HML       RMW       CMA        RF
Mkt-RF  1.000000  0.279300 -0.204133 -0.186581 -0.361344 -0.083120
SMB     0.279300  1.000000  0.001820 -0.352176 -0.085131 -0.035799
HML    -0.204133  0.001820  1.000000  0.083357  0.685399  0.066428
RMW    -0.186581 -0.352176

### Question 1b Written Portion

**Question:** *Let’s focus on the High-Minus-Low (HML) factor for a moment. How would you
construct a mimicking portfolio that replicates it? Why would you expect this
portfolio to move one-to-one with HML and not with the other factors?*

**Answer**: A mimicking portfolio for the High-Minus-Low (HML) factor, as defined in the Fama–French factor framework, is constructed by taking long and short positions in portfolios sorted by book-to-market ratios. Specifically, assets are typically grouped into three quantiles (high, medium, and low) based on their book-to-market values. The HML factor is then formed by going long high book-to-market (“value”) stocks and short low book-to-market (“growth”) stocks, generally with equal weights across each leg.

In the broader factor-modeling framework, mimicking portfolios are designed to isolate exposure to a single systematic factor. In the first-pass regression, each asset’s factor loadings (betas) are estimated by regressing excess returns on the factor returns. Suppose there are $n$ assets and $k$ factors, yielding a beta matrix $\beta \in \mathbb{R}^{n\times k}$. The goal is to find portfolio weights $w \in \mathbb{R}^{n}$ such that:
$$B^{\intercal}w = e_j$$
where $e_j$ is the unit vector corresponding to the HML factor (or any arbitrary target factor at index $j$), implying the portfolio has beta 1 with respect to HML and 0 with respect to all other factors.

Given that $n>k$ and the column of $B$ are linearly independent, there exists at least one $w$ that satisfies the system, ensuring that the mimicking portfolio's returns move 1-for-1 with HML and are orthogonal to the other factors.The mimicking portfolio is explicitly constructed to neutralize any exposure to all non-HML factors, thus any variation can be solely attributed to the HML factor and therefore does not covary with any other factor.

In [138]:
## Question 1c Coding Portion

t_statistic, p_value = stats.ttest_1samp(df_factors["HML"], 0)
print("\n================== T-Test Results ==================")
print(f"t-statistic: {t_statistic:.2f}\np-value: {p_value:.2f}")


t-statistic: 2.59
p-value: 0.01


### Question 1c Written Portion

**Question**: Based on the moments of the HML factor, run a simple test to determine whether
high book-to-market firms have, on average, higher returns than low book-to-
market firms during the sample period. Hint: How can you test whether the
HML factor’s mean is statistically different from zero?

To test whether high book-to-market firms earn higher returns on average compared to low book-to-market firms, we perform a one-sample t-test on the HML factor's mean. The null hypothesis is that the HML factor's mean is equal to 0 ($H_0: \mu _{HML} = 0$), implying no difference in return on average between the high and low book-to-market firms. The alternative hypothesis is that high book-to-market firms earn higher average returns ($H_A:\mu _{HML} > 0$).

Using a significance level of 5% ($\alpha = 0.05$), the t-test compares the sample mean of the HML factor to zero. The resulting p-value of **0.01** is below the significance threshold, providing strong evidence to reject the null hypothesis and accept the alternative hypothesis. The positive t-statistic further indicates the mean of the HML factor is greater than 0.

Therefore, we can conclude that during the sample period, high book-to-market firms earned significantly higher average returns than low book-to-market firms.

## Question 2: Regression Using Fama-French's Three Factors

In [141]:
# Question 2a Coding Portion

factors = df_factors.columns.tolist()

df_stock_returns = pd.read_excel("../data/HW4 Data.xlsx", "Stock Returns")
df_stock_returns["DATE"] = pd.to_datetime(df_stock_returns["DATE"])
df_stock_returns = df_stock_returns.set_index("DATE")

tickers = df_stock_returns.columns.tolist()
df_results = pd.DataFrame()

def factor_model(t, factors):
  df_asset = pd.DataFrame(index=df_stock_returns.index)
  df_asset[t+"_Excess_Returns"] = df_stock_returns[t] - df_factors["RF"]
  for factor in factors[:-3]:
    df_asset[t+"_"+factor] = df_factors[factor]
  return df_asset

for t in tickers:
  df_stock = factor_model(t, factors)

  X = df_stock.iloc[:, 1:]
  X = sm.add_constant(X)
  y = df_stock[t+"_Excess_Returns"]

  model = sm.OLS(y, X).fit()

  t_beta_MKT = model.t_test(t+"_"+"Mkt-RF")
  t_beta_SMB = model.t_test(t+"_"+"SMB")
  t_beta_HML = model.t_test(t+"_"+"HML")

  row = pd.DataFrame({
        "Stock": [t],
        "Alpha": [model.params["const"]],
        "p-value_Alpha": [round(float(model.pvalues["const"]),5)],
        "Beta_MKT": [model.params[t+"_"+"Mkt-RF"]],
        "t-Beta_MKT": [round(float(t_beta_MKT.tvalue.item()),5)],
        "p_Beta_MKT": [round(float(t_beta_MKT.pvalue.item()),5)],
        "Beta_SMB": [model.params[t+"_"+"SMB"]],
        "t-Beta_SMB": [round(float(t_beta_SMB.tvalue.item()),5)],
        "p_Beta_SMB": [round(float(t_beta_SMB.pvalue.item()),5)],
        "Beta_HML": [model.params[t+"_"+"HML"]],
        "t-Beta_HML": [round(float(t_beta_HML.tvalue.item()),5)],
        "p_Beta_HML": [round(float(t_beta_HML.pvalue.item()),5)],
        "R²": model.rsquared
    })
  
  df_results = pd.concat([df_results,row])

df_results = df_results.set_index("Stock")
print(df_results)

          Alpha  p-value_Alpha  Beta_MKT  t-Beta_MKT  p_Beta_MKT  Beta_SMB  \
Stock                                                                        
AEP    0.000483        0.79217  0.005847    13.51708         0.0 -0.003207   
KMB    0.000436        0.83002  0.007251    15.14206         0.0 -0.003790   
BMY    0.003823        0.08370  0.008349    16.02675         0.0 -0.003609   
XOM    0.001423        0.38664  0.007662    19.76416         0.0 -0.002654   
IBM    0.001007        0.63876  0.008973    17.73273         0.0 -0.001787   

       t-Beta_SMB  p_Beta_SMB  Beta_HML  t-Beta_HML  p_Beta_HML        R²  
Stock                                                                      
AEP      -5.16880     0.00000  0.005363     8.42684     0.00000  0.237145  
KMB      -5.51796     0.00000  0.002353     3.34014     0.00088  0.248561  
BMY      -4.82991     0.00000 -0.000986    -1.28627     0.19878  0.286456  
XOM      -4.77326     0.00000  0.003746     6.56787     0.00000  0.362594

### Question 2a Written Portion

**Question:** Report your estimates for all factor loadings from the first-pass regression.
Before running the regressions, you need to construct excess returns on each of
the five stocks. Interpret the results. Are the factor loadings estimated precisely?

To assess the precision of the estimated factor loadings from the first-pass Fama-French 3-factor regressions, we perform a one-sample t-test on each coefficient for each stock ($b_i, s_i, h_i$). Formally, the null hypothesis is for each factor loading is:
$$H_0:\beta _{factor}=0$$
and the alternative hypothesis is:
$$H_A:\beta _{factor} \neq 0$$
This test evaluates whether the estimated factor loading is statistically distinguishable from zero at a conventional significance level. In the context of factor models, a rejection of the null indicates that the factor contributes meaningfully to explaining the excess return of the stock.

Using the same significance threshold of $\alpha = 0.05$, the estimated factor loadings for the market (*Beta_MKT*) and size (*Beta_SML*) are statistically significant for all give stocks, indicated by a p-values below 0.05, confirming that these factor exposures are estimated with high precision. For the value factor (*Beta_HML*), however, the factor loading was not statistically significant for BMY ($p=0.19878$), suggesting that its exposure to HML

we observe that the estimates for the factor loadings found above in the columns "Beta_MKT", "Beta_SMB", and "Beta_HML" are mostly all precise. Specifically, the resulting p-values for the factor loadings for the MKT factor and the SMB factor are both "precise" by being below the as shown by the sigificance of 0.05. However, the "Beta_HML" factor is not statistically significant for BMY and perhaps indicates a less precise factor loading. Overall, since a single asset is does not determine the validity of the entire factor model and the factor loadings, the factor can still be considered reasonably precise considering the other p-values for the four other stocks are below the threshold value of 0.05.

However, the direction of the factor's returns is interesting, where many of the stocks are not long some of the factors. For instance, a negative t-statistics indicates that the stocks are short the SMB factor. Additionally, there is a negative t-statistic for some stocks regarding the HML factor, indicating some shorting of the factor. On the other hand, each stock is observed to have a long position in the market.

## Question 3: Regression Using Fama-French's Five Factors

***Statement of Collaboration (including ChatGPT)***: I collaborated with **Rosalia Mwidege** and **Theodore Ouyang**. Additionally, **ChatGPT** was used to debug any error-prone code and find the proper Excel-equivalent Python functions and libraries to properly execute the solutions to the problems, in conjunction with the hints listed on the problem set.

***Honor Code***: This assignment represents my own work in accordance with University regulations and class policy.