# <font face="garamond" size="18" color="#122DAC">***Replication Report on "A five-factor asset pricing model" (Fama & French,  2015)***</font>
## <font face="garamond" size="6" color="#122DAC">*Econ 430*</font> 
### <font face="garamond" size="6" color="#122DAC">*Jacob Williams, Josh Kentworthy, and Ignacio Ramirez*</font>

## <font face="garamond" size="6" color="#black">*University of California Los Angeles*</font> #
## <font face="garamond" size="6" color="#black">*November, 20 2025*</font>

### This notebook replicates the key findings of the Fama and French (2015) paper, "A five-factor asset pricing model." We will download the authors' data, replicate their main asset pricing tests, and discuss the results.

### **Original Paper:** Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. *Journal of Financial Economics, 116*(1), 1-22.

---

# Primary Goal - Provdide a replication report that includes: 
Original paper citation, research question summary, data description, replication methodology, your results compared to original findings, critical assessment, robustness checks, discussion of agreement/disagreement with authors’ conclusions, and Python source code. Attach a PDF copy of the original paper as an appendix. 


# Necessary Components

## 1. Original Study Summary

### (a) Research Question
The paper's main research question is: **Does a five-factor asset pricing model, which adds profitability (RMW) and investment (CMA) factors to the Fama and French (1993) three-factor model, provide a superior description of average stock returns?**

The economic motivation stems from evidence that the three-factor model incompletely captures variation in average returns related to profitability and investment.

### (b) Theoretical Framework
The analysis is motivated by the **dividend discount model** and the valuation theory of Miller and Modigliani (1961).

1. The authors start with the dividend discount model (Eq. 1), which states a stock's price is the discounted value of expected future dividends.
2. Through manipulation (Eq. 3), they show that a firm's market value ($M_t$) divided by its book value ($B_t$) is related to expected future earnings (proxy for **profitability**) and expected growth in book equity (proxy for **investment**).
3.  This valuation equation (Eq. 3) implies that:
    * Higher $B_t/M_t$ (Value) implies a higher expected return.
    * Higher expected earnings (Robust Profitability) implies a higher expected return.
    * Higher expected investment (Aggressive Investment) implies a lower expected return.

### (c) Methodology
The primary econometric approach is in the form of a **time-series regression**.

* **Model Specification (Five-Factor):**
    $R_{it}-R_{Ft}=a_{i}+b_{i}(R_{Mt}-R_{Ft})+s_{i}SMB_{t}+h_{i}HML_{t}+r_{i}RMW_{t}+c_{i}CMA_{t}+e_{it}$


* **Estimation & Testing:**
    * The models are estimated by regressing the excess returns of test portfolios (LHS) on the factor returns (RHS).
    * The main evaluation tool is the **Gibbons, Ross, and Shanken (1989) GRS statistic**, which tests the null hypothesis that the intercepts ($a_i$) for all portfolios in a set are jointly equal to zero.

### (d) Key Variables
* **Dependent Variables (LHS):** Monthly excess returns ($R_{it}-R_{Ft}$) on portfolios formed from various sorts on Size, $B/M$, OP, and Investment
  
* **Independent Variables (RHS Factors):**
    * **$R_M-R_F$**: The market excess return.


    * **SMB (Small Minus Big):** The size factor.

        * What it represents: This factor captures the size premium, which is the historical observation that small-company stocks (small market capitalization) tend to have higher average returns than large-company stocks.
     
            * How it's built:
              * Firms are sorted based on their market capitalization (Size).
              * The factor buys (goes long) a diversified portfolio of small stocks.
              * It sells (goes short) a diversified portfolio of big stocks.
                
          * What it means:
              * A positive SMB return means small-cap stocks outperformed large-cap stocks for that period.
              * A portfolio with a positive SMB beta is tilted toward small-cap stocks.


    * **HML (High Minus Low):** The value factor.

        * What it represents: This factor captures the value premium. This is the historical observation that "value" stocks (which appear cheap) tend to outperform "growth" stocks (which appear expensive).
          
     * How it's built:
         * Firms are sorted based on their Book-to-Market (B/M) ratio (the accounting value of the company divided by its stock market value).
         * A High B/M ratio means the stock is "cheap" (a value stock).
         * A Low B/M ratio means the stock is "expensive" (a growth stock).
         * The factor buys (goes long) a portfolio of High B/M (value) stocks.
         * It sells (goes short) a portfolio of Low B/M (growth) stocks.
           
    * What it means:
      * A positive HML return means value stocks outperformed growth stocks for that period.
      * A portfolio with a positive HML beta is tilted toward value stocks. A negative beta means it's tilted toward growth stocks.


    * **RMW (Robust Minus Weak):** The profitability factor.
 
        * What it represents: This factor captures the profitability premium. This is the observation that firms with high (robust) profitability tend to have higher average returns than firms with low (weak) profitability.
          
        * How it's built:
     
            * Firms are sorted based on their Operating Profitability (OP), which the paper measures as revenues minus costs, all divided by book equity.
            * The factor buys (goes long) a portfolio of Robust Profitability stocks.
            * It sells (goes short) a portfolio of Weak Profitability stocks.
              
        * What it means:
            * A positive RMW return means high-profitability firms outperformed low-profitability firms.
            * A portfolio with a positive RMW beta is tilted toward "quality" or high-profitability firms.

     
    * **CMA (Conservative Minus Aggressive):** The investment factor.

        * What it represents: This factor captures the investment premium. This is based on the theory and observation that companies that invest conservatively (grow their assets slowly) tend to have higher returns than firms that invest aggressively (grow their assets very quickly).
 
        * How it's built:
            * Firms are sorted based on their investment rate, which the paper measures as the year-over-year growth in total assets.
            * The factor buys (goes long) a portfolio of low-investment (Conservative) firms.
            * It sells (goes short) a portfolio of high-investment (Aggressive) firms.
              
        * What it means:
            * A positive CMA return means low-investment firms outperformed high-investment firms.
            * A portfolio with a positive CMA beta is tilted toward firms that are more conservative with their asset growth.

### (e) Main Findings
1.  **Superior Performance:** The five-factor model provides a better description of average returns than the three-factor model (Table 5).
2.  **Redundancy of HML:** A key finding is that the value factor, HML, becomes **redundant** for describing average returns. Its average return is fully captured by the other four factors (Table 6).
3.  **Main Failure:** The model's biggest problem is its **failure to capture the low average returns of small stocks** whose returns "behave like those of firms that invest a lot despite low profitability".

### (f) Contribution
The paper proposes and tests a new, five-factor model as a better benchmark for describing the cross-section of average stock returns, addressing the known shortcomings of the three-factor model related to profitability and investment.

### 2. Data Acquisition and Description

The imported modules/libraries necessary.

In [3]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS
import requests # This is the library for making HTTP requests. In this notebook, the module is grabbing data from Fama-French data library URL and downloading the .zip file. 
import zipfile # This is the built-in Python module for reading and writing ZIP archives. In this notebook,  after the requests library downloads the file, zipfile is used to open the .zip archive while it's still in memory. This allows us to extract the zip files.
from io import BytesIO, TextIOWrapper # This is a "translator" or "decoder." It takes a stream of raw bytes and turns it into a stream of readable text (strings). In this notebook, .txt filez inside the archive that is still just bytes, TextIOWrapper is used to "wrap" that byte stream, allowing us to read it as regular text.

(a) Data Sources: Document where you obtained the data (journal website, author’s page,
public repository, etc.)

(b) Data Processing: Describe any cleaning, merging, or transformations needed to match
original study

(c) Sample Description: Provide descriptive statistics for all variables used in the main
specification

• Create comparison table: your sample statistics vs. original paper’s reported statis-
tics
• Discuss any discrepancies in sample size, means, or distributions

• Explain potential reasons for differences (data updates, sample restrictions, etc.)

(d) Variable Construction: Document how you constructed key variables, especially if
transformations or combinations were needed

(e) Missing Data: Report extent of missing data and how you handled it (consistent with
original paper’s approach)

### 3. Replication Analysis

(a) Model Specification: Implement the authors’ main regression specification exactly as
described.

(b) Estimation Results: Present your results in a professional table format comparable
to the original paper’s tables.

(c) Comparison with Original: Create side-by-side comparison:

• Your coefficients vs. original paper’s coefficients

• Your standard errors vs. original standard errors

• Your R2 and other fit statistics vs. original

• Statistical significance patterns (which variables are significant in each)

(d) Replication Assessment: Evaluate success of replication:

• Are coefficient signs the same?

• Are magnitudes similar? Calculate percentage differences

• Are significance levels comparable?

• Overall assessment: successful, partial, or unsuccessful replication

(e) Discrepancy Investigation: If results differ, investigate potential reasons:
• Different sample periods or data versions
• Missing observations or variables
• Ambiguous specification details in original paper
• Software differences (R vs. Stata vs. Python)
• Errors in original paper or your implementation

### 4. Critical Evaluation and Discussion (25 points)
   
(a) Agreement/Disagreement with Findings:

• Do your replication results support the authors’ main conclusions?

• Are there any findings you question based on your analysis?

(b) Economic Interpretation:

• Do the magnitudes make economic sense?

• Are the policy implications well-supported by the evidence?

• What are the limitations for external validity?

(c) Suggestions for Improvement:

• What additional analyses would strengthen the paper?

• Are there robustness checks the authors should have included?

• Could alternative data or methods address the paper’s limitations?



### 5. Conclusion and Reflection (10 points)
    
(a) Summary: Concise summary of your replication exercise and key findings

(b) Overall Assessment: Your overall evaluation of the paper’s quality and contribution

(c) Learning Reflection: What you learned from the replication process about econometric practice and research

(d) Transparency in Research: Comment on the importance of replication and data
availability in economics