# Replicating the Nozawa Corporate Bond Portfolios from He, Kelly, and Manela (2017)

## Imports

In [None]:
import pull_bondret_treasury
import pull_CRSP_bond_returns
import pull_he_kelly_manela_factors
import calc_nozawa_portfolio
import calc_metrics
import pandas as pd
import numpy as np
from misc_tools import *

In [None]:
from pathlib import Path
from settings import config

OUTPUT_DIR = Path(config("OUTPUT_DIR"))
DATA_DIR = Path(config("DATA_DIR"))

## Data Processing

Here, we load the data and process it:

In [None]:
open_df = pull_bondret_treasury.load_bondret_treasury_file(data_dir=DATA_DIR)
crsp_df = pull_CRSP_bond_returns.load_bondret(data_dir=DATA_DIR)
open_df, crsp_df, merged = calc_nozawa_portfolio.process_all_data(open_df, crsp_df)

In [None]:
merge_stats(crsp_df, open_df, ['cusip_date'])

The data processing also generates the deciles for the 10 corresponding corporate bond portfolios per Nozawa (2017) used by He, Kelly, and Manela (2017).

In [None]:
merged

Now, we can calculate the returns weighted by amount outstanding for each decile:

In [None]:
portfolio_returns_fwd, decile_returns_df = calc_nozawa_portfolio.calculate_decile_returns(merged)

## Analysis

We can compare the decile returns to the He, Kelly, and Manela factors, in which they calculated the returns for each Nozawa decile corporate bond portfolio:

In [None]:
test_df = pull_he_kelly_manela_factors.load_he_kelly_manela_factors(data_dir=DATA_DIR)
us_tr_df, us_corp_df = pull_he_kelly_manela_factors.process_he_kelly_manela_factors(test_df)

In [None]:
us_corp_df.iloc[344:]

Our calculated returns are below for comparison.

In [None]:
replication_df, updated_reproduction_df = calc_metrics.split_decile_returns(decile_returns_df, us_corp_df)
replication_df

Let's take a look at how our replication did:

In [None]:
analysis_df, benchmark_summary, replicate_summary = calc_metrics.calculate_decile_analysis(decile_returns_df, us_corp_df)
analysis_df

Summary statistics for the Nozawa portfolios per He, Kelly, and Manela:

In [None]:
benchmark_summary

Summary statistics for our replication of the Nozawa portfolios:

In [None]:
replicate_summary

Now let's take a look at our reproduction of Nozawa updated with current data:

In [None]:
calc_metrics.plot_cumulative_returns(updated_reproduction_df)

This figure illustrates the cumulative returns for each yield-spread decile over time with updated numbers from 2012 - 2024. Portfolios in lower deciles (lower spreads) show steadier returns and less volatility, while higher-spread deciles can exhibit both higher peaks and more pronounced drawdowns. The ordering confirms the risk-return relationship typically associated with yield spreads.

## Decile Replication Analysis

Below is a summary of the replication metrics for portfolios 11 through 20. The table includes:
- **Correlation** (Pearson) between each replicated decile return and the benchmark
- **R²** (the square of the correlation)
- **Slope** and **Intercept** from a simple linear regression of benchmark returns on replicated returns
- **MAE** (Mean Absolute Error) and **RMSE** (Root Mean Squared Error)
- **Tracking Error** (standard deviation of the difference between benchmark and replicated returns)

### Key Observations

1. **High Correlation and R²**  
   - Most correlation values exceed 0.80, with several deciles at or above 0.90.  
   - Corresponding R² values typically range from about 0.65 up to 0.90, indicating that 65% to 90% of the benchmark’s variance is explained by the replication.

2. **Slope and Intercept**  
   - The **slope** values hover around 0.93 to 1.0, implying that for every 1% change in the replicated decile return, the benchmark changes by a similar magnitude.  
   - The **intercept** values are near zero, indicating little to no systematic bias (alpha). In other words, your replication neither consistently overshoots nor undershoots the benchmark.

3. **Error Measures**  
   - **MAE** (Mean Absolute Error) and **RMSE** (Root Mean Squared Error) are generally below 1% (e.g., in the 0.004–0.01 range). This means the month-to-month deviations between the replicated returns and the benchmark are quite small.  
   - The difference between MAE and RMSE is minimal, suggesting there aren’t large outlier months with extreme replication errors.

4. **Tracking Error**  
   - The **tracking error** (standard deviation of replicated minus benchmark returns) mostly remains under 1% for each decile. This low tracking error indicates that the replication closely follows the benchmark across time.

### Overall Assessment

- The **strong correlation and high R²** values demonstrate that your replicated decile portfolios move in close lockstep with the benchmark.  
- **Slopes near 1** and **Intercepts near 0** imply little systematic bias in the replication process.  
- **Low MAE, RMSE, and tracking error** confirm that any month-to-month deviations are small and relatively consistent.

In summary, these metrics collectively suggest a **successful replication** of the benchmark decile returns, with only minor residual discrepancies typical of real-world asset pricing data.