A modular framework for evaluating covariance matrix estimators in portfolio optimization contexts. This project implements multiple covariance estimation methods and provides a rolling backtest framework to compare their out-of-sample performance.
Covariance matrix estimation is critical for mean-variance optimization (MVO) in portfolio construction. This project provides:
- 7 covariance estimators: sample, PCA, constant correlation, scaled identity, shrinkage (with multiple targets), and Fama-French factor model
- Rolling backtest framework: evaluate estimators on out-of-sample data
- Multiple metrics: Frobenius loss, Stein's loss, log-likelihood, Mahalanobis distance, condition number
- YAML-driven configuration: easily experiment with different estimator combinations and parameters
- Registry pattern: modular, extensible estimator management
MVO_Covariance_Estimation/
├── src/
│ ├── cov_estimators.py # All covariance estimators with registry
│ ├── cov_backtest.py # Rolling window backtest framework
│ ├── cov_metrics.py # Evaluation metrics
│ ├── cov_run_experiment.py # Orchestration for multiple estimators
│ ├── schema.py # YAML config dataclasses
│ ├── utils.py # Data loading utilities
│ └── returns.py # Returns calculation
├── configs/
│ └── default.yaml # Experiment configuration
├── notebooks/
│ └── covariance_estimators_comparison.ipynb # Main analysis notebook
├── data/
│ ├── sample/ # Sample data for offline testing
│ └── processed/ # (gitignored) Processed outputs
├── documentation/ # Project notes and workflows
└── tests/ # (empty) Unit tests planned
git clone <repository-url>
cd MVO_Covariance_Estimation
pip install -r requirements.txtCreate a .env file in the project root:
PIPELINE_RETURNS_PATH=/path/to/FinDataPipeline/data/processed/returns_dataNote: If PIPELINE_RETURNS_PATH is unavailable, the loader automatically falls back to sample data in data/sample/.
jupyter notebook notebooks/covariance_estimators_comparison.ipynb- Sample Covariance: Classic unbiased estimator
- PCA Covariance: Statistical factor model using principal components
- Constant Correlation: Assumes equal pairwise correlations
- Scaled Identity: Diagonal matrix with average variance
- Shrinkage: Blend sample covariance with a target (constant correlation, scaled identity, or factor model)
- Fama-French Factor Model: Economic factor model (3-factor or 5-factor)
Estimators are configured in configs/default.yaml:
estimators:
- name: sample
params:
ddof: 1
- name: shrinkage_const_corr
base_estimator: shrinkage
params:
shrink_intensity: 0.5
shrink_target: constant_correlation
- name: ff_factor_model
params:
n_factors: 5from src.cov_estimators import sample_covariance, shrinkage_covariance
# Sample covariance
cov_sample = sample_covariance(returns_df, ddof=1)
# Shrinkage toward constant correlation
cov_shrink = shrinkage_covariance(
returns_df=returns_df,
shrink_target="constant_correlation",
shrink_intensity=0.3
)from src.cov_run_experiment import run_experiement
from src.schema import ExperimentConfig
# Load config
config = ExperimentConfig.from_yaml("configs/default.yaml")
# Prepare estimators dict
estimators = {}
for est in config.estimators:
estimators[est.name] = {
"estimator_kwargs": est.params,
"base_estimator": est.base_estimator or est.name
}
# Run rolling backtest
results = run_experiement(
returns_df=returns_df,
factors_df=factors_df,
estimators=estimators,
lookback_window=120, # months
scaling_factor=21 # daily to monthly scaling
)- Frobenius Loss:
||Σ_hat - Σ_ref||_F^2 - Stein's Loss:
tr(Σ_hat^-1 Σ_ref) - log det(Σ_hat^-1 Σ_ref) - n - Out-of-Sample Log-Likelihood: Fit to held-out returns
- Mahalanobis Distance: Squared distance of realized returns from estimated covariance
- Condition Number: Eigenvalue spread (numerical stability indicator)
- Returns data: Daily or monthly returns (DataFrame with dates as index, tickers as columns)
- Factor data: (Optional) Fama-French factors for factor model estimators
- Complete history: Tickers must have no gaps in the analysis window
The included sample data (data/sample/) contains SP100 returns and Fama-French factors for demonstration.
- Detailed mathematical descriptions of each estimator
- Interpretation guide for metrics
- Best practices for estimator selection
- Additional shrinkage targets (Ledoit-Wolf, Oracle Approximating Shrinkage)
- GARCH-based covariance models
- Robust covariance estimators (MCD, Minimum Covariance Determinant)
- Cross-validation for hyperparameter tuning
- Integration with MVO optimization framework
- Portfolio construction using estimated covariances
- Turnover and transaction cost analysis
- Efficient frontier visualization
- Unit tests for all estimators
- Integration tests for backtest framework
- Performance benchmarks
- Code documentation (docstrings)
- Streamlit dashboard for interactive exploration
- Automated report generation
- Visualization templates
- This is a learning and research project focused on covariance estimation methodology
- The notebook is for testing and exploration — production workflows should use the modular API
- Factor model requires Fama-French factor data (not included for full history; sample provided)
- All estimators assume returns are excess returns (subtract risk-free rate if available)
This project follows a teaching-driven development approach. When encountering issues:
- Identify the problem and understand why it occurred
- Request guidance before implementing solutions
- Learn the commands and techniques step-by-step
[Add your license here]
[Add your contact information here]