Skip to content

mathsuser/MVO_Covariance_Estimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MVO Covariance Estimation

A modular framework for evaluating covariance matrix estimators in portfolio optimization contexts. This project implements multiple covariance estimation methods and provides a rolling backtest framework to compare their out-of-sample performance.

Overview

Covariance matrix estimation is critical for mean-variance optimization (MVO) in portfolio construction. This project provides:

  • 7 covariance estimators: sample, PCA, constant correlation, scaled identity, shrinkage (with multiple targets), and Fama-French factor model
  • Rolling backtest framework: evaluate estimators on out-of-sample data
  • Multiple metrics: Frobenius loss, Stein's loss, log-likelihood, Mahalanobis distance, condition number
  • YAML-driven configuration: easily experiment with different estimator combinations and parameters
  • Registry pattern: modular, extensible estimator management

Project Structure

MVO_Covariance_Estimation/
├── src/
│   ├── cov_estimators.py      # All covariance estimators with registry
│   ├── cov_backtest.py         # Rolling window backtest framework
│   ├── cov_metrics.py          # Evaluation metrics
│   ├── cov_run_experiment.py   # Orchestration for multiple estimators
│   ├── schema.py               # YAML config dataclasses
│   ├── utils.py                # Data loading utilities
│   └── returns.py              # Returns calculation
├── configs/
│   └── default.yaml            # Experiment configuration
├── notebooks/
│   └── covariance_estimators_comparison.ipynb  # Main analysis notebook
├── data/
│   ├── sample/                 # Sample data for offline testing
│   └── processed/              # (gitignored) Processed outputs
├── documentation/              # Project notes and workflows
└── tests/                      # (empty) Unit tests planned

Setup

1. Clone and Install Dependencies

git clone <repository-url>
cd MVO_Covariance_Estimation
pip install -r requirements.txt

2. Configure Data Source

Create a .env file in the project root:

PIPELINE_RETURNS_PATH=/path/to/FinDataPipeline/data/processed/returns_data

Note: If PIPELINE_RETURNS_PATH is unavailable, the loader automatically falls back to sample data in data/sample/.

3. Run the Notebook

jupyter notebook notebooks/covariance_estimators_comparison.ipynb

Covariance Estimators

Implemented Estimators

  1. Sample Covariance: Classic unbiased estimator
  2. PCA Covariance: Statistical factor model using principal components
  3. Constant Correlation: Assumes equal pairwise correlations
  4. Scaled Identity: Diagonal matrix with average variance
  5. Shrinkage: Blend sample covariance with a target (constant correlation, scaled identity, or factor model)
  6. Fama-French Factor Model: Economic factor model (3-factor or 5-factor)

Configuration

Estimators are configured in configs/default.yaml:

estimators:
  - name: sample
    params:
      ddof: 1
  - name: shrinkage_const_corr
    base_estimator: shrinkage
    params:
      shrink_intensity: 0.5
      shrink_target: constant_correlation
  - name: ff_factor_model
    params:
      n_factors: 5

Usage

Basic Estimator Call

from src.cov_estimators import sample_covariance, shrinkage_covariance

# Sample covariance
cov_sample = sample_covariance(returns_df, ddof=1)

# Shrinkage toward constant correlation
cov_shrink = shrinkage_covariance(
    returns_df=returns_df,
    shrink_target="constant_correlation",
    shrink_intensity=0.3
)

Running Backtests

from src.cov_run_experiment import run_experiement
from src.schema import ExperimentConfig

# Load config
config = ExperimentConfig.from_yaml("configs/default.yaml")

# Prepare estimators dict
estimators = {}
for est in config.estimators:
    estimators[est.name] = {
        "estimator_kwargs": est.params,
        "base_estimator": est.base_estimator or est.name
    }

# Run rolling backtest
results = run_experiement(
    returns_df=returns_df,
    factors_df=factors_df,
    estimators=estimators,
    lookback_window=120,  # months
    scaling_factor=21     # daily to monthly scaling
)

Evaluation Metrics

  • Frobenius Loss: ||Σ_hat - Σ_ref||_F^2
  • Stein's Loss: tr(Σ_hat^-1 Σ_ref) - log det(Σ_hat^-1 Σ_ref) - n
  • Out-of-Sample Log-Likelihood: Fit to held-out returns
  • Mahalanobis Distance: Squared distance of realized returns from estimated covariance
  • Condition Number: Eigenvalue spread (numerical stability indicator)

Data Requirements

  • Returns data: Daily or monthly returns (DataFrame with dates as index, tickers as columns)
  • Factor data: (Optional) Fama-French factors for factor model estimators
  • Complete history: Tickers must have no gaps in the analysis window

The included sample data (data/sample/) contains SP100 returns and Fama-French factors for demonstration.

Upcoming Enhancements

Documentation

  • Detailed mathematical descriptions of each estimator
  • Interpretation guide for metrics
  • Best practices for estimator selection

Features

  • Additional shrinkage targets (Ledoit-Wolf, Oracle Approximating Shrinkage)
  • GARCH-based covariance models
  • Robust covariance estimators (MCD, Minimum Covariance Determinant)
  • Cross-validation for hyperparameter tuning

Integration

  • Integration with MVO optimization framework
  • Portfolio construction using estimated covariances
  • Turnover and transaction cost analysis
  • Efficient frontier visualization

Testing & Quality

  • Unit tests for all estimators
  • Integration tests for backtest framework
  • Performance benchmarks
  • Code documentation (docstrings)

UI/UX

  • Streamlit dashboard for interactive exploration
  • Automated report generation
  • Visualization templates

Notes

  • This is a learning and research project focused on covariance estimation methodology
  • The notebook is for testing and exploration — production workflows should use the modular API
  • Factor model requires Fama-French factor data (not included for full history; sample provided)
  • All estimators assume returns are excess returns (subtract risk-free rate if available)

Contributing

This project follows a teaching-driven development approach. When encountering issues:

  1. Identify the problem and understand why it occurred
  2. Request guidance before implementing solutions
  3. Learn the commands and techniques step-by-step

License

[Add your license here]

Contact

[Add your contact information here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors