# Yield Curve PCA Analysis Demo

This notebook demonstrates Principal Component Analysis (PCA) on the U.S. Treasury yield curve to identify the main factors driving changes: **Level**, **Slope**, and **Curvature**.

## Overview
- Fetch U.S. Treasury yield data from FRED API
- Preprocess and clean the dataset
- Apply PCA to identify principal components
- Visualize explained variance, loadings, and component scores
- Interpret components as level, slope, and curvature factors


In [None]:
# Import required libraries
import sys
import os
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'src'))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

from src.data_fetch import fetch_yield_data, save_yield_data, load_yield_data
from src.preprocessing import preprocess_yield_data
from src.pca_analysis import compute_pca_results
from src.visualizations import (
    plot_explained_variance,
    plot_pca_loadings,
    plot_component_scores,
    plot_yield_curve_heatmap
)

print("Libraries imported successfully!")


## Configuration

Set up the analysis parameters:


In [None]:
# Configuration
FRED_API_KEY = os.getenv('FRED_API_KEY')  # Set your FRED API key as environment variable
START_DATE = '2010-01-01'
END_DATE = datetime.now().strftime('%Y-%m-%d')
N_COMPONENTS = 3

print(f"Analysis period: {START_DATE} to {END_DATE}")
print(f"Number of PCA components: {N_COMPONENTS}")


## Step 1: Fetch Yield Curve Data

Fetch U.S. Treasury yield data from FRED API for various maturities.


In [None]:
# Option 1: Fetch from FRED API (requires API key)
if FRED_API_KEY:
    df_raw = fetch_yield_data(FRED_API_KEY, START_DATE, END_DATE)
    save_yield_data(df_raw, '../data/yield_data.csv')
else:
    # Option 2: Load existing data
    try:
        df_raw = load_yield_data('../data/yield_data.csv')
        print("Loaded existing data from data/yield_data.csv")
    except FileNotFoundError:
        print("Error: No API key provided and no existing data file found.")
        print("Please set FRED_API_KEY environment variable or run CLI to fetch data first.")

# Display first few rows
print(f"\nData shape: {df_raw.shape}")
print(f"\nFirst few rows:")
df_raw.head()


In [None]:
# Display data summary
print("Data Summary:")
print(f"Date range: {df_raw.index.min()} to {df_raw.index.max()}")
print(f"Number of observations: {len(df_raw)}")
print(f"Number of maturities: {len(df_raw.columns)}")
print(f"\nMaturities: {list(df_raw.columns)}")
print(f"\nBasic statistics:")
df_raw.describe()


## Step 2: Preprocess Data

Clean and standardize the yield data for PCA analysis.


In [None]:
# Preprocess data
df_processed, means, stds = preprocess_yield_data(df_raw, standardize='demean')

print(f"Preprocessed data shape: {df_processed.shape}")
print(f"\nStandardization means:")
print(pd.Series(means, index=df_raw.columns))
print(f"\nPreprocessed data (first few rows):")
df_processed.head()


## Step 3: Apply PCA

Perform Principal Component Analysis to identify the main factors.


In [None]:
# Compute PCA results
pca_results = compute_pca_results(df_processed, n_components=N_COMPONENTS)

# Display results
print("PCA Results:")
print(f"\nExplained Variance:")
for i, (var, cum_var) in enumerate(zip(
    pca_results['explained_variance'],
    pca_results['cumulative_variance']
)):
    print(f"  PC{i+1}: {var:.2%} (Cumulative: {cum_var:.2%})")

print(f"\nComponent Interpretations:")
for pc, interp in pca_results['interpretations'].items():
    print(f"  {pc}: {interp}")


In [None]:
# Display PCA loadings
print("PCA Loadings (Factor Loadings):")
pca_results['loadings']


In [None]:
# Display PCA scores (first few rows)
print("PCA Scores (Component Scores - first 10 rows):")
pca_results['scores'].head(10)


## Step 4: Visualizations

Generate plots to visualize the PCA results.


In [None]:
# Plot explained variance
plot_explained_variance(
    pca_results['explained_variance'],
    output_path='../plots/explained_variance.png'
)
plt.show()


In [None]:
# Plot PCA loadings
plot_pca_loadings(
    pca_results['loadings'],
    output_path='../plots/pca_loadings.png'
)
plt.show()


In [None]:
# Plot component scores time series
plot_component_scores(
    pca_results['scores'],
    output_path='../plots/component_scores.png'
)
plt.show()


In [None]:
# Plot yield curve heatmap
plot_yield_curve_heatmap(
    df_raw,
    output_path='../plots/yield_curve_heatmap.png'
)
plt.show()


## Step 5: Interpretation and Insights

Analyze the PCA results to understand yield curve dynamics.


In [None]:
# Summary statistics for component scores
print("Component Scores Summary Statistics:")
pca_results['scores'].describe()


### Key Insights

1. **PC1 (Level)**: Typically explains 80-95% of variance. All maturities move in the same direction, representing parallel shifts in the yield curve.

2. **PC2 (Slope)**: Typically explains 5-15% of variance. Short-term and long-term yields move in opposite directions, representing steepening/flattening.

3. **PC3 (Curvature)**: Typically explains 1-5% of variance. Middle maturities move differently from both ends, representing curvature changes.

The first three components typically explain 95-99% of total yield curve variance, making PCA a powerful tool for dimensionality reduction and risk management.
