# KTND-Finance: Non-Equilibrium Koopman-Thermodynamic Neural Decomposition

**Full experiment pipeline for Physical Review E submission.**

This notebook runs the complete KTND-Finance pipeline on Google Colab (free tier):
1. Install dependencies
2. Clone the repository
3. Download financial data (Yahoo Finance)
4. Run tests to verify code integrity
5. Train models (univariate + multiasset)
6. Run baselines, robustness tests, rolling analysis
7. Generate all figures

**Runtime**: ~2-3 hours on Colab free tier (T4 GPU)

**Memory**: ~6 GB RAM (fits within Colab free 12.7 GB)

## 1. Setup

In [None]:
# Install dependencies (Colab has torch, numpy, pandas, scipy, sklearn, matplotlib pre-installed)
!pip install -q yfinance>=1.0.0 hmmlearn>=0.3.0 statsmodels>=0.14.0 arch>=6.0.0 pyyaml>=6.0

In [None]:
# Clone the repository
import os
REPO_URL = "https://github.com/keshavkrishnan08/kind_finance.git"
REPO_DIR = "/content/ktnd_finance"

if os.path.exists(REPO_DIR):
    !cd {REPO_DIR} && git pull
else:
    !git clone {REPO_URL} {REPO_DIR}

os.chdir(REPO_DIR)
print(f"Working directory: {os.getcwd()}")

In [None]:
# Verify environment
import torch
import numpy as np
import sys

print(f"Python:    {sys.version}")
print(f"PyTorch:   {torch.__version__}")
print(f"NumPy:     {np.__version__}")
print(f"CUDA:      {torch.cuda.is_available()} ({torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'})")
print(f"Device:    {'cuda' if torch.cuda.is_available() else 'cpu'}")

# Verify project imports
sys.path.insert(0, REPO_DIR)
from src.model.vampnet import NonEquilibriumVAMPNet
from src.model.losses import total_loss
print("\nAll project imports OK")

## 2. Run Tests (verify code integrity)

In [None]:
# Run the fast tests (skip synthetic which takes ~3 min)
!cd {REPO_DIR} && python -m pytest tests/ -q --tb=short -k "not test_synthetic" 2>&1 | tail -10

## 3. Download Data

In [None]:
!cd {REPO_DIR} && python data/download.py --mode all

In [None]:
# Verify data
import pandas as pd
prices = pd.read_csv(f"{REPO_DIR}/data/prices.csv", index_col=0, parse_dates=True)
print(f"Prices: {prices.shape} ({prices.index[0].date()} to {prices.index[-1].date()})")
print(f"Tickers: {list(prices.columns)}")
print(f"\nNaN per ticker:")
print(prices.isna().sum().to_string())

## 4. Train Models

Train on both univariate (SPY, 1993-2025) and multiasset (11 ETFs, 2007-2025).

In [None]:
%%time
# Univariate training (~15-30 min on T4)
!cd {REPO_DIR} && python experiments/run_main.py \
    --config config/univariate.yaml \
    --mode univariate \
    --seed 42

In [None]:
%%time
# Multiasset training (~20-40 min on T4)
!cd {REPO_DIR} && python experiments/run_main.py \
    --config config/multiasset.yaml \
    --mode multiasset \
    --seed 42

In [None]:
# Check training results
import json
results_path = f"{REPO_DIR}/outputs/results/analysis_results.json"
if os.path.exists(results_path):
    with open(results_path) as f:
        results = json.load(f)
    print("=" * 60)
    print("Key Results:")
    print("=" * 60)
    for key in ['spectral_gap', 'entropy_total', 'entropy_empirical',
                'entropy_ci_lower', 'entropy_ci_upper', 'mean_irreversibility',
                'irrev_method', 'detailed_balance_violation',
                'n_complex_modes', 'fluctuation_theorem_ratio']:
        if key in results:
            print(f"  {key:35s}: {results[key]}")
else:
    print("No results found yet. Training may still be running.")

## 5. Run Baselines

In [None]:
%%time
!cd {REPO_DIR} && python experiments/run_baselines.py --config config/default.yaml

## 6. Robustness Tests

In [None]:
%%time
# Robustness tests for univariate
!cd {REPO_DIR} && python experiments/run_robustness.py \
    --config config/default.yaml \
    --mode univariate

## 7. Rolling Spectral Analysis

In [None]:
%%time
!cd {REPO_DIR} && python experiments/run_rolling.py \
    --config config/default.yaml \
    --mode univariate

## 8. Generate Figures

In [None]:
!cd {REPO_DIR} && python experiments/run_figures.py \
    --results-dir outputs/results \
    --figures-dir outputs/figures

In [None]:
# Display generated figures
from IPython.display import Image, display
import glob

fig_dir = f"{REPO_DIR}/outputs/figures"
pngs = sorted(glob.glob(f"{fig_dir}/*.png"))
print(f"Generated {len(pngs)} figures:")
for p in pngs:
    print(f"  {os.path.basename(p)}")
    display(Image(filename=p, width=800))

## 9. (Optional) Ablation Study

**Warning**: This takes ~2-6 hours even on GPU. Use reduced seeds for Colab.

In [None]:
# Ablation study with reduced seeds (3 instead of 10) for Colab time limits
# Uncomment to run:
# !cd {REPO_DIR} && python experiments/run_ablations.py \
#     --config config/default.yaml \
#     --n-seeds 3 \
#     --n-jobs 1

## 10. Final Summary

In [None]:
# Collect all outputs
print("=" * 70)
print("KTND-Finance: Experiment Complete")
print("=" * 70)

output_dir = f"{REPO_DIR}/outputs"
for subdir in ['results', 'models', 'figures', 'logs']:
    path = os.path.join(output_dir, subdir)
    if os.path.exists(path):
        files = os.listdir(path)
        print(f"\n  {subdir}/ ({len(files)} files):")
        for f in sorted(files)[:10]:
            fpath = os.path.join(path, f)
            size = os.path.getsize(fpath) if os.path.isfile(fpath) else 0
            print(f"    {f:45s} {size/1024:.1f} KB")

# Load and display key results
results_file = os.path.join(output_dir, 'results', 'analysis_results.json')
if os.path.exists(results_file):
    with open(results_file) as f:
        r = json.load(f)
    print(f"\n{'='*70}")
    print("Key Metrics for PRE Paper:")
    print(f"{'='*70}")
    print(f"  Spectral gap:           {r.get('spectral_gap', 'N/A')}")
    print(f"  Entropy (spectral):     {r.get('entropy_total', 'N/A')}")
    print(f"  Entropy (empirical):    {r.get('entropy_empirical', 'N/A')} "
          f"[{r.get('entropy_ci_lower', '?')}, {r.get('entropy_ci_upper', '?')}]")
    print(f"  Mean irreversibility:   {r.get('mean_irreversibility', 'N/A')}")
    print(f"  Irrev method:           {r.get('irrev_method', 'N/A')}")
    print(f"  DB violation:           {r.get('detailed_balance_violation', 'N/A')}")
    print(f"  Complex modes:          {r.get('n_complex_modes', 'N/A')}/{r.get('n_modes', 'N/A')}")
    print(f"  FT ratio:               {r.get('fluctuation_theorem_ratio', 'N/A')}")
    print(f"  Leading relaxation:     {r.get('relaxation_times', [None])[0]} days")

In [None]:
# Download all results as a zip
!cd {REPO_DIR} && zip -r /content/ktnd_results.zip outputs/
from google.colab import files
files.download('/content/ktnd_results.zip')