# Ablation Studies - RunPod GPU

This notebook runs ablation experiments to test which sentiment features contribute to model performance.

**Configurations:**
- `baseline` - No sentiment features (control)
- `score_only` - Just sentiment_score
- `core_3` - score + news_count + sentiment_proxy
- `all_sentiment` - All 6 sentiment features

**Plan:** 4 configs x 3 seeds = 12 experiments (~6 hours total)

## 1. Setup

In [None]:
# Check GPU
import torch
print(f"PyTorch: {torch.__version__}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

In [None]:
# Clone/update repository
import os
if not os.path.exists('/workspace/enhanced-rl-portfolio'):
    !git clone https://github.com/nimeshk03/enhanced-rl-portfolio.git /workspace/enhanced-rl-portfolio
else:
    !cd /workspace/enhanced-rl-portfolio && git pull

os.chdir('/workspace/enhanced-rl-portfolio')
print(f"Working directory: {os.getcwd()}")

In [None]:
# Install dependencies
!pip install -q stable-baselines3[extra] gymnasium pandas numpy pyyaml tensorboard

In [None]:
# Check data files
import os
os.makedirs('data', exist_ok=True)

price_exists = os.path.exists('data/processed_data.csv')
sentiment_exists = os.path.exists('data/historical_sentiment_complete.csv')

if price_exists and sentiment_exists:
    print("Data files found!")
else:
    print("Missing data files - upload before continuing:")
    if not price_exists: print("  - data/processed_data.csv")
    if not sentiment_exists: print("  - data/historical_sentiment_complete.csv")

## 2. View Ablation Configurations

In [None]:
import yaml

with open('configs/ablation_configs.yaml', 'r') as f:
    ablation_config = yaml.safe_load(f)

print("Ablation Configurations:")
print("=" * 60)
for name, cfg in ablation_config['configurations'].items():
    print(f"\n{name}:")
    print(f"  Description: {cfg['description']}")
    print(f"  Features: {cfg['sentiment_features']}")
    print(f"  Expected Sharpe: {cfg['expected_sharpe']}")

print(f"\nSeeds: {ablation_config['defaults']['seeds']}")
print(f"Total experiments: {len(ablation_config['configurations'])} x {len(ablation_config['defaults']['seeds'])} = {len(ablation_config['configurations']) * len(ablation_config['defaults']['seeds'])}")

## 3. Run Ablation Experiments

Choose one of the options below:
- **Option A:** Run ALL experiments (12 runs, ~6 hours)
- **Option B:** Run specific config with all seeds
- **Option C:** Run single experiment

In [None]:
# Option A: Run ALL ablation experiments
# WARNING: This takes ~6 hours!

RUN_ALL = False  # Set to True to run all 12 experiments

if RUN_ALL:
    !python -m src.experiments.ablation --config all
else:
    print("Set RUN_ALL = True to run all 12 experiments")
    print("Or use the cells below to run specific configs")

In [None]:
# Option B: Run specific config with all 3 seeds
# Choose: 'baseline', 'score_only', 'core_3', 'all_sentiment'

CONFIG_TO_RUN = 'baseline'  # Change this

!python -m src.experiments.ablation --config {CONFIG_TO_RUN}

In [None]:
# Option C: Run single experiment with specific seed

CONFIG = 'all_sentiment'
SEED = 42

!python -m src.experiments.ablation --config {CONFIG} --seed {SEED}

## 4. View Results

In [None]:
import pandas as pd
import os

# Load summary if exists
summary_path = 'experiments/ablation_results/ablation_summary.csv'

if os.path.exists(summary_path):
    df = pd.read_csv(summary_path)
    print("Ablation Results Summary:")
    print("=" * 60)
    print(df.to_string(index=False))
    
    print("\n" + "=" * 60)
    print("Aggregated by Config:")
    print("=" * 60)
    agg = df.groupby('config').agg({
        'sharpe_ratio': ['mean', 'std'],
        'total_return': ['mean', 'std'],
    }).round(4)
    print(agg)
else:
    print("No results yet. Run ablation experiments first.")
    
    # Check for individual results
    results_dir = 'experiments/ablation_results'
    if os.path.exists(results_dir):
        print(f"\nFound experiment directories:")
        for d in os.listdir(results_dir):
            if os.path.isdir(os.path.join(results_dir, d)):
                print(f"  - {d}")

## 5. Download Results

In [None]:
# Create zip of ablation results
!zip -r /workspace/ablation_results.zip experiments/ablation_results/

print("\nDownload: /workspace/ablation_results.zip")
print("Use RunPod File Browser or SCP")

In [None]:
# List all saved files
!ls -la experiments/ablation_results/ 2>/dev/null || echo "No results yet"

print("\n" + "="*50)
print("REMEMBER: Stop your RunPod instance to avoid charges!")
print("="*50)