# Algo DRL SAC/IQL — Colab End-to-End (Cloud GPU Optimized)

This notebook runs the full DRL pipeline on Google Colab with T4/A100 GPU:
- **Step 1**: Data download (BTCUSDT hourly OHLCV)
- **Step 2**: IQL offline pretraining (200K steps, ~2-3 hours)
- **Step 3**: IQL-only evaluation (measure offline RL baseline)
- **Step 4**: SAC online fine-tuning (loads IQL weights, ~1-2 hours)
- **Step 5**: Final evaluation & comparison

**Key Features**:
- Separated IQL/SAC for independent execution
- Cloud-optimized parameters (200K IQL, proper SAC steps)
- Runtime overrides keep repo YAMLs unchanged
- IQL-only baseline to measure SAC improvement

In [None]:
# GPU / CUDA summary (works with or without GPU)
import shutil, importlib
print('nvidia-smi path:', shutil.which('nvidia-smi'))
torch = importlib.import_module('torch')
print('torch.cuda.is_available:', torch.cuda.is_available())
print('torch.version.cuda:', getattr(torch.version, 'cuda', None))


In [None]:
%%bash
set -e
# Remove legacy gym to avoid conflicts
pip -q uninstall -y gym || true
# Try cu121 first; fallback to default wheels
pip -q install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 || pip -q install torch torchvision torchaudio
# Core libraries
pip -q install d3rlpy==2.8.1 gymnasium pandas numpy plotly vectorbt requests pyyaml


## Clone your repository (always pulls latest)
Set REPO_URL to your GitHub repo. If private, paste a token when prompted.

In [None]:
import os, pathlib
from getpass import getpass
REPO_URL = os.environ.get('REPO_URL', 'https://github.com/enluha/algo-drl-sac-iql')
BRANCH   = os.environ.get('REPO_BRANCH', 'master')
TARGET   = pathlib.Path('/content/algo-drl-sac-iql')
if 'your-account' in REPO_URL:
    raise SystemExit('Please set REPO_URL to your GitHub repository URL (public) or paste a token for private repos.')
if TARGET.exists():
    %cd /content/algo-drl-sac-iql
    !git fetch origin $BRANCH && git checkout $BRANCH && git pull
else:
    token = ''
    try:
        token = getpass('GitHub token (press Enter if public): ')
    except Exception:
        token = ''
    if token:
        auth_url = REPO_URL.replace('https://', f'https://{token}@')
        !git clone -b $BRANCH $auth_url $TARGET
    else:
        !git clone -b $BRANCH $REPO_URL $TARGET
    %cd /content/algo-drl-sac-iql


## Cloud Training Configuration (Overrides repo YAMLs at runtime)

**Production Mode**: Remove QA_STEPS to use cloud-optimized values below.  
**Quick Test**: Set `QA_STEPS='10000'` for 15-min smoke test.

In [None]:
import os

# ============================================================================
# CLOUD-OPTIMIZED TRAINING PARAMETERS (overrides repo YAMLs at runtime)
# ============================================================================

# Environment setup
os.environ['QA_DEVICE'] = 'cuda'
os.environ['CONFIG'] = 'config/config.yaml'

# ------------------------------------------------------------------------------
# TRAINING STEPS - Comment out for PRODUCTION, or set lower for quick test
# ------------------------------------------------------------------------------
# os.environ['QA_STEPS'] = '10000'  # Quick test (~15 min total)
# os.environ['QA_STEPS'] = '50000'  # Medium test (~1 hour total)
# If QA_STEPS is NOT set, uses cloud-optimized values below:

# ------------------------------------------------------------------------------
# IQL OFFLINE PRETRAINING (if QA_STEPS not set)
# ------------------------------------------------------------------------------
os.environ['CLOUD_IQL_STEPS'] = '200000'  # 200K steps (~2-3 hours on T4/A100)
# Recommended: 200K-300K for production, 100K minimum for decent policy

# ------------------------------------------------------------------------------
# SAC ONLINE FINE-TUNING (if QA_STEPS not set)
# ------------------------------------------------------------------------------
os.environ['CLOUD_SAC_STEPS'] = '100000'  # 100K steps (~1-2 hours)
# SAC needs fewer steps since it starts from trained IQL policy

# ------------------------------------------------------------------------------
# NETWORK ARCHITECTURE (cloud GPU can handle larger networks)
# ------------------------------------------------------------------------------
os.environ['CLOUD_HIDDEN_UNITS'] = '512,512,256'  # Larger capacity for cloud
# Default repo: [256, 256]. Cloud upgrade: [512, 512, 256] for better learning

# ------------------------------------------------------------------------------
# BUFFER & BATCH SIZE (utilize GPU memory efficiently)
# ------------------------------------------------------------------------------
os.environ['CLOUD_BATCH_SIZE'] = '512'  # Larger batches for faster training
# Default: 256. Cloud: 512 for better gradient estimates & GPU utilization

# ------------------------------------------------------------------------------
# SUMMARY
# ------------------------------------------------------------------------------
print('=' * 80)
print('CLOUD TRAINING CONFIGURATION')
print('=' * 80)
print(f"Device:              {os.environ['QA_DEVICE']}")
print(f"Config:              {os.environ['CONFIG']}")
print(f"QA_STEPS Override:   {os.environ.get('QA_STEPS', 'NOT SET (using cloud values below)')}")
print(f"IQL Training Steps:  {os.environ['CLOUD_IQL_STEPS']}")
print(f"SAC Training Steps:  {os.environ['CLOUD_SAC_STEPS']}")
print(f"Network Architecture: {os.environ['CLOUD_HIDDEN_UNITS']}")
print(f"Batch Size:          {os.environ['CLOUD_BATCH_SIZE']}")
print('=' * 80)
print('\nEstimated Training Time (T4 GPU):')
print('  - IQL: 2-3 hours (200K steps)')
print('  - SAC: 1-2 hours (100K steps)')
print('  - Total: ~3-5 hours for full pipeline')
print('=' * 80)

## 1) Download OHLCV (with volume fields)

In [None]:
!python scripts/download_ohlcv_binance.py --symbol BTCUSDT --interval 3600 --start '2024-06-10' --end '2025-10-16' --output-dir data


## 2) IQL Offline Pretraining (200K steps, ~2-3 hours)

Trains the IQL policy from offline expert demonstrations with auxiliary task (price direction prediction).

**What happens**:
- Loads dataset with momentum expert labels
- Trains multi-task encoder (policy + 24h price prediction)
- Generates IQL-only evaluation files for baseline comparison
- Saves model to `evaluation/artifacts/iql_policy.d3`

**Output Files**:
- `iql_policy.d3` - Trained IQL model
- `iql_actor_state.pt` - Actor weights for SAC warm-start
- `equity_BTCUSDT_IQLonly.html` - Equity curve (offline only)
- `candlestick_BTCUSDT_SepOct2025_IQLonly.html` - Trading signals
- `summary_report_BTCUSDT_IQLonly.txt` - Performance metrics

In [None]:
%%bash
set -e

# Apply cloud parameter overrides by modifying config files in-memory
# This keeps repo YAMLs unchanged while using cloud-optimized values

# Override training steps (if QA_STEPS not set)
if [ -z "$QA_STEPS" ]; then
    echo "Using cloud-optimized IQL steps: $CLOUD_IQL_STEPS"
    # Temporarily patch the config for this run
    python -c "
import yaml
from pathlib import Path

cfg_path = Path('config/algo_iql.yaml')
cfg = yaml.safe_load(cfg_path.read_text())
cfg['grad_steps_IQL'] = int('$CLOUD_IQL_STEPS')
cfg['batch_size'] = int('$CLOUD_BATCH_SIZE')

# Update network architecture if cloud override is set
if '$CLOUD_HIDDEN_UNITS':
    hidden = [int(x) for x in '$CLOUD_HIDDEN_UNITS'.split(',')]
    cfg['actor_encoder_factory']['params']['hidden_units'] = hidden
    cfg['critic_encoder_factory']['params']['hidden_units'] = hidden
    cfg['value_encoder_factory']['params']['hidden_units'] = hidden

cfg_path.write_text(yaml.dump(cfg, default_flow_style=False))
print(f'✓ Updated algo_iql.yaml: {cfg[\"grad_steps_IQL\"]} steps, batch={cfg[\"batch_size\"]}')
"
fi

# Run IQL pretraining
echo ""
echo "=========================================="
echo "Starting IQL Offline Pretraining..."
echo "=========================================="
python src/drl/offline/iql_pretrain.py

echo ""
echo "=========================================="
echo "IQL Training Complete!"
echo "=========================================="
echo "Saved artifacts:"
ls -lh evaluation/artifacts/iql_*.d3 evaluation/artifacts/iql_*.pt 2>/dev/null || echo "  (check evaluation/artifacts/)"
echo ""
echo "IQL-only evaluation files:"
ls -lh evaluation/charts/*IQLonly.html evaluation/reports/*IQLonly.txt 2>/dev/null || echo "  (check evaluation/ directories)"

## 3) View IQL-Only Results (Offline Baseline)

Before SAC fine-tuning, check the IQL-only performance to establish baseline.

In [None]:
import os
from IPython.display import IFrame, display, Markdown

# Display IQL-only summary report
print("=" * 80)
print("IQL-ONLY PERFORMANCE (Offline Baseline)")
print("=" * 80)
report_path = 'evaluation/reports/summary_report_BTCUSDT_IQLonly.txt'
if os.path.exists(report_path):
    with open(report_path, 'r', encoding='utf-8') as f:
        print(f.read())
else:
    print("⚠️  IQL-only report not found. Training may have failed.")

print("\n" + "=" * 80)
print("IQL-ONLY CHARTS")
print("=" * 80)

# Display equity curve
equity_path = 'evaluation/charts/equity_BTCUSDT_IQLonly.html'
if os.path.exists(equity_path):
    display(Markdown("### IQL-Only Equity Curve"))
    display(IFrame(equity_path, width=1000, height=500))
else:
    print("⚠️  IQL equity chart not found")

# Display candlestick chart
candle_path = 'evaluation/charts/candlestick_BTCUSDT_SepOct2025_IQLonly.html'
if os.path.exists(candle_path):
    display(Markdown("### IQL-Only Trading Signals"))
    display(IFrame(candle_path, width=1000, height=500))
else:
    print("⚠️  IQL candlestick chart not found")

## 4) SAC Online Fine-Tuning (100K steps, ~1-2 hours)

Loads the IQL pretrained weights and fine-tunes with online environment interaction using SAC.

**What happens**:
- Loads IQL actor weights as warm-start
- Runs online SAC training with replay buffer
- Updates policy via TD3-style soft actor-critic
- Saves final model to `evaluation/artifacts/sac_policy.d3`

**Expected Improvement**:
- IQL provides good initialization from offline data
- SAC refines policy through online exploration
- Should improve Sharpe ratio, reduce drawdown vs IQL-only baseline

In [None]:
%%bash
set -e

# Apply cloud parameter overrides for SAC
if [ -z "$QA_STEPS" ]; then
    echo "Using cloud-optimized SAC steps: $CLOUD_SAC_STEPS"
    python -c "
import yaml
from pathlib import Path

cfg_path = Path('config/algo_sac.yaml')
cfg = yaml.safe_load(cfg_path.read_text())
cfg['grad_steps_SAC'] = int('$CLOUD_SAC_STEPS')
cfg['batch_size'] = int('$CLOUD_BATCH_SIZE')

# Update network architecture if cloud override is set
if '$CLOUD_HIDDEN_UNITS':
    hidden = [int(x) for x in '$CLOUD_HIDDEN_UNITS'.split(',')]
    cfg['actor_encoder_factory']['params']['hidden_units'] = hidden
    cfg['critic_encoder_factory']['params']['hidden_units'] = hidden

cfg_path.write_text(yaml.dump(cfg, default_flow_style=False))
print(f'✓ Updated algo_sac.yaml: {cfg[\"grad_steps_SAC\"]} steps, batch={cfg[\"batch_size\"]}')
"
fi

# Run SAC fine-tuning
echo ""
echo "=========================================="
echo "Starting SAC Online Fine-Tuning..."
echo "=========================================="
python src/drl/online/sac_train.py

echo ""
echo "=========================================="
echo "SAC Training Complete!"
echo "=========================================="
echo "Saved artifacts:"
ls -lh evaluation/artifacts/sac_*.d3 evaluation/artifacts/sac_*.pt 2>/dev/null || echo "  (check evaluation/artifacts/)"

## 5) Final Evaluation (SAC-Finetuned Model)

Evaluates the SAC-finetuned model on the test period with warm-up.

In [None]:
%%bash
set -e

echo "=========================================="
echo "Running Final Evaluation (SAC Model)..."
echo "=========================================="
python -m src.run_walkforward --config config/config.yaml --device cuda

echo ""
echo "=========================================="
echo "Evaluation Complete!"
echo "=========================================="
echo "Final results:"
ls -lh evaluation/charts/equity_BTCUSDT.html evaluation/reports/summary_report_BTCUSDT.txt 2>/dev/null || echo "  (check evaluation/ directories)"

## 6) Compare Results: IQL-Only vs SAC-Finetuned

View final charts and compare performance metrics.

In [None]:
import os
from IPython.display import IFrame, display, Markdown

print("=" * 80)
print("PERFORMANCE COMPARISON: IQL-Only vs SAC-Finetuned")
print("=" * 80)

# Read both reports
iql_report_path = 'evaluation/reports/summary_report_BTCUSDT_IQLonly.txt'
sac_report_path = 'evaluation/reports/summary_report_BTCUSDT.txt'

if os.path.exists(iql_report_path):
    print("\n📊 IQL-ONLY (Offline Baseline)")
    print("-" * 80)
    with open(iql_report_path, 'r', encoding='utf-8') as f:
        print(f.read())
else:
    print("\n⚠️  IQL-only report not found")

if os.path.exists(sac_report_path):
    print("\n📊 SAC-FINETUNED (Final Model)")
    print("-" * 80)
    with open(sac_report_path, 'r', encoding='utf-8') as f:
        print(f.read())
else:
    print("\n⚠️  SAC report not found")

print("\n" + "=" * 80)
print("VISUAL COMPARISON")
print("=" * 80)

# Display SAC equity curve
display(Markdown("### SAC-Finetuned Equity Curve"))
if os.path.exists('evaluation/charts/equity_BTCUSDT.html'):
    display(IFrame('evaluation/charts/equity_BTCUSDT.html', width=1000, height=500))
else:
    print("⚠️  SAC equity chart not found")

# Display SAC candlestick
display(Markdown("### SAC-Finetuned Trading Signals"))
if os.path.exists('evaluation/charts/candlestick_BTCUSDT_SepOct2025.html'):
    display(IFrame('evaluation/charts/candlestick_BTCUSDT_SepOct2025.html', width=1000, height=500))
else:
    print("⚠️  SAC candlestick chart not found")

print("\n" + "=" * 80)
print("✓ Training Complete! Compare metrics above to see SAC improvement over IQL.")
print("=" * 80)

## 7) Download Results & Models

Package all results for local analysis.

In [None]:
%%bash
set -e

echo "Packaging results..."
cd /content/algo-drl-sac-iql

# Create results archive
tar -czf results_cloud_training.tar.gz \
    evaluation/artifacts/*.d3 \
    evaluation/artifacts/*.pt \
    evaluation/charts/*.html \
    evaluation/reports/*.txt \
    evaluation/reports/*.csv \
    evaluation/reports/*.json \
    d3rlpy_logs/IQLWithAuxiliary_*/params.json \
    d3rlpy_logs/SAC_*/params.json \
    2>/dev/null || true

echo ""
echo "=========================================="
echo "Results packaged: results_cloud_training.tar.gz"
echo "=========================================="
ls -lh results_cloud_training.tar.gz
echo ""
echo "Download this file from Colab's file browser (left sidebar)"
echo "or use Google Drive integration to save it."
echo "=========================================="