# IV Surface Generation Pipeline
## Arbitrage-Free IV Surfaces using Conditional VAE

This notebook orchestrates the complete pipeline for generating arbitrage-free volatility surfaces:

1. **Single Heston Calibration** - Fits Heston model parameters for each day
2. **Conditional VAE Training** - Trains the conditional VAE with market conditioning
3. **IV Surface Generation** - Generates surfaces for specific dates

---

**Status:** Ready to run (all dependencies installed, data prepared)
**Data Range:** 2019-01-01 to 2025-11-10
**Latest GDELT Data:** 2025-11-10 

---

###  File Operations Summary

**Read-Only Operations (Safe):**
-  Section 4 & 5: IV Surface Generation & Visualization - Only reads from existing models/data
-  Section 8: Results Management - Only reads and displays results

**Write Operations (Modifies Original Files):**
-  **Section 2**: Heston Calibration writes to `calibration_single_heston/`
  - Skip this section to use existing calibration results
-  **Section 3**: CVAE Training writes to `condtional_vae/best_model/`
  - Keep `RUN_CVAE_TRAINING = False` to use existing pre-trained model

**Notebook Output Location:**
-  All notebook-generated results are saved to `demo_results/` folder
-  Original data files and models remain unchanged (unless you run Sections 2 or 3)

**Recommendation:** For a demo run, skip Sections 2 & 3 and only run Sections 4-5 to generate surfaces using existing models.

##  Quick Start Guide for TAs/Reviewers

### Prerequisites Check

Before running this notebook, ensure:

1. **Python Environment**: Virtual environment with all dependencies installed
   ```bash
   source .venv/bin/activate  # Activate the virtual environment
   ```

2. **Required Files** (should already exist in project):
   -  `nifty_filtered_surfaces.pickle` - Input IV surface data
   -  `calibration_single_heston/NIFTY_heston_single_params_tensor.pt` - Pre-computed Heston parameters
   -  `llm_options_assistant/best_model_2025/cvae_model.pt` - Pre-trained CVAE model
   -  `api.json` - Gemini API key for LLM assistant

3. **Dependencies**: All packages from `requirements.txt` should be installed

### Recommended Demo Workflow (5-10 minutes)

**For a quick demo without re-training:**

1. **Run Section 1**: Setup and Environment 
2. **Skip Section 2**: Uses existing Heston calibration 
3. **Skip Section 3**: Uses pre-trained CVAE model 
4. **Run Section 4**: Generate IV surfaces for a date (takes ~1 minute) 
5. **Run Section 5**: View results and visualizations 
6. **Optional - Run Section 7**: Chat with LLM assistant (requires API key)

**Output Location**: All results saved to `demo_results/` folder

### Full Pipeline (if time permits - 30+ minutes)

Run all sections sequentially to see the complete training pipeline:
- Section 2: Heston Calibration (~10-30 minutes)
- Section 3: CVAE Training (~10-30 minutes)
- Sections 4-5: Surface Generation & Visualization

### Troubleshooting

**Issue**: Module not found errors
- **Solution**: Ensure virtual environment is activated

**Issue**: File not found errors
- **Solution**: Check that you're in the project root directory

**Issue**: LLM Assistant not working
- **Solution**: API key in `api.json` is optional for demo. Skip Section 7 if not needed.

### What This Notebook Demonstrates

1. **Heston Model Calibration**: Fits stochastic volatility model to market data
2. **Conditional VAE**: Learns distribution of Heston parameters conditioned on market variables
3. **IV Surface Generation**: Generates arbitrage-free volatility surfaces for any date
4. **AI Analysis**: Interactive LLM assistant for options analysis (optional)

---

In [None]:
# Dependency and Environment Check
import sys
import os
from pathlib import Path

print("=" * 80)
print("ENVIRONMENT & DEPENDENCY CHECK")
print("=" * 80)

# Check Python version
print(f"\n Python Version: {sys.version.split()[0]}")

# Check if in virtual environment
in_venv = hasattr(sys, 'real_prefix') or (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix)
if in_venv:
    print(f" Virtual Environment: Active ({sys.prefix})")
else:
    print(f" Virtual Environment: Not detected")
    print(f"  Run: source .venv/bin/activate")

# Check critical dependencies
critical_packages = [
    'torch', 'numpy', 'pandas', 'matplotlib', 
    'scipy', 'sklearn', 'tqdm', 'google.generativeai'
]

missing_packages = []
print(f"\n Checking Dependencies:")
for package in critical_packages:
    try:
        if package == 'sklearn':
            __import__('sklearn')
        elif package == 'google.generativeai':
            __import__('google.generativeai')
        else:
            __import__(package)
        print(f"   {package}")
    except ImportError:
        print(f"   {package} - NOT FOUND")
        missing_packages.append(package)

if missing_packages:
    print(f"\n Missing packages: {', '.join(missing_packages)}")
    print(f"  Install with: pip install {' '.join(missing_packages)}")
else:
    print(f"\n All critical dependencies installed!")

# Check for required data files
print(f"\n Checking Required Data Files:")
project_root = Path.cwd()

required_files = {
    'Input Data': project_root / 'nifty_filtered_surfaces.pickle',
    'Heston Params': project_root / 'calibration_single_heston' / 'NIFTY_heston_single_params_tensor.pt',
    'Pre-trained Model': project_root / 'llm_options_assistant' / 'best_model_2025' / 'cvae_model.pt',
    'API Key (Optional)': project_root / 'api.json'
}

all_required_exist = True
for name, filepath in required_files.items():
    if filepath.exists():
        size_mb = filepath.stat().st_size / (1024 * 1024)
        print(f"   {name:20s} ({size_mb:.1f} MB)")
    else:
        if 'Optional' in name:
            print(f"   {name:20s} (Optional - skip Section 7 if missing)")
        else:
            print(f"   {name:20s} - NOT FOUND")
            all_required_exist = False

print("\n" + "=" * 80)
if all_required_exist and not missing_packages:
    print(" READY TO RUN! All requirements satisfied.")
    print("   Proceed to Section 1 to start the pipeline.")
else:
    print("  SETUP INCOMPLETE - Please address the issues above.")
print("=" * 80)

## Section 1: Setup and Environment

In [None]:
import subprocess
import sys
import os
from pathlib import Path
import json
from datetime import datetime
import shutil

print("=" * 80)
print("INITIALIZING PIPELINE")
print("=" * 80)
print("[1/3] Setting up paths...", end=" ")

# Get the project root directory using relative path
PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == "condtional_vae" else Path.cwd()
PROJECT_ROOT = PROJECT_ROOT.resolve()

# Create results folder if it doesn't exist - NOTEBOOK OUTPUTS ONLY
RESULTS_FOLDER = PROJECT_ROOT / "demo_results"
RESULTS_FOLDER.mkdir(parents=True, exist_ok=True)

print("")
print("[2/3] Changing to project root...", end=" ")
os.chdir(PROJECT_ROOT)
print("")

print("[3/3] Gathering environment info...", end=" ")
print("\n")

# Use current Python executable (works in any environment)
PYTHON_EXECUTABLE = sys.executable

print("=" * 80)
print("IV SURFACE GENERATION PIPELINE")
print("=" * 80)
print(f"Project Root: {PROJECT_ROOT}")
print(f"Results Folder: {RESULTS_FOLDER}")
print(f"Current Working Directory: {os.getcwd()}")
print(f"Python Executable: {PYTHON_EXECUTABLE}")
print(f"Python Version: {sys.version.split()[0]}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 80)
print(" Setup complete - ready to run scripts\n")
print("\n  IMPORTANT: This notebook only writes to demo_results/ folder")
print("   All original data files and models remain unchanged.")

In [None]:
print("[SETUP] Defining helper function...", end=" ")

def run_script(script_path, script_name, description, timeout=None):
    """
    Execute a Python script as subprocess and capture output.
    Also saves output to a log file in the results folder.
    
    Args:
        script_path: Full path to the Python script
        script_name: Display name for the script
        description: What the script does
        timeout: Timeout in seconds (None for no timeout)
    
    Returns:
        Tuple of (return_code, stdout, stderr)
    """
    print("\n" + "=" * 80)
    print(f"Running: {script_name}")
    print(f"Description: {description}")
    print(f"Script: {script_path}")
    print("=" * 80)
    
    if not os.path.exists(script_path):
        print(f" ERROR: Script not found at {script_path}")
        return -1, "", f"Script not found: {script_path}"
    
    # Create log file in results folder
    log_filename = f"{script_name.replace(' ', '_')}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
    log_file = RESULTS_FOLDER / log_filename
    
    try:
        # Use current Python executable (works in any environment)
        python_exec = PYTHON_EXECUTABLE
        
        # Set up environment with proper matplotlib backend for non-interactive use
        env = os.environ.copy()
        env['MPLBACKEND'] = 'Agg'  # Use Agg backend (non-interactive, works in subprocess)
        
        print("[1/4] Preparing environment...", end=" ")
        print("")
        
        print("[2/4] Starting subprocess...", end=" ")
        result = subprocess.run(
            [python_exec, script_path],
            capture_output=True,
            text=True,
            timeout=timeout,
            cwd=os.path.dirname(script_path),
            env=env  # Pass custom environment
        )
        print("")
        
        # Save output to log file
        print("[3/4] Saving execution log...", end=" ")
        with open(log_file, 'w') as f:
            f.write(f"Script: {script_name}\n")
            f.write(f"Description: {description}\n")
            f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
            f.write("=" * 80 + "\n\n")
            if result.stdout:
                f.write("STDOUT:\n")
                f.write(result.stdout)
                f.write("\n\n")
            if result.stderr:
                f.write("STDERR:\n")
                f.write(result.stderr)
                f.write("\n\n")
            f.write(f"Return Code: {result.returncode}\n")
        print("")
        
        # Print output
        print("[4/4] Processing results...", end=" ")
        if result.stdout:
            print("")
            print(result.stdout)
        else:
            print("")
        
        if result.stderr:
            print("STDERR:", result.stderr)
        
        if result.returncode == 0:
            print(f"\n {script_name} completed successfully!")
            print(f" Log saved to: {log_file}")
        else:
            print(f"\n {script_name} exited with code {result.returncode}")
            print(f" Log saved to: {log_file}")
        
        return result.returncode, result.stdout, result.stderr
        
    except subprocess.TimeoutExpired:
        print(f" ERROR: {script_name} timed out after {timeout} seconds")
        return -1, "", f"Timeout after {timeout}s"
    except Exception as e:
        print(f" ERROR: {str(e)}")
        return -1, "", str(e)

print("\n")

## Section 2: Single Heston Calibration

Fits a single Heston model per day across all strikes and maturities.

**Input:** `nifty_filtered_surfaces.pickle` (IV surfaces for NIFTY options)
**Output:** 
- `calibration_single_heston/NIFTY_heston_single_params.pickle` (parameters)
- `calibration_single_heston/NIFTY_heston_single_params_tensor.pt` (PyTorch tensor)

**Process:**
1. Stage 1: Fast calibration without Wasserstein penalty
2. Stage 2: Refinement with Wasserstein penalty
3. Validation: Feller condition, parameter bounds, arbitrage checks

** WARNING:** Running this cell will write calibration results to `calibration_single_heston/` folder.
If you want to preserve existing calibration results, skip this cell and use the existing files.

In [None]:
# Run Single Heston Calibration with real-time progress
script_path = PROJECT_ROOT / "calibration_single_heston" / "run_single_heston_calibration.py"

# Check if calibration results already exist
output_pickle = PROJECT_ROOT / "calibration_single_heston" / "NIFTY_heston_single_params.pickle"
output_tensor = PROJECT_ROOT / "calibration_single_heston" / "NIFTY_heston_single_params_tensor.pt"
output_plot = PROJECT_ROOT / "calibration_single_heston" / "heston_single_calibration_errors.png"

if output_pickle.exists() and output_tensor.exists():
    print("\n" + "=" * 80)
    print("HESTON CALIBRATION - USING EXISTING RESULTS")
    print("=" * 80)
    print("\n Found existing calibration results:")
    print(f"  {'Pickle':20s} {str(output_pickle):60s} ")
    print(f"  {'Tensor':20s} {str(output_tensor):60s} ")
    print(f"  {'Plot':20s} {str(output_plot):60s} {'' if output_plot.exists() else ''}")
    print("\nSkipping calibration. To re-run calibration, delete these files and run this cell again.")
    print("=" * 80)
else:
    print("\n" + "=" * 80)
    print("Running: Single Heston Calibration")
    print("Description: Fits Heston model parameters for each day (two-stage: fast + Wasserstein refinement)")
    print(f"Script: {script_path}")
    print("=" * 80)

    if not script_path.exists():
        print(f" ERROR: Script not found at {script_path}")
    else:
        # Use current Python executable
        python_exec = PYTHON_EXECUTABLE
        
        # Set up environment with proper matplotlib backend
        env = os.environ.copy()
        env['MPLBACKEND'] = 'Agg'
        
        # Create log file
        log_filename = f"Single_Heston_Calibration_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
        log_file = RESULTS_FOLDER / log_filename
        
        try:
            print("\n[1/3] Starting calibration process...")
            print("[2/3] Running script with real-time output...\n")
            
            # Use Popen for real-time output streaming
            process = subprocess.Popen(
                [python_exec, str(script_path)],
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
                bufsize=1,  # Line buffered
                cwd=str(script_path.parent),
                env=env
            )
            
            # Capture output for logging while displaying in real-time
            output_lines = []
            
            # Stream output line by line
            for line in process.stdout:
                print(line, end='')  # Print immediately (real-time)
                output_lines.append(line)
            
            # Wait for process to complete
            return_code = process.wait()
            
            print("\n[3/3] Saving execution log...", end=" ")
            
            # Save to log file
            with open(log_file, 'w') as f:
                f.write(f"Script: Single Heston Calibration\n")
                f.write(f"Description: Fits Heston model parameters for each day\n")
                f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
                f.write("=" * 80 + "\n\n")
                f.write("OUTPUT:\n")
                f.write(''.join(output_lines))
                f.write(f"\n\nReturn Code: {return_code}\n")
            
            print("")
            
            # Check if successful
            if return_code == 0:
                print("\n Heston calibration completed successfully!")
                print(f" Log saved to: {log_file}")
                
                # Check output files
                print("\nOutput files:")
                print(f"  {'Pickle':20s} {str(output_pickle):60s} {'' if output_pickle.exists() else ''}")
                print(f"  {'Tensor':20s} {str(output_tensor):60s} {'' if output_tensor.exists() else ''}")
                print(f"  {'Plot':20s} {str(output_plot):60s} {'' if output_plot.exists() else ''}")
            else:
                print(f"\n Heston calibration failed with return code {return_code}")
                print(f" Log saved to: {log_file}")
                
        except Exception as e:
            print(f" ERROR: {str(e)}")

## Section 3: Conditional VAE Training (Optional)

Trains the Conditional VAE with market conditioning variables.

**Input:** 
- `calibration_single_heston/NIFTY_heston_single_params_tensor.pt` (Heston parameters)
- Market data: India VIX, USD/INR, Crude Oil, US 10Y Yield
- GDELT unrest index

**Output:**
- `condtional_vae/best_model/` (trained model weights)
- Training curves, loss plots

**Note:** This is optional - a pre-trained model is available at `llm_options_assistant/best_model_2025/`

** WARNING:** Running training will write model files to `condtional_vae/best_model/` folder.
If you want to preserve existing trained models, keep `RUN_CVAE_TRAINING = False`.

In [None]:
# Uncomment to run CVAE training (takes 10-30 minutes)
# This is optional - a pre-trained model exists

RUN_CVAE_TRAINING = False

if RUN_CVAE_TRAINING:
    # Check if Heston calibration parameters exist
    heston_tensor = PROJECT_ROOT / "calibration_single_heston" / "NIFTY_heston_single_params_tensor.pt"
    
    if not heston_tensor.exists():
        print("\n" + "=" * 80)
        print("ERROR: Heston Calibration Parameters Not Found")
        print("=" * 80)
        print(f"\nRequired file missing: {heston_tensor}")
        print("\nCVAE training requires Heston parameters as input.")
        print("\nOptions:")
        print("  1. Run Section 2 (Heston Calibration) first")
        print("  2. Or ensure the file exists from a previous calibration run")
        print("=" * 80)
    else:
        print("\n" + "=" * 80)
        print("CVAE Training - Prerequisites Check")
        print("=" * 80)
        print(f" Heston parameters found: {heston_tensor}")
        print(f"  Using existing calibration results for training")
        print("=" * 80)
        
        script_path = PROJECT_ROOT / "condtional_vae" / "train_cvae.py"
        
        return_code, stdout, stderr = run_script(
            script_path=str(script_path),
            script_name="Conditional VAE Training",
            description="Trains the conditional VAE with market conditioning variables",
            timeout=None
        )
        
        if return_code == 0:
            print("\n CVAE training completed successfully!")
            
            # Check output files
            model_dir = PROJECT_ROOT / "condtional_vae" / "best_model"
            model_file = model_dir / "cvae_model.pt"
            
            if model_file.exists():
                print(f"\n Trained model saved to: {model_file}")
                print(f"  This model will now be used for IV surface generation.")
            else:
                print(f"\n Warning: Model file not found at expected location: {model_file}")
        else:
            print(f"\n CVAE training failed with return code {return_code}")
else:
    print("\n" + "=" * 80)
    print("CVAE Training - SKIPPED")
    print("=" * 80)
    
    # Check for pre-trained model
    pretrained_model = PROJECT_ROOT / "llm_options_assistant" / "best_model_2025" / "cvae_model.pt"
    trained_model = PROJECT_ROOT / "condtional_vae" / "best_model" / "cvae_model.pt"
    
    if pretrained_model.exists():
        print(f"\n Using pre-trained model: llm_options_assistant/best_model_2025/cvae_model.pt")
    elif trained_model.exists():
        print(f"\n Using trained model: condtional_vae/best_model/cvae_model.pt")
    else:
        print(f"\n No trained model found in either location:")
        print(f"  - {pretrained_model}")
        print(f"  - {trained_model}")
        print(f"\n  You will need to run training to generate IV surfaces.")
    
    # Check if Heston parameters exist for potential training
    heston_tensor = PROJECT_ROOT / "calibration_single_heston" / "NIFTY_heston_single_params_tensor.pt"
    if heston_tensor.exists():
        print(f"\n Heston calibration parameters available: {heston_tensor.name}")
        print(f"  Ready for training if needed.")
    else:
        print(f"\n Heston calibration parameters not found: {heston_tensor.name}")
        print(f"  Run Section 2 (Heston Calibration) before training.")
    
    print("\nTo run training:")
    print("  1. Ensure Heston calibration is complete (Section 2)")
    print("  2. Set RUN_CVAE_TRAINING = True")
    print("  3. Re-run this cell")
    print("\nNote: Training takes 10-30 minutes and requires Heston parameters as input")

## Section 4: Generate IV Surfaces

Generates IV surfaces for specific dates using the trained CVAE model.

**Inputs:**
- Date (format: YYYY-MM-DD)
- Number of samples to generate (default: 100)
- Pre-trained CVAE model
- Market data: NIFTY spot, India VIX, USD/INR, Crude Oil, US 10Y Yield, GDELT unrest

**Outputs:**
- IV surface matrices (CSV)
- Surface visualization plots (PNG)
- PyTorch tensor data

**Usage:**
Define a date below, then run the cell to generate surfaces for that date.

In [None]:
# Configuration for IV Surface Generation
TARGET_DATE = "2025-11-10"  # Change this to any date from 2015-01-01 to 2025-11-10
N_SAMPLES = 100  # Number of surface samples to generate

print("=" * 80)
print("IV SURFACE GENERATION CONFIGURATION")
print("=" * 80)
print(f"Target Date: {TARGET_DATE}")
print(f"Number of Samples: {N_SAMPLES}")

# Check for pre-trained model
pretrained_model = PROJECT_ROOT / "llm_options_assistant" / "best_model_2025" / "cvae_model.pt"
trained_model = PROJECT_ROOT / "condtional_vae" / "best_model" / "cvae_model.pt"

if pretrained_model.exists():
    print(f"Model: llm_options_assistant/best_model_2025/cvae_model.pt ")
    MODEL_PATH = pretrained_model
elif trained_model.exists():
    print(f"Model: condtional_vae/best_model/cvae_model.pt ")
    MODEL_PATH = trained_model
else:
    print(f"Model:  No trained model found!")
    print(f"  Expected locations:")
    print(f"    - {pretrained_model}")
    print(f"    - {trained_model}")
    print(f"\n  Please run Section 3 (CVAE Training) first or ensure pre-trained model exists.")
    MODEL_PATH = None

print(f"Available Date Range: 2015-01-01 to 2025-11-10")
print("=" * 80)

In [None]:
# Run IV Surface Generator
script_path = PROJECT_ROOT / "condtional_vae" / "generate_iv_surface_by_date.py"

# Check if model exists (from previous cell)
if 'MODEL_PATH' not in dir() or MODEL_PATH is None:
    print("\n" + "=" * 80)
    print("ERROR: No trained model available")
    print("=" * 80)
    print("\nPlease ensure one of the following:")
    print("  1. Pre-trained model exists at: llm_options_assistant/best_model_2025/cvae_model.pt")
    print("  2. Run Section 3 (CVAE Training) to train a new model")
    print("\nCannot generate IV surfaces without a trained model.")
    print("=" * 80)
else:
    # Use current Python executable
    python_exec = PYTHON_EXECUTABLE

    cmd = [
        python_exec,
        str(script_path),
        "--date", TARGET_DATE,
        "--n_samples", str(N_SAMPLES),
        "--output_dir", "results_date"
    ]

    try:
        print("\n" + "=" * 80)
        print(f"Running: IV Surface Generation for {TARGET_DATE}")
        print(f"Number of samples: {N_SAMPLES}")
        print(f"Output will be saved to: {RESULTS_FOLDER}/{TARGET_DATE}_TIMESTAMP/")
        print("=" * 80)
        
        # Set up environment with proper matplotlib backend
        env = os.environ.copy()
        env['MPLBACKEND'] = 'Agg'  # Use Agg backend (non-interactive, works in subprocess)
        
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=300,
            cwd=str(PROJECT_ROOT / "condtional_vae"),
            env=env  # Pass custom environment
        )
        
        print(result.stdout)
        if result.stderr:
            print("STDERR:", result.stderr)
        
        # Create timestamped folder in demo_results
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        dated_results_dir = RESULTS_FOLDER / f"{TARGET_DATE}_{timestamp}"
        dated_results_dir.mkdir(parents=True, exist_ok=True)
        
        # Copy generated files from temporary location to demo_results folder
        # The script writes to condtional_vae/results_date/ temporarily
        source_dir = PROJECT_ROOT / "condtional_vae" / "results_date" / TARGET_DATE
        
        if source_dir.exists():
            print(f"\n Copying results to demo_results: {dated_results_dir.name}")
            
            # Copy all files from source to demo_results folder
            for file in source_dir.glob("*"):
                if file.is_file():
                    dest_file = dated_results_dir / file.name
                    shutil.copy2(file, dest_file)
                    size = file.stat().st_size / 1024
                    print(f"   Copied {file.name} ({size:.1f} KB)")
            
            # Save execution log
            log_file = dated_results_dir / f"execution_log_{timestamp}.log"
            with open(log_file, 'w') as f:
                f.write(f"IV Surface Generation Report\n")
                f.write(f"Date: {TARGET_DATE}\n")
                f.write(f"Samples: {N_SAMPLES}\n")
                f.write(f"Executed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
                f.write("=" * 80 + "\n\n")
                f.write("STDOUT:\n")
                f.write(result.stdout)
                f.write("\n\nSTDERR:\n")
                f.write(result.stderr)
                f.write(f"\n\nReturn Code: {result.returncode}\n")
            
            print(f"   Saved execution log")
            print(f"\n All results saved to: {dated_results_dir}")
            
            # Note about temporary files
            print(f"\n Note: Script temporarily writes to condtional_vae/results_date/{TARGET_DATE}/")
            print(f"   These files are copied to demo_results and can be safely deleted.")
            
            if result.returncode == 0:
                print(f"\n IV Surface generation completed successfully!")
            else:
                print(f"\n Generation completed with warnings (return code {result.returncode})")
        else:
            print(f" Source results directory not found: {source_dir}")
            
    except subprocess.TimeoutExpired:
        print(f" Generation timed out after 300 seconds")
    except Exception as e:
        print(f" Error: {str(e)}")

## Section 5: View Results & Visualization

Display the generated IV surfaces and analysis results.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image

# Find the most recent results folder for the target date
print(f"\n{'=' * 80}")
print(f"SEARCHING FOR RESULTS FOR {TARGET_DATE}")
print(f"{'=' * 80}\n")

# Look for folders matching TARGET_DATE in demo_results folder
matching_dirs = sorted([d for d in RESULTS_FOLDER.glob(f"{TARGET_DATE}_*")], reverse=True)

# Fallback: Check if results exist in the original condtional_vae/results_date folder
fallback_dir = PROJECT_ROOT / "condtional_vae" / "results_date" / TARGET_DATE

if matching_dirs:
    results_dir = matching_dirs[0]  # Get the most recent
    print(f" Found results in demo_results: {results_dir.name}\n")
    source = "demo_results"
elif fallback_dir.exists():
    results_dir = fallback_dir
    print(f" Found results in original location: condtional_vae/results_date/{TARGET_DATE}\n")
    print(f" Note: These results were not generated from this notebook run.")
    print(f"  To save results to demo_results, run Section 4 (IV Surface Generation).\n")
    source = "original"
else:
    results_dir = None
    print(f" No results found for {TARGET_DATE}")
    print(f"\nSearched in:")
    print(f"  - {RESULTS_FOLDER}")
    print(f"  - {fallback_dir}")
    print("\nPlease run the IV Surface Generation cell (Section 4) first.")

if results_dir and results_dir.exists():
    print(f"\n{'=' * 80}")
    print(f"RESULTS FOR {TARGET_DATE}")
    print(f"{'=' * 80}\n")
    
    # Load and display mean IV surface
    mean_iv_file = results_dir / "mean_iv_surface.csv"
    if mean_iv_file.exists():
        df_mean = pd.read_csv(mean_iv_file, index_col=0)
        print("Mean IV Surface (Implied Volatility %):")
        print(df_mean.round(2))
        print()
    
    # Load and display median IV surface
    median_iv_file = results_dir / "median_iv_surface.csv"
    if median_iv_file.exists():
        df_median = pd.read_csv(median_iv_file, index_col=0)
        print("\nMedian IV Surface (Implied Volatility %):")
        print(df_median.round(2))
        print()
    
    # Display plots if they exist
    plot_files = [
        "atm_term_structure.png",
        "mean_surface_heatmap.png",
        "iv_smiles.png"
    ]
    
    print(f"\n{'=' * 80}")
    print("GENERATED VISUALIZATIONS")
    print(f"{'=' * 80}\n")
    
    for plot_file in plot_files:
        plot_path = results_dir / plot_file
        if plot_path.exists():
            print(f"\n{plot_file}:")
            img = Image.open(plot_path)
            plt.figure(figsize=(12, 6))
            plt.imshow(img)
            plt.axis('off')
            plt.title(f"{TARGET_DATE} - {plot_file}")
            plt.tight_layout()
            plt.show()
    
    # List all files in results directory
    print(f"\n{'=' * 80}")
    print("ALL FILES IN RESULTS DIRECTORY")
    print(f"{'=' * 80}\n")
    
    for file in sorted(results_dir.glob("*")):
        size = file.stat().st_size / 1024  # KB
        print(f"  {file.name:40s} ({size:8.1f} KB)")
    
    if source == "original":
        print(f"\n{'=' * 80}")
        print("NOTE: Using results from original location")
        print(f"{'=' * 80}")
        print(f"These results are from: condtional_vae/results_date/{TARGET_DATE}")
        print(f"To generate fresh results in demo_results, run Section 4.")

## Section 7: Interactive LLM Options Assistant

Chat with an AI-powered options analyst that can:
- Generate IV surfaces for any date
- Analyze volatility patterns and market sentiment
- Identify trading opportunities
- Provide actionable recommendations

**Requirements:**
- Google Gemini API key (free, no credit card required)
- Get your key at: https://aistudio.google.com/app/apikey

**Usage:**
1. Set your API key: `os.environ['GEMINI_API_KEY'] = 'your-key-here'`
2. Run the cell below to start chatting
3. Type 'quit' to exit the chat

In [None]:
# Load API key from api.json and run the LLM Options Assistant
import os
import json
import sys

# Load API key from api.json
api_file = PROJECT_ROOT / "api.json"

if api_file.exists():
    with open(api_file, 'r') as f:
        api_data = json.load(f)
        os.environ['GEMINI_API_KEY'] = api_data['API_KEY']
    print(" API key loaded from api.json\n")
else:
    print("=" * 80)
    print("  API KEY FILE NOT FOUND")
    print("=" * 80)
    print(f"\nExpected file: {api_file}")
    print("\nCreate api.json with:")
    print('{\n    "API_KEY": "your-gemini-api-key-here"\n}')
    print("\nGet your free API key at: https://aistudio.google.com/app/apikey")
    print("=" * 80)
    sys.exit(1)

# Add the llm_options_assistant directory to path
llm_assistant_dir = PROJECT_ROOT / "llm_options_assistant"
sys.path.insert(0, str(llm_assistant_dir))

# Import and run the assistant
print("=" * 80)
print(" Starting NIFTY 50 Options Analysis Assistant")
print("=" * 80)
print(f"Using script: {llm_assistant_dir / 'options_analyst_gemini.py'}")
print("=" * 80 + "\n")

# Import the main function from the script
from options_analyst_gemini import main

# Run the interactive assistant
main()

## Section 8: Advanced Usage - Batch Generation

Generate IV surfaces for multiple dates in a loop.

In [None]:
# Optional: Generate surfaces for multiple dates
# Uncomment and customize to use this feature

BATCH_GENERATION = False

if BATCH_GENERATION:
    from datetime import datetime, timedelta
    
    # Check if model exists
    pretrained_model = PROJECT_ROOT / "llm_options_assistant" / "best_model_2025" / "cvae_model.pt"
    trained_model = PROJECT_ROOT / "condtional_vae" / "best_model" / "cvae_model.pt"
    
    if not (pretrained_model.exists() or trained_model.exists()):
        print("\n" + "=" * 80)
        print("ERROR: No trained model available for batch generation")
        print("=" * 80)
        print("\nPlease ensure one of the following:")
        print("  1. Pre-trained model exists at: llm_options_assistant/best_model_2025/cvae_model.pt")
        print("  2. Run Section 3 (CVAE Training) to train a new model")
        print("=" * 80)
    else:
        # Define date range
        start_date = datetime(2025, 10, 1)
        end_date = datetime(2025, 11, 10)
        
        dates_to_generate = []
        current = start_date
        while current <= end_date:
            dates_to_generate.append(current.strftime("%Y-%m-%d"))
            current += timedelta(days=1)
        
        print(f"\nGenerating surfaces for {len(dates_to_generate)} dates...")
        print(f"Date range: {dates_to_generate[0]} to {dates_to_generate[-1]}")
        print(f"Results will be saved to: {RESULTS_FOLDER}\n")
        
        venv_python = PROJECT_ROOT / ".venv" / "bin" / "python"
        script_path = PROJECT_ROOT / "condtional_vae" / "generate_iv_surface_by_date.py"
        
        results = {}
        batch_timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        
        # Set up environment with proper matplotlib backend
        env = os.environ.copy()
        env['MPLBACKEND'] = 'Agg'  # Use Agg backend for batch processing
        
        for idx, date in enumerate(dates_to_generate, 1):
            print(f"\n[{idx}/{len(dates_to_generate)}] Generating for {date}...", end=" ")
            
            cmd = [
                str(venv_python),
                str(script_path),
                "--date", date,
                "--n_samples", "50",  # Fewer samples for batch to speed up
                "--output_dir", "results_date"
            ]
            
            try:
                result = subprocess.run(
                    cmd,
                    capture_output=True,
                    text=True,
                    timeout=120,
                    cwd=str(PROJECT_ROOT / "condtional_vae"),
                    env=env  # Pass custom environment
                )
                
                if result.returncode == 0:
                    # Copy results to results folder
                    source_dir = PROJECT_ROOT / "condtional_vae" / "results_date" / date
                    dest_dir = RESULTS_FOLDER / f"{date}_{batch_timestamp}"
                    
                    if source_dir.exists():
                        dest_dir.mkdir(parents=True, exist_ok=True)
                        for file in source_dir.glob("*"):
                            if file.is_file():
                                shutil.copy2(file, dest_dir / file.name)
                    
                    print("")
                    results[date] = "Success"
                else:
                    print("")
                    results[date] = f"Failed (code {result.returncode})"
                    
            except subprocess.TimeoutExpired:
                print("⏱ Timeout")
                results[date] = "Timeout"
            except Exception as e:
                print(f" {str(e)}")
                results[date] = str(e)
        
        # Summary
        print(f"\n{'=' * 80}")
        print("BATCH GENERATION SUMMARY")
        print(f"{'=' * 80}")
        
        success_count = sum(1 for v in results.values() if v == "Success")
        print(f"\nCompleted: {success_count}/{len(dates_to_generate)}")
        print(f"Results saved to: {RESULTS_FOLDER}\n")
        
        for date, status in results.items():
            symbol = "" if status == "Success" else ""
            print(f"  {symbol} {date}: {status}")
    
else:
    print("Batch generation is DISABLED")
    print("\nTo enable batch generation:")
    print("  1. Set BATCH_GENERATION = True")
    print("  2. Customize date range (lines below)")
    print("  3. Re-run this cell")
    print(f"\nResults will be saved to: {RESULTS_FOLDER}")

## Section 9: Pipeline Summary & Notes

### Project Overview

This notebook orchestrates the **Arbitrage-Free IV Surface Generation Pipeline**, which consists of three main components:

1. **Heston Model Calibration**
   - Fits single Heston parameters for each trading day
   - Uses two-stage optimization (fast + Wasserstein refinement)
   - Input: IV surfaces from NIFTY options market
   - Output: 5 parameters (kappa, theta, sigma_v, rho, v0) per day

2. **Conditional VAE Training** (Optional)
   - Trains a variational autoencoder with market conditioning
   - Conditions on: VIX, USD/INR, Oil prices, Interest rates, Geopolitical unrest
   - Learns latent distribution of Heston parameters
   - Pre-trained model available: `llm_options_assistant/best_model_2025/`

3. **IV Surface Generation**
   - Generates forward-looking IV surfaces for any date
   - Conditions on current market state
   - Produces 100 sample surfaces with statistics
   - Outputs: matrices, plots, PyTorch tensors

### Key Features

 **Current Data:** GDELT unrest index updated to 2025-11-10
 **Market Data:** India VIX, USD/INR, Crude Oil, US 10Y Yield (via yfinance)
 **Full Date Range:** 2015-01-01 to 2025-11-10
 **Pre-trained Model:** Ready for immediate IV surface generation
 **LLM Assistant:** Interactive AI analyst for options analysis
 **Error Handling:** Graceful fallbacks for missing data

### How to Use

1. **Run Heston Calibration:** Execute Section 2 (takes 10-30 minutes) - Optional if using existing
2. **Skip CVAE Training:** Use pre-trained model (Section 3)
3. **Generate Surfaces:** Set date in Section 4, run the cell
4. **View Results:** Section 5 displays matrices and plots
5. **Chat with AI:** Section 7 provides interactive options analysis
6. **Batch Generation:** Optional batch processing (Section 8)

### Typical Workflow

```
Setup (Section 1)
    ↓
Heston Calibration (Section 2) [Optional - uses existing]
    ↓
CVAE Training (Section 3) [Skip if using pre-trained]
    ↓
IV Surface Generation (Section 4) [Main output]
    ↓
Visualize Results (Section 5)
    ↓
Chat with LLM Assistant (Section 7) [Interactive analysis]
```

### File Structure

```
/project_root/
├── calibration_single_heston/
│   ├── run_single_heston_calibration.py
│   └── NIFTY_heston_single_params_tensor.pt (output)
├── condtional_vae/
│   ├── train_cvae.py
│   ├── generate_iv_surface_by_date.py
│   └── results_date/ (generated surfaces)
├── llm_options_assistant/
│   ├── best_model_2025/cvae_model.pt (pre-trained model)
│   └── options_analyst_gemini.py (LLM assistant)
├── demo_results/ (notebook outputs)
└── Scripts_Orchestration.ipynb (this notebook)
```

### Troubleshooting

**Issue:** "Module not found" errors
**Solution:** Ensure venv is activated. Run: `source .venv/bin/activate`

**Issue:** "File not found" in calibration
**Solution:** Check that `nifty_filtered_surfaces.pickle` exists in project root

**Issue:** No market data available
**Solution:** Ensure yfinance can reach internet. Check ticker symbols (^INDIAVIX, EURINR=X, etc.)

**Issue:** GDELT data not found
**Solution:** Run `condtional_vae/fetch_and_compute_unrest_index.py` to update

**Issue:** LLM Assistant not working
**Solution:** Get free Gemini API key from https://aistudio.google.com/app/apikey

### References

- Heston Model: Two-factor stochastic volatility model
- Conditional VAE: Learns conditional distribution of parameters given market state
- GDELT Data: Geopolitical event database for sentiment/unrest index
- Wasserstein Metric: Optimal transport distance for distribution matching
- Google Gemini: Free LLM API for options analysis

## Section 10: Results Management

View all generated results and organize them.

In [None]:
print(f"\n{'=' * 80}")
print("RESULTS FOLDER CONTENTS")
print(f"{'=' * 80}\n")

print(f"Results Location: {RESULTS_FOLDER}\n")

if RESULTS_FOLDER.exists():
    # Get all subdirectories
    subdirs = sorted([d for d in RESULTS_FOLDER.iterdir() if d.is_dir()], reverse=True)
    
    if subdirs:
        print(f"Found {len(subdirs)} result directories:\n")
        
        for subdir in subdirs:
            # Parse directory name
            parts = subdir.name.rsplit('_', 2)  # Split from right to get date and timestamp
            dir_name = subdir.name
            
            # Count files
            files = list(subdir.glob("*"))
            file_count = len(files)
            total_size = sum(f.stat().st_size for f in files if f.is_file()) / 1024 / 1024  # MB
            
            print(f" {dir_name}/")
            print(f"   Files: {file_count}, Size: {total_size:.2f} MB")
            
            # List files
            for file in sorted(files):
                size = file.stat().st_size / 1024
                print(f"     • {file.name:35s} ({size:8.1f} KB)")
            print()
    else:
        print("No results generated yet in demo_results folder.")
        print("\nRun the IV Surface Generation cell (Section 4) to create results.")
        
        # Check for results in original location
        original_results = PROJECT_ROOT / "condtional_vae" / "results_date"
        if original_results.exists():
            original_subdirs = list(original_results.glob("*"))
            if original_subdirs:
                print(f"\n Found {len(original_subdirs)} result(s) in original location:")
                print(f"  {original_results}")
                print("\nThese can be viewed in Section 5 (View Results) as fallback.")
else:
    print(f"Results folder not yet created: {RESULTS_FOLDER}")
    print("It will be created when you first run the IV Surface Generation.")

print(f"\n{'=' * 80}")
print("HOW TO USE RESULTS")
print(f"{'=' * 80}\n")
print(f"1. Results are saved in: {RESULTS_FOLDER}")
print(f"2. Each run creates a timestamped folder: DATE_YYYYMMDD_HHMMSS/")
print(f"3. Inside each folder:")
print(f"   • iv_surfaces.pt - PyTorch tensor with samples")
print(f"   • mean_iv_surface.csv - Average IV surface (8x21 matrix)")
print(f"   • median_iv_surface.csv - Median IV surface (8x21 matrix)")
print(f"   • atm_term_structure.png - Term structure plot")
print(f"   • mean_surface_heatmap.png - IV heatmap visualization")
print(f"   • iv_smiles.png - Volatility smile across maturities")
print(f"   • execution_log_*.log - Detailed execution log")
print(f"\n4. You can download/share any result folder directly")
print(f"\n5. Fallback: If no demo_results exist, Section 5 will use results from:")
print(f"   condtional_vae/results_date/ (if available)")