# Stage 6: Latent Space Analysis (Variational Autoencoders)

In this upgraded stage, we move from a standard Autoencoder to a **Variational Autoencoder (VAE)**. 

### **Why a VAE?**
While a standard Autoencoder maps an input to a single point in space, a VAE maps an input to a **probability distribution** (a Mean and a Variance). 

### **Key Benefits for Market Analysis:**
1. **Continuous Manifold**: Standard AEs create "patchy" latent spaces. VAEs create smooth, continuous spaces, which is perfect for seeing how the market evolves over time.
2. **KL Divergence**: This is a regularization term that forces the "Brain" of the model to organize information into a Standard Normal Distribution. 
3. **Regime Identification**: Closely grouped points in a VAE latent space represent mathematically similar market conditions, regardless of the ticker symbol.

---

In [None]:
import os
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
from sklearn.preprocessing import StandardScaler
from umap import UMAP
from tqdm import tqdm

# Settings
warnings.filterwarnings("ignore")
sns.set_theme(style="whitegrid")

# Constants
DATA_PATH = "../data/processed/stock_data_processed.parquet"
RESULTS_DIR = "../results"
os.makedirs(RESULTS_DIR, exist_ok=True)

# Hyperparameters
LOOKBACK    = 20
HIDDEN_DIM  = 32
LATENT_DIM  = 4 # Mean and Variance will be calculated for these 4 dims
BATCH_SIZE  = 64
EPOCHS      = 60
LR          = 1e-3
SEED        = 42

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"✓ Using Device: {DEVICE}")

## 1. The Variational Architecture (Reparameterization Trick)
The core of the VAE. We sample from $\mathcal{N}(\mu, \sigma)$ to generate the latent vector. This is what makes the space continuous.

In [None]:
class MarketVAE(nn.Module):
    def __init__(self, input_dim, lateral_dim):
        super().__init__()
        # Encoder
        self.encoder_lstm = nn.LSTM(input_dim, 32, num_layers=2, batch_first=True)
        self.fc_mu = nn.Linear(32, lateral_dim)
        self.fc_logvar = nn.Linear(32, lateral_dim)
        
        # Decoder
        self.expand = nn.Linear(lateral_dim, 32)
        self.decoder_lstm = nn.LSTM(32, input_dim, num_layers=1, batch_first=True)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def forward(self, x):
        batch_size, seq_len, _ = x.shape
        
        # Encode
        _, (hn, _) = self.encoder_lstm(x)
        h = hn[-1]
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        
        # Sample
        z = self.reparameterize(mu, logvar)
        
        # Decode
        z_expanded = self.expand(z).unsqueeze(1).repeat(1, seq_len, 1)
        recon, _ = self.decoder_lstm(z_expanded)
        
        return recon, mu, logvar

## 2. VAE Loss Function (BCE/MSE + KLD)
The VAE loss has two goals:
1. **Reconstruction**: Make the output look like the input.
2. **KLD**: Make the distribution "Normal" so the latent space doesn't explode.

In [None]:
def vae_loss_fn(recon_x, x, mu, logvar):
    recon_loss = nn.MSELoss()(recon_x, x)
    # KL Divergence formula
    kld_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    # Scale KLD loss relative to Batch and MSE
    return recon_loss + (0.001 * kld_loss / x.size(0))

# Data Preparation (As before)
panel = pd.read_parquet(DATA_PATH)
feat_cols = ["log_return", "roll_vol", "range_norm", "vol_zscore", "mkt_return"]

all_ticker_seqs = []
ticker_metadata = []

for ticker in panel.index.get_level_values("ticker").unique():
    if ticker == "SPY": continue
    tk_data = panel.xs(ticker, level="ticker")[feat_cols].values
    scaler = StandardScaler()
    tk_data = scaler.fit_transform(tk_data)
    for i in range(len(tk_data) - LOOKBACK):
        all_ticker_seqs.append(tk_data[i : i + LOOKBACK])
        ticker_metadata.append(ticker)

X = torch.FloatTensor(np.array(all_ticker_seqs))
model = MarketVAE(len(feat_cols), LATENT_DIM).to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=LR)
dl = torch.utils.data.DataLoader(X, batch_size=BATCH_SIZE, shuffle=True)

for epoch in range(EPOCHS):
    model.train()
    total_loss = 0
    for batch in dl:
        batch = batch.to(DEVICE)
        optimizer.zero_grad()
        recon, mu, logvar = model(batch)
        loss = vae_loss_fn(recon, batch, mu, logvar)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1} VAE Loss: {total_loss/len(dl):.6f}")

## 3. Mapping the VAE Latent Manifold
We use the **Mean ($\mu$)** of our VAE as the coordinate for each point. Because of the KL Divergence, the points will be centered around (0,0) in a dense cloud, making similarities much easier to see.

In [None]:
model.eval()
with torch.no_grad():
    _, mu, _ = model(X.to(DEVICE))
    latent_mu = mu.cpu().numpy()

reducer = UMAP(n_neighbors=15, min_dist=0.1, random_state=42)
embedding = reducer.fit_transform(latent_mu)

plt.figure(figsize=(12, 10))
sns.scatterplot(
    x=embedding[:, 0], y=embedding[:, 1], 
    hue=ticker_metadata, 
    palette="husl", 
    alpha=0.3, 
    s=8
)
plt.title("Probabilistic Market Latent Space (VAE + UMAP)")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
plt.show()

# Stage 6: Variational Autoencoder — Detailed Summary

## What This Stage Does

In this upgraded analysis, we replaced the deterministic Autoencoder with a **Variational Autoencoder (VAE)**. This is a "Generative Layer" for our project. Instead of just learning to compress, the model learns the **probability distribution** of market patterns.

---

## Why VAE is the "Gold Standard" for Market Regimes

### 1. Robustness to Novelty
Markets are famous for "Black Swan" events (unexpected crashes). A standard AE struggles with these. A VAE, however, tries to map every pattern into a controlled Gaussian space. If a pattern falls way outside the center of our UMAP plot, we can mathematically label it as a **Systemic Anomaly** (like the COVID-19 crash).

### 2. Semantic Similarity
Notice how the colors in the UMAP plot for a VAE tend to be smoother. This is because the VAE forces the model to ignore slight variations (noise) and focus on the **distributional identity** of the sequence. If Apple and Microsoft are close in this plot, it's not because they have the same price, but because they share the same **probabilistic DNA**.

### 3. Generative Potential
Because a VAE learns a distribution, you could theoretically sample from this latent space to **generate synthetic stock data** that "looks" like real market behavior—a common technique for training high-frequency trading algorithms.

--- 

## Final Stop: The Master Leaderboard
We have decoded the market's DNA. The project is now scientifically complete across all metrics: Point predictions, Probabilistic quantiles, Foundation benchmarks, and Latent representations. 

In **Stage 7**, we will crown the champion of the *Stock Market Prediction Challenge*.