# PSformer for Stock Forecasting

This notebook implements the PSformer (Parameter Shared Transformer) model for stock price forecasting using multiple tickers.

**Important Disclaimer:** This model uses random weights and serves as a demonstration. For real-world applications, the model needs to be properly trained on historical data.

## Features:
- Multi-ticker stock forecasting
- Backtest validation on held-out data
- Parameter sharing for efficient computation
- RevIN normalization for better generalization
- Two-stage segment attention mechanism

In [None]:
# Install required packages
!pip install torch pandas numpy matplotlib plotly

# Mount Google Drive (optional)
# from google.colab import drive
# drive.mount('/content/drive')

# Import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from typing import Tuple, Optional, Dict, Any
import warnings
warnings.filterwarnings('ignore')

print("Setup complete! PyTorch version:", torch.__version__)

# Upload Your Data Files

**Instructions:**
1. Use the file explorer pane on the left side of Colab
2. Upload your OHLCV data file (CSV format)
3. The CSV should have columns: ['Date', 'Ticker', 'Open', 'High', 'Low', 'Close', 'Volume']
4. Make sure the Date column is properly formatted

**Expected CSV format:**
```
Date,Ticker,Open,High,Low,Close,Volume
2023-01-01,AAPL,150.0,155.0,149.0,154.0,1000000
2023-01-02,AAPL,154.0,156.0,152.0,155.0,1100000
...
```

In [None]:
# List files in current directory to verify upload
import os
print("Files in current directory:")
for file in os.listdir('.'):
    if file.endswith('.csv'):
        print(f"üìä {file}")
    else:
        print(f"üìÑ {file}")

# Configuration

This cell contains all the parameters that can be easily modified for different experiments.

In [None]:
# ========== DATA CONFIGURATION ==========
DATA_FILE_PATH = "stock_data.csv"  # Change this to your uploaded CSV file name
TICKER_COLUMN = "Ticker"
DATE_COLUMN = "Date"
OHLCV_COLUMNS = ["Open", "High", "Low", "Close", "Volume"]

# ========== MODEL HYPERPARAMETERS ==========
SEQUENCE_LENGTH = 96    # Input sequence length (L) - 96 days of historical data
PATCH_SIZE = 8          # Temporal patch size (P) - 8 days per patch
PREDICTION_LENGTH = 30  # Forecast horizon (F) - 30 days ahead
NUM_ENCODER_LAYERS = 2  # Number of PSformer encoder layers
NUM_VARIABLES = 5       # Number of features (Open, High, Low, Close, Volume)

# ========== VALIDATION CONFIGURATION ==========
MIN_DATA_POINTS = SEQUENCE_LENGTH + PREDICTION_LENGTH  # 126 days minimum

# ========== DEVICE CONFIGURATION ==========
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {DEVICE}")

# Verify configuration
print(f"\nConfiguration Summary:")
print(f"- Input sequence length: {SEQUENCE_LENGTH} days")
print(f"- Patch size: {PATCH_SIZE} days")
print(f"- Number of patches: {SEQUENCE_LENGTH // PATCH_SIZE}")
print(f"- Prediction horizon: {PREDICTION_LENGTH} days")
print(f"- Minimum data required: {MIN_DATA_POINTS} days")
print(f"- Features: {NUM_VARIABLES} ({', '.join(OHLCV_COLUMNS)})")

# PSformer Model Implementation

The following cells contain the complete source code for the PSformer model, adapted for the notebook environment.

In [None]:
# ========== PS BLOCK IMPLEMENTATION ==========
class PSBlock(nn.Module):
    """
    Parameter Shared Block implementing Equation 3 from PSformer paper:
    Xout = (GeLU(XinW(1))W(2) + Xin)W(3)
    """
    
    def __init__(self, N: int):
        """
        Args:
            N: Dimension size for N√óN weight matrices
        """
        super().__init__()
        self.N = N
        
        # Three N√óN linear layers with bias
        self.linear1 = nn.Linear(N, N)
        self.linear2 = nn.Linear(N, N) 
        self.linear3 = nn.Linear(N, N)
        
        # Activation function
        self.activation = nn.GELU()
        
        # Initialize weights
        self._init_weights()
    
    def _init_weights(self):
        """Initialize weights using Xavier initialization for W1, W2 and smaller weights for W3"""
        nn.init.xavier_uniform_(self.linear1.weight)
        nn.init.xavier_uniform_(self.linear2.weight)
        # Initialize linear3 with smaller weights as it's the final transformation
        nn.init.xavier_uniform_(self.linear3.weight, gain=0.1)
        
        if self.linear1.bias is not None:
            nn.init.zeros_(self.linear1.bias)
        if self.linear2.bias is not None:
            nn.init.zeros_(self.linear2.bias)
        if self.linear3.bias is not None:
            nn.init.zeros_(self.linear3.bias)
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass implementing the three-step transformation
        
        Args:
            x: Input tensor of shape (C, N) or (batch, C, N)
            
        Returns:
            Output tensor of same shape as input
        """
        # Handle both 2D and 3D tensors
        original_shape = x.shape
        is_3d = x.dim() == 3
        
        # Validate input shape
        if x.dim() not in [2, 3]:
            raise ValueError(f"Input tensor must be 2 or 3-dimensional, got {x.dim()}")
        
        if is_3d:
            # Reshape 3D to 2D: [batch, C, N] -> [batch*C, N]
            batch, C, N = x.shape
            if N != self.N:
                raise ValueError(f"Input tensor last dimension must be {self.N}, got {N}")
            x = x.view(-1, N)  # [batch*C, N]
        else:
            # 2D case
            if x.shape[1] != self.N:
                raise ValueError(f"Input tensor second dimension must be {self.N}, got {x.shape[1]}")
        
        # Store original input for residual connection
        residual = x
        
        # First transformation: Linear -> GeLU -> Linear + Residual
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        intermediate_output = x + residual
        
        # Second transformation: Linear
        final_output = self.linear3(intermediate_output)
        
        # Reshape back to original shape if needed
        if is_3d:
            final_output = final_output.view(batch, C, N)
        
        return final_output

In [None]:
# ========== REVIN IMPLEMENTATION ==========
class RevIN(nn.Module):
    def __init__(self, num_features: int, eps=1e-5, affine=True):
        """
        :param num_features: the number of features or channels
        :param eps: a value added for numerical stability
        :param affine: if True, RevIN has learnable affine parameters
        """
        super(RevIN, self).__init__()
        self.num_features = num_features
        self.eps = eps
        self.affine = affine
        if self.affine:
            self._init_params()

    def forward(self, x, mode:str):
        if mode == 'norm':
            self._get_statistics(x)
            x = self._normalize(x)
        elif mode == 'denorm':
            x = self._denormalize(x)
        else: raise NotImplementedError
        return x

    def _init_params(self):
        # initialize RevIN params: (C,)
        self.affine_weight = nn.Parameter(torch.ones(self.num_features))
        self.affine_bias = nn.Parameter(torch.zeros(self.num_features))

    def _get_statistics(self, x):
        if x.ndim != 3:
            raise IndexError(f"Expected 3D input tensor [batch, channels, length], got {x.ndim}D tensor")
        # Compute statistics over the time dimension (dim=2) for each channel separately
        # Input shape: [batch, channels, time] -> statistics shape: [batch, channels, 1]
        self.mean = torch.mean(x, dim=2, keepdim=True).detach()
        self.stdev = torch.sqrt(torch.var(x, dim=2, keepdim=True, unbiased=False) + self.eps).detach()

    def _normalize(self, x):
        x = x - self.mean
        x = x / self.stdev
        if self.affine:
            # Reshape affine parameters to broadcast properly
            # For input [B, C, T] and statistics [B, 1, T], 
            # reshape affine params [C] to [1, C, 1] for broadcasting
            weight = self.affine_weight.view(1, -1, 1)
            bias = self.affine_bias.view(1, -1, 1)
            x = x * weight
            x = x + bias
        return x

    def _denormalize(self, x):
        if self.affine:
            # Reshape affine parameters to broadcast properly
            weight = self.affine_weight.view(1, -1, 1)
            bias = self.affine_bias.view(1, -1, 1)
            x = x - bias
            x = x / (weight + self.eps*self.eps)
        x = x * self.stdev
        x = x + self.mean
        return x

In [None]:
# ========== DATA TRANSFORMER IMPLEMENTATION ==========
class DataTransformationConfig:
    """
    Configuration class for data transformation parameters
    """
    def __init__(self, patch_size: int, sequence_length: int, num_variables: int):
        """
        Args:
            patch_size: Size of each temporal patch (P)
            sequence_length: Total input sequence length (L)
            num_variables: Number of time series variables (M)
        """
        self.patch_size = patch_size
        self.sequence_length = sequence_length
        self.num_variables = num_variables
        
        # Validate and calculate derived parameters
        self._validate()
        self.num_patches = self.sequence_length // self.patch_size
        self.segment_length = self.num_variables * self.patch_size
    
    def _validate(self):
        """Validate configuration parameters"""
        if self.patch_size <= 0:
            raise ValueError(f"Patch size must be positive, got {self.patch_size}")
        if self.sequence_length % self.patch_size != 0:
            raise ValueError(f"Sequence length {self.sequence_length} must be divisible by patch size {self.patch_size}")
        if self.num_variables <= 0:
            raise ValueError(f"Number of variables must be positive, got {self.num_variables}")


class PSformerDataTransformer:
    """
    Data transformation utility for PSformer that handles conversion between
    standard time series format and PSformer's segment-based representation.
    """
    
    def __init__(self, config: DataTransformationConfig):
        """
        Args:
            config: DataTransformationConfig instance with transformation parameters
        """
        self.config = config
        self._validate_configuration()
    
    def _validate_configuration(self):
        """Validate the configuration"""
        if not isinstance(self.config, DataTransformationConfig):
            raise TypeError("config must be an instance of DataTransformationConfig")
    
    def create_patches(self, input_tensor: torch.Tensor) -> torch.Tensor:
        """
        Transform: [batch, variables, sequence] -> [batch, variables, num_patches, patch_size]
        """
        # Input validation
        if input_tensor.dim() != 3:
            raise ValueError(f"Input tensor must be 3-dimensional, got {input_tensor.dim()}")
            
        batch, variables, sequence = input_tensor.shape
        
        if sequence != self.config.sequence_length:
            raise ValueError(f"Input sequence length {sequence} does not match configured length {self.config.sequence_length}")
            
        if variables != self.config.num_variables:
            raise ValueError(f"Input variables count {variables} does not match configured count {self.config.num_variables}")
        
        # Reshape sequence dimension to split into patches
        # [batch, variables, sequence] -> [batch, variables, num_patches, patch_size]
        patched = input_tensor.view(batch, variables, self.config.num_patches, self.config.patch_size)
        
        return patched
    
    def create_segments(self, patched_tensor: torch.Tensor) -> torch.Tensor:
        """
        Transform: [batch, variables, num_patches, patch_size] -> [batch, num_patches, segment_length]
        """
        # Input validation
        if patched_tensor.dim() != 4:
            raise ValueError(f"Patched tensor must be 4-dimensional, got {patched_tensor.dim()}")
            
        batch, variables, num_patches, patch_size = patched_tensor.shape
        
        # Step 1: Transpose to put patches before variables
        # [batch, variables, num_patches, patch_size] -> [batch, num_patches, variables, patch_size]
        transposed = patched_tensor.transpose(1, 2)
        
        # Step 2: Reshape to create segments by concatenating all variables for each patch
        # [batch, num_patches, variables, patch_size] -> [batch, num_patches, variables*patch_size]
        segments = transposed.contiguous().view(batch, num_patches, self.config.segment_length)
        
        return segments
    
    def restore_shape(self, segment_tensor: torch.Tensor, target_sequence_length: int) -> torch.Tensor:
        """
        Transform: [batch, num_patches, segment_length] -> [batch, variables, sequence]
        """
        # Input validation
        if segment_tensor.dim() != 3:
            raise ValueError(f"Segment tensor must be 3-dimensional, got {segment_tensor.dim()}")
            
        batch, num_patches, segment_length = segment_tensor.shape
        
        # Step 1: Reshape segments back to separate variables and patches
        # [batch, num_patches, segment_length] -> [batch, num_patches, variables, patch_size]
        reshaped = segment_tensor.view(batch, num_patches, self.config.num_variables, self.config.patch_size)
        
        # Step 2: Transpose to put variables before patches
        # [batch, num_patches, variables, patch_size] -> [batch, variables, num_patches, patch_size]
        transposed = reshaped.transpose(1, 2)
        
        # Step 3: Reshape to flatten patches back into sequence
        # [batch, variables, num_patches, patch_size] -> [batch, variables, num_patches*patch_size]
        output = transposed.contiguous().view(batch, self.config.num_variables, target_sequence_length)
        
        return output
    
    def forward_transform(self, input_tensor: torch.Tensor) -> torch.Tensor:
        """
        Complete pipeline: input -> patches -> segments
        """
        patches = self.create_patches(input_tensor)
        segments = self.create_segments(patches)
        return segments
    
    def inverse_transform(self, segment_tensor: torch.Tensor, target_sequence_length: int) -> torch.Tensor:
        """
        Complete pipeline: segments -> patches -> output
        """
        return self.restore_shape(segment_tensor, target_sequence_length)


def create_transformer_for_psformer(sequence_length: int, num_variables: int, patch_size: int) -> PSformerDataTransformer:
    """
    Factory method with PSformer-specific defaults
    """
    config = DataTransformationConfig(
        patch_size=patch_size,
        sequence_length=sequence_length,
        num_variables=num_variables
    )
    return PSformerDataTransformer(config)


def get_psformer_dimensions(transformer: PSformerDataTransformer) -> dict:
    """
    Return the key dimensions that PSformer encoder needs
    """
    return {
        'C': transformer.config.segment_length,      # This becomes the feature dimension for attention
        'N': transformer.config.num_patches,         # This becomes the sequence dimension for attention
        'segment_shape': (None, transformer.config.num_patches, transformer.config.segment_length)  # Final shape for PSformer input
    }

In [None]:
# ========== ATTENTION MECHANISM IMPLEMENTATION ==========
class ScaledDotProductAttention(nn.Module):
    """
    Scaled Dot-Product Attention mechanism as described in the Attention Is All You Need paper.
    Implements: Attention(Q, K, V) = softmax(QK^T / sqrt(dk)) * V
    """
    
    def __init__(self, dropout_rate: float = 0.0):
        """
        Args:
            dropout_rate: Dropout rate for attention weights
        """
        super().__init__()
        self.dropout = nn.Dropout(dropout_rate) if dropout_rate > 0 else None
        self.scale = None
    
    def forward(self, Q: torch.Tensor, K: torch.Tensor, V: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Compute scaled dot-product attention.
        """
        # Validate input dimensions
        if Q.dim() != 3 or K.dim() != 3 or V.dim() != 3:
            raise ValueError("Q, K, and V must all be 3-dimensional tensors")
        
        if Q.shape[0] != K.shape[0] or Q.shape[0] != V.shape[0]:
            raise ValueError("Batch dimensions of Q, K, and V must match")
            
        if K.shape[1] != V.shape[1]:
            raise ValueError("Number of keys in K must match number of values in V")
            
        if Q.shape[2] != K.shape[2]:
            raise ValueError("Embedding dimensions of Q and K must match")
        
        # Get the embedding dimension
        dk = K.shape[2]
        self.scale = torch.sqrt(torch.tensor(dk, dtype=torch.float32, device=Q.device))
        
        # Compute attention scores: Q @ K^T
        scores = torch.matmul(Q, K.transpose(-2, -1))  # [batch, num_queries, num_keys]
        
        # Scale the scores
        scaled_scores = scores / self.scale
        
        # Apply softmax to get attention weights
        attention_weights = F.softmax(scaled_scores, dim=-1)
        
        # Apply dropout if specified
        if self.dropout is not None:
            attention_weights = self.dropout(attention_weights)
        
        # Apply attention to values
        output = torch.matmul(attention_weights, V)  # [batch, num_queries, dv]
        
        return output, attention_weights


class SegmentAttentionStage(nn.Module):
    """
    Single stage of segment attention using a shared PS Block to generate Q, K, V.
    """
    
    def __init__(self, ps_block: PSBlock, use_dropout: bool = False):
        """
        Args:
            ps_block: Shared PS Block used to generate Q, K, V
            use_dropout: Whether to use dropout in the attention mechanism
        """
        super().__init__()
        self.ps_block = ps_block
        self.attention = ScaledDotProductAttention(dropout_rate=0.1 if use_dropout else 0.0)
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Forward pass through the segment attention stage.
        """
        # Validate input shape
        if x.dim() != 3:
            raise ValueError(f"Input tensor must be 3-dimensional, got {x.dim()}")
        
        batch, N, C = x.shape
        
        # Generate Q, K, V using the shared PS Block
        # In PSformer, all three come from the same PS Block output
        ps_output = self.ps_block(x)  # [batch, N, C]
        
        # According to paper: attention operates across C dimension (spatial-temporal features)
        # Transpose to [batch, C, N] so attention is computed across C dimension
        Q = ps_output.transpose(-2, -1).contiguous()  # [batch, C, N]
        K = ps_output.transpose(-2, -1).contiguous()  # [batch, C, N] 
        V = ps_output.transpose(-2, -1).contiguous()  # [batch, C, N]
        
        # Apply attention - this will create [batch, C, C] attention matrix
        attention_output, attention_weights = self.attention(Q, K, V)
        
        # Transpose back to [batch, N, C] to maintain output format
        attention_output = attention_output.transpose(-2, -1).contiguous()  # [batch, N, C]
        
        return attention_output, attention_weights


class TwoStageSegmentAttention(nn.Module):
    """
    Two-stage segment attention mechanism as described in the PSformer paper.
    """
    
    def __init__(self, ps_block: PSBlock):
        """
        Args:
            ps_block: Shared PS Block used across both attention stages
        """
        super().__init__()
        self.ps_block = ps_block  # Single shared PS Block
        self.stage1 = SegmentAttentionStage(ps_block)
        self.stage2 = SegmentAttentionStage(ps_block)
        self.activation = nn.ReLU()
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
        """
        Forward pass through the two-stage segment attention.
        """
        # Validate input shape
        if x.dim() != 3:
            raise ValueError(f"Input tensor must be 3-dimensional, got {x.dim()}")
        
        # Stage 1
        stage1_output, stage1_weights = self.stage1(x)
        
        # ReLU activation between stages
        activated_output = self.activation(stage1_output)
        
        # Stage 2
        stage2_output, stage2_weights = self.stage2(activated_output)
        
        return stage2_output, (stage1_weights, stage2_weights)


class PSformerEncoderLayer(nn.Module):
    """
    Single layer of the PSformer encoder.
    """
    
    def __init__(self, ps_block: PSBlock):
        """
        Args:
            ps_block: Shared PS Block used in all components of this layer
        """
        super().__init__()
        self.ps_block = ps_block  # Shared across all components
        self.two_stage_attention = TwoStageSegmentAttention(ps_block)
        self.final_ps_block = ps_block  # Same instance for final transformation
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
        """
        Forward pass through the PSformer encoder layer.
        """
        # Validate input shape
        if x.dim() != 3:
            raise ValueError(f"Input tensor must be 3-dimensional, got {x.dim()}")
        
        batch, N, C = x.shape
        
        # Two-stage attention
        attention_output, attention_weights = self.two_stage_attention(x)
        
        # Residual connection
        residual_output = attention_output + x
        
        # Final PS Block processing
        output = self.final_ps_block(residual_output)
        
        return output, attention_weights


class PSformerEncoder(nn.Module):
    """
    Complete PSformer encoder with multiple layers.
    """
    
    def __init__(self, num_layers: int, segment_length: int):
        """
        Args:
            num_layers: Number of encoder layers
            segment_length: Length of each segment (C = M * P)
        """
        super().__init__()
        # Each layer has its own PS Block
        self.layers = nn.ModuleList()
        for i in range(num_layers):
            ps_block = PSBlock(N=segment_length)
            encoder_layer = PSformerEncoderLayer(ps_block)
            self.layers.append(encoder_layer)
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, list]:
        """
        Forward pass through the PSformer encoder.
        """
        # Validate input shape
        if x.dim() != 3:
            raise ValueError(f"Input tensor must be 3-dimensional, got {x.dim()}")
        
        attention_weights_list = []
        
        # Process through each layer
        for layer in self.layers:
            x, weights = layer(x)
            attention_weights_list.append(weights)
        
        return x, attention_weights_list

In [None]:
# ========== PSFORMER MAIN MODEL IMPLEMENTATION ==========
class PSformerConfig:
    """
    Configuration class for PSformer model parameters
    """
    def __init__(self, 
                 sequence_length: int,
                 num_variables: int, 
                 patch_size: int,
                 num_encoder_layers: int,
                 prediction_length: int,
                 affine_revin: bool = True,
                 revin_eps: float = 1e-5):
        """
        Args:
            sequence_length: Total input sequence length (L)
            num_variables: Number of time series variables (M)
            patch_size: Size of each temporal patch (P)
            num_encoder_layers: Number of PSformer encoder layers
            prediction_length: Length of prediction horizon (F)
            affine_revin: Whether to use learnable affine parameters in RevIN
            revin_eps: Small value for numerical stability in RevIN
        """
        self.sequence_length = sequence_length
        self.num_variables = num_variables
        self.patch_size = patch_size
        self.num_encoder_layers = num_encoder_layers
        self.prediction_length = prediction_length
        self.affine_revin = affine_revin
        self.revin_eps = revin_eps
        
        # Validate configuration
        self._validate()
    
    def _validate(self):
        """Validate configuration parameters"""
        if self.sequence_length % self.patch_size != 0:
            raise ValueError(f"Sequence length {self.sequence_length} must be divisible by patch size {self.patch_size}")
        if self.num_variables <= 0:
            raise ValueError(f"Number of variables must be positive, got {self.num_variables}")
        if self.patch_size <= 0:
            raise ValueError(f"Patch size must be positive, got {self.patch_size}")
        if self.num_encoder_layers <= 0:
            raise ValueError(f"Number of encoder layers must be positive, got {self.num_encoder_layers}")
        if self.prediction_length <= 0:
            raise ValueError(f"Prediction length must be positive, got {self.prediction_length}")


class PSformer(nn.Module):
    """
    Main PSformer model implementing the complete input processing pipeline.
    
    Architecture: Raw Input ‚Üí RevIN Normalization ‚Üí Data Transformation ‚Üí PSformer Encoder
    
    Based on the PSformer paper architecture described in Section 3.2 and Figures 1 & 2.
    """
    
    def __init__(self, config: PSformerConfig):
        """
        Initialize PSformer model with all components.
        
        Args:
            config: PSformerConfig instance containing model parameters
        """
        super().__init__()
        self.config = config
        
        # 1. Instantiate the Reversible Instance Normalization (RevIN) layer
        # It normalizes each variable independently
        self.revin_layer = RevIN(
            num_features=config.num_variables,
            eps=config.revin_eps,
            affine=config.affine_revin
        )
        
        # 2. Instantiate the Data Transformer
        # This handles patching and segment creation
        self.data_transformer = create_transformer_for_psformer(
            sequence_length=config.sequence_length,
            num_variables=config.num_variables,
            patch_size=config.patch_size
        )
        
        # 3. Get key dimensions from the data transformer
        # This ensures the encoder is built with correct C and N dimensions
        psformer_dims = get_psformer_dimensions(self.data_transformer)
        # C = segment_length = M * P
        # N = num_patches = L / P
        
        # 4. Instantiate the PSformer Encoder
        # The encoder uses the segment_length (C) as the feature dimension for attention
        self.encoder = PSformerEncoder(
            num_layers=config.num_encoder_layers,
            segment_length=psformer_dims['C']
        )
        
        # 5. Add the final linear projection layer for forecasting
        # This layer maps the sequence length (L) to the prediction length (F)
        # Paper reference: X_pred = X_out * W_F where W_F ‚àà R^(L√óF)
        self.output_projection = nn.Linear(
            in_features=config.sequence_length,
            out_features=config.prediction_length
        )
        
        # Store dimensions for easy access
        self.psformer_dims = psformer_dims
    
    def forward(self, raw_input_tensor: torch.Tensor) -> torch.Tensor:
        """
        Forward pass implementing the complete pipeline: 
        Input -> Normalization -> Transformation -> Encoder -> Inverse Transformation -> Projection -> Inverse Normalization -> Output
        
        Args:
            raw_input_tensor: Input tensor of shape [batch_size, num_variables, sequence_length]
                             or [batch, M, L] as per the paper notation
        
        Returns:
            final_predictions: Tensor of shape [batch, M, F] where F is prediction_length
        """
        # Validate input tensor
        self._validate_input(raw_input_tensor)
        
        # ----- START OF INPUT PROCESSING PIPELINE -----
        
        # STEP 1: NORMALIZATION
        # Apply RevIN layer in 'norm' mode to the raw input
        # This is the first step shown in Figure 1 and mentioned in Section 3.2
        # The statistics (mean, stdev) are automatically stored inside self.revin_layer
        normalized_input = self.revin_layer(raw_input_tensor, mode='norm')
        
        # STEP 2: DATA TRANSFORMATION (Patching & Segmenting)
        # Use the data_transformer's forward_transform method
        # This transforms the input from [batch, M, L] to [batch, N, C]
        encoder_ready_data = self.data_transformer.forward_transform(normalized_input)
        
        # STEP 3: ENCODER PROCESSING
        # Feed the prepared data into the encoder
        # The encoder returns its final output and a list of attention weights
        encoder_output, attention_weights_list = self.encoder(encoder_ready_data)
        
        # ----- END OF INPUT PROCESSING, START OF OUTPUT PIPELINE -----
        
        # STEP 4.1: INVERSE DATA TRANSFORMATION
        # Description: Convert the encoder's segmented output back to time series format.
        # Paper reference: "Inverse Transformation" block in Fig 1 & 2.
        reshaped_output = self.data_transformer.inverse_transform(
            encoder_output, 
            self.config.sequence_length
        )  # Shape: [B, M, L]
        
        # STEP 4.2: LINEAR PROJECTION FOR FORECASTING
        # Description: Project the restored sequence to the desired prediction horizon.
        # Paper reference: "Linear Mapping" (Fig 1) or WF matrix multiplication (Section 3.2).
        projected_output = self.output_projection(reshaped_output)  # Shape: [B, M, F]
        
        # STEP 4.3: INVERSE NORMALIZATION (RevIN Denorm)
        # Description: Scale the forecast back to the original data distribution.
        # Paper reference: "Inverse RevIN" (Fig 1) or "RevIN‚Åª¬π" (Fig 2).
        final_predictions = self.revin_layer(projected_output, mode='denorm')  # Shape: [B, M, F]
        
        return final_predictions
    
    def _validate_input(self, input_tensor: torch.Tensor):
        """
        Validate the input tensor shape and properties.
        """
        if input_tensor.dim() != 3:
            raise ValueError(f"Input tensor must be 3-dimensional [batch, variables, sequence], got {input_tensor.dim()}D tensor")
        
        batch, variables, sequence = input_tensor.shape
        
        if variables != self.config.num_variables:
            raise ValueError(f"Input variables count {variables} does not match configured count {self.config.num_variables}")
        
        if sequence != self.config.sequence_length:
            raise ValueError(f"Input sequence length {sequence} does not match configured length {self.config.sequence_length}")


def create_psformer_model(sequence_length: int, 
                         num_variables: int, 
                         patch_size: int, 
                         num_encoder_layers: int,
                         prediction_length: int,
                         **kwargs) -> PSformer:
    """
    Factory function to create a PSformer model with default configuration.
    """
    config = PSformerConfig(
        sequence_length=sequence_length,
        num_variables=num_variables,
        patch_size=patch_size,
        num_encoder_layers=num_encoder_layers,
        prediction_length=prediction_length,
        **kwargs
    )
    return PSformer(config)

print("‚úÖ PSformer model implementation completed!")

# Helper Function for Data Splitting

This function separates the time series of each stock into two distinct parts:
1. **Model Input:** The historical data the model will use to make a forecast
2. **Validation Ground Truth:** The most recent 30 days of data, which the model will not see

In [None]:
def prepare_ticker_data(ticker_df: pd.DataFrame) -> tuple:
    """
    Prepare data for a single ticker by splitting into model input and validation sets.
    
    Args:
        ticker_df: DataFrame containing data for a single ticker
        
    Returns:
        Tuple of (model_input_df, validation_ground_truth_df) or (None, None) if insufficient data
    """
    # Check if there's enough data
    if len(ticker_df) < MIN_DATA_POINTS:
        return None, None
    
    # Sort by date to ensure chronological order
    ticker_df = ticker_df.sort_values(DATE_COLUMN).reset_index(drop=True)
    
    # Split the data
    # Validation set: last PREDICTION_LENGTH rows
    validation_ground_truth = ticker_df.tail(PREDICTION_LENGTH).copy()
    
    # Model input set: SEQUENCE_LENGTH rows just before the validation set
    end_idx = len(ticker_df) - PREDICTION_LENGTH
    start_idx = end_idx - SEQUENCE_LENGTH
    
    if start_idx < 0:
        return None, None
    
    model_input = ticker_df.iloc[start_idx:end_idx].copy()
    
    return model_input, validation_ground_truth


def prepare_tensor_from_dataframe(df: pd.DataFrame) -> torch.Tensor:
    """
    Convert DataFrame to PyTorch tensor in the format expected by PSformer.
    
    Args:
        df: DataFrame with OHLCV columns
        
    Returns:
        Tensor of shape [1, num_variables, sequence_length]
    """
    # Extract OHLCV values
    values = df[OHLCV_COLUMNS].values  # Shape: [sequence_length, num_variables]
    
    # Convert to tensor and transpose
    tensor = torch.tensor(values, dtype=torch.float32)  # [sequence_length, num_variables]
    tensor = tensor.transpose(0, 1)  # [num_variables, sequence_length]
    tensor = tensor.unsqueeze(0)  # [1, num_variables, sequence_length]
    
    return tensor.to(DEVICE)


def tensor_to_dataframe(tensor: torch.Tensor, dates: pd.DatetimeIndex, ticker: str) -> pd.DataFrame:
    """
    Convert prediction tensor back to DataFrame format.
    
    Args:
        tensor: Prediction tensor of shape [1, num_variables, prediction_length]
        dates: Date index for the predictions
        ticker: Ticker symbol
        
    Returns:
        DataFrame with prediction results
    """
    # Convert tensor to numpy and transpose
    predictions = tensor.squeeze(0).cpu().numpy()  # [num_variables, prediction_length]
    predictions = predictions.transpose()  # [prediction_length, num_variables]
    
    # Create DataFrame
    df = pd.DataFrame(predictions, columns=[f"{col}_predicted" for col in OHLCV_COLUMNS])
    df[DATE_COLUMN] = dates
    df[TICKER_COLUMN] = ticker
    
    return df

print("‚úÖ Data preparation functions ready!")

# Run Prediction and Validation Loop

This loop performs a backtest for each ticker by forecasting the hold-out period and comparing it to the actual known data.

In [None]:
# Load the main CSV file
try:
    print(f"Loading data from {DATA_FILE_PATH}...")
    df = pd.read_csv(DATA_FILE_PATH)
    print(f"Data loaded successfully! Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    
    # Convert date column to datetime
    df[DATE_COLUMN] = pd.to_datetime(df[DATE_COLUMN])
    
    # Get unique tickers
    unique_tickers = df[TICKER_COLUMN].unique()
    print(f"Found {len(unique_tickers)} unique tickers: {list(unique_tickers)[:10]}{'...' if len(unique_tickers) > 10 else ''}")
    
except Exception as e:
    print(f"‚ùå Error loading data: {e}")
    print("Please make sure your CSV file is uploaded and the path is correct.")
    raise

# Initialize results storage
all_validation_results = []
successful_predictions = 0
skipped_tickers = []

print(f"\nüöÄ Starting prediction loop for {len(unique_tickers)} tickers...")
print(f"Minimum data requirement: {MIN_DATA_POINTS} days\n")

# Loop through each ticker
for i, ticker in enumerate(unique_tickers):
    print(f"[{i+1}/{len(unique_tickers)}] Processing {ticker}...", end=" ")
    
    try:
        # Get data for current ticker
        ticker_data = df[df[TICKER_COLUMN] == ticker].copy()
        
        # Prepare data splits
        model_input, ground_truth = prepare_ticker_data(ticker_data)
        
        # Check if we have enough data
        if model_input is None or ground_truth is None:
            print(f"‚ùå Insufficient data ({len(ticker_data)} days)")
            skipped_tickers.append(ticker)
            continue
        
        # Create PSformer model (with random weights)
        config = PSformerConfig(
            sequence_length=SEQUENCE_LENGTH,
            num_variables=NUM_VARIABLES,
            patch_size=PATCH_SIZE,
            num_encoder_layers=NUM_ENCODER_LAYERS,
            prediction_length=PREDICTION_LENGTH
        )
        model = PSformer(config).to(DEVICE)
        model.eval()  # Set to evaluation mode
        
        # Prepare input tensor
        input_tensor = prepare_tensor_from_dataframe(model_input)
        
        # Make prediction
        with torch.no_grad():
            prediction_tensor = model(input_tensor)
        
        # Generate future dates for predictions
        last_date = ground_truth[DATE_COLUMN].iloc[0]
        prediction_dates = pd.date_range(start=last_date, periods=PREDICTION_LENGTH, freq='D')
        
        # Convert prediction to DataFrame
        prediction_df = tensor_to_dataframe(prediction_tensor, prediction_dates, ticker)
        
        # Prepare ground truth for comparison
        ground_truth_comparison = ground_truth[[DATE_COLUMN, TICKER_COLUMN] + OHLCV_COLUMNS].copy()
        ground_truth_comparison.columns = [DATE_COLUMN, TICKER_COLUMN] + [f"{col}_actual" for col in OHLCV_COLUMNS]
        
        # Merge predictions with ground truth
        comparison_df = pd.merge(ground_truth_comparison, prediction_df, on=[DATE_COLUMN, TICKER_COLUMN], how='inner')
        
        # Add to results
        all_validation_results.append(comparison_df)
        successful_predictions += 1
        
        print(f"‚úÖ Success ({len(model_input)} -> {len(ground_truth)} days)")
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        skipped_tickers.append(ticker)
        continue

print(f"\nüìä Prediction Summary:")
print(f"- Successful predictions: {successful_predictions}")
print(f"- Skipped tickers: {len(skipped_tickers)}")
if skipped_tickers:
    print(f"- Skipped: {skipped_tickers[:5]}{'...' if len(skipped_tickers) > 5 else ''}")

# Analyze and Visualize Validation Results

This final section brings all the results together and provides both quantitative and visual analysis of the model's performance on the hold-out data.

In [None]:
if len(all_validation_results) == 0:
    print("‚ùå No successful predictions to analyze!")
else:
    # ========== CONSOLIDATE RESULTS ==========
    print("üìà Consolidating results...")
    consolidated_results = pd.concat(all_validation_results, ignore_index=True)
    print(f"Total predictions: {len(consolidated_results)} rows")
    print(f"Date range: {consolidated_results[DATE_COLUMN].min()} to {consolidated_results[DATE_COLUMN].max()}")
    
    # ========== QUANTITATIVE ANALYSIS ==========
    print("\nüìä Calculating error metrics...")
    
    # Calculate MAE for each ticker and feature
    error_metrics = []
    
    for ticker in consolidated_results[TICKER_COLUMN].unique():
        ticker_data = consolidated_results[consolidated_results[TICKER_COLUMN] == ticker]
        
        ticker_metrics = {'Ticker': ticker}
        
        for feature in OHLCV_COLUMNS:
            actual_col = f"{feature}_actual"
            predicted_col = f"{feature}_predicted"
            
            if actual_col in ticker_data.columns and predicted_col in ticker_data.columns:
                mae = np.mean(np.abs(ticker_data[actual_col] - ticker_data[predicted_col]))
                ticker_metrics[f"{feature}_MAE"] = mae
        
        error_metrics.append(ticker_metrics)
    
    error_df = pd.DataFrame(error_metrics)
    
    # Display summary statistics
    print("\nüìã Error Summary (Mean Absolute Error):")
    print("=" * 50)
    
    for feature in OHLCV_COLUMNS:
        mae_col = f"{feature}_MAE"
        if mae_col in error_df.columns:
            mean_mae = error_df[mae_col].mean()
            std_mae = error_df[mae_col].std()
            print(f"{feature:>6}: {mean_mae:.4f} ¬± {std_mae:.4f}")
    
    # Show top 5 and bottom 5 performers for Close price
    if 'Close_MAE' in error_df.columns:
        print("\nüèÜ Best Close Price Predictions (Lowest MAE):")
        best_performers = error_df.nsmallest(5, 'Close_MAE')[['Ticker', 'Close_MAE']]
        print(best_performers.to_string(index=False))
        
        print("\nüìâ Worst Close Price Predictions (Highest MAE):")
        worst_performers = error_df.nlargest(5, 'Close_MAE')[['Ticker', 'Close_MAE']]
        print(worst_performers.to_string(index=False))
    
    # ========== VISUAL ANALYSIS ==========
    print("\nüìä Creating visualizations...")
    
    # Select a few tickers for detailed visualization
    sample_tickers = consolidated_results[TICKER_COLUMN].unique()[:3]  # First 3 tickers
    
    # Create subplot for each ticker
    fig, axes = plt.subplots(len(sample_tickers), 1, figsize=(12, 4 * len(sample_tickers)))
    if len(sample_tickers) == 1:
        axes = [axes]
    
    for i, ticker in enumerate(sample_tickers):
        ticker_data = consolidated_results[consolidated_results[TICKER_COLUMN] == ticker]
        ticker_data = ticker_data.sort_values(DATE_COLUMN)
        
        ax = axes[i]
        
        # Plot actual vs predicted Close prices
        ax.plot(ticker_data[DATE_COLUMN], ticker_data['Close_actual'], 
                label='Actual Close', color='blue', linewidth=2, marker='o')
        ax.plot(ticker_data[DATE_COLUMN], ticker_data['Close_predicted'], 
                label='Predicted Close', color='red', linewidth=2, marker='s')
        
        ax.set_title(f'{ticker} - Actual vs Predicted Close Price', fontsize=14, fontweight='bold')
        ax.set_xlabel('Date')
        ax.set_ylabel('Price')
        ax.legend()
        ax.grid(True, alpha=0.3)
        
        # Rotate x-axis labels for better readability
        ax.tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # Interactive plot with Plotly (if available)
    try:
        print("\nüìä Creating interactive plot...")
        
        # Select one ticker for detailed interactive analysis
        sample_ticker = sample_tickers[0]
        sample_data = consolidated_results[consolidated_results[TICKER_COLUMN] == sample_ticker].sort_values(DATE_COLUMN)
        
        fig_plotly = go.Figure()
        
        # Add actual prices
        fig_plotly.add_trace(go.Scatter(
            x=sample_data[DATE_COLUMN],
            y=sample_data['Close_actual'],
            mode='lines+markers',
            name='Actual Close',
            line=dict(color='blue', width=2)
        ))
        
        # Add predicted prices
        fig_plotly.add_trace(go.Scatter(
            x=sample_data[DATE_COLUMN],
            y=sample_data['Close_predicted'],
            mode='lines+markers',
            name='Predicted Close',
            line=dict(color='red', width=2, dash='dash')
        ))
        
        fig_plotly.update_layout(
            title=f'{sample_ticker} - Interactive Actual vs Predicted Close Price',
            xaxis_title='Date',
            yaxis_title='Price',
            hovermode='x unified'
        )
        
        fig_plotly.show()
        
    except Exception as e:
        print(f"Note: Interactive plot not available: {e}")
    
    # ========== SAVE RESULTS ==========
    print("\nüíæ Saving results...")
    
    # Save consolidated validation results
    validation_filename = 'validation_results.csv'
    consolidated_results.to_csv(validation_filename, index=False)
    print(f"‚úÖ Validation results saved to: {validation_filename}")
    
    # Save error metrics
    error_filename = 'error_metrics.csv'
    error_df.to_csv(error_filename, index=False)
    print(f"‚úÖ Error metrics saved to: {error_filename}")
    
    print("\nüéâ Analysis complete!")
    print(f"\nüìà Summary:")
    print(f"- Processed {len(sample_tickers)} tickers for visualization")
    print(f"- Total validation samples: {len(consolidated_results)}")
    print(f"- Files saved: {validation_filename}, {error_filename}")
    
    # Display sample of results
    print("\nüìã Sample Results:")
    print(consolidated_results.head(10).to_string(index=False))

# Summary and Next Steps

## What We Accomplished

‚úÖ **Model Implementation**: Complete PSformer architecture with all components  
‚úÖ **Multi-ticker Processing**: Automated prediction pipeline for multiple stocks  
‚úÖ **Backtest Validation**: Proper train/test split with held-out validation data  
‚úÖ **Error Analysis**: Quantitative metrics (MAE) for each ticker and feature  
‚úÖ **Visualization**: Actual vs predicted price comparisons  
‚úÖ **Data Export**: Results saved for further analysis  

## Important Notes

‚ö†Ô∏è **Random Weights**: This model uses untrained, random weights for demonstration purposes  
‚ö†Ô∏è **Training Required**: For real-world use, the model needs proper training on historical data  
‚ö†Ô∏è **Validation Results**: Current predictions are random and should not be used for trading  

## Next Steps for Real Implementation

1. **Data Collection**: Gather comprehensive historical stock data
2. **Model Training**: Implement proper training loop with loss functions
3. **Hyperparameter Tuning**: Optimize sequence length, patch size, etc.
4. **Cross-validation**: Implement rolling window validation
5. **Feature Engineering**: Add technical indicators, market sentiment
6. **Risk Management**: Implement position sizing and stop-loss mechanisms

## Files Generated

- `validation_results.csv`: Detailed predictions vs actual values
- `error_metrics.csv`: Performance metrics by ticker
- This notebook: Complete implementation ready for training

---

*This notebook demonstrates the PSformer architecture and provides a foundation for building a production-ready stock forecasting system.*