<a href="https://colab.research.google.com/github/enriquedlh97/project-cs-1090b/blob/main/MS2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Project Proposal Summary

**Objective:**  
Develop a deep learning model using a Transformer architecture to predict the distribution of future stock returns. The model's outputs will be used to inform actionable trading decisions through a simulated trading strategy.

**Data Sources:**  
- **Stock Data:**  
  - IBM daily OHLCV data (from 1998–2025) via Yahoo Finance.  
- **Sector and Market Data:**  
  - Major sector ETFs (e.g., XLK, XLF, XLV, XLY, etc.)  
  - S&P 500 Index, VIX, and 10-Year Treasury Yield  
- **Macroeconomic Indicators:**  
  - CPI and Unemployment Rate from FRED (resampled to daily frequency)

**Feature Engineering:**  
- **Technical Indicators:**  
  - Momentum: RSI, MACD, ROC  
  - Volatility: 20-day Rolling Standard Deviation, ATR, Bollinger Bands  
  - Trend: SMA and EMA (20-day, 50-day)  
  - Volume-Based: OBV, MFI  
- **Returns:**  
  - Daily log returns, 5-day and 10-day returns, and rolling mean of returns

**Modeling Task:**  
- **Target Definition Options:**  
  - **Parametric Approach:** Predict parameters (mean and variance) assuming a Gaussian distribution.  
  - **Quantile Regression:** Predict multiple quantiles (e.g., 10th, 50th, 90th percentiles) to capture the full return distribution.
- **Architecture:**  
  - A Transformer model that processes input sequences (e.g., 30-day rolling windows) and outputs the chosen target (either distribution parameters or quantiles).
- **Loss Functions:**  
  - Negative log-likelihood loss for the parametric model or quantile loss (pinball loss) for quantile regression.

**Evaluation:**  
- **Standard Metrics:**  
  - Use MAE, RMSE, and calibration metrics to assess prediction accuracy.
- **Trading Simulation:**  
  - Implement a backtesting simulator that uses the predicted distribution to generate trading signals.  
  - Evaluate performance using portfolio metrics such as total return, Sharpe ratio, and maximum drawdown.

**Workflow Overview:**  
1. **Data Collection & Preprocessing:**  
   Download and merge IBM, sector ETFs, market indices, and macroeconomic data.
2. **Feature Engineering:**  
   Compute technical indicators and lagged return features from the OHLCV data.
3. **Sequence Preparation:**  
   Create sliding windows from the daily data to form input sequences.
4. **Model Training:**  
   Train a Transformer model to predict the future return distribution.
5. **Evaluation & Simulation:**  
   Assess model performance with statistical metrics and simulate trading performance in a realistic backtest.


## Data Collection

### Date Range  
- **Period:** December 22, 1998 to April 2, 2025  
  This range ensures we capture over 25 years of trading data, covering multiple market regimes (e.g., tech bubbles, financial crises, recovery periods), which is crucial for building a robust model.

### Data Sources

1. **IBM Stock Data**  
   - **Source:** Yahoo Finance  
   - **Content:** Daily OHLCV (Open, High, Low, Close, Volume) for IBM  
   - **Relevance:**  
     IBM serves as our primary asset—it's a well-established blue-chip stock with a long trading history. We focus on IBM as the core subject for predicting return distributions.

2. **Sector ETFs**  
   - **Source:** Yahoo Finance  
   - **Tickers:** XLK, XLF, XLV, XLY, XLP, XLE, XLI, XLU, XLB  
   - **Relevance:**  
     These ETFs represent major industry sectors (e.g., Technology, Financials, Health Care, Consumer Discretionary, Energy, etc.). They provide additional signals about the broader economic and sector-specific trends that might influence IBM’s performance. For instance, if the technology sector (XLK) is outperforming, it could be a positive signal for IBM.

3. **Market Indices and Macro Signals**  
   - **S&P 500 (Ticker: ^GSPC):**  
     Serves as a benchmark for the overall market performance.  
   - **VIX (Ticker: ^VIX):**  
     Measures market volatility, often referred to as the “fear index,” and provides insights into market sentiment.  
   - **10-Year Treasury Yield (Ticker: ^TNX):**  
     Reflects prevailing interest rates and risk-free returns, influencing investor behavior and equity valuations.
   
   All are obtained from Yahoo Finance, complementing the stock and sector data by offering a broader market context.

4. **Macroeconomic Indicators (from FRED)**  
   - **CPI (Consumer Price Index):**  
     Represents inflation levels and the cost of living.  
   - **Unemployment Rate:**  
     Indicates overall economic health and labor market conditions.
   
   **Relevance:**  
   Although reported on a monthly basis, these indicators are critical for understanding the macroeconomic environment in which stocks trade. They are resampled to daily frequency (using forward fill) to match the rest of our dataset.

---

## Handling Missing Values

- **Issue:**  
  Macroeconomic indicators like CPI and Unemployment are reported monthly. When these series are resampled to daily frequency, the days before the first report in our range result in NaNs.

- **Solution:**  
  We applied **forward-fill** to propagate the most recent available value forward. In addition, **back-fill** was used after the forward-fill step to fill any remaining gaps. This ensures that the final dataset has continuous daily values without missing data points, which is essential for stable model training.

---

## Feature Engineering

Using the full OHLCV data, we computed a variety of technical indicators and return features to enrich the dataset:

1. **Momentum Indicators:**  
   - **RSI (Relative Strength Index):**  
     Captures the speed and change of price movements, indicating overbought or oversold conditions.  
   - **MACD (Moving Average Convergence Divergence):**  
     Compares short-term and long-term moving averages to identify trend changes.  
   - **ROC (Rate of Change):**  
     Measures the percentage change over a specific period to quantify momentum.

2. **Volatility Indicators:**  
   - **Rolling Standard Deviation (20-day):**  
     Assesses the degree of price fluctuation over time.  
   - **ATR (Average True Range, 14-day):**  
     Provides a more comprehensive measure of volatility by incorporating gaps and intraday ranges.  
   - **Bollinger Bands (20-day):**  
     Constructed using a simple moving average and standard deviation, these bands indicate the upper and lower bounds of expected price movement.

3. **Trend Indicators:**  
   - **Simple Moving Averages (SMA) & Exponential Moving Averages (EMA) for 20-day and 50-day windows:**  
     Help smooth out price data to identify the underlying trend direction over different time horizons.

4. **Volume-Based Indicators:**  
   - **On-Balance Volume (OBV):**  
     Aggregates volume flow to gauge buying and selling pressure.  
   - **Money Flow Index (MFI, 14-day):**  
     Combines price and volume data to measure the strength of money flowing in and out of the asset.

5. **Return Features:**  
   - **Daily Log Returns:**  
     Captures the percentage change from one day to the next on a logarithmic scale, which is standard for financial time series.  
   - **5-Day and 10-Day Returns:**  
     Provide short-term performance metrics to capture momentum over slightly longer intervals.  
   - **Rolling Mean of Log Returns (20-day):**  
     Serves as a smoothing function to highlight trends in daily returns.

**Rationale:**  
These features are designed to capture the diverse aspects of price behavior—momentum, volatility, trend, and volume—each of which can provide the model with useful signals for predicting future returns. They are critical for uncovering patterns in the data that can help anticipate both the magnitude and uncertainty of future price movements.

## Modeling Approaches

### 1. Parametric Approach

**Overview:**  
- The model predicts a set of parameters (typically two) that define a probability distribution for future returns.  
- For example, assuming a Gaussian distribution, it outputs:
  - **μ (mean):** Expected return over the forecast horizon.
  - **σ² (variance):** Uncertainty or risk associated with that return.

**Loss Function:**  
- A negative log-likelihood (NLL) loss is used. For a Gaussian:
  \[
  L = \frac{1}{2}\log(2\pi\sigma^2) + \frac{(y-\mu)^2}{2\sigma^2}
  \]
- This loss penalizes both prediction errors and underestimation of uncertainty.

**Pros & Cons:**  
- **Pros:**  
  - Simplicity (only 2 outputs per sample).  
  - Interpretability: Mean is directly actionable; variance gives risk context.  
  - Efficient training with fewer parameters.
- **Cons:**  
  - Assumes returns follow a normal distribution, which might not capture fat tails or skewness.  
  - Limited flexibility in representing complex distribution shapes.

**Trading Implications:**  
- Trading signals can be derived by comparing the expected return (mean) with risk (variance). For example, you may choose to buy if the predicted mean is high and the variance is low, or by computing the probability of a loss using the Gaussian CDF.

---

### 2. Quantile Regression

**Overview:**  
- Instead of assuming a specific distribution shape, the model directly outputs several quantiles of the future return distribution (e.g., the 10th, 50th, and 90th percentiles).
- This provides a non-parametric view of the outcome.

**Loss Function:**  
- The quantile (pinball) loss is applied individually to each quantile:
  \[
  L_q(y, \hat{y}) = \max\left( q (y-\hat{y}),\; (q-1)(y-\hat{y}) \right)
  \]
- Losses for all quantiles are typically combined (summed or averaged).

**Pros & Cons:**  
- **Pros:**  
  - Flexibility in capturing asymmetry, fat tails, and other non-Gaussian features.  
  - Direct insight into risk through multiple quantile estimates (e.g., the lower quantile indicating downside risk).  
  - Actionable: For instance, if even the 10th percentile is above zero, the downside risk is minimal.
- **Cons:**  
  - Increased output complexity (one neuron per quantile).  
  - The quantile loss can be less smooth and may require careful calibration of the quantile levels.

**Trading Implications:**  
- Trading strategies might be built around quantile thresholds. For example, if the predicted 10th percentile (a conservative estimate) is above a certain level, the strategy might signal a buy.


## Implementation Similarities and Differences

- **Transformer Architecture:**  
  The core architecture remains largely the same in both cases. The differences are primarily:
  - **Output Layer Dimension:**  
    - Parametric: 2 neurons (mean, variance).  
    - Quantile Regression: One neuron per chosen quantile.
  - **Loss Function:**  
    - Parametric: Negative log-likelihood (assuming a Gaussian or another parametric form).  
    - Quantile: Pinball loss for each quantile.

- **Data Processing & Training Pipeline:**  
  Both approaches use the same input sequences (e.g., sliding windows of past 30 days) and similar training setups. The target definitions only affect the final layer and loss computation.


## Potential Baselines

To gauge the value of your deep learning models, it’s crucial to compare them against simpler or more established baselines:

1. **Buy-and-Hold Strategy:**  
   - **Description:**  
     Simply invest in IBM for the entire period.
   - **Purpose:**  
     Serves as a basic benchmark; if your model can’t beat the buy-and-hold return, it might not be adding much value.

2. **Technical Indicator-Based Rules:**  
   - **Moving Average Crossover:**  
     For example, generate buy/sell signals when a short-term moving average (e.g., 20-day SMA) crosses above or below a long-term moving average (e.g., 50-day SMA).
   - **RSI-Based Strategy:**  
     Buy when RSI indicates oversold conditions (e.g., below 30) and sell when it indicates overbought conditions (e.g., above 70).
   - **Purpose:**  
     These simple rules capture basic market momentum and reversals, providing a practical benchmark.

3. **Statistical Models:**  
   - **ARIMA or GARCH Models:**  
     These traditional time-series models can forecast returns or volatility.
   - **Purpose:**  
     They help assess whether the Transformer-based deep learning model offers an edge over established statistical methods.

4. **Simple Neural Network:**  
   - **Description:**  
     A straightforward MLP (multilayer perceptron) or LSTM trained on the same features.
   - **Purpose:**  
     This baseline helps determine whether the added complexity of a Transformer yields improved performance.


\\

## Data

### Downlaod data

In [50]:
import yfinance as yf
import pandas as pd
from pandas_datareader import data as pdr

# Set date range
start_date = "1998-12-22"
end_date = "2025-04-02"

# Define custom mapping: custom name -> actual ticker
tickers = {
    "IBM": "IBM",
    "XLK": "XLK",
    "SP500": "^GSPC",
    "VIX": "^VIX",
    "TNX": "^TNX"
}

# Major sector ETFs
sector_etfs = {
    "XLF": "XLF",
    "XLV": "XLV",
    "XLY": "XLY",
    "XLP": "XLP",
    "XLE": "XLE",
    "XLI": "XLI",
    "XLU": "XLU",
    "XLB": "XLB"
}

# Merge all mappings
all_tickers_mapping = {**tickers, **sector_etfs}

# Download using the actual ticker symbols
download_list = list(all_tickers_mapping.values())
raw_data = yf.download(download_list, start=start_date, end=end_date, interval="1d", group_by="ticker", auto_adjust=False)

# Store full OHLCV data
ohlcv_data = {}

for custom_name, actual_ticker in all_tickers_mapping.items():
    try:
        df = raw_data[actual_ticker][["Open", "High", "Low", "Close", "Volume"]].copy()
        if not df.dropna().empty:
            # Rename columns like "IBM_Open", "IBM_Close", etc.
            df.columns = [f"{custom_name}_{col}" for col in df.columns]
            ohlcv_data[custom_name] = df
            print(f"Loaded OHLCV for {custom_name}")
        else:
            print(f"No data for {custom_name}")
    except Exception as e:
        print(f"Could not process {custom_name}: {e}")

# Merge all OHLCV DataFrames by date
merged_ohlcv = pd.concat(ohlcv_data.values(), axis=1)

# Add macroeconomic indicators from FRED
fred_series = {
    "CPI": "CPIAUCNS",
    "Unemployment": "UNRATE"
}

for name, code in fred_series.items():
    try:
        print(f"Downloading FRED series: {name}...")
        fred_df = pdr.DataReader(code, "fred", start=start_date, end=end_date)
        fred_df = fred_df.resample("D").ffill()
        merged_ohlcv[name] = fred_df
    except Exception as e:
        print(f"Failed to fetch {name}: {e}")

# Final clean-up: forward-fill then back-fill
merged_ohlcv = merged_ohlcv.ffill().bfill().loc[start_date:end_date]


print("\nFinal shape:", merged_ohlcv.shape)
memory_mb = merged_ohlcv.memory_usage(deep=True).sum() / (1024 ** 2)
print(f"Total memory usage: {memory_mb:.2f} MB")


[*********************100%***********************]  13 of 13 completed


Loaded OHLCV for IBM
Loaded OHLCV for XLK
Loaded OHLCV for SP500
Loaded OHLCV for VIX
Loaded OHLCV for TNX
Loaded OHLCV for XLF
Loaded OHLCV for XLV
Loaded OHLCV for XLY
Loaded OHLCV for XLP
Loaded OHLCV for XLE
Loaded OHLCV for XLI
Loaded OHLCV for XLU
Loaded OHLCV for XLB
Downloading FRED series: CPI...
Downloading FRED series: Unemployment...

Final shape: (6609, 67)
Total memory usage: 3.43 MB


In [52]:
merged_ohlcv.tail(10)

Unnamed: 0_level_0,IBM_Open,IBM_High,IBM_Low,IBM_Close,IBM_Volume,XLK_Open,XLK_High,XLK_Low,XLK_Close,XLK_Volume,...,XLU_Low,XLU_Close,XLU_Volume,XLB_Open,XLB_High,XLB_Low,XLB_Close,XLB_Volume,CPI,Unemployment
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-03-19,248.330002,253.660004,246.639999,252.289993,3853600,213.240005,217.429993,212.350006,214.910004,4419300,...,78.410004,78.940002,8081700,86.980003,87.529999,86.379997,87.199997,5126800,317.671,4.0
2025-03-20,244.240005,246.800003,237.220001,243.320007,7026800,212.470001,215.729996,212.149994,213.380005,3572600,...,78.779999,79.260002,8207600,86.940002,87.389999,86.529999,86.650002,3803100,317.671,4.0
2025-03-21,241.690002,245.210007,238.5,243.869995,9580100,210.720001,214.25,209.940002,213.960007,5212700,...,78.260002,78.75,7813800,85.93,85.93,84.730003,85.790001,8404400,317.671,4.0
2025-03-24,247.309998,248.820007,245.970001,248.449997,4753300,217.139999,218.149994,216.570007,217.630005,4100100,...,78.099998,78.169998,6998400,86.25,86.709999,85.940002,86.540001,4602200,317.671,4.0
2025-03-25,248.360001,250.899994,248.199997,249.899994,3133800,217.490005,218.639999,217.380005,218.350006,3740900,...,76.559998,76.919998,11198000,86.809998,86.989998,86.150002,86.519997,3811100,317.671,4.0
2025-03-26,251.25,254.320007,249.529999,250.339996,4450100,217.679993,218.199997,212.639999,213.479996,3936500,...,76.93,77.43,7582000,86.610001,87.199997,86.260002,86.68,4080400,317.671,4.0
2025-03-27,249.710007,250.300003,245.729996,246.210007,2889300,212.020004,213.220001,210.389999,211.509995,3288900,...,77.18,77.410004,6985900,86.43,86.940002,85.830002,86.639999,3024600,317.671,4.0
2025-03-28,246.270004,247.570007,242.070007,244.0,3125300,210.410004,211.259995,205.679993,206.380005,5333800,...,77.739998,77.980003,9539300,86.849998,86.879997,84.940002,85.050003,4573600,317.671,4.0
2025-03-31,242.740005,250.889999,242.490005,248.660004,6795000,202.779999,206.949997,200.729996,206.479996,7109500,...,78.080002,78.849998,10735000,84.75,86.389999,84.07,85.980003,4440900,317.671,4.0
2025-04-01,248.029999,250.619995,243.490005,250.339996,4412900,205.660004,208.169998,203.979996,207.990005,4740100,...,78.290001,79.050003,9042400,85.760002,86.339996,84.989998,86.300003,6098800,317.671,4.0


### Compute indicators

In [55]:
import numpy as np


def compute_technical_indicators(df, prefix):
    """
    Given a DataFrame with columns:
      [prefix + '_Open', prefix + '_High', prefix + '_Low', prefix + '_Close', prefix + '_Volume'],
    this function computes various technical indicators and return features.
    """
    # Extract price and volume series
    open_price = df[f"{prefix}_Open"]
    high = df[f"{prefix}_High"]
    low = df[f"{prefix}_Low"]
    close = df[f"{prefix}_Close"]
    volume = df[f"{prefix}_Volume"]

    features = pd.DataFrame(index=df.index)

    # Returns: Daily log returns, 5-day and 10-day returns, rolling mean (20-day) of log returns
    features[f"{prefix}_LogReturn"] = np.log(close / close.shift(1))
    features[f"{prefix}_Return_5d"] = close / close.shift(5) - 1
    features[f"{prefix}_Return_10d"] = close / close.shift(10) - 1
    features[f"{prefix}_RollingMean_Return_20d"] = features[f"{prefix}_LogReturn"].rolling(20).mean()

    # RSI (14-day)
    delta = close.diff()
    gain = delta.clip(lower=0)
    loss = -delta.clip(upper=0)
    avg_gain = gain.rolling(14).mean()
    avg_loss = loss.rolling(14).mean()
    rs = avg_gain / avg_loss
    features[f"{prefix}_RSI"] = 100 - (100 / (1 + rs))

    # MACD: EMA12 - EMA26, Signal: EMA9 of MACD, Histogram = MACD - Signal
    ema12 = close.ewm(span=12, adjust=False).mean()
    ema26 = close.ewm(span=26, adjust=False).mean()
    macd = ema12 - ema26
    signal = macd.ewm(span=9, adjust=False).mean()
    features[f"{prefix}_MACD"] = macd
    features[f"{prefix}_MACD_Signal"] = signal
    features[f"{prefix}_MACD_Hist"] = macd - signal

    # ROC (12-day)
    features[f"{prefix}_ROC_12d"] = (close / close.shift(12) - 1) * 100

    # Volatility: Rolling Standard Deviation (20-day)
    features[f"{prefix}_Rolling_STD_20d"] = close.rolling(20).std()

    # ATR (14-day): Average True Range
    tr1 = high - low
    tr2 = (high - close.shift(1)).abs()
    tr3 = (low - close.shift(1)).abs()
    tr = pd.concat([tr1, tr2, tr3], axis=1).max(axis=1)
    features[f"{prefix}_ATR_14d"] = tr.rolling(14).mean()

    # Bollinger Bands (20-day): Upper, Lower, and Mid (SMA20)
    sma20 = close.rolling(20).mean()
    std20 = close.rolling(20).std()
    features[f"{prefix}_Bollinger_Upper"] = sma20 + 2 * std20
    features[f"{prefix}_Bollinger_Lower"] = sma20 - 2 * std20
    features[f"{prefix}_Bollinger_Mid"] = sma20

    # Trend: SMA and EMA (20-day and 50-day)
    features[f"{prefix}_SMA_20"] = close.rolling(20).mean()
    features[f"{prefix}_SMA_50"] = close.rolling(50).mean()
    features[f"{prefix}_EMA_20"] = close.ewm(span=20, adjust=False).mean()
    features[f"{prefix}_EMA_50"] = close.ewm(span=50, adjust=False).mean()

    # OBV (On-Balance Volume)
    direction = np.where(close > close.shift(1), 1, np.where(close < close.shift(1), -1, 0))
    obv = (direction * volume).cumsum()
    features[f"{prefix}_OBV"] = obv

    # MFI (14-day Money Flow Index)
    typical_price = (high + low + close) / 3
    money_flow = typical_price * volume
    pos_flow = money_flow.where(typical_price > typical_price.shift(1), 0)
    neg_flow = money_flow.where(typical_price < typical_price.shift(1), 0)
    pos_mf = pos_flow.rolling(14).sum()
    neg_mf = neg_flow.rolling(14).sum()
    features[f"{prefix}_MFI"] = 100 - (100 / (1 + pos_mf / neg_mf))

    return features


# Compute Features for Each Ticker and Merge


all_features = {}

# Loop over each ticker from our mapping (for which we have OHLCV data)
for ticker in all_tickers_mapping.keys():
    required_cols = [f"{ticker}_Open", f"{ticker}_High", f"{ticker}_Low", f"{ticker}_Close", f"{ticker}_Volume"]
    if all(col in merged_ohlcv.columns for col in required_cols):
        df_ticker = merged_ohlcv[required_cols]
        features = compute_technical_indicators(df_ticker, ticker)
        all_features[ticker] = features
        print(f"Features computed for {ticker}")
    else:
        print(f"Missing OHLCV data for {ticker}; skipping feature computation.")

features_df = pd.concat(all_features.values(), axis=1)

final_df = pd.concat([merged_ohlcv, features_df], axis=1)


print("\nFinal features DataFrame shape:", final_df.shape)
memory_mb = final_df.memory_usage(deep=True).sum() / (1024 ** 2)
print(f"Total memory usage: {memory_mb:.2f} MB")

Features computed for IBM
Features computed for XLK
Features computed for SP500
Features computed for VIX
Features computed for TNX
Features computed for XLF
Features computed for XLV
Features computed for XLY
Features computed for XLP
Features computed for XLE
Features computed for XLI
Features computed for XLU
Features computed for XLB

Final features DataFrame shape: (6609, 327)
Total memory usage: 16.54 MB


In [54]:
final_df.tail(20)

Unnamed: 0_level_0,IBM_Open,IBM_High,IBM_Low,IBM_Close,IBM_Volume,XLK_Open,XLK_High,XLK_Low,XLK_Close,XLK_Volume,...,XLB_ATR_14d,XLB_Bollinger_Upper,XLB_Bollinger_Lower,XLB_Bollinger_Mid,XLB_SMA_20,XLB_SMA_50,XLB_EMA_20,XLB_EMA_50,XLB_OBV,XLB_MFI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-03-05,251.580002,252.740005,247.009995,251.350006,4009800,219.179993,222.289993,216.240005,221.570007,5965500,...,1.466429,91.060124,86.427877,88.744001,88.744001,87.375601,88.306611,88.650997,193217800,46.296991
2025-03-06,249.75,252.100006,246.800003,248.690002,3254400,217.020004,220.449997,214.380005,215.419998,8216000,...,1.442144,91.036762,86.30224,88.669501,88.669501,87.436601,88.23741,88.608997,183971000,48.862329
2025-03-07,245.949997,261.959991,245.179993,261.540009,6700200,215.169998,219.089996,212.630005,218.539993,5631700,...,1.468572,90.96299,86.220012,88.591501,88.591501,87.481401,88.197657,88.578056,191798300,41.357636
2025-03-10,261.559998,266.450012,254.75,256.899994,8165500,214.220001,214.619995,207.080002,209.25,10780900,...,1.539287,91.117323,85.835678,88.476501,88.476501,87.499801,87.987404,88.476563,183461900,35.138376
2025-03-11,255.990005,256.700012,245.860001,248.949997,5630600,208.610001,211.910004,206.289993,208.399994,7416400,...,1.537859,91.292844,85.319158,88.306001,88.306001,87.495801,87.734318,88.353169,175748800,33.768919
2025-03-12,250.350006,253.130005,245.529999,249.630005,3848800,212.119995,213.520004,209.149994,211.679993,5490800,...,1.554287,91.395816,84.783186,88.089501,88.089501,87.486401,87.46724,88.218927,169445200,33.383673
2025-03-13,248.800003,249.270004,243.039993,245.800003,3862400,211.070007,211.539993,206.850006,207.660004,6191500,...,1.525002,91.494574,84.309427,87.902001,87.902001,87.485001,87.219884,88.087597,164303700,33.400093
2025-03-14,242.75,248.949997,241.679993,248.350006,4045200,210.380005,214.160004,210.309998,213.940002,5020400,...,1.559287,91.218193,84.167809,87.693001,87.693001,87.528601,87.1018,88.004946,169057400,38.39522
2025-03-17,249.25,254.630005,249.0,252.970001,3233900,213.770004,216.960007,213.149994,215.429993,5074400,...,1.598572,90.93727,84.169731,87.553501,87.553501,87.587001,87.097819,87.967889,173364900,38.115517
2025-03-18,252.509995,252.570007,245.119995,246.949997,4171900,213.899994,214.119995,211.369995,212.149994,5743300,...,1.593572,90.328701,84.370301,87.349501,87.349501,87.661001,87.079932,87.926403,168353000,38.72952


### Add target

We compute the target as the log return over the next 5 days using IBM’s closing price (i.e. the observed return). The only difference is that in the parametric case we intend to train the model to output distribution parameters (mean and variance) that best explain this observed return via a likelihood loss, while in the quantile regression case we train the model to directly predict quantiles (e.g. 10th, 50th, 90th) using a pinball loss. (In both approaches, the observed target is the same.)




$$
\text{target}_t = \log\left( \frac{\text{Price}_{t + d}}{\text{Price}_t} \right)
$$

In [61]:
import numpy as np
import pandas as pd

def compute_target(final_df, price_col="IBM_Close", days=5):
    """
    Computes the target log return over a future horizon for the parametric approach.

    Parameters:
        final_df (pd.DataFrame): The final dataset with daily data.
        price_col (str): Column name for the price used (e.g. "IBM_Close").
        days (int): The forecast horizon (default is 5 days).

    Returns:
        pd.Series: The log return computed as log(price[t+days] / price[t]).
    """
    # Compute the future price shift by the specified number of days
    future_price = final_df[price_col].shift(-days)
    current_price = final_df[price_col]
    # Calculate log returns
    target = np.log(future_price / current_price)
    return target


In [63]:
target = compute_target(final_df)
target.head(10)

Unnamed: 0_level_0,IBM_Close
Date,Unnamed: 1_level_1
1998-12-22,0.024391
1998-12-23,-0.003384
1998-12-24,-0.026623
1998-12-28,0.00198
1998-12-29,0.008646
1998-12-30,0.01824
1998-12-31,0.01714
1999-01-04,0.033583
1999-01-05,-0.024355
1999-01-06,-0.017368
