- $ y_{i,t}$: The target asset return for asset $ i $ at time $ t $.
     - $i \in [m] $: Index for the target assets, where $[m]$ denotes the set $\{1, 2, \ldots, m\}$.
     -  $t \in T $: Time index, where $ T $ represents the set of time periods.
- $ x_{i,t} \in \mathbb{R}^d $: The explanatory variables for asset $ i $ at time $ t $.
     - $ d $: Dimensionality of the explanatory variables.
- $ D_{i,s,t} = (x_{i,s:t}, y_{i,s:t}) $: The dataset for asset $ i $ from time $ s $ to time $ t $. This includes all observations of explanatory variables $ x_{i,s:t} $ and target values $ y_{i,s:t}$ within the interval $[s, t]$.
-  $ D_i = \cup_{s < t} D_{i,s,t} = \left(\cup_{s < t} x_{i,s:t}, \cup_{s < t} y_{i,s:t}\right) $: The cumulative dataset for asset $ i $ up to time $ t $. This includes all data for explanatory variables and target values from times before $ t $.
- $ \hat{\beta}_{i,t+1} $: The predicted coefficients for asset $ i $ at time $ t+1 $.
     - The functional form is given by $ \hat{\beta}_{i,t+1} = f_i(t, D_i) $, where $ f_i $ is a function learned from the dataset $ D_i$ and time $ t $.
-  $ f_i $: The function representing the model for asset $ i$. In the context of the formulation, this function is learned from the dataset $ D_i $ and used to estimate $ \hat{\beta}_{i,t+1} $.
-  $ \tau_{i,t} $: A weighting factor or term for asset $ i $ at time $ t $. It may represent the frequency or importance of the observations.
- $ T_n $: A subset of time periods used for training or validation. Specifically, $ T_n \subset T $ indicates the set of time periods considered for the training or validation phase in the objective function.    
    


    
- $ \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle $: The inner product or prediction for asset $ i $ at time $ t+1 $, calculated using the coefficients $ \hat{\beta}_{i,t+1} $ and explanatory variables $ x_{i,t+1} $.
    
- Objective Function:
    $$
    \min_{(f_i)_{i \in [m]}} \frac{1}{nm} \sum_{t \in T_n} \sum_{i \in [m]}   L \left(y_{i,t+1}, \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \right)
    $$
    -  $ n $: Number of time periods.
    - $ m $: Number of target assets.
    - The objective function aims to minimize the average loss across all assets and time periods.
- $ L $: The loss function used in training. In this case, $ L $ is the mean squared error (MSE), defined as:
   $$
    L \left(y_{i,t+1}, \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \right) = \left(y_{i,t+1} - \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \right)^2
   $$
    where $ \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle $ denotes the predicted return for asset $ i $ at time $ t+1 $.

$$
\mathcal{L}(y_{t+1}, \langle \beta_{t+1}, x_{t+1}\rangle )
$$

In [None]:
\documentclass{article}
\usepackage{amsmath, amssymb}

\begin{document}

\section*{Notation}

\begin{enumerate}
    \item \( y_{i,t} \): The target asset return for asset \( i \) at time \( t \).
    \begin{itemize}
        \item \( i \in [m] \): Index for the target assets, where \([m]\) denotes the set \(\{1, 2, \ldots, m\}\).
        \item \( t \in T \): Time index, where \( T \) represents the set of time periods.
    \end{itemize}
    
    \item \( x_{i,t} \in \mathbb{R}^d \): The explanatory variables for asset \( i \) at time \( t \).
    \begin{itemize}
        \item \( d \): Dimensionality of the explanatory variables.
    \end{itemize}
    
    \item \( D_{i,s,t} = (x_{i,s:t}, y_{i,s:t}) \): The dataset for asset \( i \) from time \( s \) to time \( t \). This includes all observations of explanatory variables \( x_{i,s:t} \) and target values \( y_{i,s:t} \) within the interval \([s, t]\).
    
    \item \( D_i = \cup_{s < t} D_{i,s,t} = \left(\cup_{s < t} x_{i,s:t}, \cup_{s < t} y_{i,s:t}\right) \): The cumulative dataset for asset \( i \) up to time \( t \). This includes all data for explanatory variables and target values from times before \( t \).
    
    \item \( \hat{\beta}_{i,t+1} \): The predicted coefficients for asset \( i \) at time \( t+1 \).
    \begin{itemize}
        \item The functional form is given by \( \hat{\beta}_{i,t+1} = f_i(t, D_i) \), where \( f_i \) is a function learned from the dataset \( D_i \) and time \( t \).
    \end{itemize}
    
    \item \( f_i \): The function representing the model for asset \( i \). In the context of the formulation, this function is learned from the dataset \( D_i \) and used to estimate \( \hat{\beta}_{i,t+1} \).
    
    \item \( \tau_{i,t} \): A weighting factor or term for asset \( i \) at time \( t \). It may represent the frequency or importance of the observations.
    
    \item \( L \): The loss function used in training. In this case, \( L \) is the mean squared error (MSE), defined as:
    \[
    L \left(y_{i,t+1}, \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \right) = \left(y_{i,t+1} - \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \right)^2
    \]
    where \( \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \) denotes the predicted return for asset \( i \) at time \( t+1 \).
    
    \item \( \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \): The inner product or prediction for asset \( i \) at time \( t+1 \), calculated using the coefficients \( \hat{\beta}_{i,t+1} \) and explanatory variables \( x_{i,t+1} \).
    
    \item Objective Function:
    \[
    \min_{(f_i)_{i \in [m]}} \frac{1}{nm} \sum_{t \in T_n} \sum_{i \in [m]} \tau_{i,t} L \left(y_{i,t+1}, \langle \hat{\beta}_{i,t+1}, x_{i,t+1} \rangle \right)
    \]
    \begin{itemize}
        \item \( n \): Number of time periods.
        \item \( m \): Number of target assets.
        \item The objective function aims to minimize the average loss across all assets and time periods.
    \end{itemize}
\end{enumerate}

\end{document}


| **Scenario**             | **Constant  $\beta$**                                                                                                           | **Stepwise 𝛽**                                                                                                           | **Cyclical $\beta$**                                                                                                           |
|--------------------------|--------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| **Description**           | Simplest case: time-invariant relation between response and explanatory variable.                                          | Time-varying beta with regime shifts, where $\beta$ stays constant for a period and then jumps to a new level.                   | Cyclical patterns in financial time series (e.g., seasonality, business cycles).                                           |
| **Ground Truth $\beta$**        | $\beta_t =c, c  \sim \mathcal{N}(1, 1)$                                                                                                      | $\beta_t$ is stepwise with jumps generated from $\mathcal{N}(1, 1)$.                                                                         | $\beta_t = \sin(\beta_0 + ct), \beta_0 \sim \mathcal{N}(0, 1), c \sim \mathcal{U}(4, 32)$                                                                                     |
| **Model Evaluation**      | Test whether NeuralBeta model converges to the optimal solution derived via Bayesian linear regression.                    | Test NeuralBeta’s ability to adapt to sudden changes (market regime shifts).                                               | Test NeuralBeta’s ability to capture cyclical patterns without further modification.                                        |





Synthetic time series are generated for each scenario: 
$$
x_t \sim  t_{10} (0, 1), y_t = \beta_t \times x_t + \epsilon_t, \text{ with } \epsilon_t \sim  \mathcal{N}(0, 1)
$$
The length is 65, with 100,000 samples. Use 70% of the data for training, 20% for validation, and 10% for testing. 

- **Streaming Data Handling**: Conventional methods such as ordinary least squares (OLS) or rolling OLS operate on static datasets or a fixed window of past data. In contrast, our hedging task requires progressively updating the dataset as new data points become available. Existing methods often fail to adequately handle streaming data, where the dataset evolves dynamically.

- **Time-Varying Coefficients**: Traditional OLS assumes time-homogeneous coefficients, which can be inappropriate for real-world financial markets where relationships between variables change over time. In our problem, the ground truth  $\beta$  is not static, and the coefficient estimation must reflect this time-varying nature.

- **Lookback Window Selection**: Rolling OLS uses a fixed lookback window 
$h$,  and choosing the window size can be arbitrary. This approach risks misestimating both long-term and short-term data relevance by treating all points within the window equally while disregarding older data entirely.

- **Weighting Schemes in WLS**: While weighted least squares (WLS) introduces a dynamic weighting scheme to adjust the importance of data points, configuring proper weighting schemes (e.g., exponential or power law weights) across all time intervals remains challenging. This introduces complexity, additional parameters to tune, and potential misalignment with real-world data behavior.

We begin with a simple scenario where one hedges a single target asset against multiple hedging instruments on a daily basis. The objective is to determine the optimal hedging ratio of each instrument that minimizes the next day's hedging error, ultimately achieving a low average hedging error over the entire horizon. Once this single-asset hedging scenario is formalized, it scales naturally to the multi-asset case and can be applied to other prediction tasks.

Let $ T = \{0, 1, 2, \dots\} $ denote the discrete time index for data of a certain frequency (e.g., daily). For a time interval $ (s, t] $ where $ s, t \in T $ and \( s < t \), define the dataset $ D_{s,t} = \{(x_{s+1}, y_{s+1}), (x_{s+2}, y_{s+2}), \dots, (x_t, y_t)\} $, where $ x \in \mathbb{R}^d $ is the explanatory variable (e.g., factor returns) and $ y \in \mathbb{R} $ is the scalar response variable (e.g., a single stock return).

At any time $t \in T $, we assume a linear relationship between $ x_t $ and $ y_t$ with coefficient $ \beta_t \in \mathbb{R}^d $ and noise $ \epsilon_t \in \mathbb{R} $, described by the model:

$$
y_t = \langle \beta_t, x_t \rangle + \epsilon_t \tag{1}
$$

where $ \langle \cdot, \cdot \rangle $ denotes the inner product.



A one-step hedging task at time $ t $ involves determining the optimal hedging ratio $ \hat{\beta}_{t+1} $ given the available data $ D_{0,t}$ such that the ex-ante hedging error at time $ t+1$ is minimized:

$$
\hat{\beta}_{t+1} = \underset{\beta}{\text{argmin}} \, L(y_{t+1}, \langle \beta, x_{t+1} \rangle)
$$

where $ L$ is a risk measure (e.g., expected quadratic loss or negative log-likelihood). The optimal hedging ratio $\hat{\beta}_{t+1} $ is assumed to be inferable from the dataset $ D_{0,t} $, taking the form:

$$
\hat{\beta}_{t+1} = f(t, D_{0,t}) \tag{2}
$$

for some function $ f : T \times D \to \mathbb{R}^d $, which could be time-inhomogeneous.

Consider a practical scenario over an $ n $-day horizon starting at day $ \tau \in T $, denoted $ T^n_\tau = \{\tau, \tau+1, \dots, \tau+n-1\} $ with $ \tau \geq h $ (for some chosen lookback window $h \in \mathbb{N} $). The one-step hedge ratio prediction (Equation 2) is performed each day $ t \in T^n_\tau $, with the objective of finding the function $ f $ that minimizes the average hedging error over the entire horizon:

$$
\underset{f}{\text{min}} \, \frac{1}{n} \sum_{t \in T^n_\tau} L(y_{t+1}, \langle \hat{\beta}_{t+1}, x_{t+1} \rangle)
$$

### 3.2 Market Data

| **Aspect**                       | **Details**                                                                                                                         |
|----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| **Time Horizon**                 | January 1, 2010, to December 31, 2023                                                                                                 |
| **Data Used**                    | - Daily return series of the S&P 500 index <br> - Size factor index <br> - Value factor index <br> - S&P 500 components             |
| **Data Segmentation**            | - Training Period: January 1, 2010, to December 31, 2017 <br> - Validation Period: January 1, 2018, to December 31, 2019 <br> - Test Period: January 1, 2020, to December 31, 2023 |
| **Component Tracking**           | A fixed snapshot of S&P 500 components as of May 1, 2024, with price histories from January 1, 2010, resulting in 468 stocks.     |
| **Experiment Scenarios**         | - **Univariate Scenario:** CAPM $\beta$ for each S&P 500 stock <br> - **Multivariate Scenario:** Factor $\beta$ for market, size, and value factors (similar to Fama-French three-factor model)  |
| **S&P 500 Components with CAPM** | **Description:** Uses daily return series of S&P 500 index and individual components <br> **Results:** Performance shown in "Univariate" entry in Table 1 |
| **S&P 500 Components with Factors** | **Description:** Multivariate beta estimation using SPX, size, and value indices <br> **Results:** Performance shown in "Multivariate" entry in Table 1 <br> **Findings:** Interpretable architecture with Attention performs best; NeuralBeta models slightly worse than the benchmark |
