### 📊 Statistical Arbitrage Pipeline (Inspired by G. Ordonez)

Here’s what we’re doing in simple terms (for an example with 3 assets):

1. We start with **251 days of price data** for 3 assets.
2. From this, we calculate **250 days of "residuals"**, which tell us how far each asset is from its "fair value".
3. Then, we take **rolling windows of 30 days** of the cumulative residuals, giving us 220 overlapping windows of recent market behaviour.
4. Each 30-day window is given to a deep learning model (CNN + Transformer) to decide **how much of each asset we should long or short**.
5. We test the model's prediction by checking **how well that portfolio performs on the next day**.
6. Finally, we summarize the performance using the **Sharpe Ratio**, which balances return and risk. The model is trained to maximize this.

_________________________________
_________________________________
_________________________________

More rigourously:

1. **Input Data**  
   Obtain 251-day cumulative returns for **N = 3** assets:
   $$
   \mathbf{R}_{t} = \begin{bmatrix}
   R_{1,t} \\
   R_{2,t} \\
   R_{3,t}
   \end{bmatrix}, \quad t = 1, \dots, 251
   $$

2. **Factor Residual Extraction**  
   Fit a factor model (e.g., PCA or Fama-French) and obtain 250-day residuals:
   $$
   \mathbf{e}_{t} = \mathbf{R}_{t} - \hat{\mathbf{R}}_{t}, \quad t = 1, \dots, 250
   $$

3. **Generate Rolling Cumulative Residuals**  
   For each $t = 1, \dots, 220$, compute 30-day cumulative residuals:
   $$
   \mathbf{E}_{t}^{(30)} = \sum_{i=0}^{29} \mathbf{e}_{t+i} \in \mathbb{R}^{3}
   $$
   or build a full window matrix:
   $$
   \mathbf{X}_t = \begin{bmatrix}
   \mathbf{e}_{t} & \mathbf{e}_{t+1} & \dots & \mathbf{e}_{t+29}
   \end{bmatrix}^\top \in \mathbb{R}^{30 \times 3}
   $$

4. **Deep Model: CNN + Transformer + FFN**  
   Feed each $\mathbf{X}_t$ into the model to output portfolio weights:
   $$
   \mathbf{w}_t = f_{\theta}(\mathbf{X}_t) \in \mathbb{R}^{3}
   $$
   with optional constraints:
   $$
   \sum_{i=1}^{3} w_{i,t} = 0, \quad \|\mathbf{w}_t\|_1 \leq 1
   $$

5. **Calculate Next-Day Portfolio Return**  
   Use next-day returns $\mathbf{r}_{t+30}$ to get portfolio return:
   $$
   r^{\text{portfolio}}_t = \mathbf{w}_t^\top \mathbf{r}_{t+30}
   $$

6. **Sharpe Ratio as Objective**  
   Collect all returns $\{r^{\text{portfolio}}_t\}_{t=1}^{220}$ and compute Sharpe Ratio:
   $$
   \text{Sharpe} = \frac{\mathbb{E}[r^{\text{portfolio}}]}{\text{Std}[r^{\text{portfolio}}]}
   $$

   This Sharpe Ratio is used to optimize model parameters $\theta$:
   $$
   \theta^* = \arg\max_{\theta} \, \text{Sharpe}
   $$

---
---
---

# Minor Adjustments:
___________________________
- Ensure you normalize residuals or cumulative residuals properly before feeding them into the CNN, as this preprocessing is important for stability (see Ordonez’s mention of scaling the cumsum residuals).
### How Ordoñez normalizes the residuals

In the empirical implementation, Ordoñez applies **instance normalization** before each activation function in the CNN. The precise form is:

$$
x^{(1)}_{l,d} = \text{ReLU} \left( \frac{y^{(0)}_{l,d} - \mu^{(0)}_d}{\sigma^{(0)}_d} \right), \quad
x^{(2)}_{l,d} = \text{ReLU} \left( \frac{y^{(1)}_{l,d} - \mu^{(1)}_d}{\sigma^{(1)}_d} \right)
$$

Where:

- \( \mu^{(i)}_d \) and \( \sigma^{(i)}_d \) are the **mean** and **standard deviation** across the time axis \( L \), for each filter/channel \( d \).
- This normalization happens **before ReLU activations** and **after the CNN’s linear transformations**.

### Why is this normalization needed?

They say it helps with:

- **Training stability**: Prevents saturation of ReLU units, reducing vanishing gradients.
- **Optimization speed**: Normalized activations allow for more consistent gradient magnitudes.
- **Generalization**: Keeps the learning dynamics balanced across different time periods and assets.

> *"...we include so-called 'instance normalization' before each activation function to speed up the optimization and avoid vanishing gradients caused by the saturation of the ReLU activations."*

### Summary for your implementation

You should:

- Normalize the **output of each CNN convolution** by subtracting the mean and dividing by the standard deviation **across time**, **per filter**.
- Do this **before** applying nonlinearities like ReLU.

This is done **after cumulative residuals are created** — so you do **not** normalize the raw residuals or cumulative residuals before feeding into the CNN. Instead, normalization is handled **inside the CNN layers**.

____________________________

- You might want to double-check whether the portfolio is dollar-neutral (i.e., weights sum to zero) — this constraint is often enforced in statistical arbitrage.

G. Ordoñez ensures **dollar-neutrality** by **normalizing the portfolio weight vector to have an L1 norm of 1**, meaning the total absolute weight of long and short positions adds up to 1.

#### Implementation

The final portfolio weights \( \mathbf{w}_{t-1}^R \) are computed from an unscaled vector \( \mathbf{w}_{t-1}^\varepsilon \) using:

$$
\mathbf{w}_{t-1}^R = \frac{(\mathbf{w}_{t-1}^\varepsilon)^\top \Phi_{t-1}}{\left\|(\mathbf{w}_{t-1}^\varepsilon)^\top \Phi_{t-1} \right\|_1}
$$

Where:

- \( \mathbf{w}_{t-1}^\varepsilon \): raw weights from the feedforward network.
- \( \Phi_{t-1} \): the mapping from factor space (residuals) to asset space.
- The **L1 normalization** ensures:

$$
\left\| \mathbf{w}_{t-1}^R \right\|_1 = 1
$$

This constrains the portfolio to be **dollar-neutral** and leverage-controlled.


#### Why is this important?

1. **Risk Control**  
   Dollar-neutral portfolios remove exposure to the overall market (beta), so returns reflect **pure arbitrage** opportunities.

2. **Sharpe Ratio Consistency**  
   Fixing total exposure prevents the model from inflating returns and risk through unbounded leverage. This makes **Sharpe ratio optimization meaningful**.

3. **Comparability and Robustness**  
   Normalized weights allow **comparison across time** and **between models**, and simulate realistic trading strategies with constraints.

> "*We include this normalization step to prevent uncontrolled leverage and to focus on relative pricing discrepancies rather than market direction.*"


### Summary

> G. Ordoñez enforces dollar-neutrality by **L1-normalizing** the raw allocation vector, ensuring **balanced long and short exposure**, better risk management, and robust out-of-sample optimization.