# THEORY FOR THE RESIDUALS SECTION OF THE PROJECT

G Ordonez uses Factor models to calculate the residuals to put into the CNN:

$$
    R_{n,t} = \beta_{n,t-1} \mathbf{F}_t + \varepsilon_{n,t}
$$

- **$R_{n,t}$**: Return of asset **n** at time **t**.
- **$\beta_{n,t-1}$**: Factor loading (sensitivity) of asset **n** at time **t-1**.
- **$\mathbf{F}_t$**: Vector of factor returns at time **t**.
- **$\varepsilon_{n,t}$**: Idiosyncratic error term for asset **n** at time **t**, assumed to be white noise.

They use the factor model to remove common systematic risks and extract residuals from asset returns.

However, cointegration was in our project because it is easy to intepret and we do not incorporate additional economic indications or characterisitics, as in G Ordonez.
In the context of **global electrcity prices** cointegration provides a valid method for extracting arbitrage signals. 

Reasons to choose Cointegration over Factor Models:
1. Focus on Long-Run Equilibrium Relationships
- Cointegration directly models long-term equilibrium between asset prices. If you believe two or more assets should revert to a common relationship over time (e.g., due to supply chains, regional electricity pricing, or corporate ownership links), cointegration is ideal.

- Factor models, by contrast, focus on cross-sectional risk exposure, not long-run relationships.
___________________________________________________
2. Minimal Economic Assumptions
- Cointegration is purely statistical. It doesn’t require assumptions about systematic risk factors or the existence of risk premia.

- This makes it useful in contexts where economic theory about risk factors is either weak or not applicable, such as in commodity markets, regional electricity markets, or cryptocurrencies.
___________________________________________________
3. Robust to Market-Wide Shocks
- Cointegration-based strategies focus on relative price movements. They can ignore market-wide trends as long as the relationship between the assets holds.

- Factor models rely on systematic factors, which can be misestimated, especially during structural market shifts.
___________________________________________________
4. Well-Suited for Stable, Linked Markets
- In regulated or linked markets (e.g., electricity prices across connected regions, or currency pairs in tightly coupled economies), cointegration can capture structural relationships better than factor models.

- Factor models might not capture these specific cross-market dependencies unless those are explicitly modeled.
___________________________________________________
5. Fewer Data Requirements for Small Portfolios
- Cointegration can be applied effectively with limited data or few assets.

- Factor models often require broad cross-sectional data to reliably estimate factors and loadings.

Therefore cointegration is better in the context of global electricity prices because:
- We are working with a few assets with known economic or physical linkages (e.g. parent/subsidary companies, regional energy markets)
- The focus is on long-term price convergence, not cross-sectional risk factors
- We have do not have economic theory about risk facors in the commodity markets
- We do not have theoretical justifications about systematic risks behind Factor Models

This report delves into finding the electricity prices between neighbouring regions, that would follow a long-run relationship due to shared infrastructure and demand patterns -- perfect for cointegration.

To sum it up, we should use cointegration because:
- We want to model specific, stable, long-term relationships between a few assets
- We seek a purely statistical approach without relying on economic risk factors
- The market structure suggests a natural equillibrium

___________________________________________________________
___________________________________________________________
___________________________________________________________

# Cointegration in Statistical Arbitrage

### Overview:
Cointegration is a statistical property of time series variables that indicates a **long-term equilibrium relationship** between them, even though the individual series may be non-stationary. In **statistical arbitrage**, cointegration is used to identify portfolios where a linear combination of asset prices or returns results in a **stationary residual**, enabling **mean-reverting trading strategies**.

### Log Cumulative Returns:
Let **$P_{i,t}$** be the price of asset **i** at time **t**. The **log cumulative return** is defined as:

$$
R_{i,t} = \log\left( \frac{P_{i,t}}{P_{i,0}} \right)
$$

where **$R_{i,t}$** is the cumulative return of asset **i** since time **0**.

### Cointegration Relationship:
Consider **N** assets with log cumulative returns **$R_{1,t}, R_{2,t}, \ldots, R_{N,t}$**. These assets are cointegrated if there exists a vector **$\boldsymbol{\beta} = (\beta_1, \beta_2, \ldots, \beta_N)$** such that the linear combination:

$$
e_t = \beta_1 R_{1,t} + \beta_2 R_{2,t} + \cdots + \beta_N R_{N,t}
$$

or more compactly,

$$
e_t = \boldsymbol{\beta}^\top \mathbf{R}_t
$$

where **$\mathbf{R}_t = (R_{1,t}, R_{2,t}, \ldots, R_{N,t})^\top$**, is a **stationary process**.

### Statistical Arbitrage Strategy:
- The stationary residual **$e_t$** oscillates around a constant mean.
- When **$e_t$** deviates significantly from the mean, traders take positions assuming it will revert.
  - **Long** the undervalued assets.
  - **Short** the overvalued assets.
  
### Example of Trading Signal:
If **$e_t > \theta$**:
- Short the portfolio: **Sell high, expect mean reversion down.**

If **$e_t < -\theta$**:
- Long the portfolio: **Buy low, expect mean reversion up.**

---

### Key Assumption:
- The assets themselves can be non-stationary (e.g., follow a random walk), but the **linear combination** (portfolio) is **stationary**.

### Summary Equations:
1. **Log cumulative returns**:  
   $$
   R_{i,t} = \log\left( \frac{P_{i,t}}{P_{i,0}} \right)
   $$
2. **Cointegrated portfolio residual**:  
   $$
   e_t = \sum_{i=1}^{N} \beta_i R_{i,t}
   $$
3. **Stationary residual**:  
   $$
   e_t \sim \text{Stationary process (e.g., AR(1))}
   $$

___________________________________________________________
___________________________________________________________
___________________________________________________________

# G. Ordonez Method for Calculating Cumulative Residuals for CNN Input

The method developed by G. Ordonez et al. involves a **three-step process** to generate cumulative residuals, which are then used as input into a Convolutional Neural Network (CNN) for statistical arbitrage.


### **1. Residual Portfolio Construction**

- Residuals **$\epsilon_{n,t}$** are constructed using a **factor model**:
  $$
  R_{n,t} = \boldsymbol{\beta}_{n,t-1}^\top \mathbf{F}_t + \epsilon_{n,t}
  $$
  where:
  - **$R_{n,t}$**: Excess return of asset **n** at time **t**.
  - **$\mathbf{F}_t$**: Vector of systematic risk factors at time **t**.
  - **$\boldsymbol{\beta}_{n,t-1}$**: Factor loadings based on past information.
  - **$\epsilon_{n,t}$**: Residual representing deviations from fair value.

- Factor models used include:
  - Fama-French factors
  - PCA latent factors
  - Instrumented PCA (IPCA) for conditional latent factors

---

### **2. Cumulative Residual Calculation**

- The **cumulative residuals** are calculated by **integrating the time-series** of residuals over a rolling window of **L** days.

- For asset **n**, define **$\epsilon^L_{n,t-1}$** as the vector of past **L** residuals:
  $$
  \epsilon^L_{n,t-1} = (\epsilon_{n,t-L}, \ldots, \epsilon_{n,t-1})
  $$

- The **cumulative residual vector** **$x$** is defined as:
  $$
  x := \text{Int}(\epsilon^L_{n,t-1}) = \left( \epsilon_{n,t-L}, \sum_{l=2}^{2} \epsilon_{n,t-L-1+l}, \ldots, \sum_{l=1}^{L} \epsilon_{n,t-L-1+l} \right)
  $$

  This operation **integrates** the residuals over time to resemble a "residual price process".

---

### **3. Input to CNN**

- The **cumulative residual vector** **$x$** becomes the **input** to the **CNN**, which detects patterns for mean reversion or trending behavior.

- Example cumulative residual:
  $$
  x = \left( \epsilon_{n,t-L}, \epsilon_{n,t-L} + \epsilon_{n,t-L+1}, \ldots, \sum_{l=1}^{L} \epsilon_{n,t-L-1+l} \right)
  $$

- These cumulative values represent a transformed version of the original residual time series that captures the **temporal dynamics** more effectively for deep learning pattern recognition.

---

### **Why Use Cumulative Residuals?**

- The cumulative sum mimics a **price-like** behavior, which is more suitable for identifying trading signals.
- Raw residuals may not capture sufficient **trend information**, while cumulative forms highlight **deviations** more clearly.

---

### **Summary Equation:**

Given residuals **$\epsilon_{n,t}$**, compute:

$$
x_l = \sum_{j=1}^{l} \epsilon_{n,t-L-1+j}, \quad l = 1, \ldots, L
$$

This cumulative sequence **$x = (x_1, x_2, \ldots, x_L)$** is then fed into the CNN.


___________________________________________________________
___________________________________________________________
___________________________________________________________
# Updated understanding of the training process

### Specific Example

1. CNN Input Preperation
- **Given**: $251$ Days of $N$ assets. Dimensions of $(251, N)$
- **Residuals**: Compute $250$-day residuals of the each asset. Dimensions of $(250, N)$
- **Windowing of Cumulative Residuals**: Form $30$-day cumulative residuals of each asset. Dimensions of $(221, 30, N)$

2. CNN+Transformer+FNN
- **Input**: Each sample (30, N) -> CNN+Transformer layers
- **Output**: A single column vector representing the portfolio weights (long-short)
    - These weights tell you how much to allocate to each asset in a long-short portfolio

3. Trading Simulation
- **Next Day Trade**: After each 30-day window, you:
    - Use the CNN+Transformer+FNN output weights,
    - Apply them to the **returns of the next day** (day 31) to compute portfolio return:

        $R_{t+1}^{portfolio} = w_t^T R_{t+1}$
    
    - **Sharpe Ratio Objective**: Use these next-day portfolio returns over all 221 samples to compute **Shapre Ratio**:

        $\text{Sharpe} = \frac{\text{mean}(R^{\text{portfolio}})}{\text{std}(R^{\text{portfolio}})}$

    - **Optimized Goal**: The **model parameters** (CNN+Transformer+FNN) are **trained to maximize the Sharpe Ratio**

4. Rolling Window Shift
- After one **training cycle**, you may **shift the 251-day window** forward by **1 day**, retrain, and simulate again (this is to handle non-stationarity)

### Summary
- Compute residuals
- Form cuulative residuals windows
- Feed (30, N) into CNN+Transformer+FNN
- Get **portfolio weights**
- Trade on **next day**, compute Sharpe Ratio
- Optimize Sharpe over **all 221 samples**

### Next Considerations
- Implement **rolling window training over larger datasets**
- Integrate a custom loss function for **Sharpe Ratio optimization**
- Explore regularization prevent overfitting on residual patterns