# More models

- ARDL
- VAR
- ECM Error correction model
- SVAR structural vector auto-regression
- GMM with constraints
- CVAR co integrated var

Insight. SARIMA model can work with data that is I(2), meanwhile the rest of the models can't, except for VARI, Vector autoregression for integrated series, and CVAR but a modified version.

## ARDL Auto-regressive distributed lag model

The model is a econometric model used to analyze the dynamic relationship between dependent variable and one or more explanatory variables, while incorporating both their current and lagged values. It is particulary useful when variables are a mix of I(0), I(1) (non-stationary) but not I(2) variables. 

1. Formula: 

$$Y_t = c + ∑(ϕ_i Y_{t-i}) + ∑(∑(β_{j,m} X_{j,t-m})) + ε_t$$

a single explanatory variable $X_t$, ARDL(p,q) model is:

$$Y_t = c + ∑(ϕ_i Y_{t-i}) + ∑(β_m X_{t-m}) + ε_t$$

2. Intuition

    1. Combines lagged effects
    2. Short term and long term relationship: short in lagged terms and long-term from equilibrium relationship between variables
    3. Flexibility with mixed integration orders
    4. Error correction mechanism
    
3. Steps

    1. Check order of integration. non $I(2)$
    2. Lag selection: AIC, BIC, HQIC
    3. Estimate the model
    4. Test for cointegration. Bounds test, check for presence of long-term relationship
    5. Transform into ECM (if cointegration exists): rewrite the model in ECM
    
**Assumptions**

1. Stationary residuals
2. No $I(2)$ variables. That must be $I(0)$ or $I(1)$
3. Linear relationship between variables
4. Exogeneity of explanatory variables
5. Sufficient Data length

**Limitations**

1. Dependent on lag selection
2. Non-linear relationships
3. Bound test limitations (not robust for small samples)
4. Model complexity
5. Endogeneity: weak exogeneity of regressor.

**Applications**

1. Finance 
    1. Asset pricing.
    2. Volatility modeling
    3. Exchange rates
    
2. Macroeconomics
    1. Monetary policy
    2. Trade analysis
    3. Consumption and saving
    
---

## VAR

statistical use to capture linear interdependencies among multiple time series. All varaibles are endogenous. Models dynamics between them.

$$ Y_t = c + \sum_{i=1}^{p} A_i Y_{t-i} + \epsilon_t $$

where is term is a vector or a matrix.

**Assumptions**

1. Stationarity
2. Linearity
3. No multicollinearity
4. White noise residuals
5. Sufficient lag selection.

**Limitations**

1. High dimensionality
2. Stationarity
3. No structural interpretability
4. Overfitting risk
5. Causality. (it only shows associations)

**Applications**

1. Multivariate time series analysis
2. Forecasting
3. Shock analysis (impulse response functions)
4. Macroeconomic systems

---

## ECM error correlation model

Used to model dynamic relationships between time series while explicitly taking into account the long-run equilibrium relationships between them. It is a special case of VAR. It is employed when the variables are cointegrated.(i.e. they have a stable, long-term relationship despite being non-stationary individually).

There is only one target variable. It focuses in co-integration. there is no symmetry in variables.

$$ΔY_t = α + βΔX_t + γEC_t-1 + ε_t$$

$$EC_{t-1} = Y_{t-1} - \beta_0 - \beta_1X_{t-1}$$

EC measures how far the system is from the equilibrium relationship.

**Assumptions**

1. Cointegration: variables are non-stationary I(1) but have a stable long-term relationship.
2. Stationarity differences: first difference is stationary.
3. Linearity
4. White noise error
5. Exogeneity: $X_t$ is weakly exogenous

**Limitations** 

1. Assumes cointegration
2. Linearity
3. Limited short-term modeling
4. sensitivity to specifications
5. Small sample bias: ecm models can suffer from inefficiency and instability in small datasets

**Applications**

1. Macroeconomics
2. Finance. stock price adjustments.
3. Energy economics
4. Policy analysis. short term shocs affect long term economic
5. international trade


**How to identify cointegration** 

There are statistical tests. They measures long term relationships between non-stationary variables.

1. Engle-Granger test
    1. Uses OLS to estimate relationships.
    2. Test for stationarity of residuals.
  
2. Johansen test


---

## SVAR: Structural vector auto-regression

extension of VAR model. Incorporates structural information based on economic or theoretical assumptions to identify casual relationships between variables. Unlike VAR, which models associations, SVAR seeks to establish cause-and-effect dynamics. This model uses structual constraints to infer causality

$$A Y_t = c + ∑ (B_i Y_{t-i}) + ε_t$$

A: structural impact. relationships between variables

VAR is $Y_t = c + ∑ (Φ_i Y_{t-i}) + e_t$

**Assumptions**

1. Identifiability: To identify $A$ and $B_i$ the number of restrictions on $A$ must be sufficient.
2. Structural shocks: $ε_t$ are uncorrelated and have economic meaning (demand shock, supply shock)
3. Stationarity
4. Linearity

**Limitations**

1. Identifiability: very subjective
2. High dimensions: higher dimensions, exponentially more restrictions
3. Stationarity
4. Overfitting
5. Linear

**Applications**

1. Macroeconomics
2. Finance: study impact of finantial shocks, credit speads or market volatility
3. Energy economics
4. International trade

---


## GMM generalized method of moments

Econometric technique used when standard assumsptions of OLS or MLE are violated. It relies on moment condidions derivated from economic theory or data properties to estimate parameters.

1. The model is calculated by minimizing the difference between the sample moments and the model moments.

$$ θ̂ = argmin_θ g(θ)' W g(θ)$$

$g(θ)$ is the vector of moment conditions. $g(\theta) = \frac{1}{n} \sum^n_{i=1} z_i h(x_i, \theta) $

- $h(x_i, \theta)$ is the moment function, representing theoretical relationships

$W$ is a positive definite weighting matrix.


2. Steps to estimate parameters using GMM

    1. Define moment conditions. for example $E[h(x_i, \theta)] = 0$, with $h(x_i, \theta)= x_i (y_i - x_i' \theta)$ this ensure the residuals $(y_i - x_i' \theta)$ are uncorrelated with $x_i$
    2. Choose a weighting matrix $W$
    3. Optimize

**Assumptions**

1. Moment conditions are valid
2. Weighting matrix is positive definite
3. Instrument relevance. (Instrumental variables are correlated with the endogenous variables but uncorrelated with the error term)
4. Exogeneity
5. Stationarity
6. Sufficient sample size

**Limitations**

1. Sensitive to instrument choice
2. Overfitting
3. Finite-sample problem
4. Misspecified moment conditions
5. Computational complexity

**Applications**

1. Finance: asset pricing, risk-return relationship, volatility modeling
2. Macroeconomics: estimating DSGE models (dynamic stochastic general equilibrium), phillips curve estimation, policy analysis.

| **Feature**            | **OLS**                      | **MLE**                     | **GMM**                     |
|-------------------------|------------------------------|-----------------------------|-----------------------------|
| **Moment Conditions**   | Implicit                    | Likelihood-based            | Explicitly defined          |
| **Endogeneity**         | Cannot handle               | Requires strong assumptions | Handles via instruments     |
| **Efficiency**          | Efficient if assumptions hold | Efficient under correct model | Asymptotically efficient    |
| **Robustness**          | Sensitive to outliers       | Sensitive to distribution   | Robust to some misspecifications |
| **Applications**        | Linear regression           | Structural models           | Wide-ranging, incl. finance |

---


## CVAR: Co-integrated vector autoregression

Is an extension of VAR designed for non-stationary time series that are cointegrated. It combines the short-term dynamic of VAR with long-term equilibrium relationships between variables (cointegration).

1. The CVAR model is often represented as a Vector Error Correction Model (VECM), which is derived from the VAR model.

$$ΔY_t = Π Y_{t-1} + ∑(Γ_i ΔY_{t-i}) + ε_t$$

- $ΔY_t$: first difference of endogenous variables
- $\Pi$ long term impact matrix, captures cointegration relationships.
- $Γ_i$ short-term adjustement matrix, capturing dynamic relationships in first differences.

2. Cointegration relationship

- Matrix $\Pi$ can be decomposed as: $\Pi = αβ'$, where $α$ and $β$ are matrices of cointegration vectors.
- Short-term dynamic: captured by $Γ_i$ matrix.
- Error correction term

3. Intuition

- Long-Term Equilibrium: CVAR assumes that while individual time series may be non-stationary, linear combinations of them are stationary. These stationary combinations define the long-term relationships.
- Short-Term Adjustments: The model captures how deviations from long-term equilibrium influence short-term dynamics.
- Combination of VAR and Cointegration: CVAR combines the strengths of VAR for short-term analysis with cointegration for long-term analysis.

**Assumptions**

1. Cointegration: Variables are non-stationary but have stable long-term relationships. Johansen test. 
2. Stationarity differences: First difference is stationary.
3. Linearity
4. White noise error
5. Exogeneity: the model assumes no significant external shocks that systematically bias relationships
6. Sufficient sample size

**Limitations**

1. Complexity
2. Sensitivity to specifications
3. Stationary assumptions
4. Linear assumption
5. Data intensive: estimation requires substantial data, specially when working with many variables.

**Applications**

1. Finance
    1. Asset pricing
    2. Exchange rates
    3. Interest rates
    
2. Macroeconomics
    1. Policy analysis
    2. Economic forecasting
    3. Inflation dynamics
    


| **Feature**                | **VAR**                                  | **SVAR**                                     | **CVAR**                                       |
|----------------------------|------------------------------------------|---------------------------------------------|-----------------------------------------------|
| **Stationarity**            | Requires stationarity or differencing.  | Requires stationarity or differencing.      | Handles non-stationary but cointegrated data. |
| **Long-Term Relationships** | Not modeled.                            | Not modeled explicitly.                     | Explicitly includes cointegration (via \( Π \)). |
| **Causality**               | No causal interpretation.               | Causal relationships inferred via structure.| Cointegration relationships provide economic meaning. |
| **Model Structure**         | Simple lagged dynamics.                 | Adds contemporaneous structural constraints.| Combines VAR with error correction (VECM).    |
| **Applications**            | Short-term forecasting.                 | Policy analysis and causal inference.       | Long-term equilibrium and short-term dynamics.|

---

---

### Ejemplo ARDL

```python
from statsmodels.tsa.ardl import ARDL
import statsmodels.api as sm

y = df['Comptes a terme et livrets ordinaires, actif des menages, encours trimestriel']
X = df[['Taux de rémunération annuel des livrets ordinaires']]
X = pd.DataFrame(X)

max_lag_y = 5
max_lag_X = [1, 3]

model = ARDL(y, max_lag_y, X, max_lag_X)
result = model.fit()

print(result.summary())

plt.figure(figsize=(10, 5))
plt.plot(y.iloc[max_lag_y:], label='Comptes a terme et livrets ordinaires')
plt.plot(df.iloc[max_lag_y:].index, result.fittedvalues, label='Fitted Values')
plt.xlabel('Date')
plt.ylabel('Encours trimestriel')
plt.legend()
plt.grid(True)
plt.title("ARDL Model: Comptes a terme et livrets ordinaires vs. Taux de rémunération annuel")
plt.show()

def backtesting_ARDL(y, X, lag_y, lag_X, test_size=24, trend='c'):
    import numpy as np
    import pandas as pd
    from statsmodels.tsa.ardl import ARDL
    
    # Split the data into training and testing sets
    y_train, y_test = y[:-test_size], y[-test_size:]
    X_train, X_test = X[:-test_size], X[-test_size:]
    
    # Fit the ARDL model
    model = ARDL(y_train, lag_y, X_train, lag_X, trend=trend)
    result = model.fit()
    
    # Prepare the exogenous variables for prediction (ensure lags are correct)
    X_with_lags = X.copy()
    X_with_lags.index = y.index  # Ensure alignment of indice
    
    # Predict the test set
    y_pred = result.forecast(steps = test_size, exog=X_with_lags)
    
    # Calculate the MAPE
    mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
    
    # Calculate the MSE
    mse = np.mean((y_test - y_pred)**2)
    
    print(f"Mean Absolute Percentage Error (MAPE): {mape:.4f}%")
    print(f"Mean Squared Error (MSE): {mse:.4f}")
    
    return y_pred
    


y_pred = backtesting_ARDL(y, X, max_lag_y, max_lag_X, test_size = 24)

plt.figure(figsize=(10, 5))
plt.plot(y, label='Comptes a terme et livrets ordinaires')
plt.plot(y_pred, label='Predicted Values')
plt.xlabel('Date')
plt.ylabel('Encours trimestriel')
plt.legend()
plt.grid(True)

```