# ARIMA Baseline Model for FOREX Volatility Forecasting

**Author:** Naveen Babu  
**Date:** January 18, 2026  
**Purpose:** Classical time series baseline for comparison with GARCH and deep learning models

---

## Objectives

1. Implement ARIMA as a classical baseline model
2. Understand ARIMA theory, assumptions, and limitations
3. Identify optimal ARIMA(p,d,q) parameters using ACF/PACF
4. Train and evaluate on FOREX log returns
5. Compare with GARCH, LSTM, and Hybrid models

---

## 1. Import Required Libraries

In [None]:
# Standard libraries
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT))

# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Statistical modeling
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import acf, pacf, adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox

# Auto ARIMA
try:
    from pmdarima import auto_arima
    AUTO_ARIMA_AVAILABLE = True
    print("‚úÖ pmdarima (auto_arima) is available")
except ImportError:
    AUTO_ARIMA_AVAILABLE = False
    print("‚ö†Ô∏è pmdarima not available. Manual parameter selection will be used.")

# Metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Project imports
from src.utils.config import PROCESSED_DATA_DIR, FIGURES_DIR, SAVED_MODELS_DIR, RANDOM_SEED
from src.models.arima_model import ARIMABaselineModel

# Set random seed
np.random.seed(RANDOM_SEED)

print(f"‚úÖ All libraries imported successfully")
print(f"Random seed: {RANDOM_SEED}")

## 2. Load and Preprocess Data

We'll use the same preprocessed data as GARCH model:
- **Log returns** of close prices
- **Chronological split**: 70% train, 15% validation, 15% test

In [None]:
# Load preprocessed data
train_data = pd.read_csv(PROCESSED_DATA_DIR / "train_data.csv")
val_data = pd.read_csv(PROCESSED_DATA_DIR / "val_data.csv")
test_data = pd.read_csv(PROCESSED_DATA_DIR / "test_data.csv")

# Convert Datetime to pandas datetime
for df in [train_data, val_data, test_data]:
    df['Datetime'] = pd.to_datetime(df['Datetime'])

print("="*60)
print("DATA LOADED")
print("="*60)
print(f"Train: {len(train_data):,} records")
print(f"Val:   {len(val_data):,} records")
print(f"Test:  {len(test_data):,} records")
print(f"\nDate range:")
print(f"  Train: {train_data['Datetime'].min()} to {train_data['Datetime'].max()}")
print(f"  Val:   {val_data['Datetime'].min()} to {val_data['Datetime'].max()}")
print(f"  Test:  {test_data['Datetime'].min()} to {test_data['Datetime'].max()}")

# Display first few rows
print(f"\n{'-'*60}")
print("TRAIN DATA SAMPLE")
print("-"*60)
display(train_data.head())

## 3. ARIMA Theory and Assumptions

### What is ARIMA?

**ARIMA** stands for **A**uto**R**egressive **I**ntegrated **M**oving **A**verage. It's a classical statistical model for time series forecasting.

**Model Structure: ARIMA(p, d, q)**

$$
(1-\phi_1 B - \phi_2 B^2 - ... - \phi_p B^p)(1-B)^d y_t = (1 + \theta_1 B + \theta_2 B^2 + ... + \theta_q B^q)\epsilon_t
$$

Where:
- **p**: Order of **AutoRegressive** (AR) component
- **d**: Degree of **Differencing** (I - Integrated)
- **q**: Order of **Moving Average** (MA) component
- **B**: Backshift operator $(B y_t = y_{t-1})$
- **$\epsilon_t$**: White noise error term

---

### Components Explained

#### 1. **AR(p) - AutoRegressive**
Current value depends on past values:
$$y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + \epsilon_t$$

#### 2. **I(d) - Integrated**
Differencing to make the series stationary:
$$\Delta y_t = y_t - y_{t-1}$$

#### 3. **MA(q) - Moving Average**
Current value depends on past errors:
$$y_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + ... + \theta_q \epsilon_{t-q}$$

---

### Key Assumptions

1. **Stationarity**: Mean, variance, and autocorrelation are constant over time
2. **Linearity**: Relationships between variables are linear
3. **Constant Variance**: Homoscedasticity (no volatility clustering)
4. **No Structural Breaks**: Model parameters don't change over time
5. **Gaussian Errors**: Residuals are normally distributed

---

### Why Use ARIMA as Baseline?

‚úÖ **Advantages:**
- Well-established theoretical foundation
- Interpretable parameters
- Efficient for univariate time series
- Captures autocorrelation patterns
- Benchmark for more complex models

‚ùå **Limitations for FOREX:**
- **Cannot model volatility clustering** (conditional heteroscedasticity)
- **Assumes constant variance** (FOREX has time-varying volatility)
- **Linear model** (FOREX may have non-linear dynamics)
- **No multivariate features** (ignores High, Low, Open, Volume)
- **Sensitive to structural breaks** (market regime changes)

üí° **Use Case:**
ARIMA serves as a **classical baseline** to evaluate whether more sophisticated models (GARCH, LSTM, Hybrid) provide meaningful improvements over traditional time series methods.

## 4. Stationarity Testing

ARIMA requires **stationary data**. We'll test using:
1. **ADF (Augmented Dickey-Fuller)**: Tests null hypothesis of unit root (non-stationary)
2. **KPSS (Kwiatkowski-Phillips-Schmidt-Shin)**: Tests null hypothesis of stationarity

### Decision Rule:
- **ADF p-value < 0.05** ‚Üí Reject unit root ‚Üí Series is stationary
- **KPSS p-value > 0.05** ‚Üí Cannot reject stationarity ‚Üí Series is stationary

In [None]:
# Extract log returns from training data
log_returns = train_data['Log_Returns'].dropna()

print("="*70)
print("STATIONARITY TESTS")
print("="*70)

# 1. Augmented Dickey-Fuller Test
print("\n1. AUGMENTED DICKEY-FULLER TEST")
print("-"*70)
adf_result = adfuller(log_returns, autolag='AIC')

print(f"ADF Statistic: {adf_result[0]:.6f}")
print(f"P-value: {adf_result[1]:.6f}")
print(f"Critical Values:")
for key, value in adf_result[4].items():
    print(f"  {key}: {value:.6f}")

if adf_result[1] < 0.05:
    print(f"\n‚úÖ STATIONARY (p-value < 0.05)")
    print("   ‚Üí Reject null hypothesis of unit root")
    print("   ‚Üí No differencing required (d=0)")
else:
    print(f"\n‚ùå NON-STATIONARY (p-value >= 0.05)")
    print("   ‚Üí Cannot reject unit root")
    print("   ‚Üí Differencing may be required (d‚â•1)")

# 2. KPSS Test
print("\n\n2. KPSS TEST")
print("-"*70)
kpss_result = kpss(log_returns, regression='c', nlags='auto')

print(f"KPSS Statistic: {kpss_result[0]:.6f}")
print(f"P-value: {kpss_result[1]:.6f}")
print(f"Critical Values:")
for key, value in kpss_result[3].items():
    print(f"  {key}: {value:.6f}")

if kpss_result[1] > 0.05:
    print(f"\n‚úÖ STATIONARY (p-value > 0.05)")
    print("   ‚Üí Cannot reject stationarity")
else:
    print(f"\n‚ùå NON-STATIONARY (p-value <= 0.05)")
    print("   ‚Üí Reject stationarity")

# Summary
print("\n" + "="*70)
print("CONCLUSION")
print("="*70)

adf_stationary = adf_result[1] < 0.05
kpss_stationary = kpss_result[1] > 0.05

if adf_stationary and kpss_stationary:
    print("‚úÖ Log returns are STATIONARY (both tests agree)")
    print("   ‚Üí ARIMA assumption satisfied")
    print("   ‚Üí Suggested differencing: d=0")
elif adf_stationary and not kpss_stationary:
    print("‚ö†Ô∏è Mixed results: ADF says stationary, KPSS says non-stationary")
    print("   ‚Üí Consider d=0 or d=1")
elif not adf_stationary and kpss_stationary:
    print("‚ö†Ô∏è Mixed results: ADF says non-stationary, KPSS says stationary")
    print("   ‚Üí Consider d=0 or d=1")
else:
    print("‚ùå Log returns are NON-STATIONARY (both tests agree)")
    print("   ‚Üí Differencing required: d‚â•1")

## 5. ACF and PACF Analysis

**Autocorrelation Function (ACF)** and **Partial Autocorrelation Function (PACF)** help identify ARIMA orders:

- **ACF**: Shows correlation between $y_t$ and $y_{t-k}$ ‚Üí Identifies **MA(q)** order
- **PACF**: Shows correlation between $y_t$ and $y_{t-k}$ after removing intermediate correlations ‚Üí Identifies **AR(p)** order

### Interpretation Rules:

| Pattern | ACF | PACF | Model |
|---------|-----|------|-------|
| **AR(p)** | Decays gradually | Cuts off after lag p | ARIMA(p, d, 0) |
| **MA(q)** | Cuts off after lag q | Decays gradually | ARIMA(0, d, q) |
| **ARMA(p,q)** | Decays gradually | Decays gradually | ARIMA(p, d, q) |

In [None]:
# Plot ACF and PACF
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# ACF Plot
plot_acf(log_returns, lags=40, ax=axes[0], alpha=0.05)
axes[0].set_title("Autocorrelation Function (ACF)", fontsize=14, fontweight='bold')
axes[0].set_xlabel("Lag", fontsize=12)
axes[0].set_ylabel("ACF", fontsize=12)
axes[0].grid(True, alpha=0.3)

# PACF Plot
plot_pacf(log_returns, lags=40, ax=axes[1], alpha=0.05, method='ywm')
axes[1].set_title("Partial Autocorrelation Function (PACF)", fontsize=14, fontweight='bold')
axes[1].set_xlabel("Lag", fontsize=12)
axes[1].set_ylabel("PACF", fontsize=12)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_DIR / "arima_acf_pacf.png", dpi=300, bbox_inches='tight')
plt.show()

print("\n" + "="*70)
print("ACF/PACF INTERPRETATION")
print("="*70)
print("\nüìä Visual Analysis:")
print("   ‚Üí Check where ACF cuts off (MA order)")
print("   ‚Üí Check where PACF cuts off (AR order)")
print("   ‚Üí Blue shaded area = 95% confidence interval")
print("\nüí° Tip: If both decay gradually ‚Üí ARMA model (mixed AR and MA)")

## 6. Parameter Identification: Auto ARIMA

We'll use `auto_arima` from pmdarima (if available) to automatically identify optimal ARIMA(p,d,q) parameters.

**Selection Criteria:**
- **AIC (Akaike Information Criterion)**: Balances model fit and complexity
- **BIC (Bayesian Information Criterion)**: More penalty for complexity than AIC

**Search Space:**
- p: 0 to 5 (AR order)
- d: 0 to 2 (differencing)
- q: 0 to 5 (MA order)

In [None]:
# Identify optimal ARIMA order
if AUTO_ARIMA_AVAILABLE:
    print("="*70)
    print("AUTO ARIMA PARAMETER SEARCH")
    print("="*70)
    print("Searching optimal ARIMA(p,d,q) parameters...")
    print("Search space: p=[0,5], d=[0,2], q=[0,5]")
    print("Criterion: AIC (Akaike Information Criterion)\n")
    
    optimal_order = ARIMABaselineModel.identify_order_auto(
        log_returns,
        max_p=5,
        max_d=2,
        max_q=5
    )
else:
    print("="*70)
    print("MANUAL PARAMETER IDENTIFICATION (ACF/PACF)")
    print("="*70)
    
    optimal_order = ARIMABaselineModel.identify_order_manual(
        log_returns,
        max_lags=40
    )

print(f"\n‚úÖ Optimal ARIMA order: {optimal_order}")
print(f"   ‚Üí AR(p) = {optimal_order[0]}")
print(f"   ‚Üí I(d) = {optimal_order[1]}")
print(f"   ‚Üí MA(q) = {optimal_order[2]}")

## 7. Train ARIMA Model

Now we'll train the ARIMA model on training data only.

In [None]:
# Initialize and train ARIMA model
arima_model = ARIMABaselineModel(order=optimal_order)
arima_model.fit(train_data, target_col='Log_Returns')

## 8. Generate Predictions

Generate predictions on train, validation, and test sets.

In [None]:
# Generate predictions
predictions = arima_model.predict(
    val_data=val_data,
    test_data=test_data,
    target_col='Log_Returns'
)

## 9. Evaluate Model Performance

Calculate RMSE, MAE, R¬≤, and directional accuracy.

In [None]:
# Evaluate model
metrics = arima_model.evaluate()

# Create metrics table
metrics_df = pd.DataFrame(metrics).T
metrics_df = metrics_df[['rmse', 'mae', 'r2', 'directional_accuracy', 'n_samples']]
metrics_df.columns = ['RMSE', 'MAE', 'R¬≤', 'Directional Accuracy (%)', 'Samples']

print("\n" + "="*70)
print("PERFORMANCE METRICS")
print("="*70)
display(metrics_df.style.format({
    'RMSE': '{:.6f}',
    'MAE': '{:.6f}',
    'R¬≤': '{:.4f}',
    'Directional Accuracy (%)': '{:.2f}',
    'Samples': '{:,.0f}'
}).background_gradient(subset=['RMSE', 'MAE'], cmap='RdYlGn_r')
   .background_gradient(subset=['R¬≤'], cmap='RdYlGn')
   .background_gradient(subset=['Directional Accuracy (%)'], cmap='RdYlGn'))

## 10. Visualize Results

In [None]:
# Visualization: Actual vs Predicted
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

subsets = ['train', 'val', 'test']
titles = ['Training Set', 'Validation Set', 'Test Set']
colors = ['#2E86AB', '#A23B72', '#F18F01']

for idx, (subset, title, color) in enumerate(zip(subsets, titles, colors)):
    y_true = predictions[subset]['y_true']
    y_pred = predictions[subset]['y_pred']
    
    # Plot actual vs predicted
    axes[idx].plot(y_true, label='Actual', color=color, linewidth=1.5, alpha=0.8)
    axes[idx].plot(y_pred, label='Predicted', color='red', linewidth=1.2, alpha=0.7, linestyle='--')
    
    axes[idx].set_title(f"{title} - ARIMA{optimal_order}", fontsize=14, fontweight='bold')
    axes[idx].set_xlabel("Time Steps", fontsize=11)
    axes[idx].set_ylabel("Log Returns", fontsize=11)
    axes[idx].legend(loc='upper right', fontsize=10)
    axes[idx].grid(True, alpha=0.3)
    
    # Add metrics text
    rmse = metrics[subset]['rmse']
    mae = metrics[subset]['mae']
    r2 = metrics[subset]['r2']
    dir_acc = metrics[subset]['directional_accuracy']
    
    textstr = f'RMSE: {rmse:.6f}\nMAE: {mae:.6f}\nR¬≤: {r2:.4f}\nDir. Acc: {dir_acc:.2f}%'
    axes[idx].text(0.02, 0.98, textstr, transform=axes[idx].transAxes,
                   fontsize=10, verticalalignment='top',
                   bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))

plt.tight_layout()
plt.savefig(FIGURES_DIR / "arima_predictions.png", dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# Residual Analysis
fig, axes = plt.subplots(3, 1, figsize=(16, 10))

for idx, (subset, title, color) in enumerate(zip(subsets, titles, colors)):
    y_true = predictions[subset]['y_true']
    y_pred = predictions[subset]['y_pred']
    residuals = y_true - y_pred
    
    axes[idx].scatter(range(len(residuals)), residuals, alpha=0.6, color=color, s=20)
    axes[idx].axhline(y=0, color='red', linestyle='--', linewidth=2)
    axes[idx].set_title(f"Residuals - {title}", fontsize=14, fontweight='bold')
    axes[idx].set_xlabel("Time Steps", fontsize=11)
    axes[idx].set_ylabel("Residuals", fontsize=11)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(FIGURES_DIR / "arima_residuals.png", dpi=300, bbox_inches='tight')
plt.show()

## 11. Save Results

In [None]:
# Save predictions and metrics
output_dir = arima_model.save_results()

# Save model
model_path = arima_model.save_model()

print(f"\n‚úÖ Results saved to: {output_dir}")
print(f"‚úÖ Model saved to: {model_path}")

## 12. Limitations Analysis for FOREX Data

### Why ARIMA Struggles with FOREX:

#### 1. **Volatility Clustering** (ARCH Effects)
- **Problem:** FOREX exhibits **volatility clustering** - periods of high volatility followed by high volatility, and periods of calm followed by calm
- **ARIMA Assumption:** Constant variance (homoscedasticity)
- **Reality:** FOREX has **conditional heteroscedasticity** (time-varying variance)
- **Solution:** Use GARCH models to capture volatility dynamics

#### 2. **Heavy-Tailed Distributions**
- **Problem:** FOREX returns have **fat tails** and **excess kurtosis** (more extreme events than normal distribution)
- **ARIMA Assumption:** Gaussian errors
- **Reality:** FOREX returns follow leptokurtic distributions (Student-t, Laplace)
- **Impact:** ARIMA underestimates extreme events (Black Swan events)

#### 3. **Non-Linear Dependencies**
- **Problem:** FOREX has **non-linear relationships** between past and future values
- **ARIMA Assumption:** Linear relationships
- **Reality:** Regime changes, market microstructure effects, feedback loops
- **Solution:** Use deep learning models (LSTM, GRU) or regime-switching models

#### 4. **No Leverage Effect**
- **Problem:** FOREX exhibits **asymmetric volatility** (negative returns increase volatility more than positive returns)
- **ARIMA Limitation:** Cannot capture asymmetric responses
- **Solution:** Use EGARCH or GJR-GARCH models

#### 5. **Ignores Multivariate Information**
- **Problem:** FOREX is influenced by multiple factors (High, Low, Open, Volume, macroeconomic indicators)
- **ARIMA Limitation:** Univariate model (only uses Close price / log returns)
- **Solution:** Use VAR (Vector Autoregression) or multivariate deep learning models

---

### Test for ARCH Effects (Volatility Clustering)

In [None]:
# Test for ARCH effects (Ljung-Box test on squared residuals)
test_residuals = predictions['test']['y_true'] - predictions['test']['y_pred']
squared_residuals = test_residuals ** 2

# Ljung-Box test
lb_result = acorr_ljungbox(squared_residuals, lags=[10, 20, 30], return_df=True)

print("="*70)
print("LJUNG-BOX TEST FOR ARCH EFFECTS")
print("="*70)
print("Null Hypothesis: No autocorrelation in squared residuals (no ARCH effects)")
print("\nTest Results:")
display(lb_result)

if (lb_result['lb_pvalue'] < 0.05).any():
    print("\n‚ùå ARCH EFFECTS DETECTED (p-value < 0.05)")
    print("   ‚Üí Squared residuals are autocorrelated")
    print("   ‚Üí Volatility clustering present")
    print("   ‚Üí ARIMA assumption VIOLATED")
    print("   ‚Üí üí° Consider GARCH model instead")
else:
    print("\n‚úÖ NO ARCH EFFECTS (p-value >= 0.05)")
    print("   ‚Üí ARIMA assumption satisfied")

## 13. Comparison with Other Models

Let's compare ARIMA performance with GARCH, LSTM, and Hybrid models.

In [None]:
# Create comparison table (hypothetical - will be updated after running other models)
comparison_data = {
    'Model': ['Naive Baseline', 'ARIMA', 'GARCH', 'LSTM', 'Hybrid GARCH-LSTM'],
    'Type': ['Statistical', 'Statistical', 'Statistical', 'Deep Learning', 'Hybrid'],
    'Captures Volatility': ['No', 'No', 'Yes', 'Partial', 'Yes'],
    'Multivariate': ['No', 'No', 'No', 'Yes', 'Yes'],
    'Non-Linear': ['No', 'No', 'No', 'Yes', 'Yes'],
    'Test RMSE': [0.012, metrics['test']['rmse'], 'TBD', 'TBD', 'TBD'],
    'Test MAE': [0.009, metrics['test']['mae'], 'TBD', 'TBD', 'TBD'],
    'Test Dir. Acc. (%)': [50.0, metrics['test']['directional_accuracy'], 'TBD', 'TBD', 'TBD']
}

comparison_df = pd.DataFrame(comparison_data)

print("="*80)
print("MODEL COMPARISON")
print("="*80)
display(comparison_df.style.set_properties(**{
    'text-align': 'center'
}).set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center'), ('font-weight', 'bold')]}
]))

print("\nüí° Key Insights:")
print("   ‚Ä¢ ARIMA is a classical baseline but cannot capture volatility clustering")
print("   ‚Ä¢ GARCH specifically models time-varying volatility")
print("   ‚Ä¢ LSTM can learn complex non-linear patterns from multivariate features")
print("   ‚Ä¢ Hybrid model combines GARCH volatility with LSTM's deep learning capabilities")

## 14. Conclusions and Recommendations

### Summary

‚úÖ **ARIMA Successfully Implemented:**
- Optimal order identified: ARIMA{optimal_order}
- Model trained on log returns (training data only)
- Evaluated on validation and test sets
- Predictions and metrics saved for comparison

---

### When to Use ARIMA

#### ‚úÖ **Use ARIMA when:**
1. **Simple baseline needed**: Quick benchmark for time series forecasting
2. **Stationary data**: Data has constant mean and variance
3. **Linear relationships**: Autocorrelation patterns are linear
4. **Interpretability matters**: Need clear understanding of model parameters
5. **Low computational cost**: Limited resources for training

#### ‚ùå **Avoid ARIMA when:**
1. **Volatility clustering present**: Use GARCH instead
2. **Non-linear patterns**: Use deep learning (LSTM, GRU)
3. **Multivariate features**: Use VAR, LSTM, or Hybrid models
4. **Heavy-tailed distributions**: Use robust models or regime-switching
5. **Structural breaks**: Use regime-switching models or retrain frequently

---

### Recommendations for FOREX Forecasting

1. **Use ARIMA as baseline** to establish performance floor
2. **Combine with GARCH** to capture volatility dynamics
3. **Leverage deep learning** (LSTM) for multivariate non-linear patterns
4. **Hybrid approach** (GARCH-LSTM) combines best of both worlds
5. **Ensemble methods** can further improve accuracy

---

### Next Steps

1. ‚úÖ ARIMA baseline complete
2. üîÑ Compare with GARCH model (volatility modeling)
3. üîÑ Compare with LSTM model (deep learning)
4. üîÑ Compare with Hybrid GARCH-LSTM (combined approach)
5. üîÑ Ensemble all models for final predictions

---

**End of ARIMA Baseline Notebook**