## 9.9 SARIMA Model

- SARIMA stands for **Seasonal AutoRegressive Integrated Moving Average**. 
- It is a time series model that extends the ARIMA model to account for seasonality in data.

**Components of SARIMA**:

- **SAR (Seasonal AutoRegressive)**: Models the relationship between the current value of a time series and its own past values at specific intervals (e.g., monthly, quarterly).
- **I (Integrated)**: Makes the time series stationary by differencing at regular intervals.
- **MA (Moving Average)**: Models the relationship between the current value of a time series and past error terms at specific intervals.


**SARIMA Model**:

Yt = c + φ1Yt-1 + ... + φpYt-p + θ1εt-1 + ... + θqεt-q + Φ1Yt-s + ... + ΦpYt-sP + Θ1εt-s + ... + Θqεt-sQ + εt
where:

      - Yt is the value of the time series at time t
      - c is a constant term
      - φ1, ..., φp are the AR coefficients
      - θ1, ..., θq are the MA coefficients
      - Φ1, ..., Φp are the seasonal AR coefficients
      - Θ1, ..., Θq are the seasonal MA coefficients
      - εt is the error term
      - s is the periodicity (e.g., 12 for monthly data)

**(p, d, q), (P, D, Q)** are the orders of the AR, I, MA, SAR, seasonal I, and seasonal MA components, respectively.

**Steps in Building a SARIMA Model**:

- **Identify Seasonality**: Determine the periodicity of the data (e.g., daily, weekly, monthly).
- **Stationarize the Data**: Make the time series stationary by differencing and/or detrending.
- **Select Model Order**: Choose the appropriate values for p, d, q, P, D, and Q. This can be done using techniques like information criteria (AIC, BIC) or grid search.
- **Estimate Parameters**: Estimate the model parameters using a method like maximum likelihood estimation.
- **Evaluate Model Fit**: Assess the model's goodness of fit using diagnostic tests and information criteria.


#### -- -- -- -- --

**Advantages of SARIMA**:

- Can handle seasonal patterns: SARIMA is well-suited for modeling time series data with seasonal components.
- Flexible: SARIMA can capture a wide range of patterns in time series data.
- Forecasting: SARIMA can be used for forecasting future values of the time series.


**Disadvantages of SARIMA**:

- Complexity: SARIMA models can be complex, especially for large values of p, d, q, P, D, and Q.
- Stationarity Assumption: SARIMA assumes that the data is stationary after differencing.
- Sensitivity to Model Order: The choice of model order can significantly impact the model's performance.


#### -- -- -- -- --
In summary, SARIMA is a powerful tool for modeling and forecasting time series data with seasonal patterns. By carefully selecting the model components and estimating the parameters, SARIMA can provide accurate and reliable predictions

### -- -- -- -- --
**Finding the Optimal SARIMA Model Order**

Determining the optimal values for p, d, q, P, D, and Q in a SARIMA model can be a challenging task. Here are some common approaches:

- 1. **Visual Inspection of ACF and PACF Plots**:
  - **ACF (Autocorrelation Function)**: Can help identify the AR and MA components.
    - A significant spike at lag k in the ACF suggests an AR(k) term.
    - A gradual decay in the ACF suggests an MA(k) term.
  - **PACF (Partial Autocorrelation Function)**: Can also help identify the AR and MA components.
    - A significant spike at lag k in the PACF suggests an AR(k) term.
    - A gradual decay in the PACF suggests an MA(k) term.
    
- 2. **Information Criteria**:
  - **AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion)**: These metrics penalize models with more parameters. A lower AIC or BIC value generally indicates a better-fitting model.
  - **Grid Search**: Try different combinations of p, d, q, P, D, and Q and select the model with the lowest AIC or BIC.
  
- 3. **Manual Trial and Error**:
  - Start with a simple model (e.g., SARIMA(1,1,1)(1,1,1)s) and gradually increase the order of the components until the model fits the data well.

- 4. **Automatic Model Selection**:
  - Some software packages have built-in functions for automatically selecting the optimal SARIMA model order.
  
  
**Tips**:
  - **Stationarity**: Ensure the data is stationary before modeling.
  - **Seasonality**: Identify the seasonal period (s) based on the data.
  - **Differencing**: Use differencing (d) to make the data stationary if necessary.
  - **Model Diagnostics**: Evaluate the model's fit using diagnostic tests like the Ljung-Box test for autocorrelation in the residuals.
  - **Consider Subject Matter Knowledge**: Incorporate any domain-specific knowledge about the data to guide your model selection

### -- -- -- -- --
### Ljung-Box Test

- The Ljung-Box test is a statistical test used to determine if a series of residuals from a time series model exhibits significant autocorrelation. 
- In other words, it checks whether there are any remaining patterns or dependencies in the residuals after fitting a model.

**Null Hypothesis**:
  - H0: The residuals are uncorrelated.

**Alternative Hypothesis**:
  - H1: There is autocorrelation in the residuals.


**Test Statistic**:

  - The Ljung-Box test statistic is calculated as:
             
             Q = n * (n + 1) / (2 * (n - k)) * Σ(ρk² / k)
   - where:

       - n: The sample size
       - k: The number of lags tested
       - ρk: The autocorrelation coefficient at lag k


**Critical Values**:

 - The test statistic is compared to the chi-squared distribution with k degrees of freedom. 
 - If the test statistic is greater than the critical value at a given significance level, we reject the null hypothesis and conclude that there is evidence of autocorrelation.   

**Interpretation**:

 - Large Q-statistic: A large Q-statistic indicates significant autocorrelation in the residuals.
 - Small Q-statistic: A small Q-statistic suggests that the residuals are uncorrelated.


**Use in Time Series Analysis**:

  - Model Diagnostics: The Ljung-Box test is used to assess the adequacy of a time series model. 
     - If there is significant autocorrelation in the residuals, it indicates that the model may not be capturing all the relevant patterns in the data.
  - Model Selection: The Ljung-Box test can be used to compare different models and select the one with the lowest Q-statistic

# -- -- -- -- -- 
## 10 Advanced Forecasting Methods

**1. Complex Seasonality**
  - Seasonal patterns that vary over time.
  - SARIMAX (Seasonal ARIMA with Exogenous Regressors): Incorporates additional explanatory variables to capture complex seasonal patterns.
  - Dynamic Harmonic Regression: Models seasonal patterns using harmonic functions with time-varying coefficients.

#### -- -- -- -- --

**2. Prophet**
- Facebook's open-source forecasting library.
- Combines trend, seasonality, and holidays.
- Handles missing values and outliers.
- Suitable for large datasets with complex patterns.

#### -- -- -- -- --

**3. Vector Autoregression (VAR)**
- Models multiple interrelated time series simultaneously.
- Captures the interdependence between variables.
- Useful for economic forecasting and analysis of multivariate systems.

#### -- -- -- -- --

**4. Neural Network Models**
- Flexible and powerful models capable of learning complex patterns.
- Types: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs).
- Suitable for large datasets and non-linear relationships.

#### -- -- -- -- --

**5. Bootstrapping and Bagging**
- Resampling techniques for improving forecasting accuracy.
- Bootstrapping: Creates multiple samples by randomly drawing observations with replacement from the original dataset.
- Bagging (Bootstrap Aggregating): Trains multiple models on bootstrap samples and combines their predictions.
- Ensemble Methods: Combine multiple models to reduce variance and improve accuracy.


# Happy Learning