# STL Decomposition in Time Series Analysis

This notebook demonstrates how to perform STL (Seasonal and Trend decomposition using Loess) decomposition on a time series dataset. STL decomposition is a powerful tool for understanding the underlying components of a time series, including seasonal patterns, trends, and residuals.

## 1. Import Required Libraries

We will import pandas, matplotlib, and statsmodels libraries for data manipulation, visualization, and STL decomposition.

In [None]:
# Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL
import statsmodels.api as sm

## 2. Load and Explore the Dataset

We will use the `AirPassengers` dataset, which contains monthly totals of international airline passengers from 1949 to 1960. The dataset will be loaded, and the time column will be set as the index. Let's display the first few rows to understand its structure.

In [None]:
# Load the AirPassengers dataset
# https://vincentarelbundock.github.io/Rdatasets/articles/data.html
data = sm.datasets.get_rdataset("AirPassengers", "datasets").data

# Add a time column and set it as the index
data['time'] = pd.date_range(start='1949-01', periods=len(data), freq='ME')
data.set_index('time', inplace=True)

# Display the first few rows of the dataset
data.head()

## 3. Perform STL Decomposition

We will use the STL class from statsmodels to decompose the time series into three components:
- **Seasonal**: Captures repeating patterns or cycles in the data.
- **Trend**: Captures the long-term progression of the series.
- **Residual**: Captures the remaining variation after removing the seasonal and trend components.

In [None]:
# Perform STL decomposition
stl = STL(data['value'], seasonal=13)
result = stl.fit()

The `seasonal=13` parameter in the STL decomposition specifies the length of the seasonal cycle in the data. Here's why it is set to 13 in this use case and how to determine it for other cases:

### **Why `seasonal=13` for AirPassengers?**
1. **Monthly Data**: The AirPassengers dataset contains monthly data over several years.
2. **Yearly Seasonality**: The dataset exhibits a clear yearly seasonal pattern, meaning the data repeats its cycle every 12 months.
3. **Odd Window Size**: STL requires an odd number for the seasonal smoothing window. Since 12 (the number of months in a year) is even, we use 13 as the closest odd number to capture the yearly seasonality.

### **How to Determine the Seasonal Value for Other Use Cases**
1. **Understand the Data Frequency**:
   - If the data is **daily**, consider whether there are weekly, monthly, or yearly patterns.
   - If the data is **hourly**, consider daily or weekly cycles.
   - If the data is **monthly**, look for yearly patterns.

2. **Identify the Seasonal Cycle**:
   - Look for repeating patterns in the data. For example:
     - Weekly patterns in daily data → `seasonal=7`.
     - Yearly patterns in monthly data → `seasonal=13` (or another odd number close to 12).
   - Use domain knowledge or exploratory data analysis (e.g., visualizing the data) to identify cycles.

3. **Choose an Odd Number**:
   - STL requires an odd number for the seasonal smoothing window. If the cycle length is even, choose the nearest odd number.

4. **Experiment and Validate**:
   - If unsure, try different values for the `seasonal` parameter and evaluate the decomposition results. Look for smooth seasonal and trend components with minimal residual noise.

### **Example Adjustments**:
- **Daily Data with Weekly Seasonality**: Use `seasonal=7`.
- **Hourly Data with Daily Seasonality**: Use `seasonal=25` (close to 24 hours).
- **Quarterly Data with Yearly Seasonality**: Use `seasonal=5` (close to 4 quarters).

By understanding the data's frequency and patterns, you can determine the appropriate `seasonal` value for STL decomposition.

## 4. Seasonal Component

The seasonal component represents the repeating patterns or cycles in the data. Let's extract and visualize the seasonal component.

In [None]:
# Extract and visualize the seasonal component
plt.figure(figsize=(10, 6))
plt.plot(result.seasonal, label='Seasonal Component', color='blue')
plt.title('Seasonal Component of AirPassengers Data')
plt.xlabel('Time')
plt.ylabel('Seasonal Value')
plt.legend()
plt.show()

## High / Low Seasons

The seasonal component from STL decomposition can help identify high and low seasons or months in the data. Here's how you can analyze the seasonal component to determine these periods:

Steps to Identify High and Low Seasons:

1. Extract the Seasonal Component: The seasonal component represents the repeating patterns in the data. Peaks in the seasonal component indicate high seasons, while troughs indicate low seasons.

2. Find the Maximum and Minimum Values: Identify the months or periods corresponding to the highest and lowest values in the seasonal component.

3. Visualize the Seasonal Component: Use a line plot to observe the repeating patterns and visually identify high and low seasons.

4. Highlight High and Low Seasons: Annotate the plot or create a table to explicitly mark the high and low periods.

In [None]:
# Extract the seasonal component
seasonal = result.seasonal

# Find the months with the highest and lowest seasonal values
high_season = seasonal.idxmax()
low_season = seasonal.idxmin()

print(f"High season occurs in: {high_season}")
print(f"Low season occurs in: {low_season}")

# Visualize the seasonal component with annotations
plt.figure(figsize=(10, 6))
plt.plot(seasonal, label='Seasonal Component', color='blue')
plt.axvline(high_season, color='green', linestyle='--', label='High Season')
plt.axvline(low_season, color='red', linestyle='--', label='Low Season')
plt.title('Seasonal Component with High and Low Seasons Highlighted')
plt.xlabel('Time')
plt.ylabel('Seasonal Value')
plt.legend()
plt.show()

## 5. Trend Component

The trend component represents the long-term progression of the time series. Let's extract and visualize the trend component.

In [None]:
# Extract and visualize the trend component
plt.figure(figsize=(10, 6))
plt.plot(result.trend, label='Trend Component', color='green')
plt.title('Trend Component of AirPassengers Data')
plt.xlabel('Time')
plt.ylabel('Trend Value')
plt.legend()
plt.show()

## 6. Residual Component

The residual component represents the remaining variation in the data after removing the seasonal and trend components. Let's extract and visualize the residual component.

In [None]:
# Extract and visualize the residual component
plt.figure(figsize=(10, 6))
plt.scatter(result.resid.index, result.resid, label='Residual Component', color='red')
plt.title('Residual Component of AirPassengers Data')
plt.xlabel('Time')
plt.ylabel('Residual Value')
plt.legend()
plt.show()

## 7. Visualize the Decomposition

The STL result provides a built-in method to visualize all components (seasonal, trend, and residual) together. Let's use this method to get an overview of the decomposition.

In [None]:
# Visualize all components together
result.plot()
plt.show()

## 8. Analyze the Components

### Seasonal Component
The seasonal component reveals repeating patterns in the data, such as monthly or yearly cycles. For the AirPassengers dataset, we observe a clear yearly seasonality, with peaks and troughs corresponding to specific months.

### Trend Component
The trend component shows a steady increase in the number of passengers over time, indicating a growing trend in international airline travel during the observed period.

### Residual Component
The residual component captures the random noise or unexplained variation in the data. Analyzing the residuals can help identify anomalies or irregularities in the time series.

### Implications
STL decomposition is a valuable tool for time series analysis, as it allows us to isolate and study individual components. This can be particularly useful for forecasting, anomaly detection, and understanding the underlying dynamics of the data.