# Data mining

# Lesson 5

# Forecasting Trends and Seasonality in Time Series Data

### **Objectives:**
- Understand the components of time series data: trend, seasonality, and noise.
- Perform decomposition to analyze these components.
- Apply methods like moving averages and exponential smoothing for forecasting.
- Use models like ARIMA or Prophet to predict future values.

### **Description**

Time series analysis is a powerful method for understanding data that evolves over time. In this lab, we will focus on identifying trends, seasonality, and noise in time series data. Using methods like moving averages, decomposition, and forecasting models, students will learn to make predictions about future values.

We will simulate synthetic time series data with trends, seasonality, and random noise, allowing students to explore different forecasting techniques effectively.


### Libraries that we use:

- [Pandas](https://pandas.pydata.org/) - a library for working with tabular data, which will help us in the data preparation phase.
- [Matplotlib](https://matplotlib.org/) and [Seaborn](https://seaborn.pydata.org/) - for data visualization and identifying interesting patterns.
- [Numpy](https://numpy.org/) - a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- [statsmodels](https://www.statsmodels.org/stable/index.html) -  for decomposition and ARIMA.
- [fbprophet](https://github.com/facebook/prophet) - for advanced time series forecasting.



#### Structure: time series data.

- Time series data containing:
1) A linear trend.
2) Seasonality with a sinusoidal pattern.
3) Random noise to simulate variability.

Value = trend + seasonalty + noise

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set a random seed for reproducibility
np.random.seed(42)

# Generate a time range
n_periods = 1000  # Number of time steps
time = pd.date_range(start='2020-01-01', periods=n_periods, freq='D')  # Daily frequency

# Create components of time series
trend = np.linspace(10, 50, n_periods)  # Linear trend
seasonality = 10 * np.sin(2 * np.pi * time.dayofyear / 30)  # Monthly seasonality
noise = np.random.normal(0, 2, n_periods)  # Random noise

# Combine components to form the time series
data = trend + seasonality + noise

# Create a DataFrame
time_series = pd.DataFrame({'Date': time, 'Value': data})
time_series.set_index('Date', inplace=True)

# Plot the time series
plt.figure(figsize=(12, 6))
plt.plot(time_series, label='Synthetic Time Series')
plt.title('Synthetic Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

## **Exercise 1:** Decompose Time Series into Components
- Use additive decomposition to separate the time series into trend, seasonality, and residual components.
- Visualize the decomposition results.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Perform decomposition (additive model)
decomposition = seasonal_decompose(time_series, model='additive', period=30)

# Plot decomposition results
plt.figure(figsize=(12, 8))

plt.subplot(411)
plt.plot(time_series, label='Original')
plt.legend(loc='upper left')

plt.subplot(412)
plt.plot(decomposition.trend, label='Trend')
plt.legend(loc='upper left')

plt.subplot(413)
plt.plot(decomposition.seasonal, label='Seasonality')
plt.legend(loc='upper left')

plt.subplot(414)
plt.plot(decomposition.resid, label='Residuals')
plt.legend(loc='upper left')

plt.tight_layout()
plt.show()


## **Exercise 2:** Apply Moving Average Smoothing

- Use moving averages to smooth the time series data.
- Compare different window sizes for smoothing.

In [None]:
# Apply moving averages with different window sizes
time_series['MA_7'] = time_series['Value'].rolling(window=7).mean()
time_series['MA_30'] = time_series['Value'].rolling(window=30).mean()

# Plot original time series and smoothed versions
plt.figure(figsize=(12, 6))
plt.plot(time_series['Value'], label='Original', alpha=0.5)
plt.plot(time_series['MA_7'], label='7-Day Moving Average', color='red')
plt.plot(time_series['MA_30'], label='30-Day Moving Average', color='green')
plt.title('Moving Average Smoothing')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()


## **Exercise 3:** Forecasting Using Exponential Smoothing
- Apply Exponential Smoothing to forecast the next 30 days.
- Plot the forecast results.

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Apply Holt-Winters Exponential Smoothing
model = ExponentialSmoothing(time_series['Value'], trend='add', seasonal='add', seasonal_periods=30)
fit_model = model.fit()

# Forecast next 30 days
forecast = fit_model.forecast(steps=30)

# Plot the forecast
plt.figure(figsize=(12, 6))
plt.plot(time_series['Value'], label='Original', alpha=0.5)
plt.plot(forecast, label='Forecast', color='red')
plt.title('Forecasting with Exponential Smoothing')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()


## **Exercise 4:** Forecasting Using ARIMA
- Apply the ARIMA model to forecast future values.
- Compare the performance of ARIMA with Exponential Smoothing.

In [None]:
from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA model
arima_model = ARIMA(time_series['Value'], order=(2, 1, 2))  # ARIMA(p, d, q)
arima_fit = arima_model.fit()

# Forecast next 30 days
arima_forecast = arima_fit.forecast(steps=30)

# Plot the forecast
plt.figure(figsize=(12, 6))
plt.plot(time_series['Value'], label='Original', alpha=0.5)
plt.plot(arima_forecast, label='ARIMA Forecast', color='orange')
plt.title('Forecasting with ARIMA')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()


## **Exercise 5:** Analyze and Compare Forecasting Results
- Compare the forecasts obtained from:
1) Exponential Smoothing.
2) ARIMA.
- Evaluate forecast accuracy using:
1) Mean Absolute Error (MAE).
2) Root Mean Squared Error (RMSE).

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Actual future values (simulate for comparison)
actual_values = trend[-30:] + seasonality[-30:] + noise[-30:]

# Calculate MAE and RMSE for Exponential Smoothing
mae_exp = mean_absolute_error(actual_values, forecast)
rmse_exp = np.sqrt(mean_squared_error(actual_values, forecast))

# Calculate MAE and RMSE for ARIMA
mae_arima = mean_absolute_error(actual_values, arima_forecast)
rmse_arima = np.sqrt(mean_squared_error(actual_values, arima_forecast))

# Print results
print(f"Exponential Smoothing - MAE: {mae_exp:.2f}, RMSE: {rmse_exp:.2f}")
print(f"ARIMA - MAE: {mae_arima:.2f}, RMSE: {rmse_arima:.2f}")


## Consclusion:

We learned: 

- Understand the components of time series data: trend, seasonality, and noise.
- Perform decomposition to analyze these components.
- Apply methods like moving averages and exponential smoothing for forecasting.
- Use models like ARIMA or Prophet to predict future values.

This lab introduces students to time series decomposition, smoothing, and forecasting, providing them with practical skills to analyze trends and seasonality in time-dependent data.


