In [None]:
# -*- coding: utf-8 -*-
"""
# Electricity Demand Forecasting

This Jupyter Notebook provides a comprehensive analysis and forecasting of monthly electricity consumption.
The goal is to forecast electricity demand for the next 1-2 years using various time series models,
evaluate their performance, and select the best model for future demand estimation.

## Data Description
The dataset `Electricity Consumption.csv` contains:
- `DATE`: Month and Year (e.g., 1/1/1973)
- `Electricty_Consumption_in_TW`: Electricity consumption in Trillion Watts

## Objectives
1.  **Data Preprocessing:** Clean and prepare the time series data.
2.  **Model Comparison:** Implement and compare:
    -   Decomposition Model
    -   Exponential Smoothing (Holt-Winters) Model
    -   SARIMA Model
3.  **Error Metrics Calculation:** Compute RMSE, RMSPE, and MAPE for model validation.
4.  **Demand Estimation:** Provide monthly demand forecasts for the next 1-2 years.
5.  **Model Selection:** Justify the choice of the best-performing model.
6.  **Visualization:** Plot historical data and forecasts.
"""

# %% [markdown]
"""
## 1. Import Libraries and Load Data
First, we import all necessary libraries and load the `Electricity Consumption.csv` file into a pandas DataFrame.
"""

# %%
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.holtwinters import ExponentialSmoothing # Using ExponentialSmoothing for ETS-like functionality
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error
import numpy as np
import matplotlib.pyplot as plt
import warnings

# Suppress warnings for cleaner output, especially from model convergence
warnings.filterwarnings("ignore")

# Load the dataset
# Make sure 'Electricity Consumption.csv' is in the same directory as this notebook
try:
    df = pd.read_csv(r"C:\Users\DELL\OneDrive\Desktop\assignment\Electricity Consumption.csv")
    print("Data loaded successfully.")
except FileNotFoundError:
    print("Error: 'Electricity Consumption.csv' not found. Please ensure the file is in the correct directory.")
    # Exit or handle the error appropriately if the file is crucial
    exit()

# %% [markdown]
"""
## 2. Data Preprocessing
We will convert the 'DATE' column to datetime objects and set it as the DataFrame index.
We also rename the consumption column for easier access.
"""

# %%
# Convert 'DATE' column to datetime objects
df['DATE'] = pd.to_datetime(df['DATE'])

# Set 'DATE' as the DataFrame index
df.set_index('DATE', inplace=True)

# Rename the electricity consumption column for clarity
df.rename(columns={'Electricty_Consumption_in_TW': 'Electricity Consumption'}, inplace=True)

# Display the first few rows and information about the DataFrame
print("DataFrame after preprocessing:")
print(df.head())
print("\nDataFrame Info:")
df.info()

# %% [markdown]
"""
## 3. Split Data into Training and Testing Sets
To evaluate model performance, we split the data into a training set (up to December 2017)
and a testing set (January 2018 onwards). The test set will be used to validate our models
before making future predictions.
"""

# %%
# Define the end date for the training data
train_end_date = '2017-12-01'

# Split the data
train_data = df.loc[:train_end_date]
# The test data starts from the month immediately following the training data's last month
test_data = df.loc[train_data.index.max() + pd.DateOffset(months=1):]

print(f"Training data period: {train_data.index.min().strftime('%Y-%m')} to {train_data.index.max().strftime('%Y-%m')}")
print(f"Test data period: {test_data.index.min().strftime('%Y-%m')} to {test_data.index.max().strftime('%Y-%m')}")
print(f"Number of training observations: {len(train_data)}")
print(f"Number of test observations: {len(test_data)}")

# Define the forecast period for future predictions (next 1-2 years from December 2019)
# The data ends in December 2019, so we forecast from January 2020 to December 2021.
forecast_start_date = pd.to_datetime('2020-01-01')
forecast_end_date = pd.to_datetime('2021-12-01')
forecast_index = pd.date_range(start=forecast_start_date, end=forecast_end_date, freq='MS')

print(f"\nFuture forecast period: {forecast_index.min().strftime('%Y-%m')} to {forecast_index.max().strftime('%Y-%m')}")
print(f"Number of future forecast steps: {len(forecast_index)}")

# %% [markdown]
"""
## 4. Model Implementation and Evaluation

We will implement three different time series forecasting models:
1.  **Decomposition Model:** Separates the time series into trend, seasonal, and residual components.
2.  **Exponential Smoothing (Holt-Winters) Model:** A popular method for data with trend and seasonality.
3.  **SARIMA (Seasonal AutoRegressive Integrated Moving Average) Model:** A powerful model that handles both non-seasonal and seasonal components.

For each model, we will:
-   Train it on the `train_data`.
-   Generate forecasts for the `test_data` period to evaluate its accuracy.
-   Calculate RMSE, MAPE, and RMSPE as error metrics.
-   Generate forecasts for the future `forecast_index` period.
"""

# %% [markdown]
"""
### 4.1. Decomposition Model
This model breaks down the time series into its constituent components: trend, seasonality, and residual.
We then forecast these components separately and combine them. For simplicity, the trend is forecasted
using its last known value, and seasonality is repeated.
"""

# %%
# Perform additive decomposition (visually inspected to be appropriate for this data)
decomposition = seasonal_decompose(train_data['Electricity Consumption'], model='additive', period=12)

# Get the trend and seasonal components
trend = decomposition.trend.dropna()
seasonal = decomposition.seasonal.dropna()

# Forecast trend using the last known value (simple approach)
last_trend_value = trend.iloc[-1]
# Get the last 12 seasonal values to repeat for future forecasts
last_seasonal_values = seasonal[-12:]

# Extend trend for the future forecast period
trend_forecast_decomposition = pd.Series(last_trend_value, index=forecast_index)

# Extend seasonal component by repeating the last year's seasonality
seasonal_forecast_decomposition = pd.Series(
    np.tile(last_seasonal_values, (len(forecast_index) // 12) + 1)[:len(forecast_index)],
    index=forecast_index
)

# Combine trend and seasonal forecasts for the future period
decomposition_forecast = trend_forecast_decomposition + seasonal_forecast_decomposition

# Evaluate Decomposition Model on the test data
# Create a specific index for the test period for decomposition forecast evaluation
test_forecast_index_decomposition = pd.date_range(start=test_data.index.min(), end=test_data.index.max(), freq='MS')

# Extend trend and seasonal components for the test period for evaluation
trend_test_decomposition = pd.Series(last_trend_value, index=test_forecast_index_decomposition)
seasonal_test_decomposition = pd.Series(
    np.tile(last_seasonal_values, (len(test_forecast_index_decomposition) // 12) + 1)[:len(test_forecast_index_decomposition)],
    index=test_forecast_index_decomposition
)
test_decomposition_forecast_values = trend_test_decomposition + seasonal_test_decomposition

# Calculate error metrics for Decomposition Model
rmse_decomposition = np.sqrt(mean_squared_error(test_data['Electricity Consumption'], test_decomposition_forecast_values))
mape_decomposition = mean_absolute_percentage_error(test_data['Electricity Consumption'], test_decomposition_forecast_values) * 100
rmspe_decomposition = np.sqrt(np.mean(np.square(((test_data['Electricity Consumption'] - test_decomposition_forecast_values) / test_data['Electricity Consumption'])))) * 100

print(f"Decomposition Model - RMSE: {rmse_decomposition:.3f}, MAPE: {mape_decomposition:.3f}%, RMSPE: {rmspe_decomposition:.3f}%")


# %% [markdown]
"""
### 4.2. Exponential Smoothing (Holt-Winters) Model
This model is suitable for time series with both trend and seasonality. We use an additive trend and additive seasonality.
"""

# %%
# Initialize and fit the Exponential Smoothing model
# seasonal_periods=12 for monthly data
# trend='add' and seasonal='add' are chosen based on visual inspection of the data
exp_smoothing_model = ExponentialSmoothing(train_data['Electricity Consumption'],
                                          seasonal_periods=12,
                                          trend='add',
                                          seasonal='add',
                                          initialization_method="estimated")
exp_smoothing_fit = exp_smoothing_model.fit()

# Generate future forecasts
exp_smoothing_forecast = exp_smoothing_fit.forecast(steps=len(forecast_index))
exp_smoothing_forecast.index = forecast_index

# Evaluate Exponential Smoothing Model on the test data
exp_smoothing_test_forecast = exp_smoothing_fit.forecast(steps=len(test_data))
exp_smoothing_test_forecast.index = test_data.index

# Calculate error metrics for Exponential Smoothing Model
rmse_exp_smoothing = np.sqrt(mean_squared_error(test_data['Electricity Consumption'], exp_smoothing_test_forecast))
mape_exp_smoothing = mean_absolute_percentage_error(test_data['Electricity Consumption'], exp_smoothing_test_forecast) * 100
rmspe_exp_smoothing = np.sqrt(np.mean(np.square(((test_data['Electricity Consumption'] - exp_smoothing_test_forecast) / test_data['Electricity Consumption'])))) * 100

print(f"Exponential Smoothing Model - RMSE: {rmse_exp_smoothing:.3f}, MAPE: {mape_exp_smoothing:.3f}%, RMSPE: {rmspe_exp_smoothing:.3f}%")


# %% [markdown]
"""
### 4.3. SARIMA Model
SARIMA models are highly flexible and can capture complex patterns. We use a commonly effective order
for monthly data: `(1,1,1)` for the non-seasonal part and `(1,1,0,12)` for the seasonal part.
Optimal parameters can be found using `pmdarima.auto_arima` for real-world applications.
"""

# %%
# Define SARIMA orders (p,d,q) and seasonal orders (P,D,Q,s)
# (1,1,1) for non-seasonal: AR(1), I(1) - one differencing, MA(1)
# (1,1,0,12) for seasonal: Seasonal AR(1), Seasonal I(1) - one seasonal differencing, Seasonal MA(0), Period=12 (monthly)
sarima_order = (1, 1, 1)
sarima_seasonal_order = (1, 1, 0, 12)

# Initialize and fit the SARIMA model
sarima_model = SARIMAX(train_data['Electricity Consumption'],
                       order=sarima_order,
                       seasonal_order=sarima_seasonal_order,
                       enforce_stationarity=False, # Set to False to allow non-stationary models if needed
                       enforce_invertibility=False) # Set to False for robustness
sarima_fit = sarima_model.fit(disp=False) # disp=False to suppress optimization messages

# Generate future forecasts
sarima_forecast = sarima_fit.forecast(steps=len(forecast_index))
sarima_forecast.index = forecast_index

# Evaluate SARIMA Model on the test data
sarima_test_forecast = sarima_fit.forecast(steps=len(test_data))
sarima_test_forecast.index = test_data.index

# Calculate error metrics for SARIMA Model
rmse_sarima = np.sqrt(mean_squared_error(test_data['Electricity Consumption'], sarima_test_forecast))
mape_sarima = mean_absolute_percentage_error(test_data['Electricity Consumption'], sarima_test_forecast) * 100
rmspe_sarima = np.sqrt(np.mean(np.square(((test_data['Electricity Consumption'] - sarima_test_forecast) / test_data['Electricity Consumption'])))) * 100

print(f"SARIMA Model - RMSE: {rmse_sarima:.3f}, MAPE: {mape_sarima:.3f}%, RMSPE: {rmspe_sarima:.3f}%")


# %% [markdown]
"""
## 5. Compare Models and Select the Best

We compile the error metrics from all models and identify the one with the lowest RMSE.
"""

# %%
# Create a DataFrame to compare error metrics
metrics = {
    'Model': ['Decomposition', 'Exponential Smoothing', 'SARIMA'],
    'RMSE': [rmse_decomposition, rmse_exp_smoothing, rmse_sarima],
    'MAPE (%)': [mape_decomposition, mape_exp_smoothing, mape_sarima],
    'RMSPE (%)': [rmspe_decomposition, rmspe_exp_smoothing, rmspe_sarima]
}
metrics_df = pd.DataFrame(metrics)

print("Error Metrics for Each Model:")
print(metrics_df.to_markdown(index=False))

# Select the best model based on the lowest RMSE
best_model_name = metrics_df.loc[metrics_df['RMSE'].idxmin()]['Model']
print(f"\nSelected Model: {best_model_name}")

# Assign the forecast from the best model to a variable
best_forecast = None
if best_model_name == 'Decomposition':
    best_forecast = decomposition_forecast
elif best_model_name == 'Exponential Smoothing':
    best_forecast = exp_smoothing_forecast
elif best_model_name == 'SARIMA':
    best_forecast = sarima_forecast

# %% [markdown]
"""
## 6. Demand Estimation for Next 1-2 Years

Here are the monthly electricity demand estimations for the next 1-2 years (January 2020 to December 2021)
using the selected best model.
"""

# %%
print("\nDemand Estimation for next 1-2 years (monthly basis):")
print(best_forecast.to_markdown(numalign="left", stralign="left"))

print(f"\nReason for selection: The {best_model_name} model exhibited the lowest RMSE (Root Mean Squared Error) among the evaluated models, indicating that its predictions are closest to the actual values on average.")

# %% [markdown]
"""
## 7. Visualization of Forecasts

Finally, we plot the historical data, actual test data, and the forecasts from all models
to visually inspect their performance and the projected demand.
"""

# %%
plt.figure(figsize=(15, 7))
plt.plot(train_data.index, train_data['Electricity Consumption'], label='Training Data', color='blue')
plt.plot(test_data.index, test_data['Electricity Consumption'], label='Actual Test Data', color='orange')
plt.plot(decomposition_forecast.index, decomposition_forecast, label='Decomposition Forecast', linestyle='--', color='green')
plt.plot(exp_smoothing_forecast.index, exp_smoothing_forecast, label='Exponential Smoothing Forecast', linestyle='-.', color='red')
plt.plot(sarima_forecast.index, sarima_forecast, label='SARIMA Forecast', linestyle=':', color='purple')

plt.title('Electricity Consumption Forecast (1-2 Years)', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Electricity Consumption (Trillion Watts)', fontsize=12)
plt.legend(fontsize=10)
plt.grid(True)
plt.tight_layout()
plt.show()

# %%

Electricity Demand Forecasting Result Report
1. Introduction
This report details the process and outcomes of forecasting monthly electricity consumption for the next 1-2 years (January 2020 to December 2021), based on historical data from January 1973 to December 2019. The primary objective is to provide actionable insights for managing electricity production by accurately predicting future demand.

2. Methodology
The forecasting process involved the following key steps:

2.1. Data Preprocessing
The provided Electricity Consumption.csv dataset was loaded. The 'DATE' column was converted to datetime objects and set as the DataFrame's index. The 'Electricty_Consumption_in_TW' column was renamed to 'Electricity Consumption' for clarity.

2.2. Data Splitting
The historical data was split into training and testing sets to enable robust model validation.

Training Data: January 1973 to December 2017.

Testing Data: January 2018 to December 2019.
This split allowed us to evaluate how well each model generalized to unseen data before making future predictions.

2.3. Model Selection and Implementation
Three different time series forecasting models were implemented and compared:

Decomposition Model: This approach involves decomposing the time series into its underlying components: trend, seasonality, and residuals. The forecast is then generated by extending the trend (using the last known value) and repeating the seasonal pattern. An additive model was used based on the visual characteristics of the data.

Exponential Smoothing (Holt-Winters) Model: This model is well-suited for time series exhibiting both trend and seasonality. An additive trend and additive seasonality were specified, with a seasonal period of 12 (due to monthly data).

SARIMA (Seasonal AutoRegressive Integrated Moving Average) Model: A powerful and flexible model capable of capturing complex non-seasonal and seasonal dependencies. The model orders used were (1,1,1) for the non-seasonal part (AR, I, MA) and (1,1,0,12) for the seasonal part (Seasonal AR, Seasonal I, Seasonal MA, Seasonal Period).

2.4. Error Metrics
The performance of each model was evaluated using the following error metrics on the test set:

RMSE (Root Mean Squared Error): Measures the average magnitude of the errors. It is sensitive to large errors.

MAPE (Mean Absolute Percentage Error): Expresses the accuracy as a percentage of the error. It is scale-independent and easy to interpret.

RMSPE (Root Mean Squared Percentage Error): Similar to MAPE but penalizes larger errors more heavily due to squaring.

3. Results and Model Comparison
The error metrics calculated for each model on the test data are as follows:

Model

RMSE

MAPE (%)

RMSPE (%)

Decomposition

5.392

4.419

5.073

Exponential Smoothing

3.549

2.566

3.666

SARIMA

3.671

2.570

3.711

4. Selected Model and Justification
Based on the evaluation metrics, the Exponential Smoothing (Holt-Winters) Model was selected as the best-performing model for forecasting electricity demand.

Reason for Selection:
The Exponential Smoothing model exhibited the lowest RMSE (3.549) among all evaluated models. A lower RMSE indicates that the model's predictions are, on average, closer to the actual values, signifying higher accuracy and better fit to the underlying patterns in the electricity consumption data. While MAPE and RMSPE were also competitive, RMSE served as the primary selection criterion for its direct measure of prediction error magnitude.

5. Demand Estimation for Next 1-2 Years
Using the selected Exponential Smoothing model, the monthly electricity demand for the next 1-2 years (January 2020 to December 2021) is estimated as follows:

Date

Electricity Consumption (Trillion Watts)

2020-01-01

109.106

2020-02-01

102.298

2020-03-01

95.507

2020-04-01

89.625

2020-05-01

94.756

2020-06-01

109.989

2020-07-01

120.495

2020-08-01

120.016

2020-09-01

107.418

2020-10-01

94.241

2020-11-01

92.707

2020-12-01

102.681

2021-01-01

110.611

2021-02-01

103.802

2021-03-01

97.011

2021-04-01

91.129

2021-05-01

96.261

2021-06-01

111.494

2021-07-01

122.000

2021-08-01

121.521

2021-09-01

108.923

2021-10-01

95.746

2021-11-01

94.212

2021-12-01

104.185

6. Conclusion
The Exponential Smoothing model provides reliable monthly forecasts for electricity demand for the next two years. These estimations can be valuable for strategic planning, resource allocation, and operational management of electricity production to meet anticipated demand. Further refinements could include incorporating external factors (e.g., temperature, economic indicators) and exploring more advanced models or ensemble methods for even greater accuracy.