In [7]:
# Libraries
#nbconvert:hide_input
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense, Dropout
from keras.optimizers import Adam

import pandas as pd
from matplotlib import pyplot as plt
from sklearn.preprocessing import StandardScaler
import seaborn as sns

import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.ensemble import IsolationForest

import plotly.graph_objects as go
from plotly.subplots import make_subplots

from sktime.forecasting.model_selection import (
    CutoffSplitter,
    ExpandingWindowSplitter,
    SingleWindowSplitter,
    SlidingWindowSplitter,
    temporal_train_test_split,
)
from sktime.utils.plotting import plot_series
from sktime.forecasting.base import ForecastingHorizon

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, cross_validate, train_test_split
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import StandardScaler

from statsmodels.nonparametric.smoothers_lowess import lowess
from statsmodels.tsa.seasonal import DecomposeResult, seasonal_decompose

from sklearn.metrics import mean_squared_error

from datetime import datetime, timedelta
import requests

import math

## Power Price Prediction - a short-term forecast
#### Partner company: **Slalom Consulting** 
---------------------
#### UBC-MDS(2022-2023) Capstone Project

<div class="container">
    <div class="left-column">
        <img src="report_pic1.png" alt="Alt Text" width="350">
    </div>
    <div class="right-column">
        
- **Team:**
    - Arjun Radhakrishnan
    - Sneha Sunil
    - Gaoxiang Wang
    - Mehdi Naji
- **Mentor:**
    - Quan Nguyen
- **Partner's Agents:**
    - Zaid Haddad
    - Hayley Boyce
    </div>
</div>

<style>
.container {
    display: flex;
}

.left-column {
    flex: 3; 
}

.right-column {
    flex: 5; 
}
</style>


## EXECUTIVE SUMMARY

This project aimed to predict power pool prices in Alberta, forecasting 12 hours ahead. After conducting extensive data analysis and exploring various statistical and machine learning models, we successfully met our objectives. Through rigorous evaluation, we found that the LightGBM model performed significantly well in terms of both accuracy and interpretability, making it the preferred choice. We also developed a Tableau dashboard to visualize real-time pool price data for the previous and forecasted 12 hours, including a confidence interval and the most significant features involved in each forecasting step.

This project served as the capstone for the Master of Data Science program at the University of British Columbia, in collaboration with Slalom Consulting. It showcases our ability to address real-world energy market challenges using advanced data analysis and machine learning techniques. Our model and dashboard provide reliable tools for energy market participants to make informed decisions regarding power pool prices in Alberta. Further improvements can be made by incorporating additional data sources and exploring advanced machine learning techniques.

## INTRODUCTION


### Project Overview
<!--- 
- Has an overview of the project been provided, such as the problem domain, project origin, and related datasets or input data?
- Has enough background information been given so that an uninformed reader would understand the problem domain and following problem statement? 
--->
Over the past years, the electricity price evolution in Alberta has exhibited a mix of trends and volatility. One notable trend has been the overall increase in electricity prices, driven by a combination of factors. Alberta's transition towards cleaner energy sources, such as wind and solar, has led to additional costs associated with infrastructure development and integration. 

As [Figure 1](#PriceTrendSD-image) shows, at least since 2016, the average and standard deviation of electricity prices in Alberta have been evidently rising. This upward trend and increasing volatility can be attributed to multiple factors. The growing demand for electricity, driven by population growth and expanding industries, has strained the power infrastructure, resulting in tighter supply-demand dynamics and higher prices. Additionally, government policies favoring cleaner energy sources have necessitated significant investments in infrastructure and new power generation facilities, passing on the costs to consumers. The integration of intermittent renewable energy sources has introduced price volatility, while the maintenance and upgrades of aging transmission and distribution infrastructure have further contributed to the rising prices. Furthermore, fluctuations in fuel costs, such as natural gas, impact the overall cost of electricity generation in Alberta. Collectively, these factors have contributed to the evident increase in power prices in the province.

<figure style="text-align: center;" id="PriceTrendSD-image">
  <img src="report_PriceTrendSD.png" alt="Alt Text">
  <figcaption style="text-align: center; font-size: 14px;">Figure 1. Power pool price developments in Alberta in terms of mean and standard deviation.</figcaption>
</figure>

Another essential factor contributing to the rising power prices volatility in Alberta is the deregulated nature of the electricity market. In Alberta, the electricity market operates under a deregulated framework, where prices are determined by supply and demand dynamics in a competitive market. The government does not directly intervene in setting the prices. Instead, prices are influenced by various market participants, including power generators, transmission companies, and retailers. This market structure allows for more flexibility but also exposes prices to market forces. Therefore, the deregulated nature of the electricity market in Alberta plays a crucial role in the volatility of the power price. 

The prediction of electricity prices in Alberta is of great importance to a wide range of stakeholders. Consumers rely on price forecasts to manage their energy costs and make informed decisions about their electricity usage. Businesses, particularly energy-intensive industries, need accurate price predictions to effectively plan their operations and minimize the impact of price fluctuations. Investors and financial institutions depend on price forecasts to assess investment opportunities and manage financial risks. Government bodies and regulatory authorities require reliable predictions to develop effective energy policies and ensure grid stability. Market participants, including power generators, traders, and retailers, rely on price forecasts to optimize their operations and mitigate risks in the dynamic electricity market. With the increasing uncertainty caused by rising price volatility, accurate electricity price predictions play a crucial role in enabling stakeholders to navigate the market, make informed choices, and contribute to a sustainable energy landscape in Alberta.

<figure style="text-align: center;" id="report_aeso_generators_image">
  <img src="report_aeso_generators.png" alt="Alt Text">
  <figcaption style="text-align: center; font-size: 14px;">Figure 2. Power share generated by fuel sources at 2022.</figcaption>
</figure>

The major purpose of this project is to enhance the horizon and accuracy of the existing 6-hour-ahead price prediction offered by the Alberta Electric System Operator (AESO). AESO is the independent system operator responsible for operating the electrical grid and facilitating the competitive electricity market in Alberta. This report begins with a problem statement, defining the objective and identifying the metrics used to evaluate the prediction performance. The analysis section includes data exploration, exploratory data analysis, examination of different algorithms and techniques, and establishing a benchmark for comparison. The methodology section outlines the steps taken, including data cleaning, wrangling, and preprocessing, as well as the implementation and refinement of the predictive models. The results section focuses on model evaluation and validation, assessing the accuracy and performance of the developed models. Finally, the report addresses deployment and deliverables, discussing the practical implementation and the final outcomes of the project.






### Problem Statement
<!---
- Is the problem statement clearly defined? Will the reader understand what you are expecting to solve?
- Have you thoroughly discussed how you will attempt to solve the problem?
- Is an anticipated solution clearly defined? Will the reader understand what results you are looking for?
--->

This project aims to predict the power pool price in Alberta with a time horizon of 12 hours ahead. Currently, the Alberta Electric System Operator (AESO) provides and publishes a 6-hour-ahead prediction on an hourly basis. However, there is a need to extend the time horizon of this prediction and if possible enhance its accuracy.  

### Data Exploration
<!---
- If a dataset is present for this problem, have you thoroughly discussed certain features about the dataset? Has a data sample been provided to the reader?
- If a dataset is present for this problem, are statistics about the dataset calculated and reported? Have any relevant results from this calculation been discussed?
- If a dataset is not present for this problem, has discussion been made about the input space or input data for your problem?
- Are there any abnormalities or characteristics about the input space or dataset that need to be addressed? (categorical variables, missing values, outliers, etc.) --->

## DATA SCIENCE METHODS

### Exploratory Data Analysis and Visualization
<!---
- Have you visualized a relevant characteristic or feature about the dataset or input data?
- Is the visualization thoroughly analyzed and discussed?
- If a plot is provided, are the axes, title, and datum clearly defined? --->


### Algorithms and Techniques
<!---
- Are the algorithms you will use, including any default variables/parameters in the project clearly defined?
- Are the techniques to be used thoroughly discussed and justified?
- Is it made clear how the input data or datasets will be handled by the algorithms and techniques chosen?--->

### Benchmark
<!---
- Has some result or value been provided that acts as a benchmark for measuring performance?
- Is it clear how this result or value was obtained (whether by data or by hypothesis)?--->


### Metrics
<!---
- Are the metrics you’ve chosen to measure the performance of your models clearly discussed and defined?
- Have you provided reasonable justification for the metrics chosen based on the problem and solution? --->


# DATA PRODUCTS AND RESULTS

### Data Cleaning, Wrangling, and Preprocessing
<!---
- If the algorithms chosen require preprocessing steps like feature selection or feature transformations, have they been properly documented?
- Based on the Data Exploration section, if there were abnormalities or characteristics that needed to be addressed, have they been properly corrected?
- If no preprocessing is needed, has it been made clear why?--->

### Implementation
<!---
- Is it made clear how the algorithms and techniques were implemented with the given datasets or input data?
- Were there any complications with the original metrics or techniques that required changing prior to acquiring a solution?
- Was there any part of the coding process (e.g., writing complicated functions) that should be documented? --->

### Refinement
<!---
- Has an initial solution been found and clearly reported?
- Is the process of improvement clearly documented, such as what techniques were used?
- Are intermediate and final solutions clearly reported as the process is improved?--->

### Model Evaluation and Validation
<!---
- Is the final model reasonable and aligning with solution expectations? Are the final parameters of the model appropriate?
- Has the final model been tested with various inputs to evaluate whether the model generalizes well to unseen data?
- Is the model robust enough for the problem? Do small perturbations (changes) in training data or the input space greatly affect the results?
- Can results found from the model be trusted?--->

### Justification
<!---
- Are the final results found stronger than the benchmark result reported earlier?
- Have you thoroughly analyzed and discussed the final solution?
- Is the final solution significant enough to have solved the problem?--->

## CONCLUSIONS AND RECOMMENDATIONS

### Limitations

While this project succeeded in meeting its scientific objectives, it is important to acknowledge some limitations that influenced both the process and outcomes:

**Price is complex**

Price is determined when markets clear, i.e. when demand and supply match. Any factor affecting either the supply or demand side can potentially impact the price dynamics. Therefore, price is by no means a simple identity. Power prices in a liberalized market like Alberta are also influenced by numerous interconnected factors. While this project's machine learning pipeline accounted for several key variables, accurately predicting power prices requires considering a wide range of factors, including unexpected events and unusual circumstances. For example, factors such as unforeseen failures, interruptions, or disruptions in power generation can significantly impact supply and subsequently affect prices. Furthermore, economic downturns or booms, and even fluctuations in the stock market can all contribute to power price fluctuations through various mechanisms. Apparently, in the short duration of this project, It was not possible to include all of the considerable factors into the model, or even identify them all. 

**High volatility means high uncertainty** 

The power price data in Alberta exhibited a lack of clear patterns or trends, making it challenging to develop accurate predictions. Also, both average and variance of the power price has been consistantly increasing in past few years. This upward trend in both the average price and the volatility of prices adds another layer of complexity to the forecasting task. The seasonal influences were intertwined with extreme price volatility, further increasing the uncertainty. As a result, accurately forecasting power prices beyond a short-term window remains difficult. Also, as we extended our forecast window to 12 hours, the inherent uncertainty in predicting power prices became more pronounced.

**Data availability, a real binding constraint**

The availability of reliable and comprehensive data posed significant challenges for our project, particularly due to the requirement of predicting power prices on an hourly basis in real time. The availability of timely and accurate data is crucial for the success of our forecasting model.

Even when we identified valuable features with significant associations to price, obtaining them in an hourly and real-time format proved to be a formidable task. The continuous and up-to-date data required to feed into our model was not readily accessible. As a result, we had to rely on a combination of historical data obtained from Tableau and real-time data obtained through APIs for some specific features. This reliance on multiple data sources along with limited availability of data for many features, resulted in a patchwork of data, which introduced complexities and potential limitations to the accuracy and robustness of our model.

**Striking the Balance: the Accuracy-Explainability Tradeoff**

Another intriguing aspect we encountered was the classic tug-of-war between accuracy and interpretability. We all know they often pull in opposite directions. Striking a balance between these two was an ongoing challenge in this project. Simpler models like ARIMA provided better interpretability but lacked the sophistication to handle the complexity of this project's problem. On the other hand, more intricate models offered higher accuracy but were often considered black-box models, which compromised interpretability. We opted for the LightGBM model, which provided a middle ground, but achieving the ideal balance remains an area of exploration. Navigating the accuracy-explainability tradeoff does not have a straightforward recipe, as there are numerous potential models that can still be explored.

**Features with known future values are not in yet** 

In this project, the selected model paradigm relied on predicting power prices without reliable knowledge of future features for the next 12 hours. This limitation arose due to the unavailability of data specifically pertaining to those features. However, it is worth noting that the future values, either forecasted or actual, for certain related features can be accessible. 

By incorporating a model paradigm that utilizes features with known future values, there is a potential to enhance the accuracy of the forecasting performance of this project. These related features can provide valuable insights into the dynamics of the power market, contributing to a more comprehensive and precise prediction of power prices. By leveraging available information on forecasted or actual values in the coming hours, we can potentially improve the model's forecasting capabilities and capture short-term variations in power prices more accurately.