# Netherlands Electricity Price Prediction

## Abstract - todo

## Motivation

European energy prices became a critical issue following their sharp increase in 2022. During this period, energy and supply costs surged by 110% compared to 2020(Smal and Wieprow, 2023), forcing governments across the region to implement stimulus packages to avert a cost of living crisis. Although prices started to decline in 2023, the high inflation economic environment continues to pose a significant challenge for households.

Given the increased sensitivity to energy expenditure, there is a pressing need for households to have access to accurate energy price forecasts. These forecasts can help optimize energy consumption, potentially leading to substantial cost savings. Accurate predictions can enable consumers to make informed decisions about when to use energy-intensive appliances, thus reducing their overall expenditure.

The goal of this project is to develop a model that generates precise day-ahead energy price forecasts. By providing these forecasts through a user-friendly dashboard application, I aim to empower households to optimize their energy usage and mitigate the financial burden of fluctuating energy prices.

As a case study to demonstrate the model's accuracy, I generate forecasts for day ahead electricity prices in the Netherlands and compare the results for the proposed method against benchmarks found in the literature. 

## Literature Review

### Qualitative Drivers of European Energy Prices (2021-2023)

From a qualitative perspective, numerous factors influenced European energy prices between 2021 and 2023. Key reasons for price increases in 2021 and 2022, as outlined by Alvarez and Molnar (2021) and Bolton (2022), include:

1. COVID-19 Pandemic:
    1. The pandemic led to a significant decline in energy demand, which resulted in reduced energy supply.
    2. During the post-pandemic economic recovery, energy demand rebounded rapidly, outpacing the slower increase in supply.
2. Increases in Natural Gas and Coal Prices:
    1. Reduced gas supply from Russia following the conflict with Ukraine and supply-side constraints limited gas imports, driving up prices.
3. Unfavorable Weather Conditions:
    1. A long, cold winter in 2021 diminished gas reserves.
    2. Heatwaves in the summer of 2022 affected hydroelectric and nuclear power generation, increasing reliance on gas for electricity.
    3. Lower-than-expected wind generation in 2022 further strained the energy supply.

Energy prices started returning closer to historical levels in 2023 and 2024. Factors influencing this reversal, as discussed by Tertre, Martinez, and Rábago (2023) and Power Engineering International (2023), include:

1. Declining Natural Gas Prices:
    1. Diversification of energy sources led to a decrease in natural gas demand.
    2. The EU reduced its dependency on Russian natural gas from over 40% to below 10% in 2023.
    3. Natural gas storage levels were built up to historic highs.
    4. Historically low energy demand in 2023 followed the previous years' price surges.
2. Increased Renewable Energy Production:
    1. Improved nuclear availability in France.
    2. High renewable generation led to negative day-ahead prices.
   

### Electricity Price Forecasting Literature 

The volatile nature of electricity prices has made their modeling a core research area in the energy sector. The literature extensively documents factors influencing electricity prices, commonly including:

1. Historical electricity prices.
2. Historical grid load.
3. Historical generation capacity.
4. Historical residual demand.
5. Historical fuel prices.
6. Day-ahead grid load forecasts.
7. Day-ahead generation capacity.
8. Temporal and Calendar features: hour of day, day of week, month of year, holidays.
9. Weather features: temperature, wind speed, precipitation, etc.

In terms of methods, deep learning and machine learning are prominent for modeling short-term electricity prices. Lago, De Ridder, and De Schutter (2018) benchmarked 98 different models for forecasting spot electricity prices, showing that deep learning models, specifically deep neural networks, perform best overall. Machine learning-based methods generally outperform statistical methods. Similarly, studies by Keles, Scelle, Paraschiv, and Fichtner (2016) and Lago, Marcjasz, De Schutter, and Weron (2021) benchmarked various models for forecasting day-ahead electricity prices. Both studies concluded that deep learning models offer superior performance, though Lago et al. (2021) also highlighted the benefits of linear methods, such as competitive performance, lower computational requirements, and faster forecast generation.

Lago, De Ridder, Vrancx, and De Schutter (2018) investigated the effect of incorporating cross-market data from Belgium and France to forecast electricity prices for Belgium. They demonstrated that including French market data improved the accuracy of forecasting Belgium's electricity prices. Furthermore, they found that a dual-market forecasting model, which integrates data from both Belgium and France, can enhance predictive accuracy. This finding suggests that the features listed above should not only include local market data but also relevant data from neighboring markets to achieve better forecasting performance.


## Data

### Data Description

The following features were selected for modeling day-ahead electricity prices in the Netherlands, based on their recognition in the literature as meaningful explanatory variables:

**Energy Features:**
1. Day-ahead prices
2. Total load and day-ahead total load forecasts
3. Aggregated generation per type and the day-ahead forecast of the total aggregated generation
4. Unavailability of generation units
5. Day-ahead forecast of wind
6. Net physical cross-border flows
7. Total physical cross-border imports
8. Day-ahead generation forecasts for wind and solar

**Weather Features:**
1. Temperature
2. Wind speed
3. Pressure
4. Dew point
5. Humidity

**Calendar Features:**
1. Dutch holidays

Energy data was obtained from the ENTSO-E transparency platform, and comprehensive details of data items can be found in the “Detailed Data Descriptions” document by ENTSO-E (2022). Weather data were sourced from OpenWeatherMap, and Dutch holidays were retrieved using the Python package ‘holidays’.

### Data Processing

<div align="center">
    <img src="images\2020_2023_day_ahead_prices_nl.png" alt=" " width="800"/>
    <p><strong>Figure 1:</strong> Netherlands Day Ahead Electricity Prices 2020 - 2023.</p>
</div>

The data preprocessing involved several key steps to prepare the dataset for modeling:

**Data Cleaning**\
The data was initially cleaned by resampling to an hourly frequency where necessary, removing duplicate entries, and eliminating features with a majority of null or zero values. Outliers were identified and removed, and missing values were interpolated to ensure a complete dataset.

**Weather Data Aggregation**\
Weather data was obtained for seven Dutch cities: Almer, Amsterdam, Eindhoven, Groningen, Rotterdam, The Hague, and Utrecht. Naturally, weather data from these cities were highly correlated. To address this, correlated features were aggregated into single weighted features using electricity consumption as weights, as retrieved from Basanisi (2020).

**Price Transformation**\
Common processing steps in forecasting financial prices, such as converting raw prices into returns and applying a log-transformation, were not used here due to the potential for negative electricity prices. Although the Box-Cox transformation can handle negative values, it was not investigated in this study but could be a point for future work.

**Feature Engineering**\
Additional features were generated to better describe the target variable. Both the literature and figures 2 and 3 demonstrate hour-of-day and day-of-week seasonality, so these variables were included. Lagged features were also added to give the model more context. Optimal lags were selected by calculating the mutual information between lagged variables and the target variable, finding that variables lagged by one week added the most information. To provide information about the relative degree of the current time residual load compared to the residual load of the past hours, a relative load indicator (RLI) was calculated for lags of 24, 48, and 168 hours (Keles, Scelle, Paraschiv, Fichtner, 2016).

<div align="center">
    <img src="images\day_ahead_electricity_prices_by_year_and_month_nl.png" alt=" " width="1000"/>
    <p><strong>Figure 2:</strong> Netherlands Day Ahead Electricity by Year and Month.</p>
</div>

<div align="center">
    <img src="images\day_ahead_electricity_prices_by_year_and_day_of_week_nl.png" alt=" " width="1000"/>
    <p><strong>Figure 3:</strong> Netherlands Day Ahead Electricity Prices by Year and Day of Week.</p>
</div>

<div align="center">
    <img src="images\day_ahead_electricity_prices_by_year_and_hour_of_day_nl.png" alt=" " width="1000"/>
    <p><strong>Figure 4:</strong> Netherlands Day Ahead Electricity Prices by Year and Hour of Day.</p>
</div>

**Dummy Variables for Holidays**\
Dummy variables for Dutch public holidays were explored. To confirm significant differences in energy prices on holidays versus non-holidays, an A/B test was performed. The variance in electricity prices was very high from 2020-2024, which could actually be the cause for significant differences, not the holiday which is the effect being measured. To confirm that the significant differences were indeed due to the effect of the holiday, a second analysis was performed using data from 2016-2019. This secondary analysis confirmed the initial results. Consequently, dummy variables were included for all holidays except Koningsdag and Bevrijdingsdag, the latter being a public holiday only once every five years.

<div align="center">
    <img src="images\2016_2019_holiday_ab_test_nl.png" alt=" " width="900"/>
    <p><strong>Table 1:</strong> Netherlands Holiday A/B Test 2020 - 2023.</p>
</div>

<div align="center">
    <img src="images\2016_2019_holiday_ab_test_nl.png" alt=" " width="900"/>
    <p><strong>Table 2:</strong> Netherlands Holiday A/B Test Prices 2020 - 2024.</p>
</div>

**Data Splitting**\
The dataset was divided into training, validation, and testing sets as follows:

1. Training set: 2020-01-08 to 2022-09-10
2. Validation set: 2022-09-11 to 2023-05-12
3. Test set: 2023-05-20 to 2024-03-21

### Data Analysis

To test whether the time series are stationary, an augmented Dickey-Fuller test was performed. Most time series, except for some of the `clouds_all` features, rejected the null hypothesis of non-stationarity.

<br />

<div align="center">
    <img src="images\2020_2023_acf_pacf_nl.png" alt=" " width="900"/>
    <p><strong>Figure 5:</strong> Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots for Netherlands Day Ahead Electricity Prices.</p>
</div>

<br />

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots of the electricity prices are shown in Figure 5. The PACF plot indicates that lags in multiples of 24 are significant, demonstrating hour-of-day seasonality. This suggests that lagged price features are likely to be informative.

<br />

<div align="center">
    <img src="images/energy_corr.png" alt="Day Ahead Prices and Energy Features Pearson Correlation Matrix" width="900"/>
    <p><strong>Figure 6:</strong> Day Ahead Prices and Energy Features Pearson Correlation Matrix.</p>
</div>

<div align="center">
    <img src="images/weather_corr.png" alt="Day Ahead Prices and Weather Features Pearson Correlation Matrix" width="900"/>
    <p><strong>Figure 8:</strong> Day Ahead Prices and Weather Features Pearson Correlation Matrix.</p>
</div>

<br />

Day-ahead prices are released for the next day at 12:00 UTC. To have one full day of prices before the actual values are released, prices need to be forecasted 48 hours in advance. This is reflected in the data by generating a new leading variable `day_ahead_prices_lead_48`.

Pearson Correlation Matrices for Day Ahead Prices lead 48h with Energy Features and Weather Features are shown in Figures 6 and 7, respectively. Features with the highest absolute correlation include `generation_Fossil Hard coal` (0.42), `clouds_all_amsterdam` (-0.27), `wind_speed_weighted` (0.25), and `imports` (0.23).

<br />

<div align="center">
    <img src="images/mutual_information_1.png" alt="Day Ahead Prices lead 48 hours and Covariate Mutual Information" width="900"/>
    <p><strong>Figure 9:</strong> Day Ahead Prices lead 48 hours and Covariate Mutual Information.</p>
</div>


<div align="center">
    <img src="images/mutual_information.png" alt="Day Ahead Prices lead 48 hours and Covariate Mutual Information" width="900"/>
    <p><strong>Figure 10:</strong> Day Ahead Prices lead 48 hours and Select Lagged Covariate Mutual Information.</p>
</div>

<br />

A mutual information regression was performed between the covariates and the target to identify significant variables, with 
results presented in Figure 9. Several weather-related variables, such as `temperature_range` (temp_rng*), `visibility`, and `wind_direction` (wind_deg*), had low mutual information values, which aligns with intuitive expectations. These features were removed from the dataset. Similarly, most `clouds_all` features were removed except for clouds_all_amsterdam, which had one of the highest absolute correlations with the target variable.

A second mutual information regression, including lagged versions of the remaining variables, was conducted. A subset of these results is shown in Figure 10. Lags of 1, 2, 3, and 4 hours were applied, as well as daily lags from 1 to 7 days. Lags of 3 to 5 days generally had the highest mutual information. Given that the target is shifted by 48 hours, a lag of 5 days is particularly informative, as it contains information from the previous week. Weekly seasonality in electricity prices has already been highlighted, making this finding intuitive.


## Methodology

* Model Selection Rationale: Explain why you chose each model (Naive models, SARIMA, GAM, LightGBM). * * Discuss the advantages or suitability of each model for electricity price forecasting.
* Parameter Tuning: Discuss any parameters tuning done and the justification behind these choices.

## Results

* Comparative Analysis: Provide a detailed comparison of how each model performed against the others. * Include metrics such as RMSE, MAE, or others relevant to your analysis.
* Interpretation: Offer insights into what the results imply in the context of the aims set out in the Motivation section.

## System Design 

(backend and frontend)



## Conclusion

* Concise Summary: Recap the main findings and how they contribute to the field of electricity price forecasting.
* Limitations & Recommendations for Future Work: Discuss any constraints you encountered and suggest areas for further research or improvement in methodologies.


## Appendices
* Code Appendix: Include an appendix with snippets or links to code if possible, to enhance transparency and reproducibility.
* Data Sources: Cite all data sources used in your analysis.

## References

Alvarez, C.F. and Molnar, G., 2021. What is behind soaring energy prices and what happens next?.

Basanisi, L. (2020, June). Energy consumption of the Netherlands. Retrieved March 29, 2024 from https://www.kaggle.com/datasets/lucabasa/dutch-energy.

Bolton, R., 2022. Natural gas, rising prices and the electricity market.

ENTSO-E. Detailed Data Descriptions, Version 3, Release 3; 2022. Available at: ENTSO-E Detailed Data Descriptions (Accessed: March 25, 2024).

ENTSO-E transparency platform. Available at: ENTSO-E Transparency (Accessed: March 25, 2024).

holidays 0.50. Available at: https://pypi.org/project/holidays/ (Accessed: March 29, 2024).

Keles, D., Scelle, J., Paraschiv, F. and Fichtner, W., 2016. Extended forecast methods for day-ahead electricity spot prices applying artificial neural networks. Applied energy, 162, pp.218-230.

Lago, J., De Ridder, F. and De Schutter, B., 2018. Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Applied Energy, 221, pp.386-405.

Lago, J., De Ridder, F., Vrancx, P. and De Schutter, B., 2018. Forecasting day-ahead electricity prices in Europe: The importance of considering market integration. Applied energy, 211, pp.890-903.

Lago, J., Marcjasz, G., De Schutter, B. and Weron, R., 2021. Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark. Applied Energy, 293, p.116983

OpenWeather. Available at: https://openweathermap.org/ (Accessed: March 29, 2024).

Power Engineering International (2023) Record low European power demand in Q2 as renewables output hits new high, Power Engineering International. Available at: https://www.powerengineeringint.com/world-regions/europe/record-low-european-power-demand-in-q2-as-renewables-output-hits-new-high/ (Accessed: 03 June 2024). 

Smal, T. and Wieprow, J., 2023. Energy security in the context of global energy crisis: economic and financial conditions. Energies, 16(4), p.1605.

Tertre, M.G., Martinez, I. and Rábago, M.R., 2023. Reasons behind the 2022 energy price increases and prospects for next year.


## Technical Documentation

9. Technical Documentation:
* Model Documentation: Include documentation for each model implementation detailing the workflow, dependencies, and any necessary instructions to replicate the model.
* API Documentation: If your project has an API component, provide detailed API documentation.