# **Electricity Price Forecasting**

**Author:** Milos Saric [https://saricmilos.com/]  
**Date:** December 10, 2025 - December 18th, 2025  
**Dataset:** Red Eléctrica — *[Spanish Electricity Market](https://www.ree.es/en/datos/generation)*  

---

This notebook explores Spanish electricity market datasets to build predictive models for forecasting tomorrow’s electricity prices using **machine learning** techniques. 
 
**Objective:**  
To develop an accurate **electricity price forecasting model** capable of anticipating day-ahead prices, helping energy producers, traders, and consumers make informed decisions.
The analysis will guide you through the complete data science workflow, including:

1. **Problem Definition** – Define the forecasting goal: predicting next-day electricity prices using historical data, demand patterns, and external factors (e.g., weather, holidays). Establish evaluation metrics such as **MAE, RMSE, MAPE**.  

2. **Data Collection & Overview** – Load and inspect datasets containing historical electricity prices, generation, demand, and weather information. Understand the structure, relationships, and key features.  

3. **Exploratory Data Analysis (EDA)** – Analyze price trends, daily/weekly patterns, seasonal effects, and correlations between electricity demand, weather, and prices. Visualize insights with plots.

4. **Data Preprocessing & Feature Engineering** – Clean data, handle missing values, create lag features, rolling averages, and encode categorical variables. Normalize features for model input.  

5. **Model Development** –  
- **Machine Learning Models:** Linear Regression, Random Forest, Gradient Boosting for price prediction.  

6. **Evaluation & Testing** – Assess model performance using metrics like MAE, RMSE, and MAPE. Compare approaches, visualize predicted vs actual prices, and interpret model results.  

7. **Deployment & Future Work** – Discuss strategies for deploying the forecasting model (via FastAPI) and potential improvements with more features or hybrid models.
---

# **About Electricity Price Forecasting Systems**

Electricity prices fluctuate rapidly throughout the day, influenced by factors such as **demand**, **weather conditions**, **fuel costs**, and the availability of **renewable energy sources** like wind and solar. Predicting these prices is essential for ensuring stability, optimizing energy usage, reducing costs, and enabling smarter grid management.

In modern energy systems, accurate forecasting plays a key role in helping both providers and consumers make informed decisions. Energy companies rely on forecasts to determine when to buy or sell electricity, while consumers and businesses can adjust consumption to take advantage of cheaper prices or avoid expensive peak periods.  

Because electricity **CAN NOT be stored easily**, unlike fossil fuels (gas,oil,coal etc.) it must be produced (or bought) the moment it is needed. This means supply and demand factors have an **immediate** impact on price, making forecasting a challenging yet vital task.

Effective electricity price prediction has several important applications:

- **Optimizing electricity storage** and battery usage.  
- **Enabling demand-side flexibility**, helping buildings reduce consumption during expensive periods (and increase during low or negative price periods).  
- **Reducing carbon emissions**, since electricity price often correlates with carbon intensity of generation.  
- **Supporting grid stability**, reducing strain during peak hours and balancing supply and demand efficiently.  

---

## **1. Problem Definition**

The first step is to establish a clear understanding of the forecasting challenge. This creates the foundation for the entire project and ensures all further work stays aligned with the primary objective.

### **Objective**
Develop an **electricity price forecasting system** that predicts **day-ahead electricity prices** using historical price data, generation and demand patterns, weather variables, and other relevant features.  
The system should generate accurate forecasts that support smarter decision-making for energy providers, grid operators, and end users.

### **Scope**
The analysis focuses on electricity market data, which includes:

- **Historical electricity prices** — day-ahead and intraday market values.  
- **Energy generation data** — renewable and non-renewable production levels.  
- **Weather data** — temperature, solar irradiance, wind speed, etc.  
- **Demand/load data** — total electricity consumption patterns.  

Predictions will be based on the dataset provided, with optional integration of external weather or market data in advanced stages.

### **Stakeholders**
- **Energy Providers / Traders:** Optimize buying and selling strategies to reduce risk and increase profitability.  
- **Grid Operators:** Improve load balancing and ensure efficient grid stability.  
- **Businesses & Industries:** Adjust operational schedules to lower electricity costs.  
- **Consumers:** Reduce electricity bills through smarter usage patterns.  
- **Data Scientists / ML Engineers:** Explore forecasting algorithms and improve model accuracy.  

### **Success Criteria**
A successful forecasting system should produce **accurate, reliable, and robust predictions**, assessed using metrics such as:

- **RMSE** — Root Mean Squared Error  
- **MAE** — Mean Absolute Error  
- **MAPE** — Mean Absolute Percentage Error  

These metrics measure how close the predicted prices are to real future prices.

> A clearly defined problem enables the development of a powerful and effective electricity price forecasting model that supports smarter, greener, and more efficient energy systems.

## **2. Data Collection**

The **Data Collection** phase focuses on gathering and preparing the datasets required to build and evaluate the electricity price forecasting models. This step also involves importing essential libraries, setting up the working environment, and organizing reusable functions to ensure a smooth analysis workflow.

## **Dataset Description**



---

We can begin by:
1. Loading each dataset (`energy_dataset`, `weather_features`,) individually.  
2. Performing exploratory data analysis (EDA) to understand distributions and missing values.  
3. Merging the datasets to form a unified view of user–book interactions.  
4. Building and evaluating different recommendation approaches. 

# Required Libraries Import

### Setting working paths

In [1]:
import sys
from pathlib import Path
from datetime import timedelta

In [2]:
# Go up one level from /notebooks to the main folder
project_root = Path.cwd().parent

In [3]:
sys.path.append(str(project_root))

### Core Libraries

In [4]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

### Modules

In [5]:
%load_ext autoreload
%autoreload 2

# Modules
from src.dataloader import load_all_csvs_from_folder
from src.missing_values import (
    missing_values_heatmap,
    missing_values_barchart,
    get_missing_value_summary
    )
from src.unique_values import (
    get_column_types,
    plot_number_of_unique_values,
    unique_values
    )

In [6]:
# Build the path to the data file
data_path = project_root / "data"


In [7]:
energy_dataset_test = pd.read_csv(data_path / "energy_dataset.csv")

In [8]:
datasets = load_all_csvs_from_folder(data_path,low_memory = False)

In [9]:
print(f"{datasets.keys()}")

dict_keys(['energy_dataset', 'weather_features'])


In [10]:
energy_df, weather_df = (datasets.get(key) for key in ["energy_dataset","weather_features"])

In [11]:
energy_df.head()

Unnamed: 0,time,generation biomass,generation fossil brown coal/lignite,generation fossil coal-derived gas,generation fossil gas,generation fossil hard coal,generation fossil oil,generation fossil oil shale,generation fossil peat,generation geothermal,...,generation waste,generation wind offshore,generation wind onshore,forecast solar day ahead,forecast wind offshore eday ahead,forecast wind onshore day ahead,total load forecast,total load actual,price day ahead,price actual
0,2015-01-01 00:00:00+01:00,447.0,329.0,0.0,4844.0,4821.0,162.0,0.0,0.0,0.0,...,196.0,0.0,6378.0,17.0,,6436.0,26118.0,25385.0,50.1,65.41
1,2015-01-01 01:00:00+01:00,449.0,328.0,0.0,5196.0,4755.0,158.0,0.0,0.0,0.0,...,195.0,0.0,5890.0,16.0,,5856.0,24934.0,24382.0,48.1,64.92
2,2015-01-01 02:00:00+01:00,448.0,323.0,0.0,4857.0,4581.0,157.0,0.0,0.0,0.0,...,196.0,0.0,5461.0,8.0,,5454.0,23515.0,22734.0,47.33,64.48
3,2015-01-01 03:00:00+01:00,438.0,254.0,0.0,4314.0,4131.0,160.0,0.0,0.0,0.0,...,191.0,0.0,5238.0,2.0,,5151.0,22642.0,21286.0,42.27,59.32
4,2015-01-01 04:00:00+01:00,428.0,187.0,0.0,4130.0,3840.0,156.0,0.0,0.0,0.0,...,189.0,0.0,4935.0,9.0,,4861.0,21785.0,20264.0,38.41,56.04


In [12]:
energy_df.columns

Index(['time', 'generation biomass', 'generation fossil brown coal/lignite',
       'generation fossil coal-derived gas', 'generation fossil gas',
       'generation fossil hard coal', 'generation fossil oil',
       'generation fossil oil shale', 'generation fossil peat',
       'generation geothermal', 'generation hydro pumped storage aggregated',
       'generation hydro pumped storage consumption',
       'generation hydro run-of-river and poundage',
       'generation hydro water reservoir', 'generation marine',
       'generation nuclear', 'generation other', 'generation other renewable',
       'generation solar', 'generation waste', 'generation wind offshore',
       'generation wind onshore', 'forecast solar day ahead',
       'forecast wind offshore eday ahead', 'forecast wind onshore day ahead',
       'total load forecast', 'total load actual', 'price day ahead',
       'price actual'],
      dtype='object')

In [13]:
weather_df.head()

Unnamed: 0,dt_iso,city_name,temp,temp_min,temp_max,pressure,humidity,wind_speed,wind_deg,rain_1h,rain_3h,snow_3h,clouds_all,weather_id,weather_main,weather_description,weather_icon
0,2015-01-01 00:00:00+01:00,Valencia,270.475,270.475,270.475,1001,77,1,62,0.0,0.0,0.0,0,800,clear,sky is clear,01n
1,2015-01-01 01:00:00+01:00,Valencia,270.475,270.475,270.475,1001,77,1,62,0.0,0.0,0.0,0,800,clear,sky is clear,01n
2,2015-01-01 02:00:00+01:00,Valencia,269.686,269.686,269.686,1002,78,0,23,0.0,0.0,0.0,0,800,clear,sky is clear,01n
3,2015-01-01 03:00:00+01:00,Valencia,269.686,269.686,269.686,1002,78,0,23,0.0,0.0,0.0,0,800,clear,sky is clear,01n
4,2015-01-01 04:00:00+01:00,Valencia,269.686,269.686,269.686,1002,78,0,23,0.0,0.0,0.0,0,800,clear,sky is clear,01n


In [14]:
weather_df.columns

Index(['dt_iso', 'city_name', 'temp', 'temp_min', 'temp_max', 'pressure',
       'humidity', 'wind_speed', 'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h',
       'clouds_all', 'weather_id', 'weather_main', 'weather_description',
       'weather_icon'],
      dtype='object')

In [15]:
for df in [energy_df, weather_df]:
    df.columns = (
        df.columns
        .str.strip()            # Remove leading/trailing whitespace
        .str.lower()            # Convert all column names to lowercase
        .str.replace(' ', '_', regex=False)  # Replace hyphens with underscores
    )

##  **3. Exploratory Data Analysis (EDA)**

Exploratory Data Analysis is all about **understanding the dataset**, uncovering patterns, spotting anomalies, and generating insights that will guide feature engineering and modeling.

In [16]:
energy_df.shape

(35064, 29)

In [17]:
weather_df.shape

(178396, 17)

In [18]:
shapes = pd.DataFrame({
    "Dataset": ["energy_df", "weather_df"],
    "Rows": [energy_df.shape[0], weather_df.shape[0]],
    "Columns": [energy_df.shape[1], weather_df.shape[1]]
})

print(shapes.to_string(index=False))

   Dataset   Rows  Columns
 energy_df  35064       29
weather_df 178396       17


# **3.1. Energy Dataset**

### **3.1.1. Basic Information**

In [19]:
original_energy_df = energy_df.copy()

In [20]:
energy_df.head()

Unnamed: 0,time,generation_biomass,generation_fossil_brown_coal/lignite,generation_fossil_coal-derived_gas,generation_fossil_gas,generation_fossil_hard_coal,generation_fossil_oil,generation_fossil_oil_shale,generation_fossil_peat,generation_geothermal,...,generation_waste,generation_wind_offshore,generation_wind_onshore,forecast_solar_day_ahead,forecast_wind_offshore_eday_ahead,forecast_wind_onshore_day_ahead,total_load_forecast,total_load_actual,price_day_ahead,price_actual
0,2015-01-01 00:00:00+01:00,447.0,329.0,0.0,4844.0,4821.0,162.0,0.0,0.0,0.0,...,196.0,0.0,6378.0,17.0,,6436.0,26118.0,25385.0,50.1,65.41
1,2015-01-01 01:00:00+01:00,449.0,328.0,0.0,5196.0,4755.0,158.0,0.0,0.0,0.0,...,195.0,0.0,5890.0,16.0,,5856.0,24934.0,24382.0,48.1,64.92
2,2015-01-01 02:00:00+01:00,448.0,323.0,0.0,4857.0,4581.0,157.0,0.0,0.0,0.0,...,196.0,0.0,5461.0,8.0,,5454.0,23515.0,22734.0,47.33,64.48
3,2015-01-01 03:00:00+01:00,438.0,254.0,0.0,4314.0,4131.0,160.0,0.0,0.0,0.0,...,191.0,0.0,5238.0,2.0,,5151.0,22642.0,21286.0,42.27,59.32
4,2015-01-01 04:00:00+01:00,428.0,187.0,0.0,4130.0,3840.0,156.0,0.0,0.0,0.0,...,189.0,0.0,4935.0,9.0,,4861.0,21785.0,20264.0,38.41,56.04


In [21]:
energy_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35064 entries, 0 to 35063
Data columns (total 29 columns):
 #   Column                                       Non-Null Count  Dtype  
---  ------                                       --------------  -----  
 0   time                                         35064 non-null  object 
 1   generation_biomass                           35045 non-null  float64
 2   generation_fossil_brown_coal/lignite         35046 non-null  float64
 3   generation_fossil_coal-derived_gas           35046 non-null  float64
 4   generation_fossil_gas                        35046 non-null  float64
 5   generation_fossil_hard_coal                  35046 non-null  float64
 6   generation_fossil_oil                        35045 non-null  float64
 7   generation_fossil_oil_shale                  35046 non-null  float64
 8   generation_fossil_peat                       35046 non-null  float64
 9   generation_geothermal                        35046 non-null  float64
 10

All of the data types makes sense except time which needs to be a datetime type and made the index. 

In [22]:
# Convert to datetime in UTC, then to Spain local time
energy_df['datetime'] = pd.to_datetime(energy_df['time'], utc=True).dt.tz_convert('Europe/Madrid')
energy_df.drop('time', axis=1, inplace=True)

If all our times are unique we can set our time columns as index.

In [24]:
energy_df["datetime"].nunique()

35064

In [25]:
unique_values(energy_df)

Unnamed: 0,Column,UniqueValues,TotalValues,UniquePercent
0,datetime,35064,35064,100.0
1,total_load_actual,15127,35064,43.14
2,total_load_forecast,14790,35064,42.18
3,generation_wind_onshore,11465,35064,32.7
4,forecast_wind_onshore_day_ahead,11332,35064,32.32
5,generation_fossil_gas,8297,35064,23.66
6,generation_fossil_hard_coal,7266,35064,20.72
7,generation_hydro_water_reservoir,7029,35064,20.05
8,price_actual,6653,35064,18.97
9,price_day_ahead,5747,35064,16.39


In [26]:
energy_df.set_index("datetime",inplace=True)

Sort rows with datetime index chronologically

In [27]:
energy_df.sort_index(inplace=True)

In [28]:
energy_df.head()

Unnamed: 0_level_0,generation_biomass,generation_fossil_brown_coal/lignite,generation_fossil_coal-derived_gas,generation_fossil_gas,generation_fossil_hard_coal,generation_fossil_oil,generation_fossil_oil_shale,generation_fossil_peat,generation_geothermal,generation_hydro_pumped_storage_aggregated,...,generation_waste,generation_wind_offshore,generation_wind_onshore,forecast_solar_day_ahead,forecast_wind_offshore_eday_ahead,forecast_wind_onshore_day_ahead,total_load_forecast,total_load_actual,price_day_ahead,price_actual
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-01 00:00:00+01:00,447.0,329.0,0.0,4844.0,4821.0,162.0,0.0,0.0,0.0,,...,196.0,0.0,6378.0,17.0,,6436.0,26118.0,25385.0,50.1,65.41
2015-01-01 01:00:00+01:00,449.0,328.0,0.0,5196.0,4755.0,158.0,0.0,0.0,0.0,,...,195.0,0.0,5890.0,16.0,,5856.0,24934.0,24382.0,48.1,64.92
2015-01-01 02:00:00+01:00,448.0,323.0,0.0,4857.0,4581.0,157.0,0.0,0.0,0.0,,...,196.0,0.0,5461.0,8.0,,5454.0,23515.0,22734.0,47.33,64.48
2015-01-01 03:00:00+01:00,438.0,254.0,0.0,4314.0,4131.0,160.0,0.0,0.0,0.0,,...,191.0,0.0,5238.0,2.0,,5151.0,22642.0,21286.0,42.27,59.32
2015-01-01 04:00:00+01:00,428.0,187.0,0.0,4130.0,3840.0,156.0,0.0,0.0,0.0,,...,189.0,0.0,4935.0,9.0,,4861.0,21785.0,20264.0,38.41,56.04


In [29]:
energy_df.tail()

Unnamed: 0_level_0,generation_biomass,generation_fossil_brown_coal/lignite,generation_fossil_coal-derived_gas,generation_fossil_gas,generation_fossil_hard_coal,generation_fossil_oil,generation_fossil_oil_shale,generation_fossil_peat,generation_geothermal,generation_hydro_pumped_storage_aggregated,...,generation_waste,generation_wind_offshore,generation_wind_onshore,forecast_solar_day_ahead,forecast_wind_offshore_eday_ahead,forecast_wind_onshore_day_ahead,total_load_forecast,total_load_actual,price_day_ahead,price_actual
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-12-31 19:00:00+01:00,297.0,0.0,0.0,7634.0,2628.0,178.0,0.0,0.0,0.0,,...,277.0,0.0,3113.0,96.0,,3253.0,30619.0,30653.0,68.85,77.02
2018-12-31 20:00:00+01:00,296.0,0.0,0.0,7241.0,2566.0,174.0,0.0,0.0,0.0,,...,280.0,0.0,3288.0,51.0,,3353.0,29932.0,29735.0,68.4,76.16
2018-12-31 21:00:00+01:00,292.0,0.0,0.0,7025.0,2422.0,168.0,0.0,0.0,0.0,,...,286.0,0.0,3503.0,36.0,,3404.0,27903.0,28071.0,66.88,74.3
2018-12-31 22:00:00+01:00,293.0,0.0,0.0,6562.0,2293.0,163.0,0.0,0.0,0.0,,...,287.0,0.0,3586.0,29.0,,3273.0,25450.0,25801.0,63.93,69.89
2018-12-31 23:00:00+01:00,290.0,0.0,0.0,6926.0,2166.0,163.0,0.0,0.0,0.0,,...,287.0,0.0,3651.0,26.0,,3117.0,24424.0,24455.0,64.27,69.88


In [None]:
energy_df.describe()

### **3.1.2. Missing Values Information**

In [None]:
energy_df.isna().sum()

In [None]:
na_counts_energy_df = energy_df.isna().sum().reset_index()
na_counts_energy_df.columns = ["Feature", "Missing Values"]

In [None]:
na_counts_energy_df

We can drop columns "generation_hydro_pumped_storage_aggregated" and "forecast_wind_offshore_eday_ahead"

In [None]:
energy_df.drop([
    "forecast_wind_offshore_eday_ahead",
    "generation_hydro_pumped_storage_aggregated"
], axis=1, inplace=True)

In [None]:
get_missing_value_summary(energy_df)

In [None]:
missing_values_barchart(energy_df,"Energy")

### **3.1.3. Unique Values Information**

In [None]:
cat_cols_energy, int_cols_energy, float_cols_energy = get_column_types(energy_df)

In [None]:
unique_values(energy_df)

In [None]:
plot_number_of_unique_values(energy_df,float_cols_energy,"Energy Dataset")

In [None]:
energy_df[[
    "generation_fossil_oil_shale",
    "generation_fossil_coal-derived_gas",
    "generation_geothermal",
    "generation_fossil_peat",
    "generation_marine",
    "generation_wind_offshore"
]].head()

We can drop these 6 columns because they have only one value which is 0.

In [None]:
energy_df.drop([
    "generation_fossil_oil_shale",
    "generation_fossil_coal-derived_gas",
    "generation_geothermal",
    "generation_fossil_peat",
    "generation_marine",
    "generation_wind_offshore"
],axis=1,inplace=True)

In [None]:
plot_number_of_unique_values(energy_df,energy_df.columns,"Energy Dataset")

In [None]:
energy_df.head()

# **3.2. Weather Dataset**

In [None]:
original_weather_df = weather_df.copy()