# GridForecast: Electricity Demand Predictor

GridForecast is an end-to-end electricity demand forecasting system built on real-time ENTSO-E operational data, combining time series analytics (ARIMA, SARIMA, Prophet, LSTM), ensemble modeling, automated monitoring, and continuous retraining to enable data-driven grid operations and energy trading decisions.

From insights derived from temporal demand patterns and renewable energy variability, actionable forecasting models were developed and deployed via an interactive web application. The application provides a dashboard interface for grid operators and energy traders to visualize 24-hour demand predictions, confidence intervals, and real-time model performance to support operational planning and risk mitigation.

## Dataset

The objective of the project is to predict the Electricity Demand in Germany. So we need to collect the data using API from the ENTSO-E Transparency Platform. 
ENTSO-E Transparency Platform gives access to electricity generation, transportation, and consumption data for the pan-European market. It is having the details from the countries like Austria, Belgium, Switzerland, Denmark, Germany, Spain, France, UK, Italy, Ireland, Luxembourg, the Netherlands, Norway, Portugal, and Sweden. 

In order to get access to API, we need to register in the website - 
[https://transparency.entsoe.eu/](https://transparency.entsoe.eu/). 

For complete details, kindly refer the link - 
[https://transparencyplatform.zendesk.com/hc/en-us/articles/12845911031188-How-to-get-security-token](https://transparencyplatform.zendesk.com/hc/en-us/articles/12845911031188-How-to-get-security-token). 

How to fetch the dataset is given in detail in the link - [https://transparencyplatform.zendesk.com/hc/en-us/articles/15696643163924-Request-Methods](https://transparencyplatform.zendesk.com/hc/en-us/articles/15696643163924-Request-Methods). 


Apart from this traditional methods, they are providing the library to fetch the data using API - [https://github.com/EnergieID/entsoe-py](https://github.com/EnergieID/entsoe-py). 

We will be using this library because of convenience. 

## Import libraries and setup

In [1]:
import os
from pathlib import Path
from dotenv import load_dotenv
import pandas as pd
from datetime import date, timedelta
import time 
import requests
from entsoe import EntsoePandasClient
from dateutil.relativedelta import relativedelta

## Loading the API Key 

In [2]:
# Loading the API key from env file 

load_dotenv()
API_KEY = os.getenv('ENTSOE_API_KEY')

if not API_KEY:
    print("ERROR: ENTSOE_API_KEY not found")

else:
    print(f"API Key loaded successfully")


API Key loaded successfully


# Fetching data

In [3]:
# Class for downloading the dataset from the website using API. 

class GermanyElectricityDownloader:
    def __init__(self, api_key):
        """
        Initialize with ENTSO-E API key 
        """
        self.client = EntsoePandasClient(api_key=api_key)        
        self.time_zone = 'Europe/Berlin'
        self.country_code_old = 'DE_AT_LU'  # 2015 - 2018
        self.country_code_new = 'DE_LU'     # 2019 onwards

    def _get_country_code(self, year):
        """Return the appropriate country code based on the year."""
    
        return self.country_code_old if year <= 2018 else self.country_code_new

    def _fetch_with_retry(self, name, func, retries=3, delay=5):
        """Fetch a single query with retry logic."""
        for attempt in range(1, retries + 1):
            # Retrying the fetching to bypass the temporary network/API failure
            try:
                # print(f"Fetching: {name} (attempt {attempt})")
                df = func()
                time.sleep(1)  
                return df
            except Exception as e:
                if attempt == retries:
                    print(f"Failed to fetch '{name}' after {retries} attempts: {e}")
                    return None
                print(f"Attempt {attempt} failed for '{name}': {e}. Retrying in {delay}s...")
                time.sleep(delay)
        return None

    
    def download_all(self, start_date, end_date):
        """Download complete dataset for demand prediction
        
        Args:
            start_date : Starting date for which data need to be fetched. 
            end_date : Ending date for which data need to be fetched. 
        Return: 
            Combined DataFrame will returned by concatenating all the DataFrame generated. 
        """
        

        print(f"Downloading complete dataset for demand prediction from {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone) + pd.Timedelta(minutes=30)

            start_year = start.year
            end_year = end.year

            all_year_dfs = []

            for year in range(start_year, end_year + 1):
                country_code = self._get_country_code(year)
                year_start = max(
                        pd.Timestamp(f'{year}-01-01', tz=self.time_zone),
                        start
                    )
                year_end = min(
                        pd.Timestamp(f'{year+1}-01-01', tz=self.time_zone),
                        end
                    )

                # Code from https://github.com/EnergieID/entsoe-py library for downloading the data via API
                time_series_queries = {
                    "load": lambda: self.client.query_load(
                        country_code, start=year_start, end=year_end),

                    "wind_and_solar_forecast": lambda: self.client.query_wind_and_solar_forecast(
                        country_code, start=year_start, end=year_end, psr_type=None),

                    "intraday_wind_and_solar_forecast": lambda: self.client.query_intraday_wind_and_solar_forecast(
                        country_code, start=year_start, end=year_end, psr_type=None),

                    "generation": lambda: self.client.query_generation(
                        country_code, start=year_start, end=year_end, psr_type=None),

                    "import": lambda: self.client.query_import(
                        country_code, start=year_start, end=year_end),

                    "generation_import": lambda: self.client.query_generation_import(
                        country_code, year_start, year_end),
                }


                dfs = []

                for name, func in time_series_queries.items():
                    df = self._fetch_with_retry(name, func)

                    if df is None:
                        continue

                    df = df.sort_index()
                    if df.index.tz is None:
                        df.index = df.index.tz_localize("UTC").tz_convert(self.time_zone)
                    else:
                        df.index = df.index.tz_convert(self.time_zone)

                    if isinstance(df.columns, pd.MultiIndex):
                        df.columns = ["_".join(col).strip() for col in df.columns]

                    # Adding prefix so we get separate column names 
                    df = df.add_prefix(f"{name}_")

                    dfs.append(df)

                if dfs:
                    year_df = pd.concat(dfs, axis=1, sort=False)
                    all_year_dfs.append(year_df)

            if not all_year_dfs:
                return None

            # Combining all DataFrame to one DataFrame 
            final_df = pd.concat(all_year_dfs)
            final_df = final_df[~final_df.index.duplicated(keep='last')]



            print(f"Data Frame created with shape : {final_df.shape}")
            return final_df
        except ConnectionError as e:
            print(f"Network error: {e}")
            return None

        except pd.errors.EmptyDataError as e:
            print(f"No data returned from API: {e}")
            return None

        except Exception as e:
            print(f"Unexpected error downloading data: {e}")
            raise

    def update_data(self): 
        """Check the availability of the dataset if available update the missing and if not download the complete historical data."""

        
        # Creating the artifacts folder for saving the csv file. 
        project_root = Path.cwd().parent
        output_dir = project_root / 'artifacts/raw'
        os.makedirs(output_dir, exist_ok=True)

        # Defining the csv name 
        csv_filename = 'power_consumption_germany.csv'
        csv_file_path_str = str(output_dir / csv_filename)

        # Once the data is loaded from the website using API and when we are updating the new data we don't want to download the data again. 
        # So the check is created for checking the data and then downloading the data which pending for update. 
        if os.path.isfile(csv_file_path_str):
            # File exists then check the last date in the CSV
            existing_df = pd.read_csv(csv_file_path_str, index_col=0, parse_dates=True)
            existing_df.index = pd.to_datetime(existing_df.index, utc=True).tz_convert('Europe/Berlin')

            last_date = existing_df.index.max().date()
            today = date.today()
            yesterday = today - timedelta(days=1)

            if last_date >= yesterday:
                print(f"CSV is already up to date (last record: {last_date}). Nothing to download.")
                return

            # Download missing days from day after last record to yesterday
            fetch_start = last_date + timedelta(days=1)
            fetch_end = yesterday 

            print(f"CSV exists but is outdated. Fetching missing data: {fetch_start} to {yesterday}")

            new_data = self.download_all(
                start_date=fetch_start.strftime('%Y%m%d'),
                end_date=fetch_end.strftime('%Y%m%d')
            )
            # Adding the Data with already available data
            if new_data is not None and not new_data.empty:
                updated_df = pd.concat([existing_df, new_data])
                updated_df = updated_df[~updated_df.index.duplicated(keep='last')]  # safety dedup
                updated_df.to_csv(csv_file_path_str)
                print(f"Updated CSV with {len(new_data)} new rows. Total rows: {len(updated_df)}")
            else:
                print("No new data was returned from the API.")

        else:
            # File does not exist then do the full historical backfill
            start = pd.Timestamp('20150101', tz=self.time_zone)
            end = pd.Timestamp(date.today().strftime('%Y%m%d'), tz=self.time_zone)

            batch = 0
            all_data = []
            
            # Downloading the data in batches to avoid API failures 
            while start < end:
                batch += 1
                batch_end = min(start + relativedelta(years=1), end)
                print(f"Downloading batch {batch}: {start.date()} to {batch_end.date()-timedelta(days=1)}")
                
                data = self.download_all(
                    start_date=start.strftime('%Y%m%d'),
                    end_date=(batch_end - timedelta(days=1)).strftime('%Y%m%d')
                )
                
                if data is not None and not data.empty:
                    all_data.append(data)
                
                start = batch_end

            # Adding the full data to created a combined final dataset 
            if all_data:
                final_df = pd.concat(all_data)
                final_df = final_df[~final_df.index.duplicated(keep='last')]
                final_df.to_csv(csv_file_path_str)
                print(f"\nSuccess! Saved {len(final_df)} rows.")
            else:
                print("No data was downloaded.")



def main():
    downloader = GermanyElectricityDownloader(api_key=API_KEY)

    data = downloader.update_data()


if __name__ == "__main__":
    main()

Downloading batch 1: 2015-01-01 to 2016-01-01
Downloading complete dataset for demand prediction from 20150101 to 20151231
Fetching: load (attempt 1)
Fetching: wind_and_solar_forecast (attempt 1)
Fetching: intraday_wind_and_solar_forecast (attempt 1)
Fetching: generation (attempt 1)
Fetching: import (attempt 1)
Fetching: generation_import (attempt 1)
Data Frame created with shape : (34947, 81)
Downloading batch 2: 2016-01-01 to 2017-01-01
Downloading complete dataset for demand prediction from 20160101 to 20161231
Fetching: load (attempt 1)
Fetching: wind_and_solar_forecast (attempt 1)
Fetching: intraday_wind_and_solar_forecast (attempt 1)
Fetching: generation (attempt 1)
Fetching: import (attempt 1)
Fetching: generation_import (attempt 1)
Data Frame created with shape : (35043, 81)
Downloading batch 3: 2017-01-01 to 2018-01-01
Downloading complete dataset for demand prediction from 20170101 to 20171231
Fetching: load (attempt 1)
Fetching: wind_and_solar_forecast (attempt 1)
Fetching: 