# GridForecast: Electricity Demand Predictor

GridForecast is an end-to-end electricity demand forecasting system built on real-time ENTSO-E operational data, combining time series analytics (ARIMA, SARIMA, Prophet, LSTM), ensemble modeling, automated monitoring, and continuous retraining to enable data-driven grid operations and energy trading decisions.

From insights derived from temporal demand patterns and renewable energy variability, actionable forecasting models were developed and deployed via an interactive web application. The application provides a dashboard interface for grid operators and energy traders to visualize 24-hour demand predictions, confidence intervals, and real-time model performance to support operational planning and risk mitigation.

## Dataset

The objective of the project is to predict the Electricity Demand in Germany. So we need to collect the data using API from the ENTSO-E Transparency Platform. 
ENTSO-E Transparency Platform gives access to electricity generation, transportation, and consumption data for the pan-European market. It is having the details from the countries like Austria, Belgium, Switzerland, Denmark, Germany, Spain, France, UK, Italy, Ireland, Luxembourg, the Netherlands, Norway, Portugal, and Sweden. 

In order to get access to API, we need to register in the website - 
[https://transparency.entsoe.eu/](https://transparency.entsoe.eu/). 

For complete details, kindly refer the link - 
[https://transparencyplatform.zendesk.com/hc/en-us/articles/12845911031188-How-to-get-security-token](https://transparencyplatform.zendesk.com/hc/en-us/articles/12845911031188-How-to-get-security-token). 

How to fetch the dataset is given in detail in the link - [https://transparencyplatform.zendesk.com/hc/en-us/articles/15696643163924-Request-Methods](https://transparencyplatform.zendesk.com/hc/en-us/articles/15696643163924-Request-Methods). 


Apart from this traditional methods, they are providing the library to fetch the data using API - [https://github.com/EnergieID/entsoe-py](https://github.com/EnergieID/entsoe-py). 

We will be using this library because of convenience. 

## Import libraries and setup

In [6]:
import os
from pathlib import Path
from dotenv import load_dotenv
import pandas as pd
from datetime import date, timedelta
import requests
from entsoe import EntsoePandasClient

## Loading the API Key 

In [7]:
load_dotenv()
API_KEY = os.getenv('ENTSOE_API_KEY')

if not API_KEY:
    print("ERROR: ENTSOE_API_KEY not found")

else:
    print(f"API Key loaded successfully")


API Key loaded successfully


# Fetching data

In [None]:
class GermanyElectricityDownloader:
    def __init__(self, api_key):
        """
        Initialize with ENTSO-E API key
        

        - DE_50HZ for load and generation data (has complete coverage)
        - DE_LU for day-ahead prices (only zone with price data)

        """
        self.client = EntsoePandasClient(api_key=api_key)
        

        self.load_gen_zone = 'DE_50HZ'      # For load and generation data
        self.price_zone = 'DE_LU'            # For day-ahead prices
        self.time_zone = 'Europe/Berlin'

    
    def download_load_data(self, start_date, end_date):
        """Download actual electricity load (consumption) in MW"""
        print(f"Downloading load data ({self.load_gen_zone}) for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            load_data = self.client.query_load(self.load_gen_zone, start=start, end=end)
            
            print(f"Downloaded {len(load_data)} load records")
            return load_data
        except Exception as e:
            print(f"Error downloading load data: {e}")
            return None
    
    def download_load_forecast(self, start_date, end_date):
        """Download day-ahead load forecast in MW"""
        print(f"Downloading load forecast ({self.load_gen_zone}) for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            forecast_data = self.client.query_load_forecast(self.load_gen_zone, start=start, end=end)
            print(f"Downloaded {len(forecast_data)} forecast records")
            return forecast_data
        except Exception as e:
            print(f"Load forecast not available: {type(e).__name__}")
            return None
    
    def download_generation_data(self, start_date, end_date):
        """Download actual generation by source type"""
        print(f"Downloading generation data ({self.load_gen_zone}) for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            gen_data = self.client.query_generation(
                self.load_gen_zone, 
                start=start, 
                end=end, 
                psr_type=None
            )

            print(f"Downloaded {len(gen_data)} generation records")
            return gen_data
        except Exception as e:
            print(f"Generation data not available: {type(e).__name__}")
            return None
    
    def download_wind_solar_forecast(self, start_date, end_date):
        """Download wind and solar generation forecast"""
        print(f"Downloading wind/solar forecast ({self.load_gen_zone}) for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            ws_forecast = self.client.query_wind_and_solar_forecast(
                self.load_gen_zone,
                start=start,
                end=end
            )

            print(f"Downloaded {len(ws_forecast)} wind/solar forecast records")
            return ws_forecast
        except Exception as e:
            print(f"Wind/solar forecast not available: {type(e).__name__}")
            return None
    
    def download_day_ahead_prices(self, start_date, end_date):
        """Download day-ahead electricity prices in EUR/MWh
        
        """
        print(f"Downloading day-ahead prices ({self.price_zone}) for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            prices = self.client.query_day_ahead_prices(
                self.price_zone,
                start=start,
                end=end
            )

            print(f"✓ Downloaded {len(prices)} price records")
            return prices
        except Exception as e:
            print(f"Day-ahead prices not available: {type(e).__name__}")
            return None
    
    def download_generation_per_type(self, start_date, end_date):
        """Download generation broken down by type (Wind, Solar, Nuclear, etc.)"""
        print(f"Downloading generation per type ({self.load_gen_zone}) for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            gen_per_type = self.client.query_generation_per_plant(
                self.load_gen_zone,
                start=start,
                end=end
            )

            print(f"Downloaded {len(gen_per_type)} generation-per-type records")
            return gen_per_type
        except Exception as e:
            print(f"Generation per type not available: {type(e).__name__}")
            return None
    
    def download_crossborder_flows(self, start_date, end_date):
        """Download cross-border flows to neighboring countries"""
        print(f"Downloading cross-border flows for {start_date} to {end_date}")
        
        try:
            start = pd.Timestamp(start_date, tz=self.time_zone)
            end = pd.Timestamp(end_date, tz=self.time_zone)
            
            # Germany's neighbors 
            neighbors = ['FR', 'NL', 'BE', 'DK_1', 'PL', 'CZ', 'AT', 'CH']
            flows_data = {}
            
            for neighbor in neighbors:
                try:
                    flows = self.client.query_crossborder_flows(
                        self.load_gen_zone, 
                        neighbor, 
                        start=start, 
                        end=end
                    )

                    flows_data[neighbor] = flows
                    print(f"  Downloaded cross-border flows to {neighbor}: {len(flows)} records")
                except Exception as e:
                    print(f"  Could not download flows to {neighbor}: {type(e).__name__}")
            
            return flows_data
        except Exception as e:
            print(f"Error with cross-border flows: {e}")
            return None
    
    
    def download_all(self, start_date, end_date):
        """Download complete dataset for demand prediction"""

        print("Downloading the data")

        
        # Download core data
        load = self.download_load_data(start_date, end_date)
        forecast = self.download_load_forecast(start_date, end_date)
        generation = self.download_generation_data(start_date, end_date)
        gen_per_type = self.download_generation_per_type(start_date, end_date)
        wind_solar = self.download_wind_solar_forecast(start_date, end_date)
        prices = self.download_day_ahead_prices(start_date, end_date)
        flows = self.download_crossborder_flows(start_date, end_date)
          
        return {
            'load': load,
            'load_forecast': forecast,
            'generation': generation,
            'generation_per_type': gen_per_type,
            'wind_solar_forecast': wind_solar,
            'day_ahead_prices': prices,
            'crossborder_flows': flows
        }


    def create_combined_dataset(self,start_date, end_date ):
        """Combine ALL datasets into one CSV"""
    
        print("Creating combined dataset...")


        data_dict = self.download_all(start_date, end_date)
        

        load = data_dict.get('load')
        forecast = data_dict.get('load_forecast')
        generation = data_dict.get('generation')
        wind_solar = data_dict.get('wind_solar_forecast')
        prices = data_dict.get('day_ahead_prices')
        flows = data_dict.get('crossborder_flows')
        
        # Check if load exists before proceeding
        if load is None or load.empty:
            print("No load data found!")
            return None

        
        # Start with load as base 
        combined = pd.DataFrame({'load': load.squeeze()})
        
        # Add other time-series 
        if forecast is not None:
            combined['load_forecast'] = forecast.squeeze()
        
        if prices is not None:
            combined['price'] = prices
        
        # Add generation columns 
        if generation is not None:
            for col in generation.columns:
                combined[f'generation_{col}'] = generation[col]
        
        # Add wind/solar columns
        if wind_solar is not None:
            for col in wind_solar.columns:
                combined[f'forecast_{col}'] = wind_solar[col]
        
        # Add aggregated cross-border flows
        if flows is not None:
            for neighbor, flow_data in flows.items():
                combined[f'flow_to_{neighbor}'] = flow_data.squeeze()

        return combined
    


def main():
    downloader = GermanyElectricityDownloader(api_key=API_KEY)

    project_root = Path.cwd().parent
    output_dir = project_root / 'artifacts/raw'
    os.makedirs(output_dir, exist_ok=True)

    csv_filename = 'power_consumption_germany.csv'
    csv_file_path_str = str(output_dir / csv_filename)

    if os.path.isfile(csv_file_path_str):
        # File exists — check the last date in the CSV
        existing_df = pd.read_csv(csv_file_path_str, index_col=0, parse_dates=True)
        existing_df.index = pd.to_datetime(existing_df.index, utc=True).tz_convert('Europe/Berlin')

        last_date = existing_df.index.max().date()
        today = date.today()
        yesterday = today - timedelta(days=1)

        if last_date >= yesterday:
            print(f"CSV is already up to date (last record: {last_date}). Nothing to download.")
            return

        # Data is stale — download missing days from day after last record to yesterday
        fetch_start = last_date + timedelta(days=1)
        fetch_end = yesterday + timedelta(days=1)  # ENTSO-E end is exclusive

        print(f"CSV exists but is outdated. Fetching missing data: {fetch_start} to {yesterday}")

        new_data = downloader.create_combined_dataset(
            start_date=fetch_start.strftime('%Y%m%d'),
            end_date=fetch_end.strftime('%Y%m%d')
        )

        if new_data is not None and not new_data.empty:
            updated_df = pd.concat([existing_df, new_data])
            updated_df = updated_df[~updated_df.index.duplicated(keep='last')]  # safety dedup
            updated_df.to_csv(csv_file_path_str)
            print(f"Updated CSV with {len(new_data)} new rows. Total rows: {len(updated_df)}")
        else:
            print("No new data was returned from the API.")

    else:
        # File does not exist — do the full historical backfill
        start = pd.Timestamp('20150101', tz='Europe/Berlin')
        end = pd.Timestamp(date.today().strftime('%Y%m%d'), tz='Europe/Berlin')

        batch = 0
        all_data = []

        while start < end:
            batch += 1
            batch_end = start + timedelta(days=30)
            print(f"Downloading batch {batch}: {start.date()} to {batch_end.date()}")

            data = downloader.create_combined_dataset(
                start_date=start.strftime('%Y%m%d'),
                end_date=batch_end.strftime('%Y%m%d')
            )

            if data is not None and not data.empty:
                all_data.append(data)

            start += timedelta(days=30)

        if all_data:
            final_df = pd.concat(all_data)
            final_df = final_df[~final_df.index.duplicated(keep='last')]
            final_df.to_csv(csv_file_path_str)
            print(f"\nSuccess! Saved {len(final_df)} rows to {csv_file_path_str}")
        else:
            print("No data was downloaded.")


if __name__ == "__main__":
    main()

Downloading batch 1: 2015-01-01 to 2015-01-31
Creating combined dataset...
Downloading the data
Downloading load data (DE_50HZ) for 20150101 to 20150131
Downloaded 2880 load records
Downloading load forecast (DE_50HZ) for 20150101 to 20150131
Downloaded 2880 forecast records
Downloading generation data (DE_50HZ) for 20150101 to 20150131
Downloaded 2880 generation records
Downloading generation per type (DE_50HZ) for 20150101 to 20150131
Downloaded 624 generation-per-type records
Downloading wind/solar forecast (DE_50HZ) for 20150101 to 20150131
Downloaded 2880 wind/solar forecast records
Downloading day-ahead prices (DE_LU) for 20150101 to 20150131
Day-ahead prices not available: NoMatchingDataError
Downloading cross-border flows for 20150101 to 20150131
  Could not download flows to FR: NoMatchingDataError
  Could not download flows to NL: NoMatchingDataError
  Could not download flows to BE: NoMatchingDataError
  Could not download flows to DK_1: NoMatchingDataError
  Downloaded cros