# Hybrid Renewable Energy Forecasting and Trading Competition
Author: George Panagiotou

# Hornsea 1 Wind Farm Data Processing

## Overview

The main function of this notebook is to read the availability messages provided for the Hornsea 1 wind farm by REMIT (https://bmrs.elexon.co.uk/remit).

The Hornsea 1 wind farm is comprised of three balancing mechanism units: T_HOWAO-1, T_HOWAO-2, and T_HOWAO-3.

The data are provided in JSON format, and the challenge here is to transform them into a static DataFrame while combining them with the energy data provided by the competition.

## REMIT Message Format

Every REMIT message has the following format: (Example)

```json
[
  {
    "id": 64337,
    "dataset": "REMIT",
    "mrid": "11XDONG-PT-----2-NGET-RMT-00001018",
    "revisionNumber": 2,
    "publishTime": "2020-11-26T13:53:00Z",
    "createdTime": "2020-11-26T13:53:00Z",
    "messageType": "UnavailabilitiesOfElectricityFacilities",
    "messageHeading": "REMIT Information",
    "eventType": "Production unavailability",
    "unavailabilityType": "Planned",
    "participantId": "DONG013",
    "registrationCode": "11XDONG-PT-----2",
    "assetId": "T_HOWAO-1",
    "assetType": "Production",
    "affectedUnit": "HOWAO-1",
    "affectedUnitEIC": "48W00000HOWAO-1M",
    "affectedArea": "B7",
    "biddingZone": "10YGB----------A",
    "fuelType": "Wind Offshore",
    "normalCapacity": 400,
    "availableCapacity": 0,
    "unavailableCapacity": 400,
    "eventStatus": "Active",
    "eventStartTime": "2020-11-26T09:00:00Z",
    "eventEndTime": "2020-11-26T18:00:00Z",
    "cause": "Planned Outage",
    "relatedInformation": "HOW01 Z11 Dry run interlink test",
    "outageProfile": [
      {
        "startTime": "2020-11-26T09:00:00Z",
        "endTime": "2020-11-26T18:00:00Z",
        "capacity": 0
      }
    ]
  }
]


## Key Information
The information that we care about includes:

Publish time,
End time,
Available capacity

## Objective
The main objective is to create three new training features (one for each balancing mechanism) and merge them with the energy data. Additionally, we perform filtering based on the publish time of the messages.

For the training set, we used all available information regardless of the publish time. However, for the test set, we must use only the information based on the publish time. Sometimes, messages contain unavailabilities that occurred before the publish time, which in real life, we would not be able to use. Additionally, some messages contain information about events occurring during the day, so if the message for the day-ahead market times came after the submission time (9:20), we are not able to use it.

In [None]:
import pandas as pd
import json
import os

In [11]:
# Load the two CSV files
#The first set contains the energy data from 2020/09/20 to 2024/01/18
energy_data1 = pd.read_csv("data/HEFTdata/Energy_Data_20200920_20240118.csv")
#The first set contains the energy data from 2024/01/19 to 2024/05/19
energy_data2 = pd.read_csv("data/HEFTdata/Energy_Data_20240119_20240519.csv")
# Combine the DataFrames
energy_data = pd.concat([energy_data1, energy_data2])
energy_data.to_hdf('data/combined/energy_data_20200920_20240519.h5', key='df', mode='w')

## Combine all the Availability messages of T-HAWAO-1

From the link above the maximum limit of offline data that you can download is 1 year and thus we have to combine 4 sets of data to allign with the data of HEFTcompetition.

In [12]:
# List of JSON file paths
# These data have been downloaded 
json_files = [
    'data/T_Hawao/T_Hawao-1/T_HAWAO-1-2020-09-20-2021-05-21.json',
    'data/T_Hawao/T_Hawao-1/T_HAWAO-1-2021-05-20-2022-05-21.json',
    'data/T_Hawao/T_Hawao-1/T_HAWAO-1-2022-05-20-2023-05-21.json',
    'data/T_Hawao/T_Hawao-1/T_HOWAO-1-2023-05-20-2024-05-20.json'
]

# Initialize an empty list to store combined data
combined_data = []

# Read and combine each JSON file
for file_path in json_files:
    with open(file_path, 'r') as file:
        data = json.load(file)
        combined_data.extend(data)

# Remove duplicates based on 'id' field
unique_data = {entry['id']: entry for entry in combined_data}
combined_unique_data = list(unique_data.values())

# Save the combined data into a new JSON file
output_file_path = 'data/T_Hawao/T_Hawao-1/combined_T_HAWAO-1.json'
with open(output_file_path, 'w') as output_file:
    json.dump(combined_unique_data, output_file, indent=4)

print(f"Combined JSON data saved to {output_file_path}")


Combined JSON data saved to data/T_Hawao/T_Hawao-1/combined_T_HAWAO-1.json


## Combine all the Availability messages of T-HAWAO-2

In [13]:
# List of JSON file paths
json_files = [
    'data/T_Hawao/T_Hawao-2/T_HAWAO-2-2020-09-20-2021-05-21.json',
    'data/T_Hawao/T_Hawao-2/T_HAWAO-2-2021-05-20-2022-05-21.json',
    'data/T_Hawao/T_Hawao-2/T_HAWAO-2-2022-05-20-2023-05-21.json',
    'data/T_Hawao/T_Hawao-2/T_HOWAO-2-2023-05-20-2024-05-20.json'
]

# Initialize an empty list to store combined data
combined_data = []

# Read and combine each JSON file
for file_path in json_files:
    with open(file_path, 'r') as file:
        data = json.load(file)
        combined_data.extend(data)

# Remove duplicates based on 'id' field
unique_data = {entry['id']: entry for entry in combined_data}
combined_unique_data = list(unique_data.values())

# Save the combined data into a new JSON file
output_file_path = 'data/T_Hawao/T_Hawao-2/combined_T_HAWAO-2.json'
with open(output_file_path, 'w') as output_file:
    json.dump(combined_unique_data, output_file, indent=4)

print(f"Combined JSON data saved to {output_file_path}")

Combined JSON data saved to data/T_Hawao/T_Hawao-2/combined_T_HAWAO-2.json


## Combine all the Availability messages of T-HAWAO-3

In [14]:
# List of JSON file paths
json_files = [
    'data/T_Hawao/T_Hawao-3/T_HAWAO-3-2020-09-20-2021-05-21.json',
    'data/T_Hawao/T_Hawao-3/T_HAWAO-3-2021-05-20-2022-05-21.json',
    'data/T_Hawao/T_Hawao-3/T_HAWAO-3-2022-05-20-2023-05-21.json',
    'data/T_Hawao/T_Hawao-3/T_HOWAO-3-2023-05-20-2024-05-20.json'
]

# Initialize an empty list to store combined data
combined_data = []

# Read and combine each JSON file
for file_path in json_files:
    with open(file_path, 'r') as file:
        data = json.load(file)
        combined_data.extend(data)

# Remove duplicates based on 'id' field
unique_data = {entry['id']: entry for entry in combined_data}
combined_unique_data = list(unique_data.values())

# Save the combined data into a new JSON file
output_file_path = 'data/T_Hawao/T_Hawao-3/combined_T_HAWAO-3.json'
with open(output_file_path, 'w') as output_file:
    json.dump(combined_unique_data, output_file, indent=4)

print(f"Combined JSON data saved to {output_file_path}")

Combined JSON data saved to data/T_Hawao/T_Hawao-3/combined_T_HAWAO-3.json


## Create new features from Remit data:
We are going to create 3 new features: Availability1 Availability2 and Availability3. Based on this, we will train the model, with extra information.

This extra information will help the training by knowing if the Hornsea1 is not able to produce its full capacity.

In [15]:
# Load the HDF5 file
energy_data = pd.read_hdf('data/combined/energy_data_20200920_20240519.h5', 'df')
energy_data["dtm"] = pd.to_datetime(energy_data["dtm"])

# Initialize the new features to the normal capacity, which is 400
energy_data['Availability1'] = 400
energy_data['Availability2'] = 400
energy_data['Availability3'] = 400

# List of JSON file paths and corresponding availability columns
json_files = {
    'data/T_Hawao/T_Hawao-1/combined_T_HAWAO-1.json': 'Availability1',
    'data/T_Hawao/T_Hawao-2/combined_T_HAWAO-2.json': 'Availability2',
    'data/T_Hawao/T_Hawao-3/combined_T_HAWAO-3.json': 'Availability3'
}

# Function to update the Availability feature based on the outage profiles
def update_availability(row, outage_profiles, availability_column):
    for profile in outage_profiles:
        start_time = pd.to_datetime(profile['startTime'])
        end_time = pd.to_datetime(profile['endTime'])
        if start_time <= row['dtm'] < end_time:
            return profile['capacity']
    return row[availability_column]

# Iterate through each JSON file and update the corresponding Availability feature
for json_file, availability_column in json_files.items():
    with open(json_file, 'r') as file:
        outage_data = json.load(file)
    
    # Create a DataFrame from the outage profiles
    outage_entries = []
    for entry in outage_data:
        for profile in entry['outageProfile']:
            outage_entries.append({
                'startTime': pd.to_datetime(profile['startTime']),
                'endTime': pd.to_datetime(profile['endTime']),
                'capacity': profile['capacity']
            })
    
    outage_df = pd.DataFrame(outage_entries)
    
    # Update the Availability feature for each outage profile
    for _, row in outage_df.iterrows():
        mask = (energy_data['dtm'] >= row['startTime']) & (energy_data['dtm'] < row['endTime'])
        energy_data.loc[mask, availability_column] = row['capacity']

# Save the updated data back to an HDF5 file
output_file_path = 'data/combined/train_energy_data_20200920_20240519.h5'
energy_data.to_hdf(output_file_path, key='df', mode='w')

print(f"Updated energy data saved to {output_file_path}")

Updated energy data saved to data/combined/train_energy_data_20200920_20240519.h5


## Train energy data

The test data have a slightly different approach

If the publish time is before 9 AM:  
Apply changes for outage profiles that affect the period from 22:00 of the same day onwards.  
If the publish time is after 9 AM:  
Apply changes for outage profiles that affect the period from 22:00 of the next day onwards.


In [16]:
import pandas as pd
import json
import datetime

# Provided function to get day-ahead market times
def day_ahead_market_times(today_date=pd.to_datetime('today')):
    tomorrow_date = today_date + pd.Timedelta(1, unit="day")
    DA_Market = [pd.Timestamp(datetime.datetime(today_date.year, today_date.month, today_date.day, 23, 0, 0),
                              tz="Europe/London"),
                 pd.Timestamp(datetime.datetime(tomorrow_date.year, tomorrow_date.month, tomorrow_date.day, 22, 30, 0),
                              tz="Europe/London")]

    DA_Market = pd.date_range(start=DA_Market[0], end=DA_Market[1], freq=pd.Timedelta(30, unit="minute"))

    return DA_Market

# Load the HDF5 file
energy_data = pd.read_hdf('data/combined/energy_data_20200920_20240519.h5', 'df')
energy_data["dtm"] = pd.to_datetime(energy_data["dtm"]).dt.tz_convert('Europe/London')

# Initialize the new features to the normal capacity, which is 400
energy_data['Availability1'] = 400
energy_data['Availability2'] = 400
energy_data['Availability3'] = 400

# List of JSON file paths and corresponding availability columns
json_files = {
    'data/T_Hawao/T_Hawao-1/combined_T_HAWAO-1.json': 'Availability1',
    'data/T_Hawao/T_Hawao-2/combined_T_HAWAO-2.json': 'Availability2',
    'data/T_Hawao/T_Hawao-3/combined_T_HAWAO-3.json': 'Availability3'
}

# Function to get the day-ahead market times
def get_day_ahead_hours(reference_time):
    return day_ahead_market_times(reference_time)

# Function to apply outage profiles based on message publish time and 9 AM cutoff
def apply_outage_profiles(energy_data, outage_data, availability_column):
    for entry in outage_data:
        publish_time = pd.to_datetime(entry['publishTime']).tz_convert('Europe/London')
        
        # Determine the start time for the forecast window
        if publish_time.time() < pd.to_datetime('09:00:00').time():
            forecast_start = publish_time.normalize() + pd.Timedelta(hours=22)  # 22:00 of the same day
        else:
            forecast_start = publish_time.normalize() + pd.Timedelta(days=1, hours=22)  # 22:00 of the next day

        # Apply the outage profiles
        for profile in entry['outageProfile']:
            start_time = pd.to_datetime(profile['startTime']).tz_convert('Europe/London')
            end_time = pd.to_datetime(profile['endTime']).tz_convert('Europe/London')
            capacity = profile['capacity']
            
            # Apply profiles that affect the period from forecast_start onwards
            if start_time < forecast_start and end_time > forecast_start:
                start_time = forecast_start

            if start_time >= forecast_start or end_time > forecast_start:
                mask = (energy_data['dtm'] >= start_time) & (energy_data['dtm'] < end_time)
                energy_data.loc[mask, availability_column] = capacity

# Iterate through each JSON file and update the corresponding Availability feature
for json_file, availability_column in json_files.items():
    with open(json_file, 'r') as file:
        outage_data = json.load(file)
    apply_outage_profiles(energy_data, outage_data, availability_column)

# Save the updated data back to an HDF5 file
output_file_path_hdf = 'data/combined/test_energy_data_20200920_20240519.h5'
energy_data.to_hdf(output_file_path_hdf, key='df', mode='w')

print(f"Updated energy data saved to {output_file_path_hdf}")

Updated energy data saved to data/combined/test_energy_data_20200920_20240519.h5
