# Prediction of energy market prices (part 2)

You are given meteorological and market data for part of 2025. [Meteorological data file](https://github.com/jkved/enerheads-quant-challenge/blob/main/data/weather_location_Vilnius.csv)(s) contains day-ahead and intraday forecasts for meteorological variables for a single location. The data description is available on [OpenMeteo](https://open-meteo.com/en/docs) docs and the timezone here is in UTC. This will contain some of your predictors values. Day ahead value (with suffix `previous_day1`) is known 24 hours before delivery time, intraday (no suffix column) is around 1 hour before or at delivery time. Market data (predictors and target values) is given in file [market data file](https://github.com/jkved/enerheads-quant-challenge/blob/main/data/market_data.csv), the data is also publicly available on [Baltic transparency dashboard](https://baltic.transparency-dashboard.eu/). Here index is in UTC timezone and two columns here are considered our target variables:
- `10YLT-1001A0008Q_DA_eurmwh` - this is Nord Pool day-ahead auction cleared prices (EUR/MWh). It is resolved day before delivery day (day-ahead), i.e. today at 10:00 UTC we find out prices for tommorrow CET day (22:00 UTC today -> 22:00 UTC tommorrow). Only weather data with `previous_day1` suffix is available at inference time
- `LT_up_sa_cbmp` or `LT_down_sa_cbmp` - this is mFRR activation prices. Generally only up or down activations take place at the same time so price is duplicated in these columns. It is resolved at delivery time (intraday). All weather data and Nord Pool prices are available at inference time but all other market data is visible with a 30 minute lag, i.e. for an mFRR activation price @ 11:00, all other market data is visible only up to (not including) 10:30.

Complete the following tasks:

1. Create Nord Pool prices forecasting model in day-ahead setting.
2. Create mFRR prices forecasting model in intraday setting
3. Implement certain evaluation metrics for prices:
   - you wish to accurately guess times when smallest and largest prices of the day take shape.
   - you wish to know how many instances there are with spreads between smallest and largest prices being bigger than X (say, 200 EUR/MWh).
4. Choose a collection of 2-3 plots to visualize the performance of both models.


# Importing Data 

In [14]:
from functools import reduce
import glob
import os

from catboost import CatBoostRegressor, Pool
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

from utils import get_time_of_day, get_season

In [2]:
market_df = pd.read_csv("data/market_data.csv", index_col=0)
market_df.index = pd.to_datetime(market_df.index, utc=True)
print(market_df.shape)
market_df.head()

(11424, 100)


Unnamed: 0,EE_afrr_up_activ,EE_afrr_down_activ,LV_afrr_up_activ,LV_afrr_down_activ,LT_afrr_up_activ,LT_afrr_down_activ,EE_afrr_up_min_bid,EE_afrr_up_max_bid,EE_afrr_down_min_bid,EE_afrr_down_max_bid,...,LT_up_da_cbmp,LT_down_sa_cbmp,LT_down_da_cbmp,EE_dsb,LV_dsb,LT_dsb,EE_imbalance_price,LV_imbalance_price,LT_imbalance_price,10YLT-1001A0008Q_DA_eurmwh
2025-03-01 00:00:00+00:00,1.818,0.018,3.048,0.0,2.05,0.0,400.0,927.0,-273.0,20.0,...,,27.27,,1,-1.0,1.0,400.0,10.19,927.02,120.48
2025-03-01 00:15:00+00:00,1.729,0.0,3.547,0.0,2.4,0.0,400.0,400.0,-273.0,20.0,...,,27.27,,1,-1.0,1.0,400.0,10.19,932.81,120.48
2025-03-01 00:30:00+00:00,1.07,0.0,3.506,0.0,0.38,1.5,400.0,927.0,-273.0,20.0,...,,,,1,-1.0,1.0,400.0,10.19,119.23,120.48
2025-03-01 00:45:00+00:00,0.604,0.221,1.828,0.0,0.0,2.48,400.0,927.0,-273.0,20.0,...,,27.27,,1,-1.0,-1.0,400.0,10.19,-303.44,120.48
2025-03-01 01:00:00+00:00,2.829,0.0,2.324,0.0,0.0,1.75,400.0,927.0,-273.0,20.0,...,,27.27,,1,-1.0,-1.0,560.36,10.19,-291.36,117.15


In [9]:
files = glob.glob("data/weather_location_*.csv")
dfs = []

for file in files:
    city = os.path.splitext(os.path.basename(file))[0].replace("weather_location_", "")
    df = pd.read_csv(file, index_col=0)
    df = df.rename(columns={col: f"{col}_{city}" for col in df.columns})
    
    dfs.append(df)

weather_df = reduce(lambda left, right: pd.merge(left, right, left_index=True, right_index=True), dfs)
weather_df.index = pd.to_datetime(weather_df.index, utc=True)
print(weather_df.shape)
weather_df.head()

(2856, 140)


Unnamed: 0,wind_speed_80m_Alytus,wind_speed_80m_previous_day1_Alytus,wind_direction_80m_Alytus,wind_direction_80m_previous_day1_Alytus,direct_radiation_Alytus,direct_radiation_previous_day1_Alytus,diffuse_radiation_Alytus,diffuse_radiation_previous_day1_Alytus,cloud_cover_Alytus,cloud_cover_previous_day1_Alytus,...,direct_radiation_Vilnius,direct_radiation_previous_day1_Vilnius,diffuse_radiation_Vilnius,diffuse_radiation_previous_day1_Vilnius,cloud_cover_Vilnius,cloud_cover_previous_day1_Vilnius,temperature_2m_Vilnius,temperature_2m_previous_day1_Vilnius,relative_humidity_2m_Vilnius,relative_humidity_2m_previous_day1_Vilnius
2025-03-01 00:00:00+00:00,13.333627,13.684512,125.0,87.0,0.0,0.0,0.0,0.0,100.0,100.0,...,0.0,0.0,0.0,0.0,100.0,100.0,2.1235,1.8235,89.0,91.0
2025-03-01 01:00:00+00:00,12.280973,14.737166,123.0,99.0,0.0,0.0,0.0,0.0,100.0,100.0,...,0.0,0.0,0.0,0.0,100.0,100.0,1.9735,1.7735,89.0,91.0
2025-03-01 02:00:00+00:00,13.333627,12.631857,118.0,101.0,0.0,0.0,0.0,0.0,100.0,100.0,...,0.0,0.0,0.0,0.0,100.0,100.0,1.8735,1.8235,89.0,90.0
2025-03-01 03:00:00+00:00,11.228318,9.473893,130.0,102.0,0.0,0.0,0.0,0.0,100.0,100.0,...,0.0,0.0,0.0,0.0,100.0,100.0,1.7735,1.8235,89.0,89.0
2025-03-01 04:00:00+00:00,9.123008,7.368583,152.0,87.0,0.0,0.0,0.0,0.0,100.0,100.0,...,0.0,0.0,0.0,0.0,100.0,100.0,1.9235,1.7235,90.0,90.0


In [10]:
percent_missing = 0.5
market_df_filtered = market_df.loc[:, market_df.isnull().mean() < percent_missing]
print(market_df_filtered.shape)
market_df_filtered.head()

(11424, 68)


Unnamed: 0,EE_afrr_up_activ,EE_afrr_down_activ,LV_afrr_up_activ,LV_afrr_down_activ,LT_afrr_up_activ,LT_afrr_down_activ,EE_afrr_up_min_bid,EE_afrr_up_max_bid,EE_afrr_down_min_bid,EE_afrr_down_max_bid,...,LT_up_da_cbmp,LT_down_sa_cbmp,LT_down_da_cbmp,EE_dsb,LV_dsb,LT_dsb,EE_imbalance_price,LV_imbalance_price,LT_imbalance_price,10YLT-1001A0008Q_DA_eurmwh
2025-03-01 00:00:00+00:00,1.818,0.018,3.048,0.0,2.05,0.0,400.0,927.0,-273.0,20.0,...,,27.27,,1,-1.0,1.0,400.0,10.19,927.02,120.48
2025-03-01 00:15:00+00:00,1.729,0.0,3.547,0.0,2.4,0.0,400.0,400.0,-273.0,20.0,...,,27.27,,1,-1.0,1.0,400.0,10.19,932.81,120.48
2025-03-01 00:30:00+00:00,1.07,0.0,3.506,0.0,0.38,1.5,400.0,927.0,-273.0,20.0,...,,,,1,-1.0,1.0,400.0,10.19,119.23,120.48
2025-03-01 00:45:00+00:00,0.604,0.221,1.828,0.0,0.0,2.48,400.0,927.0,-273.0,20.0,...,,27.27,,1,-1.0,-1.0,400.0,10.19,-303.44,120.48
2025-03-01 01:00:00+00:00,2.829,0.0,2.324,0.0,0.0,1.75,400.0,927.0,-273.0,20.0,...,,27.27,,1,-1.0,-1.0,560.36,10.19,-291.36,117.15


# mFRR prices forecasting model in intraday setting

I will choose my target column `LT_up_sa_cbmp`. Let's fill in the NaNs with the average value of the row above and below the unknown value.

In [17]:
market_df_filtered["LT_up_sa_cbmp"].isna().sum()

424

In [19]:
market_df_filtered.loc[:, "LT_up_sa_cbmp"] = market_df_filtered["LT_up_sa_cbmp"].interpolate(
    method="linear", limit_direction="both"
)

In [20]:
market_df_filtered["LT_up_sa_cbmp"].isna().sum()

0

Let's create a 30 minute lag in the market dataframe. 

In [23]:
targets = ["10YLT-1001A0008Q_DA_eurmwh", "LT_up_sa_cbmp", "LT_down_sa_cbmp"]
features =  [col for col in market_df_filtered.columns if col not in targets]


In [24]:
lagged_market = market_df_filtered[features].shift(2)
lagged_market.head()

Unnamed: 0,EE_afrr_up_activ,EE_afrr_down_activ,LV_afrr_up_activ,LV_afrr_down_activ,LT_afrr_up_activ,LT_afrr_down_activ,EE_afrr_up_min_bid,EE_afrr_up_max_bid,EE_afrr_down_min_bid,EE_afrr_down_max_bid,...,LV_up_sa_cbmp,LV_down_sa_cbmp,LT_up_da_cbmp,LT_down_da_cbmp,EE_dsb,LV_dsb,LT_dsb,EE_imbalance_price,LV_imbalance_price,LT_imbalance_price
2025-03-01 00:00:00+00:00,,,,,,,,,,,...,,,,,,,,,,
2025-03-01 00:15:00+00:00,,,,,,,,,,,...,,,,,,,,,,
2025-03-01 00:30:00+00:00,1.818,0.018,3.048,0.0,2.05,0.0,400.0,927.0,-273.0,20.0,...,27.27,27.27,,,1.0,-1.0,1.0,400.0,10.19,927.02
2025-03-01 00:45:00+00:00,1.729,0.0,3.547,0.0,2.4,0.0,400.0,400.0,-273.0,20.0,...,27.27,27.27,,,1.0,-1.0,1.0,400.0,10.19,932.81
2025-03-01 01:00:00+00:00,1.07,0.0,3.506,0.0,0.38,1.5,400.0,927.0,-273.0,20.0,...,27.27,27.27,,,1.0,-1.0,1.0,400.0,10.19,119.23


In [25]:
weather_upsampled = weather_df.loc[weather_df.index.repeat(4)].copy()
weather_upsampled.index = market_df_filtered.index
merged_df = pd.concat([
    lagged_market,
    market_df_filtered[["10YLT-1001A0008Q_DA_eurmwh"]], 
    weather_upsampled,
    market_df_filtered[["LT_up_sa_cbmp"]] 
], axis=1)

print(merged_df.shape)
merged_df.head(8)

(11424, 207)


Unnamed: 0,EE_afrr_up_activ,EE_afrr_down_activ,LV_afrr_up_activ,LV_afrr_down_activ,LT_afrr_up_activ,LT_afrr_down_activ,EE_afrr_up_min_bid,EE_afrr_up_max_bid,EE_afrr_down_min_bid,EE_afrr_down_max_bid,...,direct_radiation_previous_day1_Vilnius,diffuse_radiation_Vilnius,diffuse_radiation_previous_day1_Vilnius,cloud_cover_Vilnius,cloud_cover_previous_day1_Vilnius,temperature_2m_Vilnius,temperature_2m_previous_day1_Vilnius,relative_humidity_2m_Vilnius,relative_humidity_2m_previous_day1_Vilnius,LT_up_sa_cbmp
2025-03-01 00:00:00+00:00,,,,,,,,,,,...,0.0,0.0,0.0,100.0,100.0,2.1235,1.8235,89.0,91.0,27.27
2025-03-01 00:15:00+00:00,,,,,,,,,,,...,0.0,0.0,0.0,100.0,100.0,2.1235,1.8235,89.0,91.0,27.27
2025-03-01 00:30:00+00:00,1.818,0.018,3.048,0.0,2.05,0.0,400.0,927.0,-273.0,20.0,...,0.0,0.0,0.0,100.0,100.0,2.1235,1.8235,89.0,91.0,27.27
2025-03-01 00:45:00+00:00,1.729,0.0,3.547,0.0,2.4,0.0,400.0,400.0,-273.0,20.0,...,0.0,0.0,0.0,100.0,100.0,2.1235,1.8235,89.0,91.0,27.27
2025-03-01 01:00:00+00:00,1.07,0.0,3.506,0.0,0.38,1.5,400.0,927.0,-273.0,20.0,...,0.0,0.0,0.0,100.0,100.0,1.9735,1.7735,89.0,91.0,27.27
2025-03-01 01:15:00+00:00,0.604,0.221,1.828,0.0,0.0,2.48,400.0,927.0,-273.0,20.0,...,0.0,0.0,0.0,100.0,100.0,1.9735,1.7735,89.0,91.0,27.27
2025-03-01 01:30:00+00:00,2.829,0.0,2.324,0.0,0.0,1.75,400.0,927.0,-273.0,20.0,...,0.0,0.0,0.0,100.0,100.0,1.9735,1.7735,89.0,91.0,27.27
2025-03-01 01:45:00+00:00,3.061,0.0,3.315,0.0,0.0,2.38,400.0,927.0,-273.0,20.0,...,0.0,0.0,0.0,100.0,100.0,1.9735,1.7735,89.0,91.0,27.27


In [26]:
df = merged_df.copy()

df["hour"] = df.index.hour
df["dayofweek"] = df.index.dayofweek
df["month"] = df.index.month
df["time_of_day"] = df["hour"].apply(get_time_of_day)
df["season"] = df["month"].apply(get_season)
df["time_of_day"] = df["time_of_day"].astype("category")
df["season"] = df["season"].astype("category")

print(df.shape)
df.head()

(11424, 212)


Unnamed: 0,EE_afrr_up_activ,EE_afrr_down_activ,LV_afrr_up_activ,LV_afrr_down_activ,LT_afrr_up_activ,LT_afrr_down_activ,EE_afrr_up_min_bid,EE_afrr_up_max_bid,EE_afrr_down_min_bid,EE_afrr_down_max_bid,...,temperature_2m_Vilnius,temperature_2m_previous_day1_Vilnius,relative_humidity_2m_Vilnius,relative_humidity_2m_previous_day1_Vilnius,LT_up_sa_cbmp,hour,dayofweek,month,time_of_day,season
2025-03-01 00:00:00+00:00,,,,,,,,,,,...,2.1235,1.8235,89.0,91.0,27.27,0,5,3,night,spring
2025-03-01 00:15:00+00:00,,,,,,,,,,,...,2.1235,1.8235,89.0,91.0,27.27,0,5,3,night,spring
2025-03-01 00:30:00+00:00,1.818,0.018,3.048,0.0,2.05,0.0,400.0,927.0,-273.0,20.0,...,2.1235,1.8235,89.0,91.0,27.27,0,5,3,night,spring
2025-03-01 00:45:00+00:00,1.729,0.0,3.547,0.0,2.4,0.0,400.0,400.0,-273.0,20.0,...,2.1235,1.8235,89.0,91.0,27.27,0,5,3,night,spring
2025-03-01 01:00:00+00:00,1.07,0.0,3.506,0.0,0.38,1.5,400.0,927.0,-273.0,20.0,...,1.9735,1.7735,89.0,91.0,27.27,1,5,3,night,spring


In [29]:
features =  [col for col in df.columns if col not in targets]
df = df.sort_index()

split_idx = int(len(df) * 0.8)
X_train = df[features].iloc[:split_idx]
y_train = df[targets[1]].iloc[:split_idx]
X_test = df[features].iloc[split_idx:]
y_test = df[targets[1]].iloc[split_idx:]

In [None]:

# 1. Identify categorical features (CatBoost needs column names or indices)
categorical_features = [col for col in X_train.select_dtypes(include=["category", "object"]).columns]

# 2. Create CatBoost Pool objects
train_pool = Pool(X_train, y_train, cat_features=categorical_features)
test_pool = Pool(X_test, y_test, cat_features=categorical_features)

# 3. Initialize and fit the model
model = CatBoostRegressor(
    iterations=1000,
    learning_rate=0.05,
    depth=6,
    loss_function="MAE",
    early_stopping_rounds=50,
    verbose=100
)

model.fit(train_pool, eval_set=test_pool)

# 4. Predict
y_pred = model.predict(X_test)

# 5. Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.3f}")
