# UNIPD - Deep Learning 2025 - Challenge 2: Weather Forecasting

Welcome to Challenge 2 of the UNIPD Deep Learning 2025 course! In this challenge, you will develop a deep learning model to forecast weather data based on historical observations collected from various stations.

You are provided with a dataset consisting of daily weather measurements over time, for multiple stations. Your task is to forecast future weather values for each station over a 30-day horizon.

This challenge will test your ability to preprocess time series data, design and train a suitable forecasting model, and prepare predictions for evaluation on Kaggle.


## Loading data

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path

In [None]:
# data_dir = '/kaggle/input/unipd-deep-learning-2025-challenge-2/'
data_dir = Path('data')
data = pd.read_csv(data_dir / 'train_dataset.csv', index_col=[0, 1])
stations = [station.values for _, station in data.groupby(level=0)]
data_arr = np.stack(stations, axis=0)

In [None]:
n_stations, n_days, n_features = data_arr.shape
print(f"Number of weather stations: {n_stations}")
print(f"Number of days of data: {n_days}")
print(f"Number of weather variables: {n_features}")

Number of weather stations: 422
Number of days of data: 695
Number of weather variables: 76


## Training

In this section, you are expected to implement the full pipeline for training a deep learning model to perform time series forecasting.

Specifically, you should:

- Define a PyTorch Dataset (or any other framework of your choice) that can serve batches of time series data.
- Implement a suitable model architecture. This could be an RNN, LSTM, GRU, Transformer, TCN, or any other structure that fits time series prediction (but you can't use pre-trained models).
- Set up a training loop that optimizes your model to minimize forecasting error (e.g., MSE, MAE, ...).
- (Optional but recommended) Add validation to monitor performance during training and avoid overfitting.

Remember, in the end your trained model should be capable of producing predictions with shape (422, 30, 76), matching the number of stations, the forecast horizon, and the number of weather variables.

## Predictions

The objective of the challenge is to forecast the same varibles in the training data for the successive 30 time steps. So, once you have a prediction array, its shape should be (422,30,76). To prepare your submission correctly for score evaluation, it needs to be made into a csv file with the following requirements:
- the first columns, the index, called "id", should be in the form "{station}_{timestep}", with both station and timestep indexed starting from zero, as in data_arr (so, there will be stations 0-421, timesteps 0-29);
- each row should contain the predictions related to the station and future time step as defined by the id, for all available variables;
- the columns for the variables should be called "var1", "var2", ..., "var76".

Below you find an example of valid submission, which uses the last recorded values in the training data as predictions for all the successive time steps. Of course, your model will do better than this simple prediction!

In [None]:
n_forecast_steps = 30

In [None]:
last_timestep_values = data_arr[:, -1, :]

submission_data = []
for station_id in range(n_stations):
    for time_step in range(n_forecast_steps):
        # Create the 'id' in the required format: 'station_timestep'
        # Note: Station IDs here are 0-indexed based on the numpy array
        submission_id = f'{station_id}_{time_step}'
        # The values are the features from the last timestep of this station
        row_data = last_timestep_values[station_id, :]
        submission_data.append({'id': submission_id, **{f'var{i+1}': row_data[i] for i in range(n_features)}})

submission_df = pd.DataFrame(submission_data)
# Set the 'id' column as the index for matching with the solution in score calculation
submission_df = submission_df.set_index('id')

submission_df

Unnamed: 0_level_0,var1,var2,var3,var4,var5,var6,var7,var8,var9,var10,...,var67,var68,var69,var70,var71,var72,var73,var74,var75,var76
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0_0,49.6,35.5,30.84444,26.4,82.0,64.0,19.0,12.0,5.0,34.0,...,0.0,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.0,0.0
0_1,49.6,35.5,30.84444,26.4,82.0,64.0,19.0,12.0,5.0,34.0,...,0.0,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.0,0.0
0_2,49.6,35.5,30.84444,26.4,82.0,64.0,19.0,12.0,5.0,34.0,...,0.0,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.0,0.0
0_3,49.6,35.5,30.84444,26.4,82.0,64.0,19.0,12.0,5.0,34.0,...,0.0,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.0,0.0
0_4,49.6,35.5,30.84444,26.4,82.0,64.0,19.0,12.0,5.0,34.0,...,0.0,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
421_25,0.0,18.6,14.97917,11.1,38.0,10.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.36883,1.13846,0.28751,0.0,0.0
421_26,0.0,18.6,14.97917,11.1,38.0,10.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.36883,1.13846,0.28751,0.0,0.0
421_27,0.0,18.6,14.97917,11.1,38.0,10.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.36883,1.13846,0.28751,0.0,0.0
421_28,0.0,18.6,14.97917,11.1,38.0,10.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.36883,1.13846,0.28751,0.0,0.0


In [None]:
submission_df.to_csv(data_dir / 'submission.csv')