> __NOTE__ This notebook serves as a submission template. Feel free to modify and extend the workflow, but the __whole notebook has to run, especially the final inference cells__, for our hidden evaluation process.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append('..')

import numpy as np
import matplotlib.pyplot as plt
import torch
from chaosbench import dataset, utils, criterion

In [3]:
# DON'T CHANGE THIS: For Evaluation Purposes
TARGET_VARS = ['t-850', 'z-500', 'q-700'] # MANDATORY for evaluation

## Change Me!

> __NOTE__ Remember to just use the year 2022 for training and 2023 for validation, so you can iterate on the process faster and get to learn more!
> 

In [4]:
# These are the list of parameters you can tune and change

## Define train/validation years
train_years = np.arange(2022, 2023)
val_years = test_years = np.arange(2023, 2024)

## Define additional variables
atmos_vars = ['u-850', 'v-850']
land_vars = ['tp']
ocean_vars = ['sosstsst']

## Time control 
## CAUTION: evaluation will be on `lead_time = 44`, so we discourage changes!
lead_time = 44


> __NOTE__ More details on the variables can be found at https://leap-stc.github.io/ChaosBench/dataset.html

## End-to-End Template
This is a sample template, feel free to modify or extend it, but the last cell has to run!

1. Build the dataset
2. Initialize the model
3. Train / fit the model
4. Perform inference with the model
5. Score the inference

In [5]:
# Build the dataset
train_dataset = dataset.S2SObsDataset(
    years=train_years, # Years for training
    lead_time=lead_time, # Number of days ahead as target
    atmos_vars=TARGET_VARS + atmos_vars, # Atmospheric variables; if not given will use ALL 60 variables
    land_vars=land_vars, # Land variables
    ocean_vars=ocean_vars # Ocean variables
)

val_dataset = dataset.S2SObsDataset(
    years=val_years, # Years for validation
    lead_time=lead_time, # Number of days ahead as target
    atmos_vars=TARGET_VARS + atmos_vars, # Atmospheric variables; if not given will use ALL 60 variables
    land_vars=land_vars, # Land variables
    ocean_vars=ocean_vars # Ocean variables
)


In [6]:
# Initialize model
import xgboost as xgb

## Defining our model (e.g., XGBoost)
## More info: https://xgboost.readthedocs.io/en/stable/get_started.html
model = xgb.XGBRegressor(tree_method="hist", device="cuda")


> __NOTE__ Feel free to use any model (ML or DL) as you see fit!

In [7]:
# Fit our model
def transform_2d(x):
    """
    Helper function to transform the dataset into 2D array of shape (n_samples, n_features).
    Here n_features represent width (longitude) x height (latitude) x channels (n_variables)
    """
    return x.view(
        -1, 
        len(train_dataset.atmos_vars) + len(train_dataset.land_vars) + len(train_dataset.ocean_vars)
    )

## Get our training data
time_idx = 0
_, train_x, train_y = train_dataset[time_idx]

## Fitting/training time!
model.fit(
    transform_2d(train_x),  # Input
    transform_2d(train_y)   # Target
)


> __NOTE__ We recommend for you to save your model regularly. During submission, you can comment out the training section and add a model loading part so we don't have to retrain your model. For example:
>
> - For ML (scikit-learn): https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/
> 
> - For DL (PyTorch): https://pytorch.org/tutorials/beginner/saving_loading_models.html
>
> Double check that this loading process works before submission!

In [8]:
# Inference
time_idx = 0
_, val_x, val_y = val_dataset[time_idx]

val_predict = model.predict(transform_2d(val_x))
val_predict = val_predict.reshape(val_y.shape)


> __NOTE__ The whole notebook has to run for the leaderboard, and __the scoring value has to be successfully printed__ as shown below! See notes above on how to save/load your model checkpoints so our scoring will be based on your __best model__.
>
> We also recommend building a nice notebook with figures / explanation / insights as additional scoring criteria :) 

In [9]:
# Scoring (has to successfully output the scalar loss value!)
rmse = criterion.RMSE(lat_adjusted=False)

loss = rmse(
    torch.tensor(val_predict), 
    torch.tensor(val_y)
)

print(f'rmse loss: {loss}')


rmse loss: 0.7280805110931396


  torch.tensor(val_y)
