# Quickstart guide

In this notebook we will through all the steps from downloading the data and training a model to evaluating the results. Check out the `environment.yml` file for the required Python packages.

In [None]:
import xarray as xr
import matplotlib.pyplot as plt

datadir = '/gpfs/work/nonnenma/data/forecast_predictability/weatherbench/5_625deg/'

#from src.score import *
#from src.train_nn import *

z500 = xr.open_mfdataset(f'{datadir}geopotential_500/*.nc', combine='by_coords')
# Plot an example
z500.z.isel(time=0).plot();

#z500_test = load_test_data('geopotential_500/', 'z') # Take data only every 12 hours to spped up computation on Binder

z500

In [None]:
import numpy as np
import torch
from src.train_nn_pytorch import Dataset

lead_time = 5*24
var_dict = {'z': None}
batch_size = 32

# tbd: separating train and test datasets / loaders should be avoidable with the start/end arguments of Dataset!

dg_train = Dataset(z500.sel(time=slice('2015', '2015')), var_dict, lead_time, normalize=True)
train_loader = torch.utils.data.DataLoader(
    dg_train,
    batch_size=batch_size,
    drop_last=True)

dg_test =  Dataset(z500.sel(time=slice('2016', '2016')), var_dict, lead_time,
                        mean=dg_train.mean, std=dg_train.std, normalize=True)
test_loader = torch.utils.data.DataLoader(
    dg_test,
    batch_size=batch_size,
    drop_last=False)

In [None]:
i = 0
for batch in dg_train:
    print((batch[0].shape, batch[1].shape))
    print('X[0]', batch[0][0,0,0]) # just verify that minibatch elements differ
    print('y[0]', batch[1][0,0,0]) # and get permuted across epochs (re-run cell!)

# debug

In [None]:
# Use 2015 for training and 2016 for validation
dg_train = DataGenerator(
    z500.sel(time=slice('2015', '2015')), var_dict, lead_time, batch_size=bs, load=True)
dg_valid = DataGenerator(
    z500.sel(time=slice('2016', '2016')), var_dict, lead_time, batch_size=bs, mean=dg_train.mean, std=dg_train.std, shuffle=False)

In [None]:
# Now also a generator for testing. Impartant: Shuffle must be False!
dg_test = DataGenerator(z500.sel(time=slice('2017', '2018')).isel(time=slice(0, None, 12)), # Limiting the data for Binder
                        var_dict, lead_time, batch_size=bs, mean=dg_train.mean, std=dg_train.std, shuffle=False)

In [None]:
X, y = dg_train[0]

In [None]:
# Batches have dimensions [batch_size, lat, lon, channels]
X.shape, y.shape

Now let's build a simple fully convolutional network. We are using periodic convolutions in the longitude direction. These are defined in `train_nn.py`.

In [None]:
cnn = keras.models.Sequential([
    PeriodicConv2D(filters=32, kernel_size=5, activation='relu', input_shape=(32, 64, 1,)),
    PeriodicConv2D(filters=1, kernel_size=5)
])

In [None]:
cnn.summary()

In [None]:
cnn.compile(keras.optimizers.Adam(1e-4), 'mse')

In [None]:
# Train a little bit ;)
cnn.fit_generator(dg_train, epochs=1, validation_data=dg_valid)

### Create a prediction and compute score

Now that we have a model (albeit a crappy one) we can create a prediction. For this we need to create a forecast for each forecast initialization time in the testing range (2017-2018) and unnormalize it. We then convert the forecasts to a Xarray dataset which allows us to easily compute the RMSE. All of this is taken care of in the `create_predictions()` function.

In [None]:
preds = create_predictions(cnn, dg_test)

In [None]:
preds

In [None]:
compute_weighted_rmse(preds.z, z500_test).load()

In [None]:
time = '2017-03-02T00'
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,5))
z500_test.sel(time=time).plot(ax=ax1)
preds.sel(time=time).z.plot(ax=ax2);

# The End

This is the end of the quickstart guide. Please refer to the Jupyter notebooks in the `notebooks` directory for more examples. If you have questions, feel free to ask them as a Github Issue.