# Graduate Climate Conference 2022 - Climate + Machine Learning Workshop
In this notebook, we will be going through an exemplary, self-contained machine learning (ML) pipeline.
Here, we will be training a neural network to forecast daily sea surface temperatures (SST) in the Pacific Ocean. The data we will be using is from the [NOAA OISSTv2](https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html) dataset, which is a gridded dataset covering SSTs from $1982$ to $2022$ at a $0.25^\circ$ resolution. We will be using the [xarray](http://xarray.pydata.org/en/stable/) library to read in the data, and [PyTorch Lightning](https://www.pytorchlightning.ai/) -- a useful abstraction layer on top of PyTorch -- to train our neural network model.

### Install libraries and download data

In [1]:
!pip install netCDF4 dask xarray pytorch_lightning torchmetrics gdown wandb
!gdown --folder https://drive.google.com/drive/folders/1TeH5Miy-rdyT4tsSWGDY0llaJyuHnHuH?usp=sharing

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytorch_lightning
  Downloading pytorch_lightning-1.7.7-py3-none-any.whl (708 kB)
[K     |████████████████████████████████| 708 kB 36.2 MB/s 
[?25hCollecting torchmetrics
  Downloading torchmetrics-0.10.1-py3-none-any.whl (529 kB)
[K     |████████████████████████████████| 529 kB 60.5 MB/s 
Collecting wandb
  Downloading wandb-0.13.4-py2.py3-none-any.whl (1.9 MB)
[K     |████████████████████████████████| 1.9 MB 58.4 MB/s 
Collecting pyDeprecate>=0.3.1
  Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.10.1-py2.py3-none-any.whl (166 kB)
[K     |████████████████████████████████| 166 kB 61.5 MB/s 
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Collecting setproctitle
  Downloading setproctitle-1.3.2-c

In [None]:
DATA_DIR = '/content'  # folder where the data is stored
# PLEASE ADJUST if not using google colab or if you want to store the data elsewhere

## Some ML terminology 
*Note:* feel free to skip and come back to this as needed
- **Model**: We will often refer to the neural network model simply as the "model". We use the model to learn a function. Here, we'll try to learn a function that maps the input data (e.g. SSTs from one day) to the output data (e.g. SSTs of the next day). A model is composed of multiple layers and non-linear activation functions.
- **Layer**: A layer is a subcomponent of a model that performs a specific function. For example, a layer might be a simple matrix multiplication (i.e. linear transformation of the input data). Other popular layers include convolutional layers, normalization layers (e.g. [LayerNorm](https://arxiv.org/abs/1607.06450)), self-attention layers, etc.
- **Activation functions** are key non-linearities that enable the model to learn complex functions (if we only used linear functions, we would only be able to learn linear functions). For example, the [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) and [GELU](https://arxiv.org/abs/1606.08415) activation functions are popular choices.
- **Model architecture and parameters**: A model is defined by its architecture (e.g. the number of layers, the type of layers, etc.) and its parameters (i.e. the learnable weights in each layer). Popular architectures include fully-connected neural networks (also known as MLPs, i.e. multi-layer perceptrons), [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory), convolutional neural net (CNN), [ResNet](https://arxiv.org/abs/1512.03385), [Transformer](https://arxiv.org/abs/1706.03762), [ConvNext](https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_A_ConvNet_for_the_2020s_CVPR_2022_paper.pdf), etc.
- **Training**: The process of learning the function. This is done by optimizing the model's parameters/weights to minimize the loss function.
- **Loss function**: A function that measures how well the model is performing. The loss function is used to optimize the model's parameters. Here, we'll use the mean squared error (MSE) between the model's predictions and the true SSTs.
- **Training loop**: The process of training the model. In each iteration of the training loop, we feed a subset of data into the model, compute the loss, and update the model's parameters.
- **Batch size**: The number of data points/examples that are fed into the model at once at any given iteration of the training loop.
- **Epoch**: One epoch means iterating over the training set once.  We often train the model for multiple epochs to improve the model's performance.
- **Learning rate**: The step size used to update the model's parameters.
- **Optimizer**: The algorithm used to update the model's parameters. Here, we'll be using the [Adam](https://arxiv.org/abs/1412.6980) optimizer.
- **Hyperparameters**: The parameters that are used to define & set up the model and training loop (e.g. the model architecture, loss function, learning rate etc.). These are often tuned on a validation set to improve the model's performance.
- **Channels**: The number of channels in a data example (also denoted as *feature* or *hidden* dimension). For example, an RGB image has 3 channels (one for each color). Here, we will be using SSTs only, so we will have 1 channel/feature.
- **Classification vs. regression**: ML differentiates between these two tasks. In classification, the model's output is a discrete label (e.g. a class, a word, classifying whether an extreme event happened or not, etc.). In regression, the model's output is a continuous value (e.g. a temperature).
- **Overfitting**: When the model performs well on the training data, but poorly on the validation data (as measured by the loss function or some other metric).


In [2]:
# Import libraries
import os
import time
import multiprocessing
from typing import List, Tuple, Dict, Any, Union, Optional
import numpy as np
import xarray as xr
import torch
import torchmetrics
import pytorch_lightning as pl

## Data

The creation of a dataset for ML training is often an iterative approach.
It is important to understand the data and the data sources, ensure that the data is of high quality, and that the data is representative of the problem domain.

#### Splitting the data
It is important to split the data into *separate* training, validation, and test sets.
- The training set is used to train the model (i.e. learn the weights/parameters of the model)
- The validation set is used to tune the model hyperparameters, preventing overfitting, and to select the best model(s).
- The test set is used to evaluate the performance of the selected model(s) on data that the model has not seen before.

**Super important:** For temporal data *always* split your data temporally!
E.g.:
- In forecasting you may use data from 2010-2019, 2020, 2021 for training, validation, and testing, respectively.
- Or, in the exemplary task of evaluating how robust an ML model is to changing atmospheric conditions due the Mt Pinatubo eruption in 1991, one might train on data from 1980-2000 taking care of removing not only 1991 but also 1992-93 from the training set since the eruption is known to have had long-lasting effects on the climate.

#### Data issues
Some common issues with data are:
- ***Data imbalance*** occurs when some data slices/classes are more common than others in the training dataset. This can lead the model to learn to predict the more common data slices and perform poorly on the less common data slices (e.g. extreme events). Ways to address this issue include:
    - *Oversampling* the less common data slices or *undersampling* the more common data slices during training
    - *Weighting the loss function* to give more importance to the less common data slices
    - Learning a *probabilistic model* (e.g. a generative adversarial net (GAN) usually produces more realistic samples than a deterministic neural network)
- ***Data scarcity*** occurs when there is not enough data to accurately train a model. Ways to address this issue include:
    - *Transfer learning* (i.e. fine-tuning/using a model pre-trained on a similar problem domain)
    - *Data augmentation* (e.g. by adding noise to the data)
    - *Simpler models* (e.g. by using a model with fewer parameters, or other model classes such as linear models, random forests etc.)

#### Data preprocessing
Data preprocessing is the process of transforming the raw data into a form that is more suitable for ML training.
It's often an iterative process, and the most time-consuming part of the ML pipeline.
Some common preprocessing steps are:
- ***Standardization & Normalization:*** Usually one subtracts the mean and divides by the standard deviation. For climate data you often want to use daily/monthly/seasonal means and standard deviations to *remove the seasonal cycle*. Data variables that follow a skewed distribution (e.g. precipitation) are often (first) normalized by applying, e.g., the *log* function.
- ***Missing data and Outliers:*** Missing data and outliers can be handled by *removing* the corresponding data slices. One may also resort to *imputing* the missing data (e.g. by using the mean value), or *clipping* the outliers (e.g. by setting the values of the outliers to some minimum/maximum value).
- ***Reshaping Dimensions:*** Neural networks expect their input data to be in very specific shapes and ordered dimensions. For example:
    - 2D convolutional neural net's (CNN) expect the input data to be of shape `(batch_size, channels, height, width)`. Note that for gridded climate data, ``height = latitude``, ``width = longitude``, and ``channels`` might consist of one or more predictors (e.g. SSTs, radiative fluxes, etc.)
    - An MLP expects its input data to be of shape `(batch_size, features)`. Note that an MLP can always be easily applied by flattening all dimensions into a single feature dimension, e.g. for the CNN example above: ``features = height * width * channels``
    - Transformer's usually expect the input data to be of shape `(batch_size, sequence_length, features)`. Originally, sequence length referred to the number of token (think of words). For image and climate data, one can build the sequence dimension by flattening the spatial dimensions, e.g. for the CNN example above: ``sequence_length = height * width = lat * lon`` and ``features = channels``. However, other choices are possible, such as patching the image/grid into multiple sub-images/grids and using the patches as the sequence dimension (see e.g. [ViT](https://arxiv.org/abs/2010.11929), or NVIDIA's weather forecasting model, [FourCastNet](https://arxiv.org/abs/2202.11214)).

In [3]:
# Each file contains daily SST data from 1982 to 2022 from a 60x60 (lat, lon) grid box
xr.open_dataset(f'{DATA_DIR}/sst.day.mean.box85.nc').sst  # print an example SST box

In [4]:
from torch.utils.data import DataLoader, TensorDataset
import dask

class OISSTv2DataModule(pl.LightningDataModule):
    """
    Data module for the OISSTv2 dataset of daily sea surface temperatures.
    A data module encapsulates the data loading, preprocessing, and data splits needed for training, validation, and testing a neural network model.
    These generated PyTorch-ready TensorDataset's are saved in the self._data_train, self._data_val, and self._data_test attributes when calling the setup() function.
    """
    _data_train: TensorDataset
    _data_val: TensorDataset
    _data_test: TensorDataset

    def __init__(self,
                 data_dir: str,
                 horizon: int = 1,
                 batch_size: int = 32,
                 eval_batch_size: int = 64,
                 ):
        """
        Args:
            data_dir (str):  A path to the data folder that contains the input and output files.
            horizon (int): The number of time steps to predict into the future (e.g. 1 for 1 day ahead prediction).
            batch_size (int): Batch size for the training dataloader
            eval_batch_size (int): Batch size for the test and validation data loader's
        """
        super().__init__()
        # The following saves all arguments to self.hparams (e.g. self.hparams.horizon)
        self.save_hyperparameters()

        # Define the temporal slices that will be used for training, validating, and testing
        # Important: training data should not contain data that is temporally near the test data (to remove autocorrelation issues)
        self.train_slice = slice(None, '2018-12-31')
        self.val_slice = slice('2019-01-01', '2019-12-31')
        self.test_slice = slice('2020-01-01', '2021-12-31')
        # Set the currently non-initialized tensor datasets for training, validating, testing
        self._data_train = self._data_val = self._data_test = None


    def setup(self, stage: Optional[str] = None):
        """ Setup data. Set internal variables: self._data_train, self._data_val, self._data_test."""
        if self._data_train and self._data_val and self._data_test:
          return # No need to setup again
        # A small auxiliary function to preprocess the netcdf4 data
        def drop_lat_lon_info(ds: xr.Dataset) -> xr.Dataset:
            """ Drop latitude and longitude coordinates so that xarray datasets can be
             concatenated/merged along (example, grid_box) instead of (lat, lon) dimensions. """
            dummy_lat = np.arange(ds.sizes['lat'])
            dummy_lon = np.arange(ds.sizes['lon'])
            return ds.assign_coords(lat=dummy_lat, lon=dummy_lon)

        # Read all 60x60 boxes into a single xarray dataset
        ds = xr.open_mfdataset(
            paths=os.path.join(self.hparams.data_dir, 'sst.day.mean.box*.nc'),
            combine='nested', concat_dim='grid_box', preprocess=drop_lat_lon_info
        ).sst

        # Split the dataset into training, validation, and testing
        data_splits = {
            'train': ds.sel(time=self.train_slice),
            'val': ds.sel(time=self.val_slice),
            'test': ds.sel(time=self.test_slice)
        }
        # Create a TensorDataset for each split (here, we perform the same preprocessing for each split)
        for split_name, split_data_subset in data_splits.items():
            # Split ds into inputs and targets (targets is horizon time steps ahead of inputs)
            inputs  = split_data_subset.isel(time=slice(None, -self.hparams.horizon))
            targets = split_data_subset.isel(time=slice(self.hparams.horizon, None))
            # Dimensions of X and Y: (grid-box, time, lat, lon)

            def transform(x: xr.DataArray) -> torch.Tensor:
                """ Transform the input and target data to the desired format. """
                with dask.config.set(**{'array.slicing.split_large_chunks': False}):
                    x = x.stack(examples=('time', 'grid_box'))  # Merge the time and grid_box dimensions into a single example dimension (new dimensions: (examples, lat, lon))
                x = x.transpose('examples', 'lat', 'lon').values   # Reorder/Reshape dimensions and convert to numpy array
                x = np.expand_dims(x, axis=1)    # Add a dummy channel dimension (needed for CNNs, Transformers, etc.)
                # Dimensions of x: (examples, channel, lat, lon) = (example, 1, 60, 60)
                x = torch.from_numpy(x).float()  # Convert to PyTorch tensor
                return x

            # Transform the inputs and targets (in this case, the same transformation is applied to both)
            inputs = transform(inputs)
            targets = transform(targets)

            # Create the pytorch tensor dataset, which will return a tuple of (input, target) when indexed
            tensor_ds = TensorDataset(inputs, targets)
            setattr(self, f'_data_{split_name}', tensor_ds)  # Save the tensor dataset to self._data_{split_name}

    # ----- Data loaders -----
    # Basically, data loaders just wrap the corresponding TensorDataset's in a pytorch DataLoader (e.g. defines the batch size)
    # Important: You usually shuffle the training data, but not the validation and test data!
    def _shared_dataloader_kwargs(self) -> dict:
        return dict(num_workers=multiprocessing.cpu_count(), pin_memory=True)  # Use multiprocessing and pin memory for faster data loading

    def train_dataloader(self) -> DataLoader:
        return DataLoader(
            dataset=self._data_train,
            batch_size=self.hparams.batch_size,
            shuffle=True,
            **self._shared_dataloader_kwargs(),
        )

    def _shared_evaluation_dataloader_kwargs(self) -> dict:
        # Disable shuffling and potentially use a larger batch size for evaluation
        return dict(**self._shared_dataloader_kwargs(), batch_size=self.hparams.eval_batch_size, shuffle=False)

    def val_dataloader(self) -> DataLoader:
        return DataLoader(dataset=self._data_val, **self._shared_evaluation_dataloader_kwargs())

    def test_dataloader(self) -> DataLoader:
        return DataLoader(dataset=self._data_test, **self._shared_evaluation_dataloader_kwargs())


## Define the model

In [5]:
import torch.nn as nn

class ConvBlock(nn.Module):
    """ A simple convolutional block with BatchNorm and GELU activation. """
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int = 3, stride: int = 1, padding: int = 1):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.norm = nn.BatchNorm2d(out_channels)  # a normalization layer for improved/more stable training
        self.activation = nn.GELU()  # a non-linearity

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.conv(x)
        x = self.norm(x)
        x = self.activation(x)
        return x

class ConvNet(nn.Module):
    """ A simple convolutional network. """

    def __init__(self, channels_in, channels_out, channels_hidden):
        super().__init__()
        dim = channels_hidden
        # Define the convolutional layers
        self.conv1 = ConvBlock(channels_in, dim, kernel_size=3, padding=1)
        self.conv2 = ConvBlock(dim, dim, kernel_size=3, padding=1)
        self.conv3 = ConvBlock(dim, dim // 2, kernel_size=3, padding=1)
        self.head = nn.Conv2d(dim // 2, channels_out, kernel_size=1, padding=0)


    def forward(self, x: torch.Tensor) -> torch.Tensor:
        h1 = self.conv1(x)
        h2 = self.conv2(h1)
        h2 = h1 + h2  # Residual connection
        h3 = self.conv3(h2)
        h4 = self.head(h3)
        return h4

    

## Define the LightningModule (i.e. how to train the model)

In [6]:
class LitConvNet(pl.LightningModule):
    def __init__(self, hidden_dim: int = 32, learning_rate: float = 1e-3, **kwargs):
        super().__init__()
         # Save the hyperparameters to self.hparams
        self.save_hyperparameters() 

        # Define the neural network architecture
        # channels_in = channels_out = 1 because inputs and outputs consist of a single variable, i.e. SSTs
        self.model = ConvNet(channels_in=1, channels_out=1, channels_hidden=hidden_dim)  

        # The loss function. The mean squared error is the usual go-to for regression
        self.loss_function = nn.MSELoss()

        # Some metrics to track during/after training. Here you should add any metrics that help you judge the model's performance on evaluation data.
        self.val_metrics = nn.ModuleDict({'val/mae': torchmetrics.MeanAbsoluteError()})
        self.test_metrics = nn.ModuleDict({'test/mae': torchmetrics.MeanAbsoluteError()})

        self._start_epoch_time = None  # To track how long each training epoch takes

    # ---------------------------------------------- Training methods START
    def on_train_start(self):
        # Compute the number of parameters in the model and log it
        num_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
        self.log('num_params', float(num_params))

    def on_train_epoch_start(self):
        self._start_epoch_time = time.time()

    def training_step(self, batch: Tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> dict:
        """ Defines a single training loop/iteration over one batch of training data. """
        x, y = batch   # x = inputs, y = targets (or ground truth)
        y_hat = self.model(x)  # y_hat = model predictions
        loss = self.loss_function(y_hat, y)
        self.log('train/loss', loss, on_step=True, on_epoch=True, prog_bar=True)
        # the returned loss below will be used by Pytorch Lightning to update the model weights
        return {'loss': loss}  

    def training_epoch_end(self, outputs: List[Any]):
        train_time = time.time() - self._start_epoch_time  # Time spend in training epoch
        self.log("time_train", train_time)
    # ------------------------------------------------ Training methods END

    def evaluation_step(self, batch: Tuple[torch.Tensor, torch.Tensor],
                        batch_idx: int,
                        torch_metrics: nn.ModuleDict,
                        split_name: str, # 'val' or 'test'
                        **kwargs
                        ) -> Dict[str, float]:
        """ Defines a single evaluation loop/iteration over one batch of data. """
        x, y = batch
        y_hat = self.model(x)  # predict with the model
        log_dict = {f'{split_name}/loss': self.loss_function(y_hat, y)}
        # Compute metrics
        for metric_name, metric in torch_metrics.items():
            # The two following lines need to be separate!
            metric(y_hat, y)  
            log_dict[metric_name] = metric
        self.log_dict(log_dict, on_step=False, on_epoch=True, **kwargs)  # log metric objects
        return log_dict

    # Here, we will use the same evaluation loop for validation and testing
    def validation_step(self, batch: Tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> Dict[str, float]:
      return self.evaluation_step(batch, batch_idx, split_name='val', torch_metrics=self.val_metrics, prog_bar=True)

    def test_step(self, batch: Tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> Dict[str, float]:
      return self.evaluation_step(batch, batch_idx, split_name='test', torch_metrics=self.test_metrics)

    def predict_step(self, batch, batch_idx):
      """For prediction/inference - might be test (or any other kind of) data. """
      x, y = batch
      y_hat = self.model(x)  # predict with the model
      return {'preds': y_hat, 'targets': y}


    def configure_optimizers(self) -> torch.optim.Optimizer:
      """ Define which optimization algorithm to use """
      return torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate)

## Logging and experiment tracking with Wandb

In [7]:
import wandb
# We'll use Weights & Biases (wandb) to log the training/validation loss (+other metrics), as well as the used hyperparameters
run = wandb.init(project='GCC-ClimateML-workshop',
                 name='OISSTv2-SimpleConvNet',  # name of the training/model run
                 anonymous='allow')  # so that you don't need to create an account for running this notebook
# For convenience, we'll visualize the logged metrics directly below
# Note that it can take a short while until the charts show below...
# ... when they do show, keep an eye out for the 'train/loss_epoch' and 'val/loss' charts
run.display(height=720)

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc




True

## Training the model
Now, we just need to instantiate the model and datamodule from above with specific (hyper-)parameters.

In [8]:
model = LitConvNet(hidden_dim=64, learning_rate=5e-5)
datamodule = OISSTv2DataModule(data_dir=DATA_DIR, batch_size=64)
accelerator = 'gpu' if torch.cuda.is_available() else 'cpu'   # GPUs will make training much faster!
print("Using accelerator:", accelerator)  # This should print 'gpu'

Using accelerator: gpu


As a last step we will define a ``Trainer`` instance. This is a useful pytorch-lightning abstraction that takes care of running the training/evaluation loops for you as well as many other helpful things such as easily running on a GPU (or multiple of them!).

In [9]:
trainer = pl.Trainer(max_epochs=5,  # How many epochs to run (runs over the full training set)
                     accelerator=accelerator, devices=1,   # CPU or GPU or GPUs
                     accumulate_grad_batches=32,   # This is a training trick to effectively use a larger batch size than the one from the dataloader
                     gradient_clip_val=1.0,    # Another trick that improves and stabilizes training by clipping large gradients to a max value
                     logger=pl.loggers.WandbLogger(anonymous='allow', log_model=True)  # Use the wandb logger to log the train/val losses/metrics
                     )
trainer.fit(model, datamodule=datamodule)

  "There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse"
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name          | Type       | Params
---------------------------------------------
0 | model         | ConvNet    | 56.4 K
1 | loss_function | MSELoss    | 0     
2 | val_metrics   | ModuleDict | 0     
3 | test_metrics  | ModuleDict | 0     
---------------------------------------------
56.4 K    Trainable params
0         Non-trainable params
56.4 K    Total params
0.226     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


## Test the model

In [10]:
trainer.test(model, datamodule=datamodule)
wandb.finish()

INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: 0it [00:00, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test/loss          0.050045158714056015
        test/mae             0.159410759806633
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


VBox(children=(Label(value='0.669 MB of 0.669 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▁▂▂▂▄▄▄▅▅▅▅▇▇▇█
num_params,▁
test/loss,▁
test/mae,▁
time_train,█▃▂▁▁
train/loss_epoch,█▁▁▁▁
train/loss_step,▂▃▅█▁▃
trainer/global_step,▁▁▁▂▃▃▃▅▅▅▆▆▆▇███
val/loss,█▃▂▆▁
val/mae,█▃▂▇▁

0,1
epoch,5.0
num_params,56385.0
test/loss,0.05005
test/mae,0.15941
time_train,81.55572
train/loss_epoch,0.13401
train/loss_step,0.12711
trainer/global_step,330.0
val/loss,0.06741
val/mae,0.18554


In [11]:
val_results = trainer.predict(model, dataloaders=datamodule.val_dataloader())
# val_results will be a list, where each entry consists of corresponding preds and targets stored in a dictionary
# E.g. access the third batch of predictions with val_results[2]['preds']

INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 2112it [00:00, ?it/s]

In [15]:
val_results[0]['preds']

tensor([[[[ 0.8864,  0.7448,  0.8347,  ...,  0.6780,  0.6514,  0.4023],
          [ 1.0362,  0.9145,  0.8960,  ...,  0.6990,  0.6703,  0.5543],
          [ 1.1100,  0.9534,  0.8112,  ...,  0.7237,  0.6806,  0.5771],
          ...,
          [ 0.8871,  0.8013,  0.5501,  ...,  1.6312,  1.5555,  1.6974],
          [ 0.7308,  0.7892,  0.4464,  ...,  1.6631,  1.5696,  1.3546],
          [ 0.6321,  0.5918,  0.4389,  ...,  1.6022,  1.4698,  0.9617]]],


        [[[ 0.3242,  0.2615,  0.3293,  ...,  0.6173,  0.6314,  0.3993],
          [ 0.4343,  0.3999,  0.4142,  ...,  0.6010,  0.5795,  0.5244],
          [ 0.5099,  0.4714,  0.4209,  ...,  0.5725,  0.5889,  0.5356],
          ...,
          [ 1.4705,  1.3415,  1.2717,  ...,  0.7325,  0.8132,  0.8179],
          [ 1.0623,  1.2658,  1.1550,  ...,  0.7462,  0.8928,  0.7006],
          [ 0.9007,  0.9285,  1.0464,  ...,  0.7312,  0.7687,  0.5071]]],


        [[[ 0.4753,  0.4700,  0.5928,  ...,  0.7301,  0.7327,  0.4874],
          [ 0.5599,  0.550