# Sea ice forecasting using the IceNet library

## Context
### Purpose
This notebook demonstrates the use of the [IceNet library](https://pypi.org/project/icenet/) for sea-ice forecasting trained using climate reanalysis and observational data.

### Description
[IceNet](https://github.com/icenet-ai/icenet/) is a python library that provides the ability to download, process, train and predict from end to end. Users can interact with IceNet either via the python interface or via a set of command-line interfaces (CLI) which provide a high-level interface that covers the above abilities.

This notebook demonstrates the use of the python library api for forecasting sea ice for a reduced dataset to demonstrate its capabilities.

### Modelling approach
IceNet is a probabilistic, deep learning sea ice forecasting system. It utilises ensemble modelling of U-Net networks to generate daily forecasts of sea ice condition, trained on climate reanalysis and sea ice observational data at 25km resolution. The original IceNet research model, published in [Nature Communications](https://www.nature.com/articles/s41467-021-25257-4) was trained on climate simulations and observational data to forecast the next 6 months of monthly-averaged sea ice concentration maps. This version advanced the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model (ECMWF SEAS5) in seasonal forecasts of summer sea ice, particularly for extreme sea ice events.

### Highlights
*Provide 3-5 bullet points that convey the use case’s core procedures. Each bullet point must have a maximum of 85 characters, including spaces.*

 * [1. Setup](#Setup): Set up the environment and project structure.
 * [2. Download](#Download): Download sea ice concentration data for training using the built-in downloader for first quarter of the year 2020.
 * [3. Process](#Process): Process the downloaded data by renormalising variables as needed, and generating cached datasets to speed up training.
 * [4. Train](#Train): Train the neural network and generate checkpoint best result.
 * [5. Predict](#Predict): Predict for defined dates.
 * [6. Visualisation](#Visualisation): Visualise the prediction output.

### Contributions

#### Notebook
* James Byrne (Notebook Author), British Antarctic Survey, [@JimCircadian](https://github.com/JimCircadian)
* Bryn Noel Ubald (Notebook Author), British Antarctic Survey, [@bnubald](https://github.com/bnubald)

#### Modelling codebase
* James Byrne (Code author)
* Tom Andersson (Science author)

__Please raise issues [in this repository](https://github.com/icenet-ai/icenet-notebooks/issues) to suggest updates to this notebook!__ 

Contact me at _jambyr \<at\> bas.ac.uk_ for anything else...

#### Modelling publications
Andersson, T.R., Hosking, J.S., Pérez-Ortiz, M. et al. Seasonal Arctic sea ice forecasting with probabilistic deep learning. Nat Commun 12, 5124 (2021). https://doi.org/10.1038/s41467-021-25257-4

> [!NOTE]  
> 
> IceNet has developed significantly since the inclusion of the existing notebook (relates to issue #6) based on the work in the original paper.
>
> The original paper and notebook used a combination of climate simulations and observational data to forecast the next 6 months of monthly-averaged sea ice concentration. Since then, the original code has been refactored into a new library icenet as showcased in this notebook.
> This library supports sea ice forecasting on a daily resolution rather than monthly-averaged. It has been developed significantly since the original paper, and there are multiple ways of interacting with the library to help enable development of sea ice forecasting and model development.

#### Involved organisations
The Alan Turing Institute and British Antarctic Survey

___
# Setup

## Load libraries
Load some of the common libraries required.

In [1]:
import os
import random
import warnings
warnings.filterwarnings(action='ignore')

import numpy as np
import pandas as pd
import tensorflow as tf

# We also set the logging level so that we get some feedback from the API
import logging
logging.basicConfig(level=logging.INFO)

The following imports modules from the IceNet library as preparation for the downloaders. Whose instantiation describes the interactions with the upstream APIs/data interfaces used to source various types of data. 

In [2]:
from icenet.data.sic.mask import Masks
from icenet.data.interfaces.cds import ERA5Downloader
from icenet.data.sic.osisaf import SICDownloader

## Set project structure
*The cell below creates a separate folder to save the notebook outputs. This facilitates the reader to inspect inputs/outputs stored within a defined destination folder. Don't remove the lines below.*

In [3]:
notebook_folder = './notebook'
if not os.path.exists(notebook_folder):
    os.makedirs(notebook_folder)

___
# Download

In this section, we download all required data with our extended date range. All downloaders inherit a `download` method from the `Downloader` class in [`icenet.data.producers`](https://github.com/icenet-ai/icenet/blob/main/icenet/data/producers.py), which also contains two other data producing classes `Generator` (which Masks inherits from) and `Processor` (used in the next section), each providing abstract implementations that multiple classes derive from.

### Mask data

We start here with generating the masks for training/prediction. This includes regions where sea ice does not form, land regions, and the [polar hole](https://blogs.egu.eu/divisions/cr/2016/10/14/image-of-the-week-the-polar-hole/).

In [4]:
masks = Masks(north=False, south=True)
masks.generate(save_polarhole_masks=False)

INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200001021200.nc already exists
INFO:root:Saving ./data/masks/south/masks/active_grid_cell_mask_01.npy
INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200002021200.nc already exists
INFO:root:Saving ./data/masks/south/masks/active_grid_cell_mask_02.npy
INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200003021200.nc already exists
INFO:root:Saving ./data/masks/south/masks/active_grid_cell_mask_03.npy
INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200004021200.nc already exists
INFO:root:Saving ./data/masks/south/masks/active_grid_cell_mask_04.npy
INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200005021200.nc already exists
INFO:root:Saving ./data/masks/south/masks/active_grid_cell_mask_05.npy
INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200006021200.nc already exists
INFO:root:Saving ./data/masks/south/masks/active_grid_cell_mask_06.npy
INFO:root:siconca ice_conc_sh_ease2-250_cdr-v2p0_200007021200.nc already exists
INFO:root:Savi

## Climate and Ocean data

Climate and ocean data are obtained from the [Climate Data Store (CDS)](https://cds.climate.copernicus.eu/).

The climate data used for training is from [ERA5](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview) reanalysis which covers the global climate since 1940 to the present time.

The Ocean data used is from [ORA5](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-oras5?tab=overview) which also uses a reanalysis approach and contains global ocean and sea-ice reanalysis data.

Since these are both obtained from reanalysis, they are a combination of physical models and observational data. Due to the reanalysis approach, there is no temporal or spatial gap in the downloaded data. Both of these sets of data are obtained from the ECMWF's (European Centre for Medium-Range Weather Forecast) reanalysis systems.

Please see the above links for more details on these datasets.

The downloader implementation of this data in IceNet utilises the CDS API which requires registration and configuration of a token before downloading. The registration is free, please see [here](https://cds.climate.copernicus.eu/api-how-to) for more instructions on how to set this up.

Assuming you have configured the CDS API key correctly, you will be able to download using the following IceNet class. Since the key is personal and should not be shared, the call to download an example dataset is shown below, but not used in this demonstrator notebook.

```python
era5 = ERA5Downloader(
    var_names=["tas", "zg", "uas", "vas"],      # Name of variables to download
    dates=[                                     # Dates to download the variable data for
        pd.to_datetime(date).date()
        for date in pd.date_range("2020-01-01", "2020-04-31", freq="D")
    ],
    path=data_dir,                              # Location to download data to (default is `./data`)
    delete_tempfiles=True,                      # Whether to delete temporary downloaded files
    levels=[None, [250, 500], None, None],      # The levels at which to obtain the variables for (e.g. for zg, it is the pressure levels)
    max_threads=4,                              # Maximum number of concurrent downloads
    north=False,                                # Boolean: Whether require data across northern hemisphere
    south=True,                                 # Boolean: Whether require data across southern hemisphere
    use_toolbox=False)                          # Experimental, alternative download method
era5.download()                                 # Start downloading
```

The `ERA5Downloader` inherits from `ClimateDownloader`, from which several implementations derive their functionality. Two particularly useful methods shown below allow the downloaded data to be converted to the same grid and orientation as the OSISAF sea-ice concentration (SIC) data in the next cell.

```python
era5.regrid()                                   # Map data onto common EASE2 grid
era5.rotate_wind_data()                         # Rotate wind data to correct orientation
```

## Sea-ice concentration (SIC) data

The sea-ice concentration data use for training is obtained from [OSI SAF](https://osi-saf.eumetsat.int/products/sea-ice-products).

The SIC is defined as the fraction of a grid cell that is covered in sea-ice.

You will notice a familiar interface as with the `ERA5Downloader` class with the `SICDownloader` class.

In [5]:
sic = SICDownloader(
    dates=[
        pd.to_datetime(date).date()             # Dates to download the variable data for
        for date in pd.date_range("2020-01-01", "2020-04-7", freq="D")
    ],
    delete_tempfiles=True, # Whether to delete temporary downloaded files
    north=False,           # Boolean: Whether to use mask for this region
    south=True,            # Boolean: Whether to use mask for this region
    parallel_opens=True,   # Boolean: Whether to use `dask.delayed` to open and preprocess multiple files in parallel
)

sic.download()

INFO:root:Downloading SIC datafiles to .temp intermediates...
INFO:root:Excluding 92 dates already existing from 98 dates requested.
INFO:root:FTP opening
INFO:root:Existing file needs concatenating: ./data/osisaf/south/siconca/2020.nc -> ./data/osisaf/south/siconca/old.2020.nc
INFO:root:Saving ./data/osisaf/south/siconca/2020.nc
INFO:root:Opening for interpolation: ['./data/osisaf/south/siconca/2020.nc']
INFO:root:Processing 0 missing dates


___
# Process

Similarly to the downloaders, each data producer (be it a `Downloader` or `Generator`) has a respective `Processor` that converts the `./notebook/data/` products into a normalised, preprocessed dataset under `./notebook/processed/`.

Firstly, to make life a bit easier, we set up some variables. In this case we're creating a train/validate/test split out of the 2020 data in a fairly naive manner.

In [6]:
processing_dates = dict(
    train=[pd.to_datetime(el) for el in pd.date_range("2020-01-01", "2020-03-06")],
    val=[pd.to_datetime(el) for el in pd.date_range("2020-03-11", "2020-03-31")],
    test=[pd.to_datetime(el) for el in pd.date_range("2020-03-08", "2020-03-09")],
)
processed_name = "notebook_api_data"

Next, we create the data producer and configure them for the dataset we want to create.

These modules import the Processing modules for the downloaded data.

In [7]:
from icenet.data.processors.era5 import IceNetERA5PreProcessor
from icenet.data.processors.meta import IceNetMetaPreProcessor
from icenet.data.processors.osi import IceNetOSIPreProcessor

In [8]:
osi = IceNetOSIPreProcessor(
    ["siconca"],                # Absolute normalised variables
    [],                         # Variables defined as deviations from an aggregated norm
    processed_name,
    processing_dates["train"],
    processing_dates["val"],
    processing_dates["test"],
    linear_trends=tuple(),
    north=False,
    south=True,
)

meta = IceNetMetaPreProcessor(
    processed_name,
    north=False,
    south=True,
)

This demonstrator does not use the ERA5 climate reanalysis data as mentioned above since an private API key should be set up per user to access the CDS API. However, if it was set up, the ERA5 data can also be preprocessed by:

```python
pp = IceNetERA5PreProcessor(
    ["uas", "vas"],             # Absolute normalised variables
    ["tas", "zg500", "zg250"],  # Variables defined as deviations from an aggregated norm
    processed_name,
    processing_dates["train"],
    processing_dates["val"],
    processing_dates["test"],
    linear_trends=tuple(),
    north=False,
    south=True
)
```

Next, we initialise the data processors using `init_source_data` which scans the data source directories to understand what data is available for processing based on the parameters. Since we named the processed data `"notebook_api_data"` above, it will create a data loader config file, `loader.notebook_api_data.json`, in the current directory.

In [9]:
osi.init_source_data(
    lag_days=1,
)
osi.process()

meta.process()

INFO:root:Processing 66 dates for train category
INFO:root:Including lag of 1 days
INFO:root:Including lead of 93 days
INFO:root:No data found for 2019-12-31, outside data boundary perhaps?
INFO:root:Processing 21 dates for val category
INFO:root:Including lag of 1 days
INFO:root:Including lead of 93 days
INFO:root:Processing 2 dates for test category
INFO:root:Including lag of 1 days
INFO:root:Including lead of 93 days
INFO:root:Got 1 files for siconca
INFO:root:Opening files for siconca
INFO:root:Filtered to 98 units long based on configuration requirements
INFO:root:No normalisation for siconca
INFO:root:Loading configuration ./loader.notebook_api_data.json
INFO:root:Writing configuration to ./loader.notebook_api_data.json
INFO:root:Loading configuration ./loader.notebook_api_data.json
INFO:root:Writing configuration to ./loader.notebook_api_data.json


At this point the preprocessed data is ready to convert or create a configuration for the network dataset.

### Dataset creation

Now, we can create a dataset configuration for training the network.This can include cached data for the network in the format of a TFRecordDataset compatible set of tfrecords. To achieve this we create the `IceNetDataLoader`, which can both generate `IceNetDataSet` configurations (which easily provide the necessary functionality for training and prediction) as well as individual data samples for direct usage.

In [10]:
from icenet.data.loaders import IceNetDataLoaderFactory

implementation = "dask"
loader_config = "loader.notebook_api_data.json"
dataset_name = "api_dataset"
lag = 1

dl = IceNetDataLoaderFactory().create_data_loader(
    implementation,
    loader_config,
    dataset_name,
    lag,
    n_forecast_days=7,
    north=False,
    south=True,
    output_batch_size=4,
    generate_workers=2)

INFO:root:Loading configuration loader.notebook_api_data.json


We can see the loader config contains information about the data sources included and also the different dates to use for the training, validation and test sets:

In [11]:
dl._config

{'sources': {'osisaf': {'name': 'notebook_api_data',
   'implementation': 'IceNetOSIPreProcessor',
   'anom': [],
   'abs': ['siconca'],
   'dates': {'train': ['2020_01_01',
     '2020_01_02',
     '2020_01_03',
     '2020_01_04',
     '2020_01_05',
     '2020_01_06',
     '2020_01_07',
     '2020_01_08',
     '2020_01_09',
     '2020_01_10',
     '2020_01_11',
     '2020_01_12',
     '2020_01_13',
     '2020_01_14',
     '2020_01_15',
     '2020_01_16',
     '2020_01_17',
     '2020_01_18',
     '2020_01_19',
     '2020_01_20',
     '2020_01_21',
     '2020_01_22',
     '2020_01_23',
     '2020_01_24',
     '2020_01_25',
     '2020_01_26',
     '2020_01_27',
     '2020_01_28',
     '2020_01_29',
     '2020_01_30',
     '2020_01_31',
     '2020_02_01',
     '2020_02_02',
     '2020_02_03',
     '2020_02_04',
     '2020_02_05',
     '2020_02_06',
     '2020_02_07',
     '2020_02_08',
     '2020_02_09',
     '2020_02_10',
     '2020_02_11',
     '2020_02_12',
     '2020_02_13',
     '202

At this point we can either use `generate` or `write_dataset_config_only` to produce a ready-to-go `IceNetDataSet` configuration. Both of these will generate a dataset config, `dataset_config.api_dataset.json` (recall we set the dataset name as `api_dataset` above).

In [12]:
dl.generate()

INFO:distributed.http.proxy:To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
INFO:distributed.scheduler:State start
INFO:distributed.diskutils:Found stale lock file and directory '/tmp/dask-scratch-space/scheduler-j7wfdrl3', purging
INFO:distributed.scheduler:  Scheduler at:     tcp://127.0.0.1:32927
INFO:distributed.scheduler:  dashboard at:  http://127.0.0.1:46714/status
INFO:distributed.scheduler:Registering Worker plugin shuffle
INFO:distributed.nanny:        Start Nanny at: 'tcp://127.0.0.1:37166'
INFO:distributed.nanny:        Start Nanny at: 'tcp://127.0.0.1:45192'
INFO:distributed.scheduler:Register worker <WorkerState 'tcp://127.0.0.1:46161', name: 0, status: init, memory: 0, processing: 0>
INFO:distributed.scheduler:Starting worker compute stream, tcp://127.0.0.1:46161
INFO:distributed.core:Starting established connection to tcp://127.0.0.1:46024
INFO:distributed.scheduler:Register worker <WorkerState '

To generate samples from this dataset, we can use the `.generate_sample()` method, which returns the inputs `x`, `y` and sample weights `sw`:

In [13]:
x, y, sw = dl.generate_sample(pd.Timestamp("2020-03-08"))

In [14]:
print(f"type(x): {type(x)}, x.shape: {x.shape}")
print(f"type(y): {type(y)}, y.shape: {y.shape}")
print(f"type(sw): {type(sw)}, sw.shape: {sw.shape}")

type(x): <class 'numpy.ndarray'>, x.shape: (432, 432, 4)
type(y): <class 'numpy.ndarray'>, y.shape: (432, 432, 7, 1)
type(sw): <class 'numpy.ndarray'>, sw.shape: (432, 432, 7, 1)


___
# Train

For single runs we programmatically can call the same method used by the CLI. `train_model` defines the training process from start to finish. The [`model-ensembler`](https://github.com/JimCircadian/model-ensembler) works outside the API, controlling multiple CLI submissions. Customising an ensemble can be achieved through looking at the configuration in [the pipeline repository](https://github.com/antarctica/IceNet-Pipeline). That said, if workflow system integration (e.g. Airflow) is desired, integrating via this method is the way to go.

In [15]:
from icenet.data.dataset import IceNetDataSet

dataset_config = f"dataset_config.{dataset_name}.json"
dataset = IceNetDataSet(dataset_config, batch_size=4)
strategy = tf.distribute.get_strategy()

INFO:root:Loading configuration dataset_config.api_dataset.json
INFO:root:Training dataset path: ./network_datasets/api_dataset/south/train
INFO:root:Validation dataset path: ./network_datasets/api_dataset/south/val
INFO:root:Test dataset path: ./network_datasets/api_dataset/south/test


In [16]:
dataset._config

{'identifier': 'api_dataset',
 'implementation': 'DaskMultiWorkerLoader',
 'channels': ['siconca_abs_1', 'cos_1', 'land_1', 'sin_1'],
 'counts': {'train': 66, 'val': 21, 'test': 2},
 'dtype': 'float32',
 'loader_config': '/data/hpcdata/users/bryald/git/turing/icenet-edsbook/loader.notebook_api_data.json',
 'missing_dates': [],
 'n_forecast_days': 7,
 'north': False,
 'num_channels': 4,
 'shape': [432, 432],
 'south': True,
 'dataset_path': './network_datasets/api_dataset',
 'generate_workers': 2,
 'loss_weight_days': True,
 'output_batch_size': 4,
 'var_lag': 1,
 'var_lag_override': {}}

You can obtain the data loader that was used to create the dataset config via the `.get_data_loader()` method:

In [17]:
dataset.get_data_loader()

INFO:root:Loading configuration /data/hpcdata/users/bryald/git/turing/icenet-edsbook/loader.notebook_api_data.json


<icenet.data.loaders.dask.DaskMultiWorkerLoader at 0x7f8346eb1050>

We can use `train_model` function to train.

In [18]:
from icenet.model.train import train_model

run_name = "api_test_run"
seed = 42

trained_path, history = train_model(
    run_name="api_test_run",
    dataset=dataset,
    epochs=10,
    n_filters_factor=0.3,
    seed=seed,
    strategy=strategy,
    training_verbosity=2,
)

INFO:root:Creating network folder: ./results/networks/api_test_run
INFO:root:Adding tensorboard callback


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, 432, 432, 4)]        0         []                            
                                                                                                  
 conv2d (Conv2D)             (None, 432, 432, 19)         703       ['input_1[0][0]']             
                                                                                                  
 conv2d_1 (Conv2D)           (None, 432, 432, 19)         3268      ['conv2d[0][0]']              
                                                                                                  
 batch_normalization (Batch  (None, 432, 432, 19)         76        ['conv2d_1[0][0]']            
 Normalization)                                                                               

INFO:root:Datasets: 17 train, 6 val and 1 test filenames
INFO:root:Reducing datasets to 1.0 of total files
INFO:root:Reduced: 17 train, 6 val and 1 test filenames
INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 1/10

Epoch 1: val_rmse improved from inf to 42.91624, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 24s - loss: 381.2150 - binacc: 20.0991 - mae: 43.0826 - rmse: 45.7920 - mse: 2389.5820 - val_loss: 334.8374 - val_binacc: 31.8200 - val_mae: 40.7788 - val_rmse: 42.9162 - val_mse: 2205.0591 - lr: 1.0000e-04 - 24s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 2/10

Epoch 2: val_rmse improved from 42.91624 to 41.52808, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 18s - loss: 296.4303 - binacc: 35.1847 - mae: 34.7180 - rmse: 40.3800 - mse: 2084.1836 - val_loss: 313.5265 - val_binacc: 31.8206 - val_mae: 39.4137 - val_rmse: 41.5281 - val_mse: 2141.2537 - lr: 1.0000e-04 - 18s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 3/10

Epoch 3: val_rmse improved from 41.52808 to 38.95157, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 17s - loss: 236.1483 - binacc: 33.2971 - mae: 31.2712 - rmse: 36.0410 - mse: 1891.0059 - val_loss: 275.8294 - val_binacc: 31.8497 - val_mae: 36.6451 - val_rmse: 38.9516 - val_mse: 2134.0237 - lr: 1.0000e-04 - 17s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 4/10

Epoch 4: val_rmse improved from 38.95157 to 35.11137, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 17s - loss: 156.9327 - binacc: 42.4368 - mae: 24.5099 - rmse: 29.3806 - mse: 1725.0334 - val_loss: 224.1228 - val_binacc: 39.9531 - val_mae: 31.9187 - val_rmse: 35.1114 - val_mse: 2318.2991 - lr: 1.0000e-04 - 17s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 5/10

Epoch 5: val_rmse improved from 35.11137 to 31.01671, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 17s - loss: 91.7016 - binacc: 58.6492 - mae: 17.3086 - rmse: 22.4591 - mse: 1577.4656 - val_loss: 174.8969 - val_binacc: 47.1699 - val_mae: 26.9526 - val_rmse: 31.0167 - val_mse: 2398.3252 - lr: 1.0000e-04 - 17s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 6/10

Epoch 6: val_rmse improved from 31.01671 to 28.99465, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 18s - loss: 54.0500 - binacc: 73.6725 - mae: 11.9489 - rmse: 17.2426 - mse: 1494.6504 - val_loss: 152.8362 - val_binacc: 51.4039 - val_mae: 24.3532 - val_rmse: 28.9946 - val_mse: 2482.8401 - lr: 1.0000e-04 - 18s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 7/10

Epoch 7: val_rmse improved from 28.99465 to 28.83835, saving model to ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5
17/17 - 18s - loss: 37.2584 - binacc: 91.0525 - mae: 8.9673 - rmse: 14.3158 - mse: 1522.3521 - val_loss: 151.1930 - val_binacc: 53.1953 - val_mae: 23.6947 - val_rmse: 28.8384 - val_mse: 2638.8867 - lr: 1.0000e-04 - 18s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 8/10

Epoch 8: val_rmse did not improve from 28.83835
17/17 - 17s - loss: 29.8080 - binacc: 93.6963 - mae: 7.4060 - rmse: 12.8047 - mse: 1588.6393 - val_loss: 161.5891 - val_binacc: 53.0573 - val_mae: 24.4566 - val_rmse: 29.8133 - val_mse: 2734.4316 - lr: 1.0000e-04 - 17s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 9/10

Epoch 9: val_rmse did not improve from 28.83835
17/17 - 17s - loss: 25.8362 - binacc: 94.4058 - mae: 6.4922 - rmse: 11.9212 - mse: 1666.8054 - val_loss: 172.9691 - val_binacc: 53.2802 - val_mae: 25.3236 - val_rmse: 30.8453 - val_mse: 2782.1943 - lr: 1.0000e-04 - 17s/epoch - 1s/step


INFO:root:
Setting learning rate to: 9.999999747378752e-05



Epoch 10/10

Epoch 10: val_rmse did not improve from 28.83835
17/17 - 16s - loss: 23.3059 - binacc: 94.8245 - mae: 5.9408 - rmse: 11.3224 - mse: 1730.7455 - val_loss: 175.7175 - val_binacc: 53.2063 - val_mae: 25.6026 - val_rmse: 31.0894 - val_mse: 2765.2197 - lr: 1.0000e-04 - 16s/epoch - 954ms/step


INFO:root:Saving network to: ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5


INFO:tensorflow:Assets written to: ./results/networks/api_test_run/api_test_run.model_api_dataset.42/assets


INFO:tensorflow:Assets written to: ./results/networks/api_test_run/api_test_run.model_api_dataset.42/assets


As can be seen the training workflow is very standard for deep learning networks, with `train_model` wrapping up the training process with a lot of customisation of extraneous functionality.

For a higher level of customisation programmatically, the training function can be split apart.

___
# Predict

In much the same manner as with `train_model`, the `predict_forecast` method acts as a convenient entry point workflow system integration, CLI entry as well as an overridable method upon which to base custom implementations. Using the method directly relies on loading from a prepared (but perhaps not cached) dataset.

Some parameters are fed to `predict_forecast` that ideally shouldn't need to be specified (like `seed` and `n_filters_factor`) and might seem contextually odd. They're used to locate the appropriate saved network. *This will be cleaned up in a future version*.

In [19]:
from icenet.model.predict import predict_forecast

# Follows the naming convention used by the CLI version
output_dir = os.path.join(".", "results", "predict",
                          "custom_run_forecast",
                          "{}.{}".format(run_name, "42"))

predict_forecast(
    dataset_config=dataset_config,
    network_name=run_name,
    n_filters_factor=0.3,
    output_folder=output_dir,
    seed=seed,
    start_dates=[pd.to_datetime(el).date()
                 for el in pd.date_range("2020-03-08", "2020-03-09")],
    test_set=True,
)

INFO:root:Loading configuration dataset_config.api_dataset.json
INFO:root:Training dataset path: ./network_datasets/api_dataset/south/train
INFO:root:Validation dataset path: ./network_datasets/api_dataset/south/val
INFO:root:Test dataset path: ./network_datasets/api_dataset/south/test
INFO:root:Loading configuration /data/hpcdata/users/bryald/git/turing/icenet-edsbook/loader.notebook_api_data.json
INFO:root:Loading model from ./results/networks/api_test_run/api_test_run.network_api_dataset.42.h5...
INFO:root:Datasets: 34 train, 12 val and 2 test filenames
INFO:root:Processing test batch 1, item 0 (date 2020-03-08)
INFO:root:Running prediction 2020-03-08
INFO:root:Saving 2020-03-08 - forecast output (1, 432, 432, 7)
INFO:root:Processing test batch 1, item 1 (date 2020-03-09)
INFO:root:Running prediction 2020-03-09
INFO:root:Saving 2020-03-09 - forecast output (1, 432, 432, 7)


___
# Visualisation

In [23]:
!printf "2020-03-08\n2020-03-09" | tee predict_dates.csv

2020-03-08
2020-03-09

In [24]:
!icenet_output -m -o ./results/predict custom_run_forecast api_dataset predict_dates.csv

[14-03-24 17:15:49 :INFO    ] - Loading configuration ./dataset_config.api_dataset.json
[14-03-24 17:15:49 :INFO    ] - Training dataset path: ./network_datasets/api_dataset/south/train
[14-03-24 17:15:49 :INFO    ] - Validation dataset path: ./network_datasets/api_dataset/south/val
[14-03-24 17:15:49 :INFO    ] - Test dataset path: ./network_datasets/api_dataset/south/test
  cube = iris.load_cube(path, 'sea_ice_area_fraction')
  cube = iris.load_cube(path, 'sea_ice_area_fraction')
  cube = iris.load_cube(path, 'sea_ice_area_fraction')
  cube = iris.load_cube(path, 'sea_ice_area_fraction')
[14-03-24 17:15:51 :INFO    ] - Post-processing 2020-03-08
[14-03-24 17:15:51 :INFO    ] - Post-processing 2020-03-09
[14-03-24 17:15:51 :INFO    ] - Dataset arr shape: (2, 432, 432, 7, 2)
[14-03-24 17:15:51 :INFO    ] - Applying active grid cell masks
[14-03-24 17:15:51 :INFO    ] - Land masking the forecast output
[14-03-24 17:15:51 :INFO    ] - Applying zeros to land mask
[14-03-24 17:15:51 :INFO 

In [25]:
from icenet.plotting.video import xarray_to_video as xvid
from icenet.data.sic.mask import Masks
from IPython.display import HTML
import xarray as xr, pandas as pd, datetime as dt

# Load our output prediction file
ds = xr.open_dataset("results/predict/custom_run_forecast.nc")
land_mask = Masks(south=True, north=False).get_land_mask()
ds.info()

xarray.Dataset {
dimensions:
	time = 2 ;
	yc = 432 ;
	xc = 432 ;
	leadtime = 7 ;

variables:
	int32 Lambert_Azimuthal_Grid() ;
		Lambert_Azimuthal_Grid:grid_mapping_name = lambert_azimuthal_equal_area ;
		Lambert_Azimuthal_Grid:longitude_of_projection_origin = 0.0 ;
		Lambert_Azimuthal_Grid:latitude_of_projection_origin = -90.0 ;
		Lambert_Azimuthal_Grid:false_easting = 0.0 ;
		Lambert_Azimuthal_Grid:false_northing = 0.0 ;
		Lambert_Azimuthal_Grid:semi_major_axis = 6378137.0 ;
		Lambert_Azimuthal_Grid:inverse_flattening = 298.257223563 ;
		Lambert_Azimuthal_Grid:proj4_string = +proj=laea +lon_0=0 +datum=WGS84 +ellps=WGS84 +lat_0=-90.0 ;
	float32 sic_mean(time, yc, xc, leadtime) ;
		sic_mean:long_name = mean sea ice area fraction across ensemble runs of icenet model ;
		sic_mean:standard_name = sea_ice_area_fraction ;
		sic_mean:short_name = sic ;
		sic_mean:valid_min = 0 ;
		sic_mean:valid_max = 1 ;
		sic_mean:ancillary_variables = sic_stddev ;
		sic_mean:grid_mapping = Lambert_Azimuth

In [26]:
# Get the forecast start date
forecast_date = ds.time.values[0]
print(forecast_date)

2020-03-08T00:00:00.000000000


In [27]:
fc = ds.sic_mean.isel(time=0).drop_vars("time").rename(dict(leadtime="time"))
fc['time'] = [pd.to_datetime(forecast_date) \
              + dt.timedelta(days=int(e)) for e in fc.time.values]

anim = xvid(fc, 15, figsize=4, mask=land_mask)
HTML(anim.to_jshtml())

INFO:root:Inspecting data
INFO:root:Initialising plot
INFO:root:Animating
INFO:root:Not saving plot, will return animation
INFO:matplotlib.animation:Animation.save using <class 'matplotlib.animation.HTMLWriter'>


## Summary

This notebook has demonstrated the use of the IceNet library for 

* Sentence 1 e.g. `tool-name` to perform...
* Sentence 2 e.g. `tool-name` to perform...

## Additional information
**Dataset**: Type here details of dataset(s) version.

**Codebase**: Type here details of codebase version (only for notebooks categorised under modelling/preprocesing/post-processing themes).

**License**: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details [here](https://github.com/alan-turing-institute/environmental-ds-book/blob/master/LICENSE.md).

**Contact**: If you have any suggestion or report an issue with this notebook, feel free to [create an issue](https://github.com/alan-turing-institute/environmental-ds-book/issues/new/choose) or send a direct message to [environmental.ds.book@gmail.com](mailto:environmental.ds.book@gmail.com).

In [28]:
import icenet
icenet_version = icenet.__version__
print(f'IceNet version: {icenet_version}')

IceNet version: 0.2.7


In [29]:
from datetime import date
print(f'Last tested: {date.today()}')

Last tested: 2024-03-14
