# Forecasting Weather using FourCastNet

In this notebook, we will introduce you to the FourCastNet architecture and build a modified version of FourCastNet, train it and validate it. FourCastNet is built using the Adaptive Fourier Neural Operators was introduced in the [Solving the Darcy-Flow using AFNO notebook](../Operators/Darcy_Flow_using_Adaptive_Fourier_Neural_Operators.ipynb)

#### Contents of the Notebook

- [FourCastNet - An overview of the Architecture](#FourCastNet---An-overview-of-the-Architecture)
- [Forecasting weather using FourCastNet](#Forecasting-weather-using-FourCastNet)
    - [Problem Description](#Problem-Description)
        - [A brief on the ERA5 Reanalysis Dataset:](#A-brief-on-the-ERA5-Reanalysis-Dataset:)
    - [Step 1: Loading the Data](#Step-1:-Loading-the-Data)
    - [Step 2: Creating the FourCastNet Model](#Step-2:-Creating-the-FourCastNet-Model)
    - [Step 3: Creating the domain and adding Constraints](#Step-3:-Creating-the-domain-and-adding-Constraints)
    - [Step 4: Adding Validators](#Step-4:-Adding-Validators)
    - [Step 5: Hydra Configuration](#Step-5:-Hydra-Configuration)
    - [Step 6: Solver and Training the model](#Step-6:-Solver-and-Training-the-model)
    - [Visualising the solution](#Visualising-the-solution)

#### Learning Outcomes
- How to load the ERA5 dataset into Modulus
- How to define the FourCastNet architecture in Modulus
- How to train FourCastNet
- How to generate weather forecasts and quantitatively assess performance


## FourCastNet - An overview of the Architecture

FourCastNet uses Adaptive Fourier Neural Operator (AFNO) model. This particular neural network architecture is appealing as it is specifically designed for high-resolution inputs and synthesizes several key recent advances in Deep Learning into one model. Namely, it combines the Fourier Neural Operator (FNO) learning approach of <a href="https://arxiv.org/abs/2010.08895" rel="nofollow">Li et al. [2021a]</a>, which has been shown to perform well in modelling challenging PDE systems, with a powerful ViT backbone.
Make it bigger 
<center><img src="images/fcn_arch.webp" alt="Drawing" /></center>

The AFNO model is unique in that it frames the mixing operation as continuous global convolution, implemented efficiently in the Fourier domain with FFTs, which allows modelling dependencies across spatial and channel dimensions flexibly and scalably. With such a design, the spatial mixing complexity is reduced to $O(N log N)$, where $N$ is the number of image patches or tokens. This scaling allows the AFNO model to be well-suited to high-resolution data at the current 0.25◦ resolution considered in this paper as well as potential future work at an even higher resolution. In the original FNO formulation, the operator learning approach showed impressive results in solving turbulent Navier-Stokes systems, so incorporating this into a data-driven atmospheric model is a natural choice.
In the model here. First, the input variables on the 720 × 1440 latitude-longitude grid are projected to a 2D grid (h × w) of patches (with a small patch size p × p, where e.g., p = 8), with each patch represented as a d-dimensional token. Then, the sequence of patches are fed, along with a positional encoding, to a series of AFNO layers. Each layer, given an input tensor of patches  $h×w×d$ , performs spatial mixing followed by channel mixing. Spatial mixing happens in the Fourier domain as follows: 
<strong>Step 1</strong> : Transform tokens to the Fourier domain with 
$$z_{m,n} = [DFT(X)]_{m,n} $$
where $m, n$ index the patch location and DFT denotes a 2D discrete Fourier transform.
<strong>Step 2</strong> : Apply token weighting in the Fourier domain, and promote sparsity with a Soft-Thresholding and Shrinkage operation as
$$\tilde{z}_{m,n} = S_λ(MLP(z_{m,n}))$$
where $S_λ(x) = sign(x) max(|x| − λ, 0)$ with the sparsity controlling parameter $λ$, and MLP() is a 2-layer
multi-layer perceptron with block-diagonal weight matrices, which are shared across all patches.
<strong>Step 3</strong> : Inverse Fourier to transform back to the patch domain and add a residual connection as
$$y_{m,n} = [IDFT(\tilde{Z})]_{m,n} + X_{m,n} $$


## Forecasting weather using FourCastNet

### Problem Description

<strong>FourCastNet</strong>, short for Fourier ForeCasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium range global predictions at 0.25° resolution. FourCastNet generates a week long forecast in less than 2 seconds, orders of magnitude faster than the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, with comparable or better accuracy. It is trained on a small subset of the ERA5 reanalysis dataset 2 from the ECMWF, which consists of hourly estimates of several atmospheric variables at a latitude and longitude resolution of 0.25°. Given an initial condition from the ERA5 dataset as input, FourCastNet recursively applies an Adaptive Fourier Neural Operator (AFNO) network to predict their dynamics at later time steps. In the current iteration, FourCastNet forecasts 20 atmospheric variables. These variables, listed in the table below, are sampled from the ERA5 dataset at a temporal resolution of 6 hours.
<center><img src="images/fcn_table.png" alt="Drawing" style="width:600px" /></center>


The goal of FourCastNet is to forecast modelled variables on a short time scale of up to 10 days. FourCastNet is initialized using an initial condition from the ERA5 reanalysis dataset.

To train FourCastNet, we use the ERA5 dataset, the completely trained model was trained over the years 1979 to 2015, but in our case, we will restrict it to 6 months of training data. When testing its performance, we use ERA5 data from 2017 that was not included in the training. Please see the original paper for a description of the 20 variables used and the preprocessing applied to the ERA5 dataset; they are specifically chosen to model important processes that influence low-level winds and precipitation.
<strong>Note:</strong> In this notebook we will walk through the contents of <a href="../../source_code/fourcastnet/fourcastnet.py" rel="nofollow"><code>fourcastnet.py</code></a> script. 


#### A brief on the ERA5 Reanalysis Dataset: 

ERA5 is a global atmospheric reanalysis dataset produced by the [European Centre for Medium-Range Weather Forecasts (ECMWF)](https://www.ecmwf.int/). It provides comprehensive and high-quality information on various atmospheric variables, such as temperature, wind, pressure, and humidity, at various vertical levels and at a high spatial resolution of 31 km.

Reanalysis refers to a technique in meteorology and climatology where historical observations, such as surface and satellite measurements, weather balloon data, and other sources of information, are combined with a numerical weather model to produce a consistent and continuous record of the state of the atmosphere over time. Reanalysis datasets are useful for studying climate variability and change, as well as for providing input data for weather and climate models.

The ERA5 reanalysis dataset provides hourly data from 1979 to present, and it is widely used in climate research, weather forecasting, and environmental studies. The Dataset has undergone several improvements compared to previous versions of the ERA series, such as higher spatial resolution, improved data assimilation methods, and inclusion of new observations, resulting in more accurate and reliable data.

Please refer to the following [link](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=form) to understand all the variables that are made available as part of the Dataset, but we will be restricting ourselves to use only a subset of the Dataset (20 variables) as mentioned in the table above. 

### Step 1: Loading the Data

We load the ERA5 data into Modulus by defining a custom `modulus.dataset.Dataset` inside of [`fourcastnet/src/dataset.py`](../../source_code/fourcastnet/src/dataset.py)

For our training script, the ERA5 datasets are initialized using the following:

```python
import modulus

from modulus.sym.hydra.config import ModulusConfig
from modulus.sym.key import Key
from modulus.sym.domain import Domain
from modulus.sym.domain.constraint import SupervisedGridConstraint
from modulus.sym.domain.validator import GridValidator
from modulus.sym.solver import Solver
from modulus.sym.utils.io import GridValidatorPlotter

from src.dataset import ERA5HDF5GridDataset
from src.fourcastnet import FourcastNetArch
from src.loss import LpLoss


@modulus.sym.main(config_path="conf", config_name="config_FCN")
def run(cfg: ModulusConfig) -> None:

    # load training and test data
    channels = list(range(cfg.custom.n_channels))
    train_dataset = ERA5HDF5GridDataset(
        cfg.custom.training_data_path,
        chans=channels,
        tstep=cfg.custom.tstep,
        n_tsteps=cfg.custom.n_tsteps,
        patch_size=cfg.arch.afno.patch_size,
    )
    test_dataset = ERA5HDF5GridDataset(
        cfg.custom.test_data_path,
        chans=channels,
        tstep=cfg.custom.tstep,
        n_tsteps=cfg.custom.n_tsteps,
        patch_size=cfg.arch.afno.patch_size,
        n_samples_per_year=20,
    )
```

### Step 2: Creating the FourCastNet Model

Next, we need to define FourCastNet as a custom Modulus architecture. This model is found inside [`fourcastnet/src/fourcastnet.py`](../../source_code/fourcastnet/src/fourcastnet.py) which is a wrapper class of Modulus’ AFNO model. FourCastNet has two training phases: the first is single step prediction and the second is two step predictions. This small wrapper allows AFNO to be executed for any `n_tsteps` of time steps using autoregressive forward passes.

We can then Instantiate the model as follows:

```python
    # define input/output keys
    input_keys = [Key(k, size=train_dataset.nchans) for k in train_dataset.invar_keys]
    output_keys = [Key(k, size=train_dataset.nchans) for k in train_dataset.outvar_keys]

    # make list of nodes to unroll graph on
    model = FourcastNetArch(
        input_keys=input_keys,
        output_keys=output_keys,
        img_shape=test_dataset.img_shape,
        patch_size=cfg.arch.afno.patch_size,
        embed_dim=cfg.arch.afno.embed_dim,
        depth=cfg.arch.afno.depth,
        num_blocks=cfg.arch.afno.num_blocks,
    )
    nodes = [model.make_node(name="FCN")]
```

### Step 3: Creating the domain and adding Constraints 

With the custom dataset for loading the ERA5 data and the FourCastNet model created, the next step is setting up the Modulus training domain. A standard data-driven grid constraint is created:

```python
    # make domain
    domain = Domain()

    # add constraints to domain
    supervised = SupervisedGridConstraint(
        nodes=nodes,
        dataset=train_dataset,
        batch_size=cfg.batch_size.grid,
        loss=LpLoss(),
        num_workers=cfg.custom.num_workers.grid,
    )
    domain.add_constraint(supervised, "supervised")
```

### Step 4: Adding Validators


We can now proceed and add the Validators in the same fashion as in the previous notebook.

```python
    # add validator
    val = GridValidator(
        nodes,
        dataset=test_dataset,
        batch_size=cfg.batch_size.validation,
        plotter=GridValidatorPlotter(n_examples=5),
        num_workers=cfg.custom.num_workers.validation,
    )
    domain.add_validator(val, "test")
```

### Step 5: Hydra Configuration

The configuration is similar to the configuration used in AFNO architecture. In addition, we have added the `custom.tstep` and `custom.n_tsteps` parameters which define the time delta between the AFNO’s input and output time steps (in multiples of 6 hours, typically set to 1) and the number of time steps FourCastNet is unrolled over during training. The contents of the [`config_FCN.yaml`](../../source_code/fourcastnet/conf/config_fcn.yaml) are shown below. 

```yaml
defaults :
  - modulus_default
  - arch:
      - afno
  - scheduler: cosine_annealing
  - optimizer: adam
  - loss: sum
  - _self_

arch:
  afno:
    patch_size: 8
    embed_dim: 512
    depth: 10
    num_blocks: 8

optimizer:
  lr: 0.0005

scheduler:
  T_max: 80000

custom:
  n_channels: 20
  tstep: 1
  n_tsteps: 1
  training_data_path: "/workspace/python/source_code/fourcastnet/data/train" # Training dataset path here
  test_data_path:     "/workspace/python/source_code/fourcastnet/data/test" # Test dataset path here
  num_workers:
    grid: 4
    validation: 4
  tag:

batch_size:
  grid: 1
  validation: 1

training:
  amp: true
  rec_constraint_freq: 10000
  rec_results_freq : 1000
  save_network_freq: 1000
  print_stats_freq: 100
  summary_freq: 1000
  max_steps : 71000 
```

### Step 6: Solver and Training the model

Once the domain and the configuration is set up, the `Solver` can be defined, and the training can be started as seen in earlier notebooks. 

```python
    # make solver
    slv = Solver(cfg, domain)

    # start solver
    slv.solve()


if __name__ == "__main__":
    run()
```

Before we can start training, we can make use of Tensorboard for visualizing the loss values and convergence of several other monitors we just created. This can be done inside the Jupyter framework by selecting the directory in which the checkpoint will be stored by clicking on the small checkbox next to it. The option to launch a Tensorboard then shows up in that directory. Once you open Tensorboard, switch between the SCALARS , IMAGES , TEXT , TIME SERIES to visualise and view Validation and other information related to Training.

For this application, please verify if you are inside the `/jupyter_notebook/FourCastNet` folder before launching Tensorboard.


<center><img src="../projectile/images/tensorboard.png" alt="Drawing" style="width:900px" /></center>

*Given the time and GPU memory constraints in this scenario, we are leveraging a pre-trained approach to optimize our model development process. This allows us to utilize an existing model that has been trained on a large dataset for a similar task, thereby reducing the time and resources needed to train the model from scratch. By adopting this approach, we aim to improve the efficiency and speed of our model development while ensuring that our performance metrics meet the desired criteria. We will just train the model for 1000 steps which will take around 5-10 minutes on a A100 GPU*

In [None]:
import os
os.environ["RANK"]="0"
os.environ["WORLD_SIZE"]="1"
os.environ["MASTER_ADDR"]="localhost"

In [None]:
!python ../../source_code/fourcastnet/fourcastnet.py

### Visualising the solution

The checkpoint directory is saved based on the results recording frequency specified in the `rec_results_freq` parameter of its derivatives. The network directory folder contains several plots of the different validation predictions, some of which are shown below. 


FourCastNet validation predictions. (Left to right) Input at($t=0$), True value at $(t=1)$ , Predicted value at $(t=1)$ and difference between True and predicted value.

<center><img src="images/test_prediction_0.png" alt="Drawing" style="width: 1200px;"/></center>
<center><img src="images/test_prediction_1.png" alt="Drawing" style="width: 1200px;"/></center>
<center><img src="images/test_prediction_2.png" alt="Drawing" style="width: 1200px;"/></center>
<center><img src="images/test_prediction_3.png" alt="Drawing" style="width: 1200px;"/></center>
<center><img src="images/test_prediction_4.png" alt="Drawing" style="width: 1200px;"/></center>

--- 

Don't forget to check out additional [Open Hackathons Resources](https://www.openhackathons.org/s/technical-resources) and join our [OpenACC and Hackathons Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.

---

# Licensing

Copyright © 2023 OpenACC-Standard.org.  This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.