# Core: Training Physics-ML models using PhysicsNemo (Data-driven workflows)

In this notebook, we will emulate a physical system governed by diffusion equation (Darcy flow) using a data-driven AI model.

#### Contents of the Notebook
- [Problem Statement](#Problem-Statement:-Developing-a-surrogate-model-for-the-Darcy-Flow-system)
- [Fourier Neural Operator](#Fourier-Neural-Operator)
- [Solving the Darcy Flow problem using FNO](#Solving-the-Darcy-Flow-Problem-using-FNO)
    - [Step 1: Load data and initialize model](#Step-1:-Load-data-and-initialize-model)
    - [Step 2: Setup optimizer and scheduler](#Step-2:-Setup-optimizer-and-scheduler)
    - [Step 3: Setup utilities to validate the model](#Step-3:-Setup-utilities-to-validate-the-model)
    - [Step 4: Setup training loop and train the model](#Step-4:-Setup-training-loop-and-train-the-model)

#### Learning Objectives
- How to use PhysicsNemo utilities to setup a data-driven training for physical systems

The data-driven approach involves using large datasets to train models and make predictions or decisions. These large datasets help neural networks learn the features present in unstructured data, and in effect "learn the physics" that governs the system. Training such models enables us to gain valuable insight about the system that is typically not possible or is computationally expensive to obtain using traditional techniques. An example of this can be predicting the transient response of the system given the initial condition - assuming the AI model is trained on enough dataset, we can assume that the model learns to capture the causality and other governing phenomena from the given dataset. The model is able to use the training to infer on an unseen initial condition and produce the outputs at much faster rate compared to the traditional techniques like solving the set of PDEs governing the system. A few other examples of this can be found in the domains of design optimization, inverse modeling, digital twin, among others.

NVIDIA PhysicsNemo enables users to setup such problems with ease. PhysicsNemo is built on top of PyTorch and has several utilities and models specifically targeted towards Physics-ML. The utilities from PhysicsNemo are interoperable with PyTorch and provide the users flexibility to use PhysicsNemo to augment their existing deep learning or simulation workflows. In the following example, we will use a Fourier Neural Operator (FNO) model from NVIDIA PhysicsNemo and see how it can be used to develop a surrogate model that learns the mapping between the permeability and the pressure field of a Darcy flow system. The mapping learnt should be true for a distribution of permeability fields and not just a single solution. A typical approach for this would be to solve the governing equation for every new permeability field. However, in this surrogate modeling approach, we will train the model on enough pairs of input permeability and output pressure field such that it "learns" the correlation between the two (underlying physics) and uses that to accurately infer on any new unseen inputs. 

## Problem Statement: Developing a surrogate model for the Darcy Flow system

We will demonstrate the use of FNO on a 2D Darcy flow problem. The Darcy PDE is a second-order, elliptic PDE as shown below. One can also think of the Darcy PDE as a diffusion equation which can be used to parameterize a variety of systems including flow through porous media, elastic materials and heat conduction. 

\begin{equation}
-\nabla \cdot \left(k(\textbf{x})\nabla u(\textbf{x})\right) = f(\textbf{x}), \quad \textbf{x} \in D,
\end{equation}

in which $u(\textbf{x})$ is the flow pressure, $k(\textbf{x})$ is the permeability field and $f(\cdot)$ is the
forcing function. 

Here you will define the domain as a 2D unit square  $D=\left\{x,y \in (0,1)\right\}$ with the boundary condition $u(\textbf{x})=0, \textbf{x}\in\partial D$. Recall that FNO requires a structured Euclidean input such that $D = \textbf{x}_{i}$ where $i \in \mathbb{N}_{N\times N}$. Thus both the permeability and flow fields are discretized into a 2D matrix $\textbf{K}, \textbf{U} \in \mathbb{R}^{N \times N}$.
This problem develops a surrogate model that learns the mapping between a permeability field and the pressure field,
$\textbf{K} \rightarrow \textbf{U}$, for a distribution of permeability fields $\textbf{K} \sim p(\textbf{K})$.
This is a key distinction compared to <em>solving the PDE</em>, you are <em>not</em> learning just a single solution but rather a <em>distribution</em>.

<center><img src="images/darcy-problem-statement.png" alt="Drawing" style="width:600px" /></center>


We can attempt to solve this problem using a variety of neural network architectures. However, the choice of neural network can be improved by introducing inductive bias into the neural network design. PhysicsNemo contains a library of such models that have strong inductive bias which is useful for modeling problems in the physics-based domain. For this problem, we will use the Fourier Neural Operator which uses Fourier Transforms at the core which blends into a few key benefits like resolution invariance which we will soon see. Before we dive into the code, we will cover the theory behind FNOs. 

### Fourier Neural Operator

Fourier neural operator (FNO) is a data-driven architecture which can be used to parameterize solutions for a distribution of PDE solutions. The key feature of FNO is the spectral convolutions: operations that place the integral kernel in Fourier space. 

<center><img src="images/fourier-layer.png" alt="Drawing" style="width:900px" /></center>

The spectral convolution (Fourier integral operator) is defined as follows:
\begin{equation}
(\mathcal{K}(\mathbf{w})\phi)(x) = \mathcal{F}^{-1}(R_{\mathbf{W}}\cdot \left(\mathcal{F}\right)\phi)(x), \quad \forall x \in D
\end{equation}
where $\mathcal{F}$ and $\mathcal{F}^{-1}$ are the forward and inverse Fourier transforms, respectively.
$R_{\mathbf{w}}$ is the transformation which contains the learnable parameters $\mathbf{w}$. Note this operator is calculated
over the entire <em>structured Euclidean</em> domain $D$ discretized with $n$ points.
Fast Fourier Transform (FFT) is used to perform the Fourier transforms efficiently, and the resulting transformation $R_{\mathbf{w}}$ is just a finite size matrix of learnable weights. Inside the spectral convolution, the Fourier coefficients are truncated to only the lower modes which in turn allows explicit control over the dimensionality of the spectral space and linear operator.
The FNO model is the composition of a fully-connected "lifting" layer, $L$ spectral convolutions with point-wise linear skip connections and a decoding point-wise fully-connected neural network at the end.
\begin{equation}
u_{net}(\Phi;\theta) = \mathcal{Q}\circ \sigma(W_{L} + \mathcal{K}_{L}) \circ ... \circ \sigma(W_{1} + \mathcal{K}_{1})\circ \mathcal{P}(\Phi), \quad \Phi=\left\{\phi(x); \forall x \in D\right\}
\end{equation}
in which $\sigma(W_{i} + \mathcal{K}_{i})$ is the spectral convolution layer $i$ with the point-wise linear transform $W_{i}$ and activation function $\sigma(\cdot)$. $\mathcal{P}$ is the point-wise lifting network that projects the input into a higher-dimensional latent space, $\mathcal{P}: \mathbb{R}^{d_in} \rightarrow \mathbb{R}^{k}$.
Similarly $\mathcal{Q}$ is the point-wise fully-connected decoding network, $\mathcal{P}: \mathbb{R}^{k} \rightarrow \mathbb{R}^{d_out}$. Since all fully-connected components of FNO are point-wise operations, the model is invariant to the dimensionality of the input.

<strong>Note:</strong> While FNO is technically invariant to the dimensionality of the discretized domain $D$, this domain <em>must</em> be a structured grid in Euclidean space. The inputs to FNO are analogous to images, but the model is invariant to the image resolution.

<center><img src="images/fno-architecture.png" alt="Drawing" style="width:1000px" /></center>

## Solving the Darcy Flow Problem using FNO

With the problem definition and the theory explained, let's dive into training a FNO model for this problem. Let's start with a few imports. Here, we will import the model and some utilities for logging and checkpointing from PhysicsNemo while the rest will be from PyTorch and other libraries. In the subsequent sections we will see more utilities from PhysicsNemo and how they can simplify the problem definition of such scenarios. 

In [None]:
import os
os.environ["RANK"]="0"
os.environ["WORLD_SIZE"]="1"
os.environ["MASTER_ADDR"]="localhost"
os.environ["MASTER_PORT"]="15678"

import hydra
from omegaconf import DictConfig
import torch
import numpy as np
import matplotlib.pyplot as plt
from hydra.utils import to_absolute_path
import torch.nn.functional as F
from torch.utils.data import DataLoader
from itertools import chain
from physicsnemo.models.fno import FNO
from physicsnemo.launch.logging import LaunchLogger
from physicsnemo.launch.utils.checkpoint import save_checkpoint
from utils import HDF5MapStyleDataset

from hydra import compose, initialize

# load the hydra config
initialize(version_base="1.3", config_path="conf")
cfg = compose(config_name="config")

### Step 1: Load data and initialize model

For a data-driven problems, the first step is to load the data to be processed for training. Here the `HDF5MapStyleDataset` is a PyTorch `Dataset` and handles the data loading and the relevant transformations. The details of the `HDF5MapStyleDataset` are not relevant for this text, and hence we won't cover it in detail. Interested users can refer the [`utils.py`](./utils.py) file for more details. This dataset is then passed to a PyTorch `DataLoader` that samples the dataset and produces a batched output. 

The model is a FNO model which is imported from PhysicsNemo and is interoperable with other PyTorch utilities like dataloaders, optimizers etc.

In [None]:
LaunchLogger.initialize()

dataset = HDF5MapStyleDataset(to_absolute_path("../../source_code/core/datasets/Darcy_241/train.hdf5"), device="cuda")
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

model_branch = FNO(
    in_channels=1,
    out_channels=1,
).to("cuda") # move the model to GPU

### Step 2: Setup optimizer and scheduler

Next, we initialize an Adam optimizer to train the weights of the model. The optimizer updates the weights of the neural network based on the loss gradient information. We will use the default Adam optimizer from PyTorch for this purpose. Typically, it is a good practice to reduce the learning rate as the model is trained longer, this helps to minimize the oscillations and improves convergence. For this example, an exponentially decaying learning rate is used to achieve this. 

In [None]:
optimizer = torch.optim.Adam(
    chain(model_branch.parameters()),
    betas=(0.9, 0.999),
    lr=cfg.start_lr,
    weight_decay=0.0,
)

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.gamma)

### Step 3: Setup utilities to validate the model

We are almost ready to train our model, but before we do that, we would like to evaluate the model on a dataset that is not part of the training to gage it's out-of-sample performance. This can be done by a validation/test dataset. We will also define a `validation_step` that computes some metrics like validation error and plots the results on these samples as a part of that process. 

In [None]:
validation_dataset = HDF5MapStyleDataset(to_absolute_path("../../source_code/core/datasets/Darcy_241/validation.hdf5"), device="cuda")
validation_dataloader = DataLoader(validation_dataset, batch_size=1, shuffle=False)

os.makedirs("./results/", exist_ok=True)

def validation_step(model, dataloader, epoch):
    """Validation Step"""
    model.eval()

    with torch.no_grad():
        loss_epoch = 0
        for data in dataloader:
            invar, outvar, _, _ = data
            out = model(invar[:, 0].unsqueeze(dim=1))

            loss_epoch += F.mse_loss(outvar, out)

        # convert data to numpy
        outvar = outvar.detach().cpu().numpy()
        predvar = out.detach().cpu().numpy()

        # plotting
        fig, ax = plt.subplots(1, 3, figsize=(25, 5))

        d_min = np.min(outvar[0, 0, ...])
        d_max = np.max(outvar[0, 0, ...])

        im = ax[0].imshow(outvar[0, 0, ...], vmin=d_min, vmax=d_max)
        plt.colorbar(im, ax=ax[0])
        im = ax[1].imshow(predvar[0, 0, ...], vmin=d_min, vmax=d_max)
        plt.colorbar(im, ax=ax[1])
        im = ax[2].imshow(np.abs(predvar[0, 0, ...] - outvar[0, 0, ...]))
        plt.colorbar(im, ax=ax[2])

        ax[0].set_title("True")
        ax[1].set_title("Pred")
        ax[2].set_title("Difference")
        
        fig.savefig(f"./results/results_{epoch}.png")
        plt.close()
        return loss_epoch / len(dataloader)

### Step 4: Setup training loop and train the model

Great, now we are ready to start training our FNO model! This process typically involves computing the loss (difference between the model prediction and truth) across the entire training dataset and repeating this process for a few epochs until a desired accuracy is reached. Here, we define the training loop as follows: we use one loop to iterate through all the training data (remember we are using a batched dataloader) and another loop to repeat the process for a few training epochs. 

The inner loop enumerates data from the DataLoader, and on each pass of the loop does the following:

1. Zeros the optimizer’s gradients
2. Gets a batch of training data from the DataLoader
3. Performs an inference - that is, gets predictions from the model for an input batch
4. Calculates the loss for that set of predictions vs. the labels on the dataset
5. Calculates the backward gradients over the learning weights
6. Tells the optimizer to perform one learning step (adjust the model’s learning weights based on the observed gradients for this batch, according to the optimization algorithm chosen)
7. Adjusts the learning rate based on the chosen scheduler scheme

This is a typical PyTorch training workflow and for more information, you may refer [here](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html).

In [None]:
for epoch in range(cfg.max_epochs):
    # wrap epoch in launch logger for console logs
    with LaunchLogger(
        "train",
        epoch=epoch,
        num_mini_batch=len(dataloader),
        epoch_alert_freq=10,
    ) as log:
        for data in dataloader:
            optimizer.zero_grad()
            truevar = data[1]
            
            # compute forward pass
            outvar = model_branch(data[0][:, 0].unsqueeze(dim=1))
            
            # compute data loss
            loss_data = F.mse_loss(outvar, truevar)
            
            # Compute total loss
            loss = loss_data
            
            # Backward pass and optimizer and learning rate update
            loss.backward()
            optimizer.step()
            scheduler.step()
            
        log.log_epoch({"Learning Rate": optimizer.param_groups[0]["lr"]})
    
    # test model on validation dataset
    with LaunchLogger("valid", epoch=epoch) as log:
        error = validation_step(model_branch, validation_dataloader, epoch)
        log.log_epoch({"Validation error": error})

    # save checkpoint
    save_checkpoint(
        "./checkpoints",
        models=[model_branch],
        optimizer=optimizer,
        scheduler=scheduler,
        epoch=epoch,
    )


You can now visualize the results of the training by looking at the outputs from the `results` directory. That completes our introductory example on training a data-driven model. As a next step, we will take these learnings and train a data-driven global weather prediction model. Please continue to [the next notebook](../weather/Notebook_2.ipynb). 

# Important: Free up GPU Memory!

Run the below cell to free up GPU memory after training the model before moving to the next notebook.

In [None]:
import os
os._exit(00)

--- 

Don't forget to check out additional [Open Hackathons Resources](https://www.openhackathons.org/s/technical-resources) and join our [OpenACC and Hackathons Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.

---

# Licensing

Copyright © 2023 OpenACC-Standard.org.  This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.
