# *checkpoint_schedule* application: Adjoint-Based Gradient with Burger's Equation

This user example shows adjoint-based gradient computation using the *checkpointing_schedules* package. We initially define the adjoint-based gradient problem and then present the forward and adjoint solvers prescribed by the *checkpointing_schedules* package.

### Defining the application

Let us consider a one-dimensional (1D) problem aiming to compute the gradient/sensitivity of an objective functional $I$ with respect to a control parameter. The objective functional is given by the expression:

$$
I(u) = \int_{\Omega} \frac{1}{2} u(x, \tau)u(x, \tau) \, dx
\tag{1}
$$

This measures the energy of a 1D velocity variable $u = u(x, \tau)$ at a time $\tau$, where $u$ is governed by the 1D viscous Burgers equation, a non-linear equation for the advection and diffusion of momentum:

$$
\frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2} = 0.
\tag{2}
$$

Here, $x \in [0, L]$ is the space variable, and $t \in \mathbb{R}^{+}$ represents the time variable. The boundary condition is $u(0, t) = u(L, t) = 0$, where $L$ is the length of the 1D domain. The initial condition is given by $u(0, t) = u_0 = \sin(\pi x)$.

The control parameter is the initial condition $u_0$. Thus, the objective is to compute the adjoint-based gradient of the cost function $I(u)$ with respect to $u_0$.

This example sets the adjoint equation from the continuous formulation, meaning the adjoint PDE is obtained from the continuous forward PDE (Partial Differential Equation). The adjoint-based gradient is given by the expression:

$$
\frac{d I}{d u_0} = \int_{\Omega}  \lambda(x, 0) \delta u_0 \, dx,
\tag{3}
$$

where $\lambda(x, 0)$ is the adjoint variable governed by the adjoint system:

$$
\frac{\partial \lambda}{\partial t} - \lambda \frac{\partial u}{\partial x} + u \frac{\partial \lambda}{\partial x} + \nu \frac{\partial^2 \lambda}{\partial x^2} = 0,
\tag{4}
$$

satisfying the boundary condition $\lambda (0, t) = \lambda(L, t) = 0$. In this case, the initial condition is $\lambda (x, \tau) = u(x, \tau)$.

Once the adjoint equation is solved from the initial time $t = \tau$ to the final time $\tau = 0$ the adjoint equation depends on the forward solution (see the adjoint equation (4)), it is necessary to store the forward state variable. Storing the entire forward state in preparation for the adjoint calculation has a memory footprint linear in the number of time steps. For sufficiently large problems this will exhaust the
memory of any computer system. To overcome this kind of problem, checkpointing algorithms are used to reduce the memory usage.

#### Checkpointing manager implementation

As shown in the [illustrative example](https://nbviewer.org/github/firedrakeproject/checkpoint_schedules/blob/main/docs/notebooks/tutorial.ipynb), we define the `CheckpointingManager` class to manage the execution of forward and adjoint models with a checkpointing schedule. The `CheckpointingManager` contains the `execute` method, which performs each action specified in the checkpointing schedule (`_schedule`). Within the `execute` method, the single-dispatch generic function `action` function is employed. `action` is overloaded by the `action_forward`, `action_adjoint`, `action_copy`, `action_move`, `action_end_forward`, and `action_end_reverse` functions. These functions correspond to different actions reached during the iterations over the elements of the `cp_schedule`.

For instance, if the action is `Forward`, the `action_forward` function is called. Within this function, the necessary code is implemented to progress the forward equation. In this particular example, the forward solver is executed by calling `self.equation.forward`. Here, `self.equation` is an attribute of `CheckpointingManager`. Similarly, the adjoint solver is executed by calling `self.equation.adjoint` within the `action_reverse` function.


In [107]:
import functools, sys
from checkpoint_schedules import *

class CheckpointingManager:
    """Manage the forward and backward solvers.

    Attributes
    ----------
    schedule : CheckpointSchedule
        The schedule created by `checkpoint_schedules` package.
    equation : object
        An equation object used to solve the forward and adjoint solvers.
    
    Notes
    -----
    The `equation` object contains methods to execute the forward and adjoint. In 
    addition, it contains methods to copy data from one storage to another, and
    to set the initial condition for the adjoint.
    """
    def __init__(self, schedule, equation):
        self.max_n = sys.maxsize
        self.equation = equation
        self.reverse_step = 0
        self._schedule = schedule
        
    def execute(self):
        """Execute forward and adjoint using checkpointing.
        """
        @functools.singledispatch
        def action(cp_action):
            raise TypeError("Unexpected action")

        @action.register(Forward)
        def action_forward(cp_action):
            n1 = cp_action.n1
            if (
                isinstance(self._schedule, SingleMemoryStorageSchedule) 
                or isinstance(self._schedule, SingleDiskStorageSchedule)
            ): 
                self.equation.forward(cp_action.n0, n1, storage=cp_action.storage,
                                      single_storage=True, write_adj_deps=cp_action.write_adj_deps)
            else:    
                self.equation.forward(cp_action.n0, n1, storage=cp_action.storage,
                                      write_adj_deps=cp_action.write_adj_deps, write_ics=cp_action.write_ics)
            if n1 > self.equation.model["max_n"]:
                n1 = min(n1, self.equation.model["max_n"])
                self._schedule.finalize(n1)
            

        @action.register(Reverse)
        def action_reverse(cp_action):
            if self.reverse_step == 0:
                self.equation.adjoint_initial_condition()
            self.equation.adjoint(cp_action.n0, cp_action.n1, cp_action.clear_adj_deps)
            self.reverse_step += cp_action.n1 - cp_action.n0
            
        @action.register(Copy)
        def action_copy(cp_action):
            self.equation.copy_data(cp_action.n, cp_action.from_storage, cp_action.to_storage)

        @action.register(Move)
        def action_move(cp_action):
            self.equation.copy_data(cp_action.n, cp_action.from_storage, cp_action.to_storage, move=True)
            
        @action.register(EndForward)
        def action_end_forward(cp_action):
            if self._schedule.max_n is None:
                self._schedule._max_n = self.max_n
            assert self.reverse_step == 0
            
        @action.register(EndReverse)
        def action_end_reverse(cp_action):
            if self._schedule.max_n != self.reverse_step:
                raise ValueError("The number of steps in the reverse phase"
                                 "is different from the number of steps in the"
                                 "forward phase.")
            
        self.reverse_step = 0
        for _, cp_action in enumerate(self._schedule):
            action(cp_action)
            if isinstance(cp_action, EndReverse):
                break


#### Burger's equation implementation

The sensitivity computation is performed by solving the forward and adjoint equations. The forward equation is 1D viscous Burgers equation (2). For this current sensitivity problem, the adjoint equation is given by (4). 

To solve forward and adjoint solvers, we implement `BurgersEquation` class that execute the forward and adjoint solvers. In addition, the `BurgersEquation` has the `copy_data` that copies the data from one storage type to another, and `adjoint_initial_condition` that sets the adjoint initial condition.

Both the forward and adjoint systems are discretised using the Finite Element Method (FEM). We use the first-order Lagrange basis functions to discretise the spatial domain. The backward finite difference method is employed to discretise the equations in time.

In [108]:
import numpy as np
from enum import Enum
import os
from scipy.sparse.linalg import spsolve
from scipy.sparse import lil_matrix
from scipy.optimize import newton


class BurgersEquation:
    """This class is capable to solve the time-dependent forward 
    and adjoint burger's equation.

    Attributes
    ----------
    model : dict
        The model parameters containing the essential information to solve
        the burger's equation.
    init_condition : array
        The initial condition used to solve the forward burger's equation.
    mesh : array
        The spatial mesh.
    """
    def __init__(self, model, forward_initial_condition, mesh):
        self.model = model
        self.mesh = mesh
        self.snapshots = {StorageType.RAM: {}, StorageType.DISK: {}}
        self.forward_work_memory = {StorageType.WORK: {}}
        self.forward_initial_condition = forward_initial_condition
        self.forward_final_solution = None
        self.forward_work_memory[StorageType.WORK][0] = forward_initial_condition
        self.adjoint_work_memory = {StorageType.WORK: {}}

    def _mass_matrix(self):
        """This function assembles the mass matrix.

        Returns
        -------
        M : scipy.sparse.lil_matrix
            The mass matrix.
        
        Notes
        -----
        The mass matrix is assembled a linear spatial basis functions.
        """
        num_nodes = self.model["nx"]
        local_matrix = (1 / 6) * np.array([[2, 1], [1, 2]])
        M = lil_matrix((num_nodes, num_nodes))
        for i in range(num_nodes - 1):
            M[i:i + 2, i:i + 2] += local_matrix
        return M
    
    def _stiffness_matrix(self):
        """This function assembles the stiffness matrix.

        Returns
        -------
        K : scipy.sparse.lil_matrix
            The stiffness matrix.
        """
        num_nodes = self.model["nx"]
        h = self.model["lx"] / self.model["nx"] 
        # 1D mesh is uniform. Thus, the mesh spacing is constant.
        b = self.model["nu"] / (h ** 2)
        local_stiffness = np.array([[-1, 1], [1, -1]])
        K = lil_matrix((num_nodes, num_nodes))
        for i in range(num_nodes - 1):
            K[i:i + 2, i:i + 2] += b * local_stiffness
        return K
    
    def _convection_matrix(self, u, adjoint=False):
        """This function assembles the convection matrix.

        Parameters
        ----------
        u : array
            State vector.

        Returns
        -------
        C : scipy.sparse.lil_matrix
            The convection matrix.
        """
        num_nodes = self.model["nx"]
        # 1D mesh is uniform. Thus, the mesh spacing is constant.
        h = self.model["lx"] / self.model["nx"] 
        C = lil_matrix((num_nodes, num_nodes))
        C[0, 0] = - 1 / 2 * u[0] / h
        C[num_nodes - 1, num_nodes - 1] = 1 / 2 * u[num_nodes - 1] / h
        C[0, 1] = 1/2 * u[0] / h
        C[num_nodes - 1, num_nodes - 2] = - 1 / 2 * u[num_nodes - 2] / h
        for i in range(1, num_nodes - 1):
            # Convection term uses the approach described in [1] and [2], 
            # which assumes `u` constant over an element.
            # [1] Kutluay SE, Esen AL, Dag I. Numerical solutions of the Burgers’ equation
            # by the least-squares quadratic B-spline finite element method. Journal of
            # computational and Applied Mathematics. 2004 May 1;167(1):21-33.
            # [2] Dogan A. A Galerkin finite element approach to Burgers' equation.
            # Applied mathematics and computation. 2004 Oct 5;157(2):331-46.
            C[i, i - 1] = - 1 / 2 * u[i - 1] / h
            C[i, i] = 1 / 2 * (u[i - 1] - u[i]) / h
            C[i, i + 1] = 1 / 2 * u[i] / h
        if adjoint:
            c_local = np.array([[1/3, 1/6], [1/3, 1/6]])
            for i in range(num_nodes - 1):
                C[i:i + 2, i:i + 2] += c_local * (u[i + 1] - u[i]) / h

        return C
    
    def forward(
            self, n0, n1, storage=None, write_adj_deps=False,
            write_ics=False, single_storage=False
    ):
        """Solve the non-linear forward burger's equation in time.

        Parameters
        ----------
        n0 : int
            Initial time step.
        n1 : int
            Final time step.
        storage : StorageType, optional
            The storage type, which can be StorageType.RAM, StorageType.DISK,
            StorageType.WORK, or StorageType.NONE.
        write_adj_deps : bool, optional
            Whether the adjoint dependency data will be stored.
        write_ics : bool, optional
            Whether the forward restart data will be stored.
        single_storage : bool, optional
            This parameter is used to indicated whether a checkpointing schedule
            is single storage or not. Single storage means that no checkpointing
            algorithm (eg, `Revolve`, `HRevole`) is employed. 
        """
        M = self._mass_matrix()
        K = self._stiffness_matrix()
        def non_linear(u_new, u):
            """Define the non-linear system.

            Parameters
            ----------
            u_new : array
                Forward solution at the `n + 1` time step.
            u : array
                Forward solution at the `n` time step.
            """
            C = self._convection_matrix(u_new)
            # Set the boundary conditions.
            u[0] = u[self.model["nx"] - 1] = 0
            F = M * u_new - M * u + self.model["dt"] * (- K * u_new + C * u_new)
            return F

        # Get the initial condition
        u = self.forward_work_memory[StorageType.WORK][n0]
        if not single_storage:
            del self.forward_work_memory[StorageType.WORK][n0]
        u_new = u.copy()
        n1 = min(n1, self.model["max_n"])
        step = n0
        while step < n1:
            if ((write_ics and step == n0)
                or (write_adj_deps and storage != StorageType.WORK)):
                self._store_data(u, step, storage,
                                write_adj_deps=write_adj_deps,
                                write_ics=write_ics)
            u_new = newton(lambda u_new: non_linear(u_new, u), u)
            u = u_new.copy()
            if single_storage and storage == StorageType.WORK:
                self.forward_work_memory[StorageType.WORK][step] = u_new
            step += 1
        self.forward_work_memory[StorageType.WORK][step] = u_new
        if n1 == self.model["max_n"]:
            self.forward_final_solution = u_new.copy()

    def adjoint(self, n0, n1, clear_adj_deps):
        """Execute the adjoint equation in time.

        Parameters
        ---------
        n0 : int
            Initial time step.
        n1 : int
            Final time step.
        clear_adj_deps : bool
            If `True`, the adjoint dependency data will be cleared.
        """
        u = self.adjoint_work_memory[StorageType.WORK][n1]
        del self.adjoint_work_memory[StorageType.WORK][n1]
        u_new = np.zeros(self.model["nx"])
        steps = n1 - n0
        t = n1
        M = self._mass_matrix()
        K = self._stiffness_matrix()
        for _ in range(steps):
            u[0] = u[self.model["nx"] - 1] = 0
            C = self._convection_matrix(self.forward_work_memory[StorageType.WORK][t],
                                        adjoint=True)
            A = M - self.model["dt"] * (K + C)
            d = M * u
            u_new = spsolve(M, d)
            u = u_new.copy()
            if clear_adj_deps:
                del self.forward_work_memory[StorageType.WORK][t]
            t -= 1
        self.adjoint_work_memory[StorageType.WORK][n0] = u_new
    
    def compute_gradient(self):
        """Compute the adjoint-based gradient.
        """
        u_adj = self.adjoint_work_memory[StorageType.WORK][0]
        return  np.trapz(u_adj * np.sin(np.pi * self.mesh), self.mesh)
    
    def _store_on_disk(self, data, step, adj_deps=False):
        """Store the forward data on disk.

        Parameters
        ----------
        data : array
            The forward data.
        step : int
            The time step.
        adj_deps : bool, optional
            If True, the data is stored in the adjoint dependencies folder.
        """
        if adj_deps:
            file_name = "adj_deps/fwd_"+ str(step) +".npy"
            with open(file_name, "wb") as f:
                np.save(f, data)
        else:
            file_name = "fwd_data/fwd_"+ str(step) +".npy"
            with open(file_name, "wb") as f:
                np.save(f, data)
        self.snapshots[StorageType.DISK][step] = file_name

    def copy_data(self, step, from_storage, to_storage, move=False):
        """Load the forward data from disk.

        Parameters
        ----------
        file_name : str
            The file name.
        step : int
            The time step.

        Returns
        -------
        data : array
            The loaded data.
        """
        if from_storage == StorageType.DISK:
            file_name = self.snapshots[StorageType.DISK][step]
            with open(file_name, "rb") as f:
                if to_storage == StorageType.RAM:
                    self.snapshots[StorageType.RAM][step] = np.load(f)
                elif to_storage == StorageType.WORK:
                    self.forward_work_memory[StorageType.WORK][step] = np.load(f)
                if move:
                    os.remove(file_name)
        elif from_storage == StorageType.RAM:
            self.forward_work_memory[StorageType.WORK][step] = self.snapshots[StorageType.RAM][step]
            if move:
                del self.snapshots[StorageType.RAM][step]

    def _store_data(self, data, t, storage, write_adj_deps=False, write_ics=False):
        """Store the forward data.

        Parameters
        ----------
        data : array
            The forward data.
        t : int
            The time step.
        storage : StorageType
            The storage type.
        write_adj_deps : bool, optional
            If `True`, the adjoint dependency data will be stored.
        write_ics : bool, optional
            If `True`, the forward restart data will be stored.
        """
        if storage == StorageType.DISK:
            if write_adj_deps:
                self._store_on_disk(data, t, adj_deps=write_adj_deps)
            if write_ics:
                self._store_on_disk(data, t)
        elif storage == StorageType.RAM:
            self.snapshots[storage][t] = data


    def adjoint_initial_condition(self):
        """Set the adjoint initial condition.
        """
        u = self.forward_final_solution
        self.adjoint_work_memory[StorageType.WORK][self.model["max_n"]] = u



### Adjoint-based sensitivity computations

The purpose of this adjoint-based sensitivity computation is to use every checkpointing approach available in the
`checkpoint_schedules` package and verify if the quantitative results provided by the solver remain consistent. 

Below, we define the `model` dictionary containing the parameters required for the forward and adjoint solvers. The `model` dictionary is then passed to the `BurgersEquation` class. Additionally, we set up the 1D mesh and the initial condition for the forward Burgers' solver.

In [109]:
model = {"lx": 1,   # lenght domain
         "nx": 80, # number of nodes
         "dt": 0.001, # time step
         "T": 2, # final time
         "nu": 0.01, # viscosity
         "max_n": 200, # total steps
         "chk_ram": 10, # number of checkpoints in RAM
         "chk_disk": 10, # number of checkpoints on disk
        }

mesh = np.linspace(0, model["lx"], model["nx"]) # create the spatial grid
u0 = np.sin(np.pi*mesh) # initial condition

Initially, we compute an adjoint-based sensitivity without employing any checkpointing approach. 
The sensitivity computations use two methods: (i) central finite difference and (ii) adjoint. The central finite difference method is given by:
$$ \frac{\partial I}{\partial u_0} = \frac{I(u_0 + \epsilon) - I(u_0 - \epsilon)}{2 \epsilon} \tag{5}$$
where $\epsilon$ is a small perturbation. 

The adjoint method is employed to compute the sensitivity of the kinetic energy with respect to the initial condition $u_0$ is given by (3).

In [110]:
# Solve the sensitivity using central finite differences method.
u0_p = 1.01*np.sin(np.pi*mesh)  # Initial condition.
burger = BurgersEquation(model, u0_p, mesh)  # Create the burger's equation object.
n0 = 0
n1 = model["max_n"]
burger.forward(0, model["max_n"])  # Solve the forward equation for model["max_n"] steps.
u_p = burger.forward_final_solution  # Get the final solution.
e_p = 0.5 * u_p**2  # Energy

u0_m = 0.99 * np.sin(np.pi*mesh)  # Initial condition.
burger = BurgersEquation(model, u0_m, mesh)  # Create the burger's equation object.
burger.forward(0, model["max_n"])  # Solve the forward equation.
u_m = burger.forward_final_solution  # Get the final solution.
e_m = 0.5 * u_m**2  # Energy

# Compute the sensitivity.
integ2 = np.trapz(e_p, mesh)
integ1 = np.trapz(e_m, mesh)
du0 = 0.01
fd_sensitivity = (integ2 - integ1) / (2 * du0)
print("Sensitivity using central finite differences method: ", fd_sensitivity)

Sensitivity using central finite differences method:  0.48237197121444586


Above, we computed the sensitivity of the objective functional $I(u(x, \tau))$ with respect to the initial condition $u_0$ using the central finite difference method. The perturbation $\epsilon$ is set to $10^{-2} \cdot u_0$, and the sensitivity is calculated at the final time $\tau = 0.5$.

The adjoint-based sensitivity is initially computed using the `SingleMemoryStorageSchedule` checkpointing approach. The `SingleMemoryStorageSchedule` stores the forward data of each time-step in working memory. As explained in the [notebook with illustrative example](https://nbviewer.org/github/firedrakeproject/checkpoint_schedules/blob/main/docs/notebooks/tutorial.ipynb), this schedule does not require the maximal step (`model["max_n"]`).


In [111]:
schedule = SingleMemoryStorageSchedule()  # create the checkpointing schedule
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing

Next, we compute the adjoint-based sensitivity and compare it with the sensitivity obtained using the central finite difference method without any checkpointing approach.

In [112]:
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
adj = burger.adjoint_work_memory[StorageType.WORK][0]  # get the adjoint solution
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.4686945163615982
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.835458042558434


The following example shows the usage of the `SingleDiskStorageSchedule` schedule. In this case, the forward data used in the adjoint computations is only stored on disk. The `SingleDiskStorageSchedule` schedule does not require the definition of the maximal step `model["max_n"]` before the execution of the forward solver. 

In [113]:
schedule = SingleDiskStorageSchedule()  # create the checkpointing schedule
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


With the example above, we do not move any data from the disk to the work in memory, i.e., we copy the data from the disk and keep this data on disk. The next example shows the usage of the `SingleDiskStorageSchedule` schedule with the `mode_data=True` argument. In this case, the forward data used in the adjoint compuations stored on disk is moved to the work in memory, i.e., we copy the data from the disk and remove this data from the disk. 

In [114]:
schedule = SingleDiskStorageSchedule(move_data=True)  # create the checkpointing schedule
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


The following example uses the `Revolve` schedule [1]. The `Revolve` algorithm requires the definition of the maximal step `model["max_n"]` before the execution of the forward solver, and also the specification of the number of checkpoints to be stored in RAM (Random Access Memory). 

`model["chk_ram"]` indicates the number of steps for which the forward data is stored in RAM 


In [115]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
schedule = Revolve(model["max_n"], model["chk_ram"]) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


The `DiskRevolve` [3] schedule requires the definition of the maximal step `model["max_n"]` before the execution of the forward solver, and the maximum number of checkpoints to be saved in RAM. This schedule automatically computes the number of checkpoints to store on disk, considering factors such as:
- Computational cost to execute the forward solver in one step.
- Computational cost to store the forward data on disk.

In [116]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
schedule = DiskRevolve(model["max_n"], model["chk_ram"]) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


The `PeriodicDiskRevolve` [4] schedule also requires the definition of the maximal step `model["max_n"]` before the execution of the forward solver. Additionally, this schedule requires the specification of the number of checkpoints intended to be stored in RAM, with the schedule automatically calculating the number of steps for which the forward data is stored on disk.

In [117]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
schedule = PeriodicDiskRevolve(model["max_n"], model["chk_ram"]) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

We use periods of size  11
Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


In the following examples, we employ the `HRevolve` [5] and `MultistageCheckpointSchedule` [2] schedules. These checkpointing schedules requires the definition of the maximal step `model["max_n"]` before the execution of the forward solver. Additionally, the specification of the maximum number of checkpoints allowed to be stored in both RAM and on disk.

In [118]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
schedule = HRevolve(model["max_n"], model["chk_ram"], model["chk_disk"]) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


In [119]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
schedule = MultistageCheckpointSchedule(model["max_n"], model["chk_ram"], model["chk_disk"]) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


`TwoLevelCheckpointSchedule` [7] does not require the maximal step `model["max_n"]` to be defined before the execution of the forward solver, it saves the forward restart data on disk based on the `period` argument. For instance, if `period = 10`, the forward restart data is stored on disk every ten steps.

During the adjoint computation, the user can define additional forward restart data storage in RAM or on disk. This is carried out according to the `binomial_storage` argument. The additional number of checkpoints stored is set by the second argument of the `TwoLevelCheckpointSchedule` class. In the example below, we set `model["chk_ram"]` as the additional number of checkpoints to be stored during the adjoint computation, and the storage type is on disk (`binomial_storage=StorageType.RAM`).

In [120]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
period = 10
schedule = TwoLevelCheckpointSchedule(period, model["chk_ram"], binomial_storage=StorageType.RAM) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


Now, we define to store the additional forward restart data on dink during the adjoint computation with the storage type `binomial_storage=StorageType.DISK`, as shown below.

In [121]:
burger = BurgersEquation(model, u0, mesh) # create the burger's equation object
period = 10
schedule = TwoLevelCheckpointSchedule(period, model["chk_disk"], binomial_storage=StorageType.DISK) # create the checkpointing schedule
manager = CheckpointingManager(schedule, burger)  # create the checkpointing manager
manager.execute()  # execute the forward and adjoint solvers using checkpointing
adj_sensitivity = burger.compute_gradient()  # compute the adjoint-based sensitivity
print("Sensitivity using adjoint method: ", adj_sensitivity)
print("sensitivity using central finite differences method: ", fd_sensitivity)
error = np.abs(adj_sensitivity - fd_sensitivity)/np.abs(fd_sensitivity)
print("Relative difference in percentage: ", error*100)

Sensitivity using adjoint method:  0.46869451662679296
sensitivity using central finite differences method:  0.48237197121444586
Relative difference in percentage:  2.8354579875812025


### References

[1] Griewank, A., & Walther, A. (2000). Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Transactions on Mathematical Software (TOMS), 26(1), 19-45., doi: https://doi.org/10.1145/347837.347846

[2] Stumm, P., & Walther, A. (2009). Multistage approaches for optimal offline checkpointing. SIAM Journal on Scientific Computing, 31(3), 1946-1967. https://doi.org/10.1137/080718036

[3] Aupy, G., Herrmann, J., Hovland, P., & Robert, Y. (2016). Optimal multistage algorithm for adjoint computation. SIAM Journal on Scientific Computing, 38(3), C232-C255. DOI: https://doi.org/10.1145/347837.347846.

[4] Aupy, G., & Herrmann, J. (2017). Periodicity in optimal hierarchical checkpointing schemes for adjoint computations. Optimization Methods and Software, 32(3), 594-624. doi: https://doi.org/10.1080/10556788.2016.1230612

[5] Herrmann, J. and Pallez (Aupy), G. (2020). H-Revolve: a framework for adjoint computation on synchronous hierarchical platforms. ACM Transactions on Mathematical Software (TOMS), 46(2), 1-25. DOI: https://doi.org/10.1145/3378672.

[6] Maddison, J. R. (2023). On the implementation of checkpointing with high-level algorithmic differentiation. arXiv preprint arXiv:2305.09568. https://doi.org/10.48550/arXiv.2305.09568.

[7] Pringle, G. C., Jones, D. C., Goswami, S., Narayanan, S. H. K., and  Goldberg, D. (2016). Providing the ARCHER community with adjoint modelling tools for high-performance oceanographic and cryospheric computation. https://nora.nerc.ac.uk/id/eprint/516314.

[8] Goldberg, D. N., Smith, T. A., Narayanan, S. H., Heimbach, P., and Morlighem, M. (2020). Bathymetric Influences on Antarctic Ice‐Shelf Melt Rates. Journal of Geophysical Research: Oceans, 125(11), e2020JC016370. doi: https://doi.org/10.1029/2020JC016370.