Skip to content

Latest commit

 

History

History
249 lines (198 loc) · 9.64 KB

zcs.rst

File metadata and controls

249 lines (198 loc) · 9.64 KB

PI-DeepONets with Zero Coordinate Shift (ZCS)

Zero Coordinate Shift (ZCS) is a low-level technique for maximizing memory and time efficiency of physics-informed DeepONets (Leng et al., 2023). In this tutorial, we will explain how to activate ZCS in an existing DeepXDE script. Usually, ZCS can reduce GPU memory consumption and wall time for training by an order of magnitude.

Prerequisite

Your current script can be easily equipped with ZCS if you are using

  • TensorFlow 2.x, PyTorch or Paddle as the backend (use PyTorch for best performance),
  • dde.data.PDEOperatorCartesianProd as the data class, and
  • dde.nn.DeepONetCartesianProd as the network class.

Usage

Switching to ZCS requires two steps.

Step 1: Replacing the classes in the following table

FROM TO
deepxde.data.PDEOperatorCartesianProd deepxde.zcs.PDEOperatorCartesianProd
deepxde.Model deepxde.zcs.Model

Step 2: Changing the PDE equation(s) to ZCS format

In DeepXDE, the user function for the PDE equation(s) is declared as

def pde(x, u, v):
    # ...

To use ZCS, we first create a deepxde.zcs.LazyGrad object, passing x and u as the arguments. The derivatives of u w.r.t. x at any higher orders can then be computed by LazyGrad.compute(orders). For example, the Laplace equation (uxx + uyy = 0) can be coded as

def pde(x, u, v):
    grad_u = dde.zcs.LazyGrad(x, u)
    du_xx = grad_u.compute((2, 0))
    du_yy = grad_u.compute((0, 2))
    return du_xx + du_yy

Note: deepxde.zcs.LazyGrad is smart enough to avoid re-calculating any lower-order derivatives if a higher-order one has been calculated based on them. For example, in the above function, if you add du_x = grad_u.compute((1, 0)) after du_xx = grad_u.compute((2, 0)), du_x will be returned instantly from a cache inside grad_u without extra computation.

These are all you need!

Example 1: diffusion reaction

In this example, we activate ZCS in the demo of diffusion reaction equation. The PDE is ut − Duxx + ku2 − v = 0, which is implemented in the original script as

def pde(x, u, v):
    D = 0.01
    k = 0.01
    du_t = dde.grad.jacobian(u, x, j=1)
    du_xx = dde.grad.hessian(u, x, j=0)
    return du_t - D * du_xx + k * u ** 2 - v

In the ZCS script, we change the PDE to (along with replacing the classes in Step 1)

def pde(x, u, v):
    D = 0.01
    k = 0.01
    grad_u = dde.zcs.LazyGrad(x, u)
    du_t = grad_u.compute((0, 1))
    du_xx = grad_u.compute((2, 0))
    return du_t - D * du_xx + k * u ** 2 - v

The GPU memory and wall time we measured on a Nvidia V100 (with CUDA 12.2) are reported below. For these measurements, we have increased the number of points in the domain from 200 to 4000, as 200 is likely to be insufficient for real applications. Time is measured for 1000 iterations.

BACKEND METHOD GPU / MB TIME / s
PyTorch Aligned 5779 186
Unaligned 5873 117
ZCS 655 11
TensorFlow Aligned 9205 73 (with jit)
Unaligned 11694 70 (with jit)
ZCS 591 35 (no jit)
Paddle Aligned 5805 197
Unaligned 6923 385
ZCS 1353 15

ZCS with Jit is on our TODO list.

Example 2: Stokes flow

The Problem

In this example, we use a PI-DeepONet to approach the system of Stokes for fluids. The domain is a 2D square full of liquid, with its lid moving horizontally at a given variable speed. The full equations and boundary conditions are

$$\begin{aligned} \begin{aligned} \mu\left(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2}\right) - \frac{\partial p}{\partial x}=0, \quad & x\in (0,1), y\in(0, 1);\\ \mu\left(\frac{\partial^2 v}{\partial x^2} + \frac{\partial^2 v}{\partial y^2}\right) - \frac{\partial p}{\partial y}=0, \quad & x\in (0,1), y\in(0, 1);\\\ \frac{\partial u}{\partial x} + \frac{\partial v}{\partial y}=0, \quad & x\in (0,1), y\in(0, 1);\\ u(x,1)=u_1(x), v(x,1)=0, \quad & x\in(0, 1);\\ u(x,0)=v(x,0)=p(x,0)=0, \quad & x\in(0, 1);\\\ u(0,y)=v(0,y)=0, \quad & y\in(0, 1);\\ u(1,y)=v(1,y)=0, \quad & y\in(0, 1). \end{aligned} \end{aligned}$$

We attempt to learn an operator mapping from u1(x) to {u, v, p}(x, y), with u1(x) sampled from a Gaussian process. The true solution for validation is computed using FreeFEM++ following this tutorial.

PDE implementation

Without ZCS, the script with aligned points implements the PDE as

def pde(xy, uvp, aux):
    mu = 0.01
    # first order
    du_x = dde.grad.jacobian(uvp, xy, i=0, j=0)
    dv_y = dde.grad.jacobian(uvp, xy, i=1, j=1)
    dp_x = dde.grad.jacobian(uvp, xy, i=2, j=0)
    dp_y = dde.grad.jacobian(uvp, xy, i=2, j=1)
    # second order
    du_xx = dde.grad.hessian(uvp, xy, component=0, i=0, j=0)
    du_yy = dde.grad.hessian(uvp, xy, component=0, i=1, j=1)
    dv_xx = dde.grad.hessian(uvp, xy, component=1, i=0, j=0)
    dv_yy = dde.grad.hessian(uvp, xy, component=1, i=1, j=1)
    motion_x = mu * (du_xx + du_yy) - dp_x
    motion_y = mu * (dv_xx + dv_yy) - dp_y
    mass = du_x + dv_y
    return motion_x, motion_y, mass

Accordingly, the script with ZCS implements the PDE as

def pde(xy, uvp, aux):
    mu = 0.01
    u, v, p = uvp[..., 0:1], uvp[..., 1:2], uvp[..., 2:3]
    grad_u = dde.zcs.LazyGrad(xy, u)
    grad_v = dde.zcs.LazyGrad(xy, v)
    grad_p = dde.zcs.LazyGrad(xy, p)
    # first order
    du_x = grad_u.compute((1, 0))
    dv_y = grad_v.compute((0, 1))
    dp_x = grad_p.compute((1, 0))
    dp_y = grad_p.compute((0, 1))
    # second order
    du_xx = grad_u.compute((2, 0))
    du_yy = grad_u.compute((0, 2))
    dv_xx = grad_v.compute((2, 0))
    dv_yy = grad_v.compute((0, 2))
    motion_x = mu * (du_xx + du_yy) - dp_x
    motion_y = mu * (dv_xx + dv_yy) - dp_y
    mass = du_x + dv_y
    return motion_x, motion_y, mass

Both of them should be self-explanatory.

Results

After 50,000 iterations of training, the relative errors for both velocity and pressure should converge to around 10%. The following figure shows the true and the predicted solutions for u1(x) = x(1 − x). Note that ZCS does not affect the accuracy of the resultant model -- it just gears up the training while saving GPU memory. You may want to decrease the number of iterations for a quicker run.

image

The memory and time measurements on a Nvidia A100 (80 GB, CUDA 12.2) are given below. Note that the wall time is measured for 100 iterations.

BACKEND METHOD GPU / MB TIME / s
PyTorch Aligned 70630 431
ZCS 4067 17
TensorFlow Aligned Failed Failed
ZCS 8632 81

Aligned failed with TensorFlow (v2.15.0) because graphing by @tf.function (either with jit_compile on or off) got stuck on both the two machines we tested on. If you manage to run it successfully, please report the results in an issue.