# Physics-Aware Recurrent Convolutions

<a href="https://colab.research.google.com/github/stephenbaek/padl-yonsei/blob/master/labs/05_parc_harmonic_oscillator.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a> <br/>

While the physics-aware deep learning methods we've seen so far, including PINN and Neural Operators, provided mathematically more rigorous way of fitting neural networks to dynamics data, they are also known to be biased towards low-frequency, "relatively linear" dynamics phenomena. For problems that involve strong nonlinear behaviors or stiffness, such as high frequency features, sharp gradients, fast-transient features, discontinuities, etc., physics-aware recurrent convolutions (PARC) might be a more reliable choice. In this session, we will revisit the problem of modeling harmonic oscillators using neural networks, but this time with a greater frequency, and compare PINN, neural operators, and PARC on the problem with the increased stiffness.

### Recap on (Damped) Harmonic Oscillators

Here, we brush up again on the basic theories of harmonic oscillators.

In classical mechanics, a harmonic oscillator is a system in which a mass experiences a restoring force proportional to its displacement from equilibrium. If we let $x$ be the displacement of the mass $m$ from the equilibrium $x=0$, by the Hooke's law, the restoring force $F_r$ has the following form:

$$F_r=-kx.$$

Also, from the Newton's second law of motion $F=m\frac{\partial^2 x}{\partial t^2}$, where the second order derivative with respect to time $t$, we have the following relationship:

$$m\frac{\partial^2 x}{\partial t^2} + kx = 0.$$

In an ideal harmonic oscillator as in the above, the mass oscillates indefinitely with constant amplitude and frequency. However, real-world oscillators are often damped and experience a resistive or damping force that opposes its motion. This means that they gradually lose energy due to frictional or resistive forces, causing the oscillations to decrease in amplitude over time. Such a damping force, say $F_d$ can be modeled as being proportional to the velocity $\frac{\partial x}{\partial t}$ of the mass, or $F_d=-c\frac{\partial x}{\partial t}$, where $c$ is called the damping coefficient.

The Newton's second law for damped harmonic oscillators then becomes:

$$F_\text{total} = F_r + F_d = -kx-c\frac{\partial x}{\partial t} = m\frac{\partial^2 x}{\partial t^2},$$

which can be rewritten into a more intuitive form:

$$\frac{\partial^2 x}{\partial t^2} + 2\zeta\omega_0\frac{\partial x}{\partial t} + \omega_0^2 x = 0,$$

where $\omega_0:=\sqrt{\frac{k}{m}}$ is the angular frequency of the oscillator and $\zeta:=\frac{c}{2\sqrt{mk}}$ is called the damping ratio.

The value of the damping ratio can vary and critically determine the behavior of the system, affecting whether the system returns to equilibrium without oscillating (overdamping; $\zeta>1$), oscillates with a gradually reducing amplitude (underdamping; $\zeta<1$), or quickly comes to rest without oscillating (critical damping; $\zeta=1$).

Oftentimes, it is more convenient to use the state space representation of a damped harmonic oscillator, expressesing the system's dynamics using a set of first-order differential equations that describe its state at any given time. The state vector, typically represented by $\mathbf{x}(t)$, includes position $x$ and velocity $\dot{x}:=\frac{\partial x}{\partial t}$, which fully describe the system's current condition:

$$\mathbf{x}(t) = \begin{bmatrix} x(t) \\ \dot{x}(t) \end{bmatrix}.$$

Then the original second-order differential equation of the harmonic oscillator above then becomes a system of first-order differential equations:

$$\frac{d}{dt}\mathbf{x}(t)=\begin{bmatrix} 0 & 1 \\ -2\zeta\omega_0 & -\omega_0^2\end{bmatrix}\mathbf{x}(t)$$

where the matrix 
$$A:=\begin{bmatrix} 0 & 1 \\ -2\zeta\omega_0 & -\omega_0^2\end{bmatrix}$$
is called the system matrix.

Finally, in the underdamped or critically damped cases where $\zeta \leq 1$, the analytical solution for the state space representation of a damped harmonic oscillator can be expressed as damped sinusoidal oscillations:

$$\mathbf{x}(t) = ae^{-\zeta\omega_0 t}\sin(\sqrt{1-\zeta^2}\omega_0t + \varphi),$$

where the amplitude $a$ and phase $\varphi$ are coefficients determined from the initial conditions.

### Data Generation

Building upon the theoretical background above, let's generate data for training and validating neural networks. Notice that most of the code below are reused from the previous sessions, except that we have a higher angular frequency ($\omega_0$) and lower damping ratio ($\zeta$).

In [None]:
import matplotlib.pyplot as plt
import os
import numpy as np
from scipy import integrate

import torch
import torch.nn as nn
import torch.nn.functional as F

from PIL import Image
import IPython.display

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float32

def save_progress(save_dir, prefix, i):
    file = os.path.join(save_dir, prefix + "_%.6i.png"%(i+1))
    plt.savefig(file, bbox_inches='tight', pad_inches=0.1, dpi=100, facecolor="white")
    return file

def make_gif(imagefiles, output_path, fps=20):
    imgs = [Image.open(file) for file in imagefiles]
    imgs[0].save(fp=output_path, format='GIF', append_images=imgs[1:], save_all=True, duration=int(1000/fps), loop=True)

def reset_parameters(model):
    for layer in model.children():
        if hasattr(layer, 'reset_parameters'):
            layer.reset_parameters()

In [None]:
def determineCoefficients(X0, zeta, omega0):
    ''' Determine amplitude a and phase phi.

    Inputs:
        X0: torch.tensor of size (2,) containing initial position and velocity.
        zeta: damping ratio.
        omega0: angular frequency.

    Returns:
        Amplitude a and phase phi that match the initial condition X0.
    '''
    assert zeta <= 1, "zeta must be under-/critically-damped (zeta <= 1)"

    if torch.abs((X0[1] + zeta*omega0*X0[0])) < 1e-7:
        phi = 0.5*torch.pi
    else:
        num = torch.sqrt(torch.tensor(1-zeta**2, dtype=dtype))*omega0*X0[0]
        den = X0[1] + zeta*omega0*X0[0]
        phi = torch.arctan(num/den)
    a = X0[0]/torch.sin(phi)

    return a, phi

# Conventional numerical solver for comparison
def ode(X, t, omega0, zeta):
    """
    X = [x, dx]: State vector
    omega0 = sqrt(k/m): Undamped angular frequency of the oscillator.
    beta = 0.5*c/sqrt(m*k): Damping ratio.
    """
    x, dx = X
    ddx = -2*zeta*omega0*dx - (omega0**2)*x
    return [dx, ddx]


# Initial condition
X0 = torch.tensor([1, 0], dtype=dtype)   # initial condition
omega0 = 40#2*torch.pi                      # angular frequency
zeta = 0.03#0.03                               # damping ratio (<= 1)

# Determine coefficients
a, phi = determineCoefficients(X0, zeta, omega0)

# Create ground truth data
Nt = 1024    # number of time steps
Tmax = 1
t = torch.linspace(0, Tmax, Nt, dtype=dtype)

# Compute the analytic solution (ground truth) for the state x(t) and the velocity dx/dt
ezot = torch.exp(-zeta*omega0*t)
sqrt_omega = torch.sqrt(torch.tensor(1-zeta**2, dtype=dtype))*omega0
position = a*ezot*torch.sin(sqrt_omega*t + phi)
velocity = -a*zeta*omega0*ezot*np.sin(sqrt_omega*t + phi) + a*ezot*sqrt_omega*np.cos(sqrt_omega*t + phi)
GT = torch.stack([position, velocity]).T

# Numerical Solution - ODE Solver
X = integrate.odeint(ode, X0, t, args = (omega0, zeta))

plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.plot(t, GT[:,0], color='black', label = "Ground Truth")
plt.plot(t, X[:,0], color='crimson', linestyle='dashed', label = "ODE Solver")
plt.grid()
plt.xlabel("t")
plt.ylabel("x")
plt.title('Position')
plt.legend()
plt.subplot(1,2,2)
plt.plot(t, GT[:,1], color='black', label = "Ground Truth")
plt.plot(t, X[:,1], color='crimson', linestyle='dashed', label = "ODE Solver")
plt.grid()
plt.xlabel("t")
plt.ylabel("dx/dt")
plt.title('Velocity')
plt.legend()

In [None]:
Tmax_sample = 0.5       # maximum time for training sample
sample_stride = 8       # stride between samples

Nt_sample = int(Nt*Tmax_sample/Tmax)

t_sample = torch.unsqueeze(t[0:Nt_sample:sample_stride], -1)
GT_sample = GT[0:Nt_sample:sample_stride,:]

training_data = torch.hstack((t_sample, GT_sample))  # t, x
training_data[:,1] += torch.randn(len(training_data))*0.01 # Emulate noisy observations
training_data[:,2] += torch.randn(len(training_data))*0.01  #

plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.plot(t, GT[:,0], color='black', label = "Ground Truth")
plt.plot(t, X[:,0], color='crimson', linestyle='dashed', label = "ODE Solver")
plt.scatter(training_data[:,0], training_data[:,1], color='orange', label='Training Data')
plt.grid()
plt.xlabel("t")
plt.ylabel("x")
plt.title('Position')
plt.legend()
plt.subplot(1,2,2)
plt.plot(t, GT[:,1], color='black', label = "Ground Truth")
plt.plot(t, X[:,1], color='crimson', linestyle='dashed', label = "ODE Solver")
plt.scatter(training_data[:,0], training_data[:,2], color='orange', label='Training Data')
plt.grid()
plt.xlabel("t")
plt.ylabel("dx/dt")
plt.title('Velocity')
plt.legend()
plt.show()

### PINN

Let's first start with PINN. A few sessions ago, we saw that PINN worked quite well in modeling harmonic oscillators with a lower frequency and stronger damping. Well, this time, like we just saw above, we have a system that is more stiff with reduced damping and increased frequency of oscillation. Let's see how PINN behaves on this data.

In [None]:
class Backbone(nn.Module):
    def __init__(self, dtype=dtype):
        super().__init__()

        self.fc1 = nn.Linear(1, 32, dtype=dtype)  # input dim = 1 (t)
        self.fc2 = nn.Linear(32, 32, dtype=dtype)  # hidden dims = 64, 64
        self.fc3 = nn.Linear(32, 32, dtype=dtype)  #
        self.out = nn.Linear(32, 1, dtype=dtype)  # output dim = 1 (x)

        self.dtype = dtype

    def forward(self, x):
        x = self.fc1(x)
        # x = nn.SiLU()(x)
        x = nn.Tanh()(x)
        x = self.fc2(x)
        # x = nn.SiLU()(x)
        x = nn.Tanh()(x)
        x = self.fc3(x)
        # x = nn.SiLU()(x)
        x = nn.Tanh()(x)
        return self.out(x)
    
model = Backbone().to(device)

In [None]:
reset_parameters(model)
optimizer = torch.optim.Adam(model.parameters(),lr=1e-4)
# lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5000, gamma=0.5)
files = []

save_dir = 'results/harmonic_stiff/pinn'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
    
MAX_ITER = 200000
N_COLLOCATION_POINTS = 60
Trange=len(training_data)
# Trange=1
input = training_data[:Trange,:1].clone().to(device)    # t
output = training_data[:Trange,1:2].clone().to(device)  # x
collocation_t = torch.linspace(0,Tmax,N_COLLOCATION_POINTS)
collocation_pts = torch.unsqueeze(collocation_t, -1).clone().requires_grad_(True).to(device)
viz_t = torch.unsqueeze(t,axis=-1).clone().requires_grad_(True).to(device)
for iter in range(MAX_ITER):
    optimizer.zero_grad()
    prediction = model(input)
    data_loss = torch.mean((output-prediction)**2)
    
    prediction_colloc = model(collocation_pts)
    dx  = torch.autograd.grad(prediction_colloc, collocation_pts, torch.ones_like(prediction_colloc), create_graph=True)[0]
    ddx  = torch.autograd.grad(dx, collocation_pts, torch.ones_like(dx), create_graph=True)[0]
    residual = ddx + 2*zeta*omega0*dx + (omega0**2)*prediction_colloc
    physics_loss = torch.mean(residual**2)
    loss = 10000*data_loss + (1e-4)*physics_loss
    # loss = data_loss

    loss.backward()
    optimizer.step()
    # lr_scheduler.step()

    print(f"{iter+1}/{MAX_ITER} - loss: {loss.detach().cpu().numpy():.5e}, physics_1: {physics_loss.detach().cpu().numpy():.5e}", end='\r')
    
    # plot the result as training progresses
    if (iter+1) % 1000 == 0: 
        prediction = model(viz_t)
        dx  = torch.autograd.grad(prediction, viz_t, torch.ones_like(prediction))[0]

        plt.figure(figsize=(16,4))
        plt.subplot(1,2,1)
        plt.plot(t, GT[:,0], color='black', label = "Ground Truth")
        # plt.plot(t, X[:,0], color='crimson', linestyle='dashed', label = "ODE Solver")
        plt.plot(t, prediction[:,0].detach().cpu(), color='deepskyblue', label = "PINN")
        plt.scatter(collocation_t, np.zeros_like(collocation_t), color='olive', label = "Collocation Points", s=10)
        plt.scatter(training_data[:Trange,0], training_data[:Trange,1], color='orange', label='Training Data')
        plt.grid()
        plt.xlabel("t")
        plt.ylabel("x")
        plt.title(f'Position (Iteration={iter+1})')
        plt.legend()
        plt.subplot(1,2,2)
        plt.plot(t, GT[:,1], color='black', label = "Ground Truth")
        # plt.plot(t, X[:,1], color='crimson', linestyle='dashed', label = "ODE Solver")
        plt.plot(t, dx.detach().cpu(), color='deepskyblue', label = "PINN")
        plt.scatter(collocation_t, np.zeros_like(collocation_t), color='olive', label = "Collocation Points", s=10)
        plt.scatter(training_data[:Trange,0], training_data[:Trange,2], color='orange', label='Training Data')
        plt.grid()
        plt.xlabel("t")
        plt.ylabel("dx/dt")
        plt.title('Velocity')
        plt.legend()

        files.append(save_progress(save_dir, 'pinn', iter))
    
        if (iter+1) % 10000 == 0: plt.show()
        else: plt.close("all")

In [None]:
make_gif(files, "results/harmonic_stiff/pinn_stiff.gif")
IPython.display.Image(filename="results/harmonic_stiff/pinn_stiff.gif") 

### Neural Operator

In [None]:
class SpectralConv1d(nn.Module):
    def __init__(self, in_channels, out_channels, n_freqs):
        super(SpectralConv1d, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.n_freqs = n_freqs  # Number of Fourier frequencies to be kept

        self.scale = 1 / (in_channels * out_channels)
        self.weights = nn.Parameter(self.scale * torch.rand(in_channels, out_channels, self.n_freqs, dtype=torch.cfloat))


    def forward(self, x):
        # Fourier Transform
        x_ft = torch.fft.rfft(x) # [batch, channels, signal_length] -> [batch, channels, signal_length//2 + 1]
        
        # Weighted sum (neural network operation)
        out_ft = torch.zeros(x.shape[0], self.out_channels, x.size(-1)//2 + 1,  device=x.device, dtype=torch.cfloat)
        out_ft[:, :, :self.n_freqs] = torch.einsum("bix,iox->box", x_ft[:, :, :self.n_freqs], self.weights)

        # Inverse Fourier Transform
        x = torch.fft.irfft(out_ft, n=x.size(-1))
        return x
    
class FNO1d(nn.Module):
    def __init__(self, width):
        super(FNO1d, self).__init__()
        self.width = width
        self.fc0 = nn.Linear(2, self.width) # Inputs: time series (x, dx/dt) for the first few time steps

        n_freq = 8
        self.conv0 = SpectralConv1d(self.width, self.width, n_freq)
        self.conv1 = SpectralConv1d(self.width, self.width, n_freq)
        self.conv2 = SpectralConv1d(self.width, self.width, n_freq)
        self.conv3 = SpectralConv1d(self.width, self.width, n_freq)

        self.skip0 = nn.Conv1d(self.width, self.width, 1)
        self.skip1 = nn.Conv1d(self.width, self.width, 1)
        self.skip2 = nn.Conv1d(self.width, self.width, 1)
        self.skip3 = nn.Conv1d(self.width, self.width, 1)

        self.fc1 = nn.Linear(self.width, 128)
        self.fc2 = nn.Linear(128, 2)  # Outputs: time series (x, dx/dt) for the following time steps

    def forward(self, x):
        # Step 1: map to a larger dimensional feature space
        x = self.fc0(x)         # [Batch, Nx, C] -> [Batch, Nx, Width]
        x = x.permute(0, 2, 1)  # [Batch, C, Nx]

        # Step 2: Integral operators u' = (W + K)(u).
        x = self.skip0(x) + self.conv0(x)
        # x = nn.Tanh()(x)
        x = nn.SiLU()(x)

        x = self.skip1(x) + self.conv1(x)
        # x = nn.Tanh()(x)
        x = nn.SiLU()(x)

        x = self.skip2(x) + self.conv2(x)
        # x = nn.Tanh()(x)
        x = nn.SiLU()(x)

        x = self.skip3(x) + self.conv3(x)
        # x = nn.Tanh()(x)
        x = nn.SiLU()(x)

        # Step 3: project from feature space to output space
        x = x.permute(0, 2, 1)  # [Batch, Nx, C]
        x = self.fc1(x)         # [Batch, Nx, C] -> [Batch, Nx, Width]
        x = F.relu(x)
        x = self.fc2(x)         # [Batch, Nx, C] -> [Batch, Nx, 1]
        
        # since there are only one output
        x = x.squeeze(-1)       # [Batch, Nx, 1] -> [Batch, Nx]
        
        return x
    
model = FNO1d(width=64).to(device)

In [None]:
reset_parameters(model)
optimizer = torch.optim.Adam(model.parameters(),lr=1e-4, weight_decay=1e-5)
# lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=500, gamma=0.8)

files = []
import os
save_dir = 'results/harmonic_stiff/fno'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
   
MAX_ITER = 20000
fno_window = len(training_data)//2
input = torch.unsqueeze(training_data[0:fno_window,1:].clone(), 0).to(device)    # x, dx/dt @ first 32 frames
output = training_data[fno_window:2*fno_window,1:].clone().to(device)             # x, dx/dt for the next 32 frames
for iter in range(MAX_ITER):
    model.train(True)

    optimizer.zero_grad()
    prediction = model(input)[0]
    loss = torch.mean((output[:,0]-prediction[:,0])**2) + 0.0001*torch.mean((output[:,1]-prediction[:,1])**2)
    # loss = torch.mean((output-prediction)**2)

    loss.backward()
    optimizer.step()
    # lr_scheduler.step()

    print(f"{iter+1}/{MAX_ITER} - loss: {loss.detach().cpu().numpy():.5e}", end='\r')
    
    # roll out the prediction
    if (iter+1) % 100 == 0: 
        # viz_t = torch.unsqueeze(t,axis=-1).clone().requires_grad_(True)
        viz_t = training_data[0:fno_window,0:1].clone()
        sequence = input.clone()
        for i in range(3):
            model.train(False)
            prediction = model(sequence[:,-fno_window:,:])
            sequence = torch.cat([sequence, prediction], dim=1)
            viz_t = torch.cat([viz_t, training_data[0:fno_window,0:1] + viz_t[-1,0] + viz_t[1,0]], dim=0)

        plt.figure(figsize=(16,4))
        plt.subplot(1,2,1)
        plt.plot(t, GT[:,0], color='black', label = "Ground Truth")
        # plt.plot(t, X[:,0], color='crimson', linestyle='dashed', label = "ODE Solver")
        plt.plot(viz_t, sequence[0,:,0].detach().cpu(), color='deepskyblue', label = "FNO")
        plt.scatter(training_data[:fno_window*2,0], training_data[:fno_window*2,1], color='orange', label='Training Data')
        plt.grid()
        plt.xlabel("t")
        plt.ylabel("x")
        plt.title(f'Position (Iteration={iter+1})')
        plt.legend()
        plt.subplot(1,2,2)
        plt.plot(t, GT[:,1], color='black', label = "Ground Truth")
        # plt.plot(t, X[:,1], color='crimson', linestyle='dashed', label = "ODE Solver")
        plt.plot(viz_t, sequence[0,:,1].detach().cpu(), color='deepskyblue', label = "FNO")
        plt.scatter(training_data[:fno_window*2,0], training_data[:fno_window*2,2], color='orange', label='Training Data')
        plt.grid()
        plt.xlabel("t")
        plt.ylabel("dx/dt")
        plt.title('Velocity')
        plt.legend()

        files.append(save_progress(save_dir, 'fno', iter))
    
        if (iter+1) % 1000 == 0: plt.show()
        else: plt.close("all")

In [None]:
make_gif(files, "results/harmonic_stiff/fno_stiff.gif")
IPython.display.Image(filename="results/harmonic_stiff/fno_stiff.gif") 

### PARC

In [None]:
class Differentiator(nn.Module):
    def __init__(self, dtype=torch.float32):
        super().__init__()

        self.fc1 = nn.Linear(2, 32, dtype=dtype)  # input dim = 2 (x, dx)
        self.fc2 = nn.Linear(32, 16, dtype=dtype)  # hidden dims = 16, 16
        self.out = nn.Linear(16, 2, dtype=dtype)  # output dim = 2 (dx, ddx)

        self.dtype = dtype

    def forward(self, x):
        x = self.fc1(x)
        # x = nn.SiLU()(x) 
        x = nn.LeakyReLU()(x) 
        x = self.fc2(x)
        # x = nn.SiLU()(x) 
        x = nn.LeakyReLU()(x) 
        return self.out(x)
    
diff = Differentiator().to(device)

In [None]:
reset_parameters(diff)

optimizer = torch.optim.Adam(diff.parameters(), lr=1e-4)
# lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1000, gamma=0.8)

files = []
import os
save_dir = 'results/harmonic_stiff/parc'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
    

MAX_ITER = 10000
dt = training_data[1,0].clone().to(device)
X0_device = X0.clone().to(device)
for iter in range(MAX_ITER):
    optimizer.zero_grad()
    x = X0_device
    for i in range(len(training_data)-1):
        # dx0 = diff(x)
        # dx1 = diff(x + dt*dx0)
        # x = x + 0.5*dt*(dx0 + dx1)
        pred_dx = diff(x)
        x = x + pred_dx*dt   # simple forward Euler should work fine for this toy example
        if i == 0:
            loss = (x[0] - training_data[i+1,1])**2 + 0.001*(x[1] - training_data[i+1,2])**2
            # loss = torch.abs(x[0] - training_data[i+1,1]) + torch.abs(x[1] - training_data[i+1,2])
        else:
            loss += (x[0] - training_data[i+1,1])**2 + 0.001*(x[1] - training_data[i+1,2])**2
            # loss += torch.abs(x[0] - training_data[i+1,1]) + torch.abs(x[1] - training_data[i+1,2])
    loss /= len(training_data)-1
    
    loss.backward()
    optimizer.step()
    # lr_scheduler.step()

    print(f"{iter+1}/{MAX_ITER} - loss: {loss.detach().cpu().numpy():.5e}", end='\r')
    
    # plot the result as training progresses
    if (iter+1) % 100 == 0: 
        x = X0_device
        prediction = [[0, x[0].detach().cpu().numpy(), x[1].detach().cpu().numpy()]]
        for i in range(len(training_data)*2):
            pred_dx = diff(x)
            x = x + pred_dx*dt
            # dx0 = diff(x)
            # dx1 = diff(x + dt*dx0)
            # x = x + 0.5*dt*(dx0 + dx1)
            prediction.append([i*dt.detach().cpu().numpy(), x[0].detach().cpu().numpy(), x[1].detach().cpu().numpy()])
        prediction = np.array(prediction)

        plt.figure(figsize=(16,4))
        plt.subplot(1,2,1)
        plt.plot(t, GT[:,0], color='black', label = "Ground Truth")
        # plt.plot(t, X[:,0], color='crimson', linestyle='dashed', label = "ODE Solver")
        plt.plot(prediction[:,0], prediction[:,1], color='deepskyblue', label = "PARC")
        plt.scatter(training_data[:,0], training_data[:,1], color='orange', label='Training Data')
        plt.grid()
        plt.xlabel("t")
        plt.ylabel("x")
        plt.title(f'Position (Iteration={iter+1})')
        plt.legend()
        plt.subplot(1,2,2)
        plt.plot(t, GT[:,1], color='black',     label = "Ground Truth")
        # plt.plot(t, X[:,1], color='crimson', linestyle='dashed', label = "ODE Solver")
        plt.plot(prediction[:,0], prediction[:,2], color='deepskyblue', label = "PARC")
        plt.scatter(training_data[:,0], training_data[:,2], color='orange', label='Training Data')
        plt.grid()
        plt.xlabel("t")
        plt.ylabel("dx/dt")
        plt.title('Velocity')
        plt.legend()
        
        file = os.path.join(save_dir, "parc_%.6i.png"%(iter+1))
        plt.savefig(file, bbox_inches='tight', pad_inches=0.1, dpi=100, facecolor="white")
        files.append(file)
    
        if (iter+1) % 500 == 0: plt.show()
        else: plt.close("all")

In [None]:
make_gif(files, "results/harmonic_stiff/parc_stiff.gif")
IPython.display.Image(filename="results/harmonic_stiff/parc_stiff.gif") 