## Setup

In [None]:
%%javascript
MathJax.Hub.Config({
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });MathJax.Hub.Queue(
  ["resetEquationNumbers", MathJax.InputJax.TeX],
  ["PreProcess", MathJax.Hub],
  ["Reprocess", MathJax.Hub]
);
MathJax.Hub.Queue(
  ["resetEquationNumbers", MathJax.InputJax.TeX],
  ["PreProcess", MathJax.Hub],
  ["Reprocess", MathJax.Hub]
);

In [None]:
from math import exp
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

# Introduction

This notebook will demonstrate how to use the code in the `nnpde1.py` module. The module allows the user to solve a 1st-order partial differential equation (PDE) with Dirichlet boundary conditions specified on all boundaries, using a single-hidden layer neural network.

This work is based on the paper ["Artificial Neural Networks for Solving Ordinary and Partial Differential Equations", by Lagaris et al, *IEEE Transactions on Neural Networks, Volume 9, No. 5*, September 1998](http://ieeexplore.ieee.org/document/712178/). Note that the notation used in this notebook and the associated Python code code differs slightly from that used in the Lagaris paper.

## The algorithm

Consider an $m$-dimensional space containing vectors $\vec x = (x_1,x_2,...,x_m)$. Any 1st-order PDE for the scalar function $\psi(\vec x)$ can be written in the form:

\begin{equation}
G(\vec x,\psi,\vec \nabla \psi) = 0
\end{equation}

The problem is to find a suitable solution to the PDE using a neural network.

The network is trained using a set of $n$ training points $\vec x_i$ ($1 \leq i \leq n$). This work assumes that the vectors $\vec x_i$ have been scaled so that each component $x_{ij} \in [0,1]$, $1 \leq j \leq m$. The training points need not be evenly-spaced. Note that only the independent variable vectors $\vec x_i$  of the training points are needed - the estimated value of the solution at the training points is obtained using a trial solution. For a 1st-order PDE with all Dirichlet boundary conditions, the trial solution has the form:

\begin{equation}
\psi_t(\vec x_i,\vec p) = A(\vec x_i) + P(\vec x_i) N(\vec x_i,\vec p)
\end{equation}

or:

\begin{equation}
\psi_{ti} = A_i + P_i N_i
\end{equation}

where $\psi_{ti}$ is the value of the trial solution at the current training point $\vec x_i$, $A_i$ is a function which yields the boundary conditions on each boundary, $P_i$ is a function which forces the product $P_i N_i$ to vanish at the boundaries, and $N_i$ is the floating-point output from an unspecified neural network with network parameters $\vec p$. Note that this trial solution satisfies the boundary conditions by construction - at any boundary, the second term vanishes, leaving $\psi_t(\vec x_i)=A(\vec x_i)$, which is just the specified boundary condition on that boundary.

Training is done in a set of *epochs*. A training epoch consists of presenting the neural network with each of the $n$ training points $\vec x_i$, one at a time. For each input point $\vec x_i$, the network output $N_i$ is computed. Once all $n$ points have been presented, the epoch is complete, and the error function $E$ is computed. The problem definition provides an analytical form for the error function, as the sum of squared errors (SSE) for each of the training points:

\begin{equation}
E = \sum_{i=1}^{n} \left( G(\vec x_i,\psi_{ti}, \vec \nabla \psi_{ti}) \right)^2 =
\sum_{i=1}^{n} G_i^2
\end{equation}

Once this error function is computed, the parameters in the neural network are adjusted to reduce the error. Eventually, a minimum of $E$ is attained, and the resulting final form of $\psi_t(x)$ is used as the solution to the original PDE.

## Computing the network output $N_i$

The neural network used in this work has a simple structure. One input node for each component of $\vec x_i$ is used (for a total of $m$ input nodes) to provide the training data. Each input node is fully-connected to each of a set of $H$ hidden nodes, each using a sigmoid transfer function. Each hidden node is connected to the single output node, which uses a linear transfer function with a weight for the signal from each hidden node.

During each step of a training epoch, the input to the network is just the training point $\vec x_i$. Each input node $j$ receives one component $x_{ij}$, and emits that value as output. These outputs are sent to each of the $H$ hidden nodes. At each hidden node $k$, the input values $x_{ij}$ are combined and scaled by the equation:

\begin{equation}
z_{ik} = \sum_{j=1}^m w_{jk} x_{ij} + u_k
\end{equation}

where  $w_{jk}$ is the weight for input $x_{ij}$ at hidden node $k$, and $u_k$ is the bias at hidden node $k$. This combined value is then used as the input to a sigmoidal transfer function:

\begin{equation}
\sigma_{ik} = \sigma(z_{ik})
\end{equation}

where:

\begin{equation}
\sigma(z) = \frac {1}{1+e^{-z}}
\end{equation}

A plot of this transfer function and its first several derivatives is provided below.

In [None]:
def sigma(z):
    return 1 / (1 + exp(-z))

def dsigma_dz(z):
    return exp(-z) / (1 + exp(-z))**2

def d2sigma_dz2(z):
    return (
        2 * exp(-2 * z) / (1 + exp(-z))**3 - exp(-z) / (1 + exp(-z))**2
    )

def d3sigma_dz3(z):
    return (
        6 * exp(-3 * z) / (1 + exp(-z))**4
        - 6 * exp(-2 * z) / (1 + exp(-z))**3
        + exp(-z) / (1 + exp(-z))**2
    )

z = np.arange(-5, 5, 0.01)
n = len(z)
s = np.zeros(n)
ds_dz = np.zeros(n)
d2s_dz2 = np.zeros(n)
d3s_dz3 = np.zeros(n)
for i in range(n):
    s[i] = sigma(z[i])
    ds_dz[i] = dsigma_dz(z[i])
    d2s_dz2[i] = d2sigma_dz2(z[i])
    d3s_dz3[i] = d3sigma_dz3(z[i])
plt.plot(z,s,label = "$\sigma(z)$")
plt.plot(z,ds_dz,label = "$d\sigma/dz$")
plt.plot(z,d2s_dz2, label = "$d^2\sigma/dz^2$")
plt.plot(z,d3s_dz3, label = "$d^3\sigma/dz^3$");
plt.xlabel("z")
plt.ylabel("$\sigma(z)$ or derivative")
plt.title("Figure 1: The $\sigma$-function and its first three derivatives")
plt.legend();

Once the $\sigma_{ik}$ are computed, they are all passed to the single output node, where they are processed by a linear transfer function to create the network output for the current input point $\vec x_i$:

\begin{equation}
N_i = \sum_{k=1}^{H}v_k\sigma_{ik}
\end{equation}

where $v_k$ is the weight applied to the output from hidden node $k$ at the output node. Once $N_i$ has been computed, the trial function $\psi_{ti}$ is computed. Next, we need the gradient of the trial function $\nabla \psi_{ti}$.

\begin{equation}
\vec \nabla \psi_{ti} = \vec \nabla (A_i + P_i N_i)
\end{equation}

Each component $j$ of this gradient may be written as:

\begin{equation}
\frac {\partial \psi_{ti}}{\partial x_{ij}} =
\frac {\partial A_i}{\partial x_{ij}} + \frac {\partial}{\partial x_{ij}} \left( P_i N_i \right) =
\frac {\partial A_i}{\partial x_{ij}} + P_i \frac {\partial N_i}{\partial x_{ij}} + \frac {\partial P_i}{\partial x_{ij}} N_i
\end{equation}

Several other intermediate derivatives are needed to compute the values need in the computation of the error function $E$. The derivatives of $A_i$ and $P_i$ with respect to $x_{ij}$ are computed from their known analytical forms, which may vary based on the problem under investigation. The values of the network derivatives $\frac {\partial N_i}{\partial x_{ij}}$ are computed analytically using the known form of the network and its weights and biases.

\begin{equation}
\frac {\partial N_i}{\partial x_{ij}} =
\frac {\partial}{\partial x_{ij}} \sum_{k=1}^{H} v_k \sigma_{ik} =
\sum_{k=1}^{H} v_k \frac {\partial \sigma_{ik}}{\partial x_{ij}} =
\sum_{k=1}^{H} v_k \frac {\partial \sigma_{ik}}{\partial z_{ik}} \frac {\partial z_{ik}}{\partial x_{ij}} = \sum_{k=1}^{H} v_k w_{jk} \sigma_{ik}^{(1)}
\end{equation}

where the derivatives of $\sigma$ are given by:

\begin{equation}
\sigma^{(k)} = \frac {d^k \sigma}{dz^k}
\end{equation}

With the values of $\psi_{ti}$ and $\vec \nabla \psi_{ti}$, we can now compute the values of $G_i$, and then the error function $E$ for the current epoch.

## Updating the network parameters

The network parameters are the weights and biases of the hidden and output nodes. For an $m$-dinesional input point $\vec x_i$, and a set of $H$ hidden nodes, we have a total of $N_p = (m+2)H$ parameters in total: a weight for each $x_{ij}$ for each hidden node, a bias for each hidden node, and an output weight for each hidden node.

Since the objective function to minimize is the error $E$, the value of each network parameter $p_{jk}$ (where $p_{jk}$ represents $v_k$, $u_k$, or $w_{jk}$) is updated using a scaled Newton's method:

\begin{equation}
p_{jk,new}=p_{jk} - \eta \frac {\frac {\partial E}{\partial {p_{jk}}}} {\frac {\partial^2 E}{\partial {p_{jk}^2}}}
\end{equation}

where $\eta$ is the *learning rate* (usually $\alpha < 1$). The learning rate is used to reduce the chance of solution instability due to large values of the correction factor in Newton's method. The derivatives of $E$ are computed using the known form of $G_i$, the network, and the network parameters.

This process of parameter updates is repeated until the specified maximum number of training epochs has been applied. Note that the same set of training points is presented during each epoch.

## Computing the derivatives of $E$

The first partial derivative of $E$ with respect to any network parameter $p_{jk}$ is given by:

\begin{equation}
\frac {\partial E}{\partial p_{jk}} =
\frac {\partial}{\partial p_{jk}} \sum_{i=1}^{n} G_i^2 =
2 \sum_{i=1}^n G_i \frac {\partial G_i}{\partial p_{jk}}
\end{equation}

The second partial derivative of $E$ with respect to any network parameter $p_{jk}$ is given by:

\begin{equation}
\frac {\partial^2 E}{\partial p_{jk}^2} =
\frac {\partial}{\partial p_{jk}} 2 \sum_{i=1}^n G_i \frac {\partial G_i}{\partial p_{jk}} =
2 \sum_{i=1}^n \left[G_i \frac {\partial^2 G_i}{\partial p_{jk}^2} + \left(\frac {\partial G_i}{\partial p_{jk}} \right)^2 \right]
\end{equation}

The general forms of the partial derivatives of $G_i$ are:

\begin{equation}
\frac {\partial G_i}{\partial p_{jk}} =
\frac {\partial G_i}{\partial \psi_{ti}} \frac {\partial \psi_{ti}}{\partial p_{jk}} +
\sum_{l=1}^m \frac {\partial G_i}{\partial \left(\frac {\partial \psi_{ti}}{\partial x_{il}}\right)}
\frac {\partial \left(\frac {\partial \psi_{ti}}{\partial x_{il}}\right)}{\partial p_{jk}}
\end{equation}

\begin{equation}
\frac {\partial^2 G_i}{\partial p_{jk}^2} =
\frac {\partial G_i}{\partial \psi_{ti}}
\frac {\partial^2 \psi_{ti}}{\partial p_{jk}^2} +
\frac {\partial^2 G_i}{\partial \psi_{ti}^2} \left(\frac {\partial \psi_{ti}}{\partial p_{jk}}\right)^2 +
\sum_{l=1}^m \left( \frac {\partial G_i}{\partial \left(\frac {\partial \psi_{ti}}{\partial x_{il}}\right)} \frac {\partial^2 \left(\frac {\partial \psi_{ti}}{\partial x_{il}}\right)}{\partial p_{jk}^2} +
\frac {\partial^2 G_i}{\partial \left(\frac {\partial \psi_{ti}}{\partial x_{ij}}\right)^2 } \left(\frac {\partial \frac {\partial \psi_{ti}}{\partial x_{ij}}}{\partial p_{jk}} \right)^2 \right)
\end{equation}

The partial derivatives of $N_i$ with respect to the network parameters are:

\begin{equation}
\frac {\partial N_i}{\partial p_{jk}} =
\frac {\partial}{\partial p_{jk}} \sum_{l=1}^H v_l \sigma_{il} =
\sum_{l=1}^H \left( v_l \frac {\partial \sigma_{il}}{\partial p_{jk}} + \frac {\partial v_l}{\partial p_{jk}} \sigma_{il} \right) =
\sum_{l=1}^H \left( v_l \sigma_{il}^{(1)} \frac {\partial z_{il}}{\partial p_{jk}} + \frac {\partial v_l}{\partial p_{jk}} \sigma_{il} \right)
\end{equation}

\begin{equation}
\frac {\partial^2 N_i}{\partial p_j^2} =
\sum_{l=1}^H \left(
v_l \sigma_{il}^{(1)} \frac {\partial^2 z_{il}}{\partial p_{jk}^2} +
v_l \sigma_{il}^{(2)} \left( \frac {\partial z_{il}}{\partial p_{jk}} \right)^2 +
2 \frac {\partial v_l}{\partial p_{jk}} \sigma_{il}^{(1)} \frac {\partial z_{il}}{\partial p_{jk}} +
\frac {\partial^2 v_l}{\partial p_{jk}^2} \sigma_{il}
\right)
\end{equation}

We now need the cross-partials of $N_i$ with respect to $x_i$ and the network parameters $p_{jk}$.

\begin{equation}
\frac {\partial^2 N_i}{\partial p_{jk} \partial x_{ij}} =
\frac {\partial}{\partial p_{jk}} \sum_{l=1}^{H} v_l w_{jl} \sigma_{il}^{(1)} =
\sum_{l=1}^{H} \left(
v_l w_{jl} \sigma_{il}^{(2)} \frac {\partial z_{il}}{\partial p_{jk}} +
v_l \frac {\partial w_{jl}}{\partial p_{jk}} \sigma_{il}^{(1)} +
\frac {\partial v_l}{\partial p_{jk}} w_{jl} \sigma_{il}^{(1)}
\right)
\end{equation}

\begin{equation}
\frac {\partial^3 N_i}{\partial p_j^2 \partial x_i} = \sum_{l=1}^H \left [
v_l w_{jl} \sigma_{il}^{(2)} \frac {\partial^2 z_{il}}{\partial p_{jk}^2} +
v_l w_{jl} \sigma_{il}^{(3)} \left ( \frac {\partial z_{il}}{\partial p_{jk}} \right )^2 +
v_l \frac {\partial w_{jl}}{\partial p_{jk}} \sigma_{il}^{(2)} \frac {\partial z_{il}}{\partial p_{jk}} +
\frac {\partial v_l}{\partial p_{jk}} w_{jl} \sigma_{il}^{(2)} \frac {\partial z_{il}}{\partial p_{jk}} + \\
v_l \frac {\partial w_{jl}}{\partial p_{jk}} \sigma_{il}^{(2)} \frac {\partial z_{il}}{\partial p_{jk}} +
v_l \frac {\partial^2 w_{jl}}{\partial p_{jk}^2} \sigma_{ij}^{(1)} +
\frac {\partial v_l}{\partial p_{jk}} \frac {\partial w_{jl}}{\partial p_{jk}} \sigma_{il}^{(1)} +
\frac {\partial v_l}{\partial p_{jk}} w_{jl} \sigma_{il}^{(2)} \frac {\partial z_{il}}{\partial p_{jk}} +
\frac {\partial v_l}{\partial p_{jk}} \frac {\partial w_{jl}}{\partial p_{jk}} \sigma_{il}^{(1)} +
\frac {\partial^2 v_l}{\partial p_{jk}^2} w_{jl} \sigma_{ij}^{(1)}
\right ] 
\end{equation}

Most of these expressions can now be simplified using the following relations between the network parameters:

\begin{equation}
\frac {\partial v_k}{\partial v_l} = \delta_{kl},
\frac {\partial v_k}{\partial w_*} = \frac {\partial v_k}{\partial u_*} = 0,
\frac {\partial^2 v_k}{\partial p_*^2} = 0
\end{equation}


\begin{equation}
\frac {\partial w_{jk}}{\partial w_{il}} = \delta_{ij} \delta_{kl},
\frac {\partial w_{jk}}{\partial v_*} = \frac {\partial w_{jk}}{\partial u_*} = 0,
\frac {\partial^2 w_{jk}}{\partial p_*^2} = 0
\end{equation}

\begin{equation}
\frac {\partial u_k}{\partial u_l} = \delta_{kl},
\frac {\partial u_k}{\partial v_*} = \frac {\partial u_k}{\partial w_*} = 0,
\frac {\partial^2 u_k}{\partial p_*^2} = 0
\end{equation}

\begin{equation}
\frac {\partial z_{ik}}{\partial v_*} = 0,
\frac {\partial z_{ik}}{\partial w_{jl}} = x_{ij},
\frac {\partial z_{ik}}{\partial u_{jl}} = \delta_{ij} \delta_{kl},
\end{equation}

\begin{equation}
\frac {\partial z_{ij}}{\partial u_j} = 1
\end{equation}

\begin{equation}
\frac {\partial z_{ij}}{\partial w_j} = x_i
\end{equation}

# Walking through an example problem

We will now walk through a complete problem which will illustrate how to use the nnode1 code to solve a 1st-order ODE IVP.

## Define the ODE to solve

Consider the simple 1st-order ODE, defined on the range $x=[0, 1]$:

\begin{equation}
\frac {dy}{dx} + x y = x
\end{equation}

This can be rearranged into the standard form (1):

\begin{equation}
\frac {dy}{dx} = f(x,y) = x(1-y)
\end{equation}

The analytical solution to this equation is:

\begin{equation}
y(x) = 1 + e^{-x^2/2}
\end{equation}

The analytical solution and its derivative are shown in the figure below.

In [None]:
def ya(x):
    return 1 + exp(-x**2 / 2)

# Define the 1st analytical derivative.
def dya_dx(x):
    return -x * exp(-x**2 / 2)

# Define the original differential equation:
def F(x, y):
    return x * (1 - y)

# Define the 1st y-partial derivative of the differential equation.
def dF_dy(x, y):
    return -x

# Define the 2nd y-partial derivative of the differential equation.
def d2F_dy2(x, y):
    return 0

In [None]:
xmin = 0
xmax = 1
n = 100
dx = (xmax - xmin) / n
#x = np.arange(xmin, xmax, dx) # Give division by zero error
x = np.linspace(xmin,xmax,n)
y = np.zeros(n)
dy_dx = np.zeros(n)
for i in range(n):
    y[i] = ya(x[i])
    dy_dx[i] = dya_dx(x[i])
plt.plot(x, y, label = "$y$")
plt.plot(x, dy_dx, label = "$dy/dx$")
plt.xlabel("x")
#plt.ylabel("$d^ky/dx^k$")
plt.legend()
plt.title("Figure 2: Analytical solution of $dy/dx = x(1-y)$");

## Other required definitions

The code above also defines several derivative functions that are required for computation of the gradients by the neural network. In addition to the ODE itself (defined as the function F), the first two partial derivatives of the ODE function $f(x,y)$ with respect to $y$ are required (defined as the functions `dF_dy` and `d2F_dy2`).

## Define the boundary conditions

Boundary conditions for the ODE must be set before the solution is attempted. For 1st-order ODE BVP, only Dirichlet boundary conditions (BC) are possible. In this case, we will always use the value of $y_t(0)$, denoted as $A$. For this problem, we are interested in a solution over the range $x=[0,1]$. The boundary conditions are therefore:

\begin{equation}
x_{min} = 0 \\
x_{max} = 1 \\
A = y(0) = 2
\end{equation}


## Create the training data

For the purposes of this example, an evenly-spaced set of training points will be used to train the neural network. Note that the initial point ($y_t(0)=A$) is not included in the training set, as it would artificially improve the accuracy measures of the solution.

In [None]:
nt = 10
dx = (xmax - xmin) / nt
xt = np.linspace(xmin,xmax,nt)
#xt = np.arange(xmin, xmax, dx) + dx

Note that repeated runs of the same ODE will usually result in slightly different solutions, due to the random number generator. To ensure repeatable results, seed the random number generator with a fixed value before each run.

## Train the model to solve the ODE

We can now train the network. The call below shows the minimum arguments required to call the nnode1() function. All tunable parameters (learning rate, hidden layer size, number of training epochs) are given default values (0.01, 10, 1000, respectively). The training function returns the computed values of $y$ and $\frac {dy}{dx}$ at the training points.

In [None]:
from nnode1 import nnode1
A = ya(xmin)
np.random.seed(0)
(yt, dyt_dx) = nnode1(xt, F, dF_dy, d2F_dy2, A)

Plot the results of this training run.

In [None]:
plt.plot(xt, yt, label = "$y$")
plt.plot(xt, dyt_dx, label = "$dy/dx$")
plt.xlabel("x")
plt.ylabel("$d^ky/dx^k$")
plt.legend()
plt.title("Figure 3: Trained solution and derivative");

Plot the error in the estimated solution and derivative.

In [None]:
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 4: Error in trained solution");

Now try repeating the analysis with a larger number of hidden nodes, and plot the error.

In [None]:
np.random.seed(0)
(yt, dyt_dx) = nnode1(xt, F, dF_dy, d2F_dy2, A, nhid = 20)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 5: Error in trained solution (20 hidden nodes)");

Now try repeating the analysis with a slightly larger learning rate, and plot the error.

In [None]:
np.random.seed(0)
(yt, dyt_dx) = nnode1(xt, F, dF_dy, d2F_dy2, A, eta = 0.02)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 6: Error in trained solution ($\eta$ = 0.02)");

Now try repeating the analysis with a larger number of training epochs, and plot the error.

In [None]:
np.random.seed(0)
(yt, dyt_dx) = nnode1(xt, F, dF_dy, d2F_dy2, A, maxepochs = 2000)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 7: Error in trained solution (2000 epochs)");

## Using a ODE definition module

Rather than entering ODE definitions in this notebook, the required definitions can be entered in a separate Python module, and imported. For example, the previous code is also encapsulated in the module ode00.py, and can be imported:

In [None]:
import ode00

We can now run the net using the information in this module.

In [None]:
np.random.seed(0)
(yt, dyt_dx) = nnode1(xt, ode00.F, ode00.dF_dy, ode00.d2F_dy2, ode00.ymin)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = ode00.ya(xt[i])
    dy_dx[i] = ode00.dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 8: Error in trained solution (using ode00.py)");

Other examples of 1st-order ODEs from the Lagaris paper have been provided in the files lagaris01.py and lagaris02.py. Run them in the same fashion.

In [None]:
np.random.seed(0)
import lagaris01
(yt, dyt_dx) = nnode1(xt, lagaris01.F, lagaris01.dF_dy,
                      lagaris01.d2F_dy2, lagaris01.ymin, nhid = 40)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = lagaris01.ya(xt[i])
    dy_dx[i] = lagaris01.dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 9: Error in trained solution (using lagaris01.py)");

In [None]:
np.random.seed(0)
import lagaris02
(yt, dyt_dx) = nnode1(xt, lagaris02.F, lagaris02.dF_dy, lagaris02.d2F_dy2,
                      lagaris02.ymin, nhid = 40)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = lagaris02.ya(xt[i])
    dy_dx[i] = lagaris02.dya_dx(xt[i])
plt.plot(xt, yt - y, label = "$y_t-y$")
plt.plot(xt, dyt_dx - dy_dx, label = "$dy_t/dx - dy/dx$")
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 10: Error in trained solution (using lagaris02.py)");