## Setup

In [None]:
%%javascript
MathJax.Hub.Config({
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });MathJax.Hub.Queue(
  ["resetEquationNumbers", MathJax.InputJax.TeX],
  ["PreProcess", MathJax.Hub],
  ["Reprocess", MathJax.Hub]
);
MathJax.Hub.Queue(
  ["resetEquationNumbers", MathJax.InputJax.TeX],
  ["PreProcess", MathJax.Hub],
  ["Reprocess", MathJax.Hub]
);

In [None]:
from math import exp, cos, sin
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

# Introduction

This notebook will demonstrate how to use the code in the `nnode2bvp.py` module. The module allows the user to solve 2nd-order ordinary differential equation (ODE) boundary value problems (BVPs) using a single-hidden layer neural network.

This work is based on the paper "Artificial Neural Networks for Solving Ordinary and Partial Differential Equations", by Lagaris et al, *IEEE Transactions on Neural Networks, Volume 9, No. 5*, September 1998. Note that the notation used in this notebook and the associated Python code is slightly different from that used in the Lagaris paper.

## The algorithm

Any 2nd-order ODE can be written in the form:

\begin{equation}
\frac {d^2y}{dx^2} = f\left(x,y,\frac {dy}{dx}\right)
\end{equation}

The problem is to find a suitable solution to the ODE using a neural network.

The network is trained using a set of training points, defined on the domain of interest. The training points need not be evenly-spaced. Note that only the independent variable values $x_i$ of the training points are needed - the estimated value of the solution at those training points is obtained using a trial solution. Such a BVP can always be scaled and mapped onto the range [0,1], and this code assumes such mapping has already been performed. For 2nd-order ODE BVP, the trial solution has the form:

\begin{equation}
y_t(x_i) = A (1 - x_i) + B x_i + x_i (1 - x_i) N(x_i)
\end{equation}

or:

\begin{equation}
y_{ti} = A (1 - x_i) + B x_i + x_i (1 - x_i) N_i
\end{equation}

where $y_{ti}=y_t(x_i)$ is the value of the trial solution at the current training point $x_i$, $A$ is the boundary condition ($y(0)$), $B$ is the boundary condition ($y(1)$), and $N_i=N(x_i)$ is the single-valued floating-point output from an unspecified neural network. Note that this trial solution satisfies the boundary conditions by construction - at $x=0$, the second and third terms vanish, leaving $y_t(0)=A$, while at $x=1$, the first and third terms vanish.

Training is done in a set of *epochs*. A training epoch consists of presenting the neural network with each of the $n$ training points $x_i$, one at a time. For each input value $x_i$, the network output $N_i$ is computed. Once all $n$ points have been presented, the epoch is complete, and the error function $E$ is computed. Since the problem definition (1) provides an analytical form $f(x,y)$ for the second derivative of the solution $y(x)$, the appropriate error function is the sum of squared errors (SSE):

\begin{equation}
E = \sum_{j=1}^{n}\left(\frac{\partial^2 y_{ti}}{\partial x_i^2}-f_i\right)^2
\end{equation}

where $\frac {\partial^2 y_{ti}}{\partial x_i^2}$ is the second derivative of the trial function with respect to $x$, evaluated at $x_i$. This derivative is an approximation of the analytical derivative $f_i = f(x_i,y_{ti},\frac {dy_{ti}}{dx_i})$, with $y_{ti}$ used in place of $y$. Once this error function is computed, the weights and biases in the neural network are adjusted to reduce the error. Eventually, a minimum of $E$ is attained, and the resulting final form of $y_t(x)$ is used as the solution to the original ODE.

## Computing the network output $N_i$

The neural network used in this work has a simple structure. A single input node is used to provide the training data. The input node is fully-connected to a set of $H$ hidden nodes. Each hidden node is connected to a single output node, which uses a linear transfer function with a weight for the signal from each hidden node.

During each step of a training epoch, the input to the network, and thus the output of the single input node, is just the training point $x_i$. This single output is then sent to each of the $H$ hidden nodes. At each hidden node, the input value $x_i$ is scaled by the equation:

\begin{equation}
z_{ij} = w_jx_i + u_j
\end{equation}

where $z_{ij}$ is the scaled value of $x_i$ at hidden node $j$, $w_j$ is the weight at node $j$, and $u_j$ is the bias at node $j$. This scaled value is then used as the input to a sigmoidal transfer function:

\begin{equation}
\sigma_{ij} = \sigma(z_{ij})
\end{equation}

where:

\begin{equation}
\sigma(z) = \frac {1}{1+e^{-z}}
\end{equation}

A plot of this transfer function and its first several derivatives is provided below.

In [None]:
def sigma(z):
    return 1 / (1 + exp(-z))

def dsigma_dz(z):
    return exp(-z) / (1 + exp(-z))**2

def d2sigma_dz2(z):
    return (
        2 * exp(-2 * z) / (1 + exp(-z))**3 - exp(-z) / (1 + exp(-z))**2
    )

def d3sigma_dz3(z):
    return (
        6 * exp(-3 * z) / (1 + exp(-z))**4
        - 6 * exp(-2 * z) / (1 + exp(-z))**3
        + exp(-z) / (1 + exp(-z))**2
    )

z = np.linspace(-5, 5, 1001)
n = len(z)
s = np.zeros(n)
ds_dz = np.zeros(n)
d2s_dz2 = np.zeros(n)
d3s_dz3 = np.zeros(n)
for i in range(n):
    s[i] = sigma(z[i])
    ds_dz[i] = dsigma_dz(z[i])
    d2s_dz2[i] = d2sigma_dz2(z[i])
    d3s_dz3[i] = d3sigma_dz3(z[i])
plt.plot(z,s,label = "$\sigma(z)$")
plt.plot(z,ds_dz,label = "$d\sigma/dz$")
plt.plot(z,d2s_dz2, label = "$d^2\sigma/dz^2$")
plt.plot(z,d3s_dz3, label = "$d^3\sigma/dz^3$");
plt.xlabel("z")
plt.ylabel("s(z) or derivative")
plt.title("Figure 1: The $\sigma$-function and its first three derivatives")
plt.legend();

Once the $\sigma_{ij}$ are computed, they are all passed to the single output node, where they are summed by a linear transfer function to create the network output for the current value of $x_i$:

\begin{equation}
N_i = \sum_{j=1}^{H}v_j\sigma_{ij}
\end{equation}

where $v_j$ is the weight applied to hidden node $j$ at the output node. Once $N_i$ has been computed, the trial function $y_{ti}$ is computed (3), and then the value of the ODE $f_i$ is computed (1). We then compute the derivatives of the trial function to use in the error computation:

\begin{equation}
\frac{\partial y_{ti}}{\partial x_i} = B - A + x_i (1 - x_i) \frac {\partial N_i}{\partial x_i} + (1 - 2 x_i) N_i
\end{equation}

\begin{equation}
\frac{\partial^2 y_{ti}}{\partial x_i^2} = x_i (1 - x_i) \frac {\partial^2 N_i}{\partial x_i^2} + 2 (1 - 2 x_i) \frac {\partial N_i}{\partial x_i} - 2 N_i
\end{equation}

The values of the network derivatives $\frac {\partial N_i}{\partial x_i}$ and $\frac {\partial^2 N_i}{\partial x_i^2}$ are computed analytically using the known form of the network and its weights and biases.

\begin{equation}
\frac {\partial N_i}{\partial x_i} =
\frac {\partial}{\partial x_i} \sum_{j=1}^{H} v_j \sigma_{ij} =
\sum_{j=1}^{H} v_j \frac {\partial \sigma_{ij}}{\partial x_i} = \sum_{j=1}^{H} v_j \frac {\partial \sigma_{ij}}{\partial z_{ij}} \frac {\partial z_{ij}}{\partial x_i} = \sum_{j=1}^{H} v_j w_j \sigma_{ij}^{(1)}
\end{equation}

\begin{equation}
\frac {\partial N_i}{\partial x_i} =
\frac {\partial}{\partial x_i} \sum_{j=1}^{H} v_j w_j \sigma_{ij}^{(1)} =
\sum_{j=1}^{H} v_j w_j \frac {\partial \sigma_{ij}^{(1)}}{\partial x_i} =
\sum_{j=1}^{H} v_j w_j \frac {\partial \sigma_{ij}^{(1)}}{\partial z_{ij}} \frac {\partial z_{ij}}{\partial x_i} =
\sum_{j=1}^{H} v_j w_j^2 \sigma_{ij}^{(2)}
\end{equation}

where the derivatives of $\sigma$ are given by:

\begin{equation}
\sigma^{(k)} = \frac {d^k \sigma}{dz^k}
\end{equation}

We now have all the values we need to compute the error function $E$ for the current epoch (4).

## Updating the network parameters

The network parameters are the weights and biases of the hidden and output nodes. For a set of $H$ hidden nodes, we have a total of $N_p = 3H$ parameters in total: a weight and bias for each hidden node, and an output weight for each hidden node.

Since the objective function to minimize is the error $E$, the value of each network parameter $p_j$ (where $p_j$ represents $v_j$, $u_j$, or $w_j$) is updated using a scaled Newton's method:

\begin{equation}
p_{j,new}=p_j - \eta \frac {\frac {\partial E}{\partial {p_j}}} {\frac {\partial^2 E}{\partial {p_j^2}}}
\end{equation}

where $\eta$ is the *learning rate* (usually $\alpha < 1$). The learning rate is used to reduce the chance of solution instability due to large values of the numerator, or small values of the denominator, in (12). The derivatives of $E$ are computed from (4) using the known form of the network and internal parameters.

This process of parameter updates is repeated until convergence is achieved (an error minimum is reached), or until the specified maximum number of training epochs have been applied. Note that the same set of training points is presented during each epoch.

## Computing the derivatives of $E$

The first partial derivative of $E$ with respect to any network parameter $p_j$ is given by:

\begin{equation}
\frac {\partial E}{\partial p_j} =
\frac {\partial}{\partial p_j} \sum_{i=1}^{n}\left(\frac{\partial^2 y_{ti}}{\partial x_i^2}-f_i\right)^2 =
2 \sum_{i=1}^n \left(\frac {\partial^2 y_{ti}}{\partial x_i^2} - f_i\right)\left(\frac {\partial^3 y_{ti}}{\partial p_j \partial x_i^2} - \frac {\partial f_i}{\partial p_j}\right)
\end{equation}

This expression requires the first partial derivatives of $\frac {\partial^2 y_{ti}}{\partial x_i^2}$ and $f_i$ with respect to the network parameters.

The second partial derivative of $E$ with respect to any network parameter $p_j$ is given by:

\begin{equation}
\frac {\partial^2 E}{\partial p_j^2} =
\frac {\partial}{\partial p_j} 2 \sum_{i=1}^n \left(\frac {\partial^2 y_{ti}}{\partial x_i^2} - f_i)(\frac {\partial^3 y_{ti}}{\partial p_j \partial x_i^2} - \frac {\partial f_i}{\partial p_j}\right)=
2 \sum_{i=1}^n \left[\left(\frac {\partial^2 y_{ti}}{\partial x_i^2} - f_i\right)\left(\frac {\partial^4 y_{ti}}{\partial p_j^2 \partial x_i^2} - \frac {\partial^2 f_i}{\partial p_j^2}\right) + \left(\frac {\partial^3 y_{ti}}{\partial p_j \partial x_i^2} - \frac {\partial f_i}{\partial p_j}\right)^2\right]
\end{equation}

This expression requires the second partial derivatives of $\frac {\partial^2 y_{ti}}{\partial x_i^2}$ and $f_i$ with respect to the network parameters.

The general forms of the partial derivatives of $\frac {\partial^2 y_{ti}}{\partial x_i^2}$ are:

\begin{equation}
\frac {\partial^3 y_{ti}}{\partial p_j \partial x_i^2} =
\frac {\partial}{\partial p_j} \left(x_i (1 - x_i) \frac {\partial^2 N_i}{\partial x_i^2} + 2 (1 - 2 x_i) \frac {\partial N_i}{\partial x_i} - 2 N_i\right) =
x_i^2 \frac {\partial^3 N_i}{\partial p_j \partial x_i^2} + 4 x_i \frac {\partial^2 N_i}{\partial p_j \partial x_i} + 2 \frac {\partial N_i}{\partial p_j}
\end{equation}

\begin{equation}
\frac {\partial^4 y_{ti}}{\partial p_j^2 \partial x_i^2} =
\frac {\partial}{\partial p_j} \left(x_i^2 \frac {\partial^3 N_i}{\partial p_j \partial x_i^2} + 4 x_i \frac {\partial^2 N_i}{\partial p_j \partial x_i} + 2 \frac {\partial N_i}{\partial p_j}\right) =
x_i^2 \frac {\partial^4 N_i}{\partial p_j^2 \partial x_i^2} + 4 x_i \frac {\partial^3 N_i}{\partial p_j^2 \partial x_i} + 2 \frac {\partial^2 N_i}{\partial p_j^2}
\end{equation}

The partial derivatives of $N_i$ with respect to the network parameters are (note that terms $j \neq k$ vanish):

\begin{equation}
\frac {\partial N_i}{\partial p_j} = \frac {\partial}{\partial p_j} \sum_{k=1}^H v_k \sigma_{ik} = v_j \frac {\partial \sigma_{ij}}{\partial p_j} + \frac {\partial v_j}{\partial p_j} \sigma_{ij} = v_j \sigma_{ij}^{(1)}\frac {\partial z_{ij}}{\partial p_j} + \frac {\partial v_j}{\partial p_j}\sigma_{ij}
\end{equation}

\begin{equation}
\frac {\partial^2 N_i}{\partial p_j^2} = \frac {\partial}{\partial p_j} \left(v_j \sigma_{ij}^{(1)}\frac {\partial z_{ij}}{\partial p_j} + \frac {\partial v_j}{\partial p_j}\sigma_{ij}\right) =
v_j \sigma_{ij}^{(2)} \left(\frac {\partial z_{ij}}{\partial p_j}\right)^2 + \frac {\partial v_j}{\partial p_j} \sigma_{ij}^{(1)} \frac {\partial z_{ij}}{\partial p_j} + \frac {\partial v_j}{\partial p_j} \sigma_{ij}^{(1)} \frac {\partial z_{ij}}{\partial p_j} + \frac {\partial^2 v_j}{\partial p_j^2} \sigma_{ij}
\end{equation}

The partial derivatives of $f_i$ with respect to the network parameters are:

\begin{equation}
\frac{\partial f_i}{\partial p_j} = \frac{\partial f_i}{\partial y_{ti}} \frac{\partial y_{ti}}{\partial p_j}
\end{equation}

\begin{equation}
\frac{\partial^2 f_i}{\partial p_j^2} = \frac{\partial f_i}{\partial y_{ti}} \frac{\partial^2 y_{ti}}{\partial p_j^2} + \frac{\partial^2 f_i}{\partial y_{ti}^2} (\frac{\partial^2 y_{ti}}{\partial p_j^2})^2
\end{equation}

The definitions of $\frac {\partial f_i}{\partial y_{ti}}$ and $\frac {\partial^2 f_i}{\partial y_{ti}^2}$ are obtained using (1).

We now need the cross-partials of $N_i$ with respect to $x_i$ and the network parameters $p_j$. Again, the terms $j \neq k$ vanish.

\begin{equation}
\begin{split}
\frac {\partial^2 N_i}{\partial p_j \partial x_i} =
\frac {\partial}{\partial p_j} \left(\sum_{k=1}^{H} v_k w_k \sigma_{ik}^{(1)}\right) =
v_j w_j \sigma_{ij}^{(2)} \frac {\partial z_{ij}}{\partial p_j} + v_j \frac {\partial w_j}{\partial p_j} \sigma_{ij}^{(1)} + \frac {\partial v_j}{\partial p_j} w_j \sigma_{ij}^{(1)}
\end{split}
\end{equation}

\begin{equation}
\frac {\partial^3 N_i}{\partial p_j^2 \partial x_i} =
\frac {\partial}{\partial p_j} \left(v_j w_j \sigma_{ij}^{(2)} \frac {\partial z_{ij}}{\partial p_j} + v_j \frac {\partial w_j}{\partial p_j} \sigma_{ij}^{(1)} + \frac {\partial v_j}{\partial p_j} w_j \sigma_{ij}^{(1)}\right) \\
= v_j w_j \sigma_{ij}^{(2)} \frac {\partial^2 z_{ij}}{\partial p_j^2} + v_j w_j \sigma_{ij}^{(3)} (\frac {\partial z_i}{\partial p_j})^2 + v_j \frac {\partial w_j}{\partial p_j} \sigma_{ij}^{(2)} \frac {\partial z_{ij}}{\partial p_j} + \frac {\partial v_j}{\partial p_j} w_j \sigma_{ij}^{(2)} \frac {\partial z_{ij}}{\partial p_j} \\
+ v_j \frac {\partial w_j}{\partial p_j} \sigma_{ij}^{(2)} \frac {\partial z_{ij}}{\partial p_j} + v_j \frac {\partial^2 w_j}{\partial p_j^2} \sigma_{ij}^{(1)} + \frac {\partial v_j}{\partial p_j} \frac {\partial w_j}{\partial p_j} \sigma_{ij}^{(1)} \\
+ \frac {\partial v_j}{\partial p_j} w_j \sigma_{ij}^{(2)} \frac {\partial z_{ij}}{\partial p_j} + \frac {\partial v_j}{\partial p_j} \frac {\partial w_j}{\partial p_j} \sigma_{ij}^{(1)} + \frac {\partial^2 v_j}{\partial p_j^2} w_j \sigma_{ij}^{(1)}
\end{equation}

Most of these expressions can now be simplified using the following relations between the network parameters:

\begin{equation}
\frac {\partial p_j}{\partial p_k} = \delta_{jk}
\end{equation}


\begin{equation}
\frac {\partial^2 p_j}{\partial p_k^2} = 0
\end{equation}

\begin{equation}
\frac {\partial z_{ij}}{\partial v_j} = 0
\end{equation}

\begin{equation}
\frac {\partial z_{ij}}{\partial u_j} = 1
\end{equation}

\begin{equation}
\frac {\partial z_{ij}}{\partial w_j} = x_i
\end{equation}

# Walking through an example problem

We will now walk through a complete problem which will illustrate how to use the nnode2bvp code to solve a 2nd-order ODE BVP.

## Define the ODE to solve

Consider the simple 2nd-order ODE, defined on the range [0, 1] (this is Problem 3, equation (29) from Lagaris et al):

\begin{equation}
\frac {d^2y}{dx^2} + \frac {1}{5} \frac {dy}{dx} + y = -\frac {1}{5} e^{-\frac {x}{5}} \cos(x)
\end{equation}

This can be rearranged into the standard form (1):

\begin{equation}
\frac {d^2y}{dx^2} = f(x,y) = -\frac {1}{5} e^{-\frac {x}{5}} \cos(x) - \frac {1}{5} \frac {dy}{dx} - y
\end{equation}

The analytical solution to this equation is:

\begin{equation}
y(x) = e^{-\frac {x}{5}} sin(x)
\end{equation}

The analytical solution and its derivative are shown in the figure below.

In [None]:
def ya(x):
    return exp(-x / 5) * sin(x)

# Define the 1st analytical derivative.
def dya_dx(x):
    return 1 / 5 * exp(-x / 5) * (5 * cos(x) - sin(x))

# Define the 2nd analytical derivative.
def d2ya_dx2(x):
    return (
        -2 / 25 * exp(-x / 5) * (5 * cos(x) + 12 * sin(x))
    )

# Define the original differential equation:
def F(x, y, dy_dx):
    return -1 / 5 * exp(-x / 5) * cos(x) - 1 / 5 * dy_dx - y

# Define the 1st y-partial derivative of the differential equation.
def dF_dy(x, y, dy_dx):
    return -1

# Define the 2nd y-partial derivative of the differential equation.
def d2F_dy2(x, y, dy_dx):
    return 0

In [None]:
xmin = 0
xmax = 1
n = 100
x = np.linspace(xmin, xmax, num = n)
y = np.zeros(n)
dy_dx = np.zeros(n)
d2y_dx2 = np.zeros(n)
for i in range(n):
    y[i] = ya(x[i])
    dy_dx[i] = dya_dx(x[i])
    d2y_dx2[i] = d2ya_dx2(x[i])
plt.xlabel('x')
plt.ylabel('$d^ky/dx^k$')
plt.plot(x, y, label = 'y(x)')
plt.plot(x, dy_dx, label = "$dy/dx$")
plt.plot(x, d2y_dx2, label = "$d^2y/dx^2$")
plt.legend()
plt.title("Figure 2: Lagaris problem 3 analytical solution and derivatives");

## Other required definitions

The code above also defines several derivative functions that are required for computation of the gradients by the neural network. In addition to the ODE itself (defined as the function F), the first two partial derivatives of the ODE function $f(x,y)$ with respect to $y$ are required (defined as the functions dF_dy and d2F_dy2).

## Define the boundary conditions

Boundary conditions for the ODE must be set before the solution is attempted. For 2nd-order ODE BVP, only Dirichlet boundary conditions (BC) are possible. In this case, we will always use the value of $y_t(0)$, denoted as $A$, and $y_t(1)$, denoted as $B$. For this problem, we are interested in a solution over the range $[0,1]$. The boundary conditions are therefore:

\begin{equation}
x_{min} = 0 \\
x_{max} = 1 \\
A = y(0) = 0 \\
B = y(1) = e^{-1/5} \sin(1) = 0.688938...
\end{equation}


## Create the training data

For the purposes of this example, an evenly-spaced set of training points will be used to train the neural network.

In [None]:
nt = 10
xt = np.linspace(xmin, xmax, num = nt)

Note that repeated runs of the same ODE will usually result in slightly different solutions, due to the random number generator. To ensure repeatable results, seed the random number generator with a fixed value before each run.

## Train the model to solve the ODE

We can now train the network. The call below shows the minimum arguments required to call the nnode1() function. All tunable parameters (learning rate, hidden layer size, number of training epochs) are given default values (0.01, 10, 1000, respectively). The training function returns the computed values of $y$ and $\frac {dy}{dx}$ at the training points.

In [None]:
from nnode2bvp import nnode2bvp
A = ya(xmin)
B = ya(xmax)
np.random.seed(0)
(yt, dyt_dx, d2yt_dx2) = nnode2bvp(xt, F, dF_dy, d2F_dy2, A, B)

Plot the results of this training run.

In [None]:
plt.plot(xt, yt, label = '$y_t$')
plt.plot(xt, dyt_dx, label = '$dy_t/dx$')
plt.plot(xt, d2yt_dx2, label = '$d^2y_t/dx^2$')
plt.xlabel('$x_t$')
plt.ylabel('$y_t$')
plt.legend()
plt.title("Figure 3: Trained solution and derivative");

Plot the error in the estimated solution and derivatives.

In [None]:
y = np.zeros(nt)
dy_dx = np.zeros(nt)
d2y_dx2 = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
    d2y_dx2[i] = d2ya_dx2(xt[i])
plt.plot(xt, yt - y, label = '$y_t-y$')
plt.plot(xt, dyt_dx - dy_dx, label = '$dy_t/dx-dy/dx$')
plt.plot(xt, d2yt_dx2 - d2y_dx2, label = '$d^2y_t/dx^2-d^2y/dx^2$')
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 4: Error in trained solution");

Now try repeating the analysis with a larger number of hidden nodes, and plot the error.

In [None]:
np.random.seed(0)
(yt, dyt_dx, d2yt_dx2) = nnode2bvp(xt, F, dF_dy, d2F_dy2, A, B, nhid = 20)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
d2y_dx2 = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
    d2y_dx2[i] = d2ya_dx2(xt[i])
plt.plot(xt, yt - y, label = '$y_t-y$')
plt.plot(xt, dyt_dx - dy_dx, label = '$dy_t/dx-dy/dx$')
plt.plot(xt, d2yt_dx2 - d2y_dx2, label = '$d^2y_t/dx^2-d^2y/dx^2$')
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 5: Error in trained solution (20 hidden nodes)");

Now try repeating the analysis with a slightly larger learning rate, and plot the error.

In [None]:
np.random.seed(0)
(yt, dyt_dx, d2yt_dx2) = nnode2bvp(xt, F, dF_dy, d2F_dy2, A, B,
                                nhid = 20, eta = 0.011)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
d2y_dx2 = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
    d2y_dx2[i] = d2ya_dx2(xt[i])
plt.plot(xt, yt - y, label = '$y_t-y$')
plt.plot(xt, dyt_dx - dy_dx, label = '$dy_t/dx-dy/dx$')
plt.plot(xt, d2yt_dx2 - d2y_dx2, label = '$d^2y_t/dx^2-d^2y/dx^2$')
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 6: Error in trained solution (20 hidden nodes, $\eta$ = 0.011)");

Now try repeating the analysis with a larger number of training epochs, and plot the error.

In [None]:
np.random.seed(0)
(yt, dyt_dx, d2yt_dx2) = nnode2bvp(xt, F, dF_dy, d2F_dy2, A, B, nhid = 20, maxepochs = 2000)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = ya(xt[i])
    dy_dx[i] = dya_dx(xt[i])
plt.plot(xt, yt - y, label = '$y_t-y$')
plt.plot(xt, dyt_dx - dy_dx, label = '$dy_t/dx-dy/dx$')
plt.plot(xt, d2yt_dx2 - d2y_dx2, label = '$d^2y_t/dx^2-d^2y/dx^2$')
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 7: Error in trained solution (20 hidden nodes, 2000 epochs)");

## Using a ODE definition module

Rather than entering ODE definitions in this notebook, the required definitions can be entered in a separate Python module, and imported. For example, the previous code is also encapsulated in the module lagaris03ivp.py, and can be imported:

In [None]:
import lagaris03bvp

We can now run the net using the information in this module.

In [None]:
np.random.seed(0)
(yt, dyt_dx, d2yt_dx2) = nnode2bvp(xt, lagaris03bvp.F, lagaris03bvp.dF_dy, lagaris03bvp.d2F_dy2,
                                lagaris03bvp.ymin, lagaris03bvp.ymax)
y = np.zeros(nt)
dy_dx = np.zeros(nt)
for i in range(nt):
    y[i] = lagaris03bvp.ya(xt[i])
    dy_dx[i] = lagaris03bvp.dya_dx(xt[i])
plt.plot(xt, yt - y, label = '$y_t-y$')
plt.plot(xt, dyt_dx - dy_dx, label = '$dy_t/dx-dy/dx$')
plt.plot(xt, d2yt_dx2 - d2y_dx2, label = '$d^2y_t/dx^2-d^2y/dx^2$')
plt.xlabel('$x_t$')
plt.ylabel('Error')
plt.legend()
plt.title("Figure 8: Error in trained solution (using lagaris03bvp.py)");