# Physics-Informed Neural Network (PINN) in Julia

This is a simple showcase of how PINNs can learn the solution to (Partial) Differential Equations without labelled data by leveraging automatic differentation to train on a residuum loss of the boundary value problem.

We will consider the 1D Poisson equation
$$
\begin{cases}
\frac{\partial^2 u}{\partial x^2} &= - f(x), \qquad & x \in \Omega = (0, 1)
\\
u(0) &= 0 = u(1)
\end{cases}
$$

For $f(x) = \sin(\pi x)$, the analytical solution is $\hat{u}(x) = \frac{1}{\pi^2} \sin(\pi x)$. We aim to train a shallow neural network to learn the mapping $x \mapsto u$.

This can be done by chosing collocation points, i.e., random points within the domain at which we enfore the PDE. A difference of the neural network against this underlying description will constitute to the loss. Additionally, we will also penalize if the network does not obey the homogeneous Dirichlet boundary conditions. As such our loss is

$$
\mathcal{L} = \alpha_{int} \frac{1}{2N} \sum_{i=1}^N \left( \frac{\partial^2 u}{\partial x^2}\bigg|_{x_i} + f(x_i)  \right)^2  + \alpha_{bc} \frac{1}{2 \cdot 2} \left( u(0)^2 + u(1)^2 \right)
$$

with hyperparameters to weigh the two components of the loss. Our neural network is parameterized by its weights and biases. We can backprop from this loss into the parameter space to obtain a gradient estimate which guides a gradient-based optimizer (here we will use ADAM).

**Julia's reverse-mode automatic differentiation ecosystem, as of now, does not properly support higher-order autodiff**. However, this is crucial, because ultimately we need three autodiff passes, two to get the second derivative within the loss formulation and another one to obtain a gradient estimate in the parameter space.

## Employed architecture

This intro follows the work of Lagaris et al. ([https://arxiv.org/abs/physics/9705023](https://arxiv.org/abs/physics/9705023)) that use a neural network with **one hidden layer**. By the universal approximation theorem, this is sufficient to capture (almost) any function. Hence, given the hidden layer is chosen large enough, it should also be possible to approximate the solution to the PDE. The forward pass of the network becomes

$$
u = v^T \sigma.(w x + b)
$$

We assume our network to be a scalar-to-scalar map, hence

$$
x \in \R, w \in \R^h, b \in \R^h, v \in \R^h, u \in \R
$$

with $h$ being the size of the hidden dimension.

### Analytical Input-Output derivative

**Our goal is to reduce to only one application of the Julia reverse-mode autodiff engine; the pullback from loss to parameter space.** As such, we want to find hand-coded derivatives for the network architecture in their input-output relation.

We can derive this simple shallow network to get

$$
\begin{aligned}
\frac{\partial u}{\partial x} &= (v \odot w)^T \sigma'(w x + b)
\\
\frac{\partial^2 u}{\partial x^2} &= (v \odot w.^2)^T \sigma''(w x + b)
\\
\frac{\partial^l u}{\partial x^l} &= (v \odot w.^l)^T \sigma^{(l)}(w x + b)
\end{aligned}
$$

### The batched case

For all practical applications, we want to query our network batched, i.e., for multiple inputs at the same time. Following the Julia convention, we will therefore the denote the collection of inputs as $x \in \R^{1 \times N}$ and the collection of outputs as $u \in \R^{1 \times N}$. As such, the forward pass becomes

$$
u = V \cdot \sigma.(W \cdot x .+ b)
$$

with the sizes

$$
x \in \R^{1 \times N}, W \in \R^{h, 1}, b \in \R^h, V \in \R^{1, h}, u \in \R^{1 \times N}
$$

### Properties of the sigmoid

Its higher derivatives can be expressed using the primal output

$$
\begin{aligned}
\sigma(x) &= \frac{1}{1 + e^{-x}}
\\
\sigma' &= \sigma (1 - \sigma)
\\
\sigma'' &= \sigma (1 - \sigma) \left( 1- 2\sigma \right) = \sigma' \left( 1- 2\sigma \right)
\end{aligned}
$$



In [None]:
using Optimisers, Zygote, Plots, Random, Distributions

In [None]:
SEED = 42
N_collocation_points = 50
HIDDEN_DEPTH = 100
LEARNING_RATE = 1e-3
N_EPOCHS = 40_000
BC_LOSS_WEIGHT = 100.0

In [None]:
rhs_function(x) = sin(π * x)
analytical_solution(x) = sin(π * x) / π^2

In [None]:
rng = MersenneTwister(SEED)

In [None]:
sigmoid(x) = 1.0 / (1.0 + exp(-x))

In [None]:
function initialize_parameters()
    # Initialize the weights according to the Xavier Glorot initializer
    uniform_limit = sqrt(6 / (1 + HIDDEN_DEPTH))
    W = rand(
        rng,
        Uniform(-uniform_limit, +uniform_limit),
        HIDDEN_DEPTH,
        1,
    )
    V = rand(
        rng,
        Uniform(-uniform_limit, +uniform_limit),
        1,
        HIDDEN_DEPTH,
    )
    b = zeros(HIDDEN_DEPTH)
    parameters = (; W, V, b)
    return parameters
end

In [None]:
#parameters = (; W, V, b)
methds = [:log10, :direct]


In [None]:
network_forward(x, p) = p.V * sigmoid.(p.W * x .+ p.b)

In [None]:
x_line = reshape(collect(range(0.0, stop=1.0, length=100)), (1, 100))

In [None]:
# Plot initial prediction of the network (together with the analytical solution)
plot(x_line[:], network_forward(x_line, parameters)[:], label="initial prediction")
plot!(x_line[:], analytical_solution.(x_line[:]), label="analytical_solution")

In [None]:
function network_output_and_first_two_derivatives(x, p)
    activated_state = sigmoid.(p.W * x .+ p.b)
    sigmoid_prime = activated_state .* (1.0 .- activated_state)
    sigmoid_double_prime = sigmoid_prime .* (1.0 .- 2.0 .* activated_state)

    output = p.V * activated_state
    first_derivative = (p.V .* p.W') * sigmoid_prime
    second_derivative = (p.V .* p.W' .* p.W') * sigmoid_double_prime

    return output, first_derivative, second_derivative
end

In [None]:
_output, _first_derivative, _second_derivative = network_output_and_first_two_derivatives(x_line, parameters)

In [None]:
_first_derivative

In [None]:
_zygote_first_derivative = Zygote.gradient(x -> sum(network_forward(x, parameters)), x_line)[1]

In [None]:
interior_collocation_points = rand(rng, Uniform(0.0, 1.0), (1, N_collocation_points))

In [None]:
boundary_collocation_points = [0.0 1.0]

In [None]:
function loss_forward_direct(p)
    output, first_derivative, second_derivative = network_output_and_first_two_derivatives(
        interior_collocation_points,
        p,
    )

    interior_residuals = second_derivative .+ rhs_function.(interior_collocation_points)

    interior_loss = 0.5 * mean(interior_residuals.^2)

    boundary_residuals = network_forward(boundary_collocation_points, p) .- 0.0

    boundary_loss = 0.5 * mean(boundary_residuals.^2)

    total_loss = interior_loss + BC_LOSS_WEIGHT * boundary_loss
    return total_loss
end

function loss_forward_log10(p)
    output, first_derivative, second_derivative = network_output_and_first_two_derivatives(
        interior_collocation_points,
        p,
    )

    interior_residuals = second_derivative .+ rhs_function.(interior_collocation_points)

    interior_loss = 0.5 * mean(interior_residuals.^2)

    boundary_residuals = network_forward(boundary_collocation_points, p) .- 0.0

    boundary_loss = 0.5 * mean(boundary_residuals.^2)

    total_loss = interior_loss + BC_LOSS_WEIGHT * boundary_loss
    return log10(total_loss)
end

In [None]:
loss_forward_direct(parameters)

In [None]:
out, back = Zygote.pullback(loss_forward_direct, parameters)

In [None]:
back(1.0)[1]

In [None]:
opt = Adam(LEARNING_RATE)

In [None]:
loss_history_direct = []
loss_history_log10 = []

for method in methds
    println("Training with method: $method")

if method == :direct
    parameters = initialize_parameters()
    opt_state = Optimisers.setup(opt, parameters)
    for i in 1:N_EPOCHS
        loss, back = Zygote.pullback(loss_forward_direct, parameters)
        push!(loss_history_direct, log10(loss))
        grad, = back(1.0)
        opt_state, parameters = Optimisers.update(opt_state, parameters, grad)
        if i % 100 == 0
            println("Epoch: $i, Loss: $loss")
        end
    end
elseif method == :log10
    parameters = initialize_parameters()
    opt_state = Optimisers.setup(opt, parameters)
    for i in 1:N_EPOCHS
        loss, back = Zygote.pullback(loss_forward_log10, parameters)
        push!(loss_history_log10, loss)
        grad, = back(1.0)
        opt_state, parameters = Optimisers.update(opt_state, parameters, grad)
        if i % 100 == 0
            println("Epoch: $i, Loss: $loss")
        end
    end
end
end

In [None]:
plot(loss_history_direct, label="direct loss function")#, yscale=:log10)
plot!(loss_history_log10, label="log10 of loss funciton")#, yscale=:log10)

In [None]:
plot(x_line[:], network_forward(x_line, parameters)[:], label="final prediction")
plot!(x_line[:], analytical_solution.(x_line[:]), label="analytical_solution")