# Tutorial: Introduction to Differential Programming

This tutorial introduces the reader to the fundamental concepts of differential program through concepts such as the 'Neural ODE' and 'universal' differential equations. Differential programming aims at its core to better exploit inherent problem structure to allow for significant model simplifications. At the core of this endeavour are autodifferentiation toolkits to obtain the gradients of instances, as well as complete program executions. Given current state-of-the-art AD toolkits we can take derivatives over pretty much any function 'f(x)'. This is an especially enticing perspective given current frameworks' excessive computational needs to construct and train model, as they always start training from 0.

Remark: The current frameworks are sadly not ready yet for large-scale deployments coupled with already validated scientific simulators.

 
 ## List of References:
 
 **Reference 1.** [Neural Ordinary Differential Equations](https://arxiv.org/abs/1806.07366)
 
 **Reference 2.** [Universal Differential Equations for Scientific Machine Learning](https://arxiv.org/abs/2001.04385)
 
 **Reference 3.** [What Is Differentiable Programming?](https://fluxml.ai/2019/02/07/what-is-differentiable-programming.html)
 
 **Reference 4.** [Differentiable Control Problems](https://fluxml.ai/2019/03/05/dp-vs-rl.html)
 
 
 ## Outline:
 
 **Section 1.** [Neural Ordinary Differential Equations](#ode)
 
 **Section 2.** [Universal Differential Equations](#universal)
 
 **Section 3.** [Exercise - Implement your own Neural Differential Equation](#ex)

# 1: Neural Ordinary Differential Equations <a name="ode"></a>

The core idea of the neural ODE can best be described by parameterizing the derivative of the latent states using a neural network. One can then subsequently apply a standard ODE solver to solve for the equations, which allows us to explicitly tradeoff between computation speed and accuracy. This is given in the following algorithm:

<img src="imgs/NeuralODE.png" width="750" height="350" />

(Source: Neural Ordinary Differential Equations)

In [None]:
using DifferentialEquations

Set up the Lotka-Volterra equation, which is most commonly known for modelling the relative population of predator-prey biological dynamical systems.

In [None]:
function lotka_volterra(du, u, p, t)
    x, y = u
    alpha, beta, delta, gamma = p
    du[1] = dx = alpha * x - beta * x * y
    du[2] = dy = -delta * y + gamma * x * y
end;

Set the artificial initial conditions

In [None]:
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
p = [1.5, 1.0, 3.0, 1.0];

Instantiate the ODE problem with DiffEq.jl

In [None]:
prob = ODEProblem(lotka_volterra, u0, tspan, p);

Solving the ODE..

In [1]:
sol = solve(prob);

UndefVarError: UndefVarError: solve not defined

Visualize the initial results with Plots.jl

In [None]:
using Plots
plot(sol)

Expressing the inputs to the Lotka-Volterra equation as functions we can now generalize

In [None]:
u0_f(p, t0) = [p[2], p[4]];
tspan_f(p) = (0.0, 10 * p[4]);
p = [1.5, 1.0, 3.0, 1.0];
prob = ODEProblem(lotka_volterra, u0_f, tspan_f, p);

Which now gives us the opportunity to first solve this ODE classically

In [None]:
p = [1.5, 1.0, 3.0, 1.0];
prob = ODEProblem(lotka_volterra, u0, tspan, p);
sol = solve(prob, Tsit5(), saveat=0.1);
A = sol[1, :];

In [None]:
# Plot result
plot(sol)
t = 0:0.1:10.0
scatter!(t, A)

And then with the more 'modern' interface of Flux.jl and DiffEqFlux.jl

In [None]:
# Import Flux and DiffEqFlux
using Flux, DiffEqFlux

Solve using the interface of DiffEqFlux

In [None]:
concrete_solve(prob, Tsit5(), u0, p, saveat=0.1);

This function can now be used as the input to a neural network, where we use the ODE in the predictive part of the neural network

In [None]:
p = [2.2, 1.0, 2.0, 0.4]
params = Flux.params(p)

function predict_rd()
    concrete_solve(prob, Tsit5(), u0, p, saveat=0.1)[1, :]
end

loss_rd() = sum(abs2, x-1 for x in predict_rd())

For which we can now employ the whole machinery of Flux.jl with an ADAM optimizer to solve

In [None]:
data = Iterators.repeated((), 100)
opt = ADAM(0.1)
cb = function ()
    display(loss_rd())
    # update parameter p with remake
    display(plot(solve(remake(prob, p=p), Tsit5(), saveat=0.1), ylim=(0, 6)))
end

# Display ODE
cb()

Flux.train!(loss_rd, params, data, opt, cb = cb)

A third option is to embed the ODE directly into the neural network, as done below

In [None]:
# Embed the ODE into a multilayer perceptron
m = Chain(
    Dense(28^2, 32, relu),
    # An ODE of 32 parameters
    p -> concrete_solve(prob, Tsit5(), u0, p, saveat=0.1)[1,:],
    Dense(32, 10),
    softmax)

The overarching requirement across all these different approaches to define and/or solve an ODE is the fact that we must always be able to define the forward pass of the solver. This is valid for both, machine learning and scientific computing. Validating against the implementation from the NeuralODE paper..

In [None]:
# Using the example from the Neural ODE release paper
u0 = Float32[2.; 0.]
datasize = 30
tspan = (0.0f0, 1.5f0)

function trueODEfunc(du, u, p, t)
    true_A = [-0.1 2.0; -2.0 -0.1]
    du .= ((u.^3)'true_A)'
end

t = range(tspan[1], tspan[2], length=datasize)
prob = ODEProblem(trueODEfunc, u0, tspan)
ode_data = Array(solve(prob, Tsit5(), saveat=t));

Write down the forward pass using the higher-level abstraction of a 'NeuralODE'

In [None]:
# Define the forward pass
dudt = Chain(x -> x.^3,
             Dense(2, 50, tanh),
             Dense(50, 2))
ps = Flux.params(dudt)
n_ode = NeuralODE(dudt, tspan, Tsit5(), saveat=t);

Define the prediction function for the neural network and a classical loss metric

In [None]:
function predict_n_ode()
    n_ode(u0)
end

loss_n_ode() = sum(abs2, ode_data .- predict_n_ode());

Train the ODE like we would normally train a neural network, including callback function and '@train' macro. The plot will be shown multiple times.

In [None]:
# Train the neural network
data = Iterators.repeated((), 1000)
opt = ADAM(0.1)
cb = function ()
    display(loss_n_ode())
    # plot current prediction against data
    cur_pred = predict_n_ode()
    pl = scatter(t, ode_data[1, :], label="data")
    scatter!(pl, t, cur_pred[1, :], label="prediction")
    display(plot(pl))
end

# Display the ODE with the initial parameter values
cb()

ps = Flux.params(n_ode)
Flux.train!(loss_n_ode, ps, data, opt, cb=cb)

# 2: Universal Differential Equations <a name="universal"></a>

Working from the realization that a neural network layer, be it with an embedded ODE or not, is always just a differentiable function a successor to the Neural ODEs was developed called the 'universal differential equations'. Possible types of universal differential equations, which we can implement are:

- Universal Ordinary Differential Equations (UODEs)
- Universal Stochastic Differential Equations (USDEs)
- Universal Delay Differential Equations (UDDEs)
- Universal Differential-Algebraic Equations (UDAEs)
- Universal Boundary Value Problems (UBVPs)
- Universal Partial Differential Equations (UPDEs)
- Universal Hybrid (Event-Driven) Differential Equations

## 2.1 Solving with adjoints - Partial Neural Adjoint

In [None]:
using DiffEqFlux, Flux, OrdinaryDiffEq

Define the initial conditions

In [None]:
u0 = Float32[0.8; 0.8]
tspan = (0.0f0, 25.0f0);

Set up the neural network and retrieve its parametrization

In [None]:
ann = Chain(Dense(2, 10, tanh), Dense(10,1))

p1, re = Flux.destructure(ann)
p2 = Float32[-2.0, 1.1]
p3 = [p1; p2]
ps = Flux.params(p3, u0);

Define the forward pass and solve the ODE

In [None]:
function dudt_(du, u, p, t)
    x, y = u
    du[1] = re(p[1:41])(u)[1]
    du[2] = p[end-1] * y + p[end] * x
end

prob = ODEProblem(dudt_, u0, tspan, p3)
concrete_solve(prob, Tsit5(), u0, p3, abstol=1e-8, reltol=1e-6);

Set up the adjoint to obtain the gradients

In [None]:
function predict_adjoint()
    Array(concrete_solve(prob, Tsit5(), u0, p3, saveat=0.0:0.1:25.0, abstol=1e-8, reltol=1e-6))
end

loss_adjoint() = sum(abs2, x-1 for x in predict_adjoint());

Train the neural network and have a look at the result

In [None]:
data = Iterators.repeated((), 100)
opt = ADAM(0.1)
cb = function ()
    display(loss_adjoint())
end

# Display ODE
cb()

Flux.train!(loss_adjoint, ps, data, opt, cb = cb)

## 2.2 Solving with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm

In [None]:
using DiffEqFlux, Flux, OrdinaryDiffEq, Optim, Zygote

Define the initial conditions

In [None]:
u0 = Float32[0.8, 0.8]
tspan = (0.0f0, 25.0f0);

Set up the neural network

In [None]:
ann = Chain(Dense(2, 10, tanh), Dense(10, 1))

p1, re = Flux.destructure(ann)
p2 = Float32[0.5, -0.5]
p3 = [p1; p2]
ptrain = [p3; u0];

Write down the forward pass and the ODE solve

In [None]:
# Define the forward pass
function dudt_(du, u, p, t)
    x, y = u
    du[1] = re(p[1:41])(u)[1]
    du[2] = p[end-1] * y + p[end] * x
end

prob = ODEProblem(dudt_, u0, tspan, p3)
concrete_solve(prob, Tsit5(), u0, p3, abstol=1e-8, reltol=1e-6);

Set up the adjoint, including gradients taken with Zygote for a faster convergence of the optimizer

In [None]:
# Set up the adjoint
function predict_adjoint(fullp)
    Array(concrete_solve(prob, Tsit5(), fullp[end-1:end], fullp[1:end-1], saveat=0.0:0.1:25.0, abstol=1e-8, reltol=1e-6))
end

loss_adjoint(fullp) = sum(abs2, x-1 for x in predict_adjoint(fullp))

function loss_adjoint_gradient!(G, fullp)
    G .= Zygote.gradient(loss_adjoint, fullp)[1]
end;

Train using BFGS including first-order gradients of the adjoint taken with Zygote

In [None]:
optimize(loss_adjoint, loss_adjoint_gradient!, ptrain, BFGS())

# 3: Exercise - Implement your own Neural Differential Equation <a name="ex"></a>

- Implement your own differential equation.
    - Experiment with the performance between the optimization-based BFGS algorithm and the neural adjoint on your
        self-set problem.
- Construct universal differential equations on the basis of recurrent neural networks and convolutional neural networks.
- Implement the Korteweg-de Vries (KdV) equation and examine the framework's behavior when dealing with high-order derivatives.
- Looking at the [Trebuchet example](https://github.com/FluxML/model-zoo/tree/cdda5cad3e87b216fa67069a5ca84a3016f2a604/games/differentiable-programming/trebuchet) for control policies in differentiable programming, develop a control policy for the above two equations.