[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jolin-io/KI2022-tutorial-universal-differential-equations/main?filepath=03%20deep%20dive%20into%20universal%20differential%20equations.ipynb)

<a href="https://www.jolin.io" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/Jolin/Jolin-Banner-Website-v1.1-darkmode.webp">
</a>

# Deep dive into Universal Differential Equations in <img height="60px" style='height:60px;display:inline;' alt="Julia" src="https://julialang.org/assets/infra/logo.svg">

Outline of this extensive deep dive:
1. Scientific Machine Learning with UDEs
    1. Differential Equations
    2. DiffEq within Machine Learning
    3. Machine Learning within DiffEq
    4. Machine Learning within DiffEq - alternative perspective
    5. More UDEs
2. Symbolic Regression with DataDrivenDiffEq
    1. Symbolic regression without UDE
    2. Symbolic regression with UDE

# Scientific Machine Learning with UDEs

The term Universal Differential Equations was introduced in the paper [Universal Differential Equations for Scientific
Machine Learning by Rackauckas et. al. 2021](https://arxiv.org/pdf/2001.04385.pdf)

**UDE is about using machine learning as part of differential equation problems.** As such it is one way of combining scientific model-based approaches with machine learning techniques, which is often named scientific machine learning. 

Another combination of machine learning and differential equations are for example physics-informed neural networks (PINN). These are not the topic of today, but have a look at [NeuralPDE.jl](https://github.com/SciML/NeuralPDE.jl) if you are interested.

Here an overview over the scientific machine learning ecosystem as described in the UDE paper:
![](./assets/overview_sciml_ecosystem.png)

This is a huge ecosystem. For today we focuse mostly on the last layer of implementing Differential Equations which depend on Neural Networks directly.

In [None]:
import DifferentialEquations, DiffEqSensitivity, DiffEqFlux
import Symbolics, ModelingToolkit, DataDrivenDiffEq
import Optimization, OptimizationOptimisers, OptimizationOptimJL
import Lux, ComponentArrays
import Plots, Random, Statistics, StatsBase, DelimitedFiles

using CommonSolve: solve

rng = Random.default_rng()
Random.seed!(rng, 12345)

## DifferentialEquations.jl

Example [Lotka-Volterra equations](https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations): Population of rabbits and foxes

<center>

rabbits: $ x^\prime = \alpha x - \beta x y $

</center>

the rate of change of the prey's population is given by its own growth rate ($\alpha$) minus the rate at which it is preyed upon ($\beta$).


<center>

foxes: $ y^\prime = \gamma x y - \delta y $

</center>

the rate of change of the predator's population depends upon the rate at which it consumes prey ($\gamma$), minus its intrinsic death rate ($\delta$)

In [None]:
function lotka_volterra(du, u, p, t)
    x, y = u
    α, β, δ, γ = p
    du[1] = dx = α*x - β*x*y
    du[2] = dy = -δ*y + γ*x*y
end
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
p = [1.5, 1.0, 3.0, 1.0]
ode_prob = DifferentialEquations.ODEProblem(lotka_volterra, u0, tspan, p)

In [None]:
ode_sol = solve(ode_prob, saveat=0.1)
Plots.plot(ode_sol)

## DiffEq within Machine Learning

This just means we learn the DiffEq parameters via gradient-based Optimization.

In [None]:
function predict(parameters, ode_prob=ode_prob, t=ode_sol.t)
    solve(ode_prob, saveat = t, p = parameters)
end
function loss_function(parameters, data)
    pred = Array(predict(parameters))[1,:]
    return sum(abs2, pred .- data)
end

In [None]:
ps_initial = Random.rand!(similar(ode_prob.p))
Plots.plot(predict(ps_initial))

In [None]:
ps_initial = ode_prob.p
data = 1.0
loss_function(ps_initial, data)

In [None]:
losses = Float64[]
function callback(p, l)
    push!(losses, l)
    if length(losses) % 50 == 0
        Plots.plot(losses, show = :inline, yscale = :log10,
            label = "loss", xlabel = "#epochs", ylabel="loss (log10 scale)")
    end
    return false  # return bool `halt`
end

ps_trained = let data = data
    minimizer = ps_initial
    opt_function = Optimization.OptimizationFunction(
        (ps, data) -> loss_function(ps, data),
        Optimization.AutoZygote(),
    )
    for (optimizer, maxiters) = [
            (OptimizationOptimisers.Adam(0.1), 300),
            (OptimizationOptimisers.Adam(0.01), 500),
        ]
        opt_prob = Optimization.OptimizationProblem(opt_function, minimizer, data)
        opt_sol = solve(opt_prob, optimizer,
            callback = callback, maxiters = maxiters)
        minimizer = opt_sol.minimizer
    end
    minimizer
end

👉 experiment with the optimizers [Adam](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam) and try different configurations

👉 plot the initial prediction before and after training

In [None]:
# your space

## Machine Learning within DiffEq

The data has been taken from https://jmahaffy.sdsu.edu/courses/f00/math122/labs/labj/q3v1.htm
(Originally published in E. P. Odum (1953), Fundamentals of Ecology, Philadelphia, W. B. Saunders)

The code was updated from the slightly out-dated [official UDE paper example](https://github.com/ChrisRackauckas/universal_differential_equations/blob/master/LotkaVolterra/hudson_bay.jl).

### First some data

In [None]:
hudson_bay_data = DelimitedFiles.readdlm("assets/hudson_bay_data.dat", '\t', Float32, '\n')

In [None]:
# normalize time to start at 0
t = hudson_bay_data[:, 1] .- hudson_bay_data[1, 1]
tspan = (t[begin], t[end])

# Measurements of prey and predator
X = Matrix(transpose(hudson_bay_data[:, 2:3]))
# Normalize the data; since the data domain is strictly positive
# we just need to divide by the maximum
xscale = maximum(X, dims =2)
X .= 1f0 ./ xscale .* X

# Plot the data
Plots.scatter(t, transpose(X), xlabel = "t", ylabel = "x(t), y(t)")
Plots.plot!(t, transpose(X), xlabel = "t", ylabel = "x(t), y(t)")

### The machine learning part

In [None]:
# Gaussian RBF as activation
rbf(x) = exp.(-(x.^2))

# Define the network 2->5->5->5->2
model_lux = Lux.Chain(
    Lux.Dense(2,5,rbf),
    Lux.Dense(5,5, rbf),
    Lux.Dense(5,5, tanh),
    Lux.Dense(5,2)
)

In [None]:
ps_lux, st_lux = Lux.setup(rng, model_lux)

### Bringing ml into the differential equations

In [None]:
# Define the hybrid model
function ude_dynamics!(du,u, p, t)
    u_pred, _st_lux = model_lux(u, p.ps_lux, st_lux) # Network prediction
    # We assume a linear birth rate for the prey
    du[1] = p.ps_ode[1]*u[1] + u_pred[1]
    # We assume a linear decay rate for the predator
    du[2] = -p.ps_ode[2]*u[2] + u_pred[2]
end

# Get the initial parameters, first two are linear birth/decay of prey and predator
p_initial = ComponentArrays.ComponentVector((
    ps_ode = rand(rng, Float32, 2),
    ps_lux = ps_lux,
))
ode_prob_nn = DifferentialEquations.ODEProblem(ude_dynamics!, X[:, 1], tspan, p_initial)

### Training

Training is a bit more elaborate. We first use a special training loss provided by `DiffEqFlux`. It is called `muliple_shoot` which essentially devides the training data into pieces and learns on the single pieces instead of learning everything at once.

For more details on `multiple_shoot` see the [DiffEqFlux.jl documentation](https://diffeqflux.sciml.ai/stable/examples/multiple_shooting/).

In [None]:
# parameters for Multiple Shooting
group_size = 5
continuity_term = 200.0f0

function shooting_loss(parameters)
    loss_compare(data, pred) = sum(abs2, data - pred)
    loss, pred = DiffEqFlux.multiple_shoot(
        parameters, X, t, ode_prob_nn, loss_compare, DifferentialEquations.Vern7(), group_size;
        continuity_term = continuity_term)
    loss
end

Define a standard predictor and loss.

The loss comes with an extra penalty which forces parameters to be small.

In [None]:
function predict(parameters, t = t)
    solve(
        ode_prob_nn,
        DifferentialEquations.Vern7(),
        p = parameters,
        saveat = t,
        abstol = 1e-6, reltol = 1e-6,
        sensealg = DiffEqSensitivity.ForwardDiffSensitivity()
    )
end

function loss(parameters)
    X_pred = Array(predict(parameters))
    loss_diff = sum(abs2, X - X_pred) / size(X, 2)
    loss_penalty = sum(abs2, parameters.ps_lux) / length(parameters.ps_lux)
    factor_penalty = convert(eltype(parameters), 1e-3)
    loss_diff + factor_penalty * loss_penalty 
end

👉 run both losses and visualize the predict

In [None]:
# your space

In [None]:
# your space

Let's train

In [None]:
# Container to track the losses
losses = Float32[]

# Callback to show the loss during training
callback(parameters, args...) = begin
    l = loss(parameters) # Equivalent L2 loss
    push!(losses, l)
    if length(losses) % 50 == 0
        Plots.plot(losses, show = :inline, yscale = :log10,
            label = "loss", xlabel = "#epochs", ylabel="loss (log10 scale)")
    end
    return false  # return bool `halt`
end

we train twice, first with faster learning rate, second with slower learning rate

In [None]:
minimizer = p_initial

for (opt_alg, maxiters, loss_func) = [
        (OptimizationOptimisers.Adam(0.1), 100, shooting_loss),
        (OptimizationOptimisers.Adam(0.01), 100, loss),
    ]
    opt_func = Optimization.OptimizationFunction((ps, _) -> loss_func(ps), Optimization.AutoZygote())   
    opt_prob = Optimization.OptimizationProblem(opt_func, minimizer) 
    opt_sol = solve(opt_prob, opt_alg, maxiters = maxiters, callback = callback)
    minimizer = opt_sol.minimizer
    println("Training loss after $(length(losses)) iterations: $(losses[end])")
end
p_trained = minimizer

Did it work out?

In [None]:
# Interpolate the solution
tsample = t[1]:0.5:t[end]
X_pred = Array(predict(p_trained, tsample))
# Trained on noisy data vs real solution
Plots.scatter(t, X', label = ["Measurements" nothing], xlabel = "t", ylabel = "x(t), y(t)")
Plots.plot!(tsample, X_pred', label = ["UDE Approximation" nothing])

Oh! We need to improve.

👉 adapt the training procedure(the number of iterations, the [Adam](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam) config, ...) to make our model fit the data at least reasonable

### Simulating the future

👉 now that the training looks good, let's check whether the model is stable on the long run

simulate our `ode_prob_nn` for some time into the future (hint: you may want to change `tspan`)

In [None]:
# your space

##  Machine Learning within DiffEq - alternative perspective 

The famous paper **Neural Ordinary Differential Equations (Chen et al. 2019)** introduced the following intuition for Neural Ordinary Differential Equations.

Residual Neural Network (discrete difference layers)
$$h_{t+1} = h_t + f(h_t, \theta_t)$$

Neural Ordinary Differential Equations
$$\frac{dh(t)}{dt} = f(h(t), t, \theta)$$

![](https://www.jolin.io/assets/examples/NeuralODE-Comparing-ResNet.png)

## More UDEs

One key aspect of Julia's scientific machine learning stack is the immense features it provides.

Just a short summary from the UDE paper.

![UDE features](./assets/ude_overview_features.png)

and here the benchmarks
![UDE benchmarks](./assets/ude_benchmarks.png)

# Symbolic regression

Symbolic regressions is the discipline of fitting mathematical formular to given data. We use DataDrivenDiffEq.jl

First, generate the basis functions, multivariate polynomials up to deg 5 and sine


In [None]:
Symbolics.@variables u[1:2]
b = DataDrivenDiffEq.polynomial_basis(u, 5)
basis = DataDrivenDiffEq.Basis(b, u)

## Symbolic regression without UDE

Direct Identification via SINDy + Collocation (estimates derivative)

In [None]:
# Create the problem using a gaussian kernel for collocation
full_problem = DataDrivenDiffEq.ContinuousDataDrivenProblem(X, t, DataDrivenDiffEq.GaussianKernel())
# Create the thresholds which should be used in the search process
λ = Float32.(exp10.(-7:0.1:5))
# Create an optimizer for the SINDy problem
opt = DataDrivenDiffEq.STLSQ(λ)

full_res = solve(full_problem, basis, opt,
    maxiter = 10_000, progress = true, denoise = true, normalize = true)

println(full_res)

In [None]:
println(DataDrivenDiffEq.result(full_res))

In [None]:
println(DataDrivenDiffEq.parameter_map(full_res))

In [None]:
full_pred = full_res(full_problem.X, full_res.parameters, full_problem.t)

In [None]:
p1 = Plots.plot(full_problem.t, full_problem.DX[1,:], label="collocation du1")
Plots.plot!(full_problem.t, full_pred[1,:], label="symbolic regression du1")

p2 = Plots.plot(full_problem.t, full_problem.DX[2,:], label="collocation du2")
Plots.plot!(full_problem.t, full_pred[2,:], label="symbolic regression du2")

Plots.plot(p1, p2, layout=(2,1))

## Symbolic regression with UDE

We want to apply symbolic regression to the neural network part.

Importantly, the neural net only captured the **interactions** between predators and prey.
The **linear parts** were already given (structurely), and fit separately -  they don't matter here.

First, let's look at what we actually learned in our neural network. 

In [None]:
# standard Lotka Volterra 
p_ideal = ode_prob.p
Y_ideal = [
    -p_ideal[2] * (X_pred[1,:] .* X_pred[2,:])'
    p_ideal[4] * (X_pred[1,:] .* X_pred[2,:])'
]

# prediction of global data driven approach, minus linear learned terms
full_pred2 = full_res(X_pred, full_res.parameters, tsample)
full_problem_DX_nn_only = full_pred2 - [1, -1] .* p_trained.ps_ode .* X_pred

# Neural network guess
Y_pred, _st_lux = model_lux(X_pred, p_trained.ps_lux, st_lux)

In [None]:
p1 = Plots.plot(tsample, Y_ideal[1,:], label = "Ideal Lotka Volterra")
Plots.plot!(tsample, full_problem_DX_nn_only[1,:], label = "symbolic regression without ude")
Plots.plot!(tsample, Y_pred[1,:], label = "UDE")

p2 = Plots.plot(tsample, Y_ideal[2,:], label = "Ideal Lotka Volterra")
Plots.plot!(tsample, full_problem_DX_nn_only[2,:], label = "symbolic regression without ude")
Plots.plot!(tsample, Y_pred[2,:], label = "UDE")

Plots.plot(p1, p2, layout=(2,1))

As this looks reasonable, let's start the symbolic regression.

We can now directly specify a function relationship (and don't need to deal with derivatives here)

In [None]:
nn_problem = DataDrivenDiffEq.DataDrivenProblem(X_pred, Y=Y_pred)

In [None]:
nn_res = solve(nn_problem, basis, opt, maxiter = 10_000, progress = true, normalize = false, denoise = true)
println(nn_res)

In [None]:
println(DataDrivenDiffEq.result(nn_res))

In [None]:
println(DataDrivenDiffEq.parameter_map(nn_res))

In [None]:
nn_pred = nn_res(X_pred, nn_res.parameters)

In [None]:
p1 = Plots.plot(nn_pred[1,:], label="symbolic regression")
Plots.plot!(Y_pred[1,:], label="NeuralODE prediction")

p2 = Plots.plot(nn_pred[2,:], label="symbolic regression")
Plots.plot!(Y_pred[2,:], label="NeuralODE prediction")

Plots.plot(p1, p2, layout=(2,1))

👉 apply `DataDrivenProblem` to ideal case

In [None]:
# your space

# That was the deep-dive into Universal Differential Equations in julia - Thank you for participating 🙂

I've prepared a **bonus topic** about combining differential equations with bayesian inference, i.e. probabilistic parameter and error estimation: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jolin-io/KI2022-tutorial-universal-differential-equations/main?filepath=04%20introduction%20to%20bayesian%20differential%20equations.ipynb)

If you have question, suggestions, or you are just interested in Julia, contact me:
- Stephan Sahm stephan.sahm@jolin.io

### Further Material

- [Blog Post DiffEqFlux.jl](https://julialang.org/blog/2019/01/fluxdiffeq/)
- [Documentation DiffEqFlux.jl](https://diffeqflux.sciml.ai/stable/)
- [Paper Neural Ordinary Differential Equations (Chen et al. 2019)](https://arxiv.org/abs/1806.07366)
- [Paper Universal Differential Equations for SciML (Rackauckas et al. 2020)](https://arxiv.org/abs/2001.04385)
- [Documentation DataDrivenDiffEq.jl](https://datadriven.sciml.ai/stable), [linear ODE example](https://datadriven.sciml.ai/stable/examples/2_linear_continuous_system/), [nonlinear ODE example](https://datadriven.sciml.ai/stable/examples/4_nonlinear_continuous_system/)

<a href="https://www.jolin.io" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/Jolin/Jolin-Banner-Website-v1.1-darkmode.webp">
</a>

#### Supported by [Jolin.io](https://www.jolin.io)

Jolin.io is an IT-consultancy for high-performance computing and data science

We are there to help you, if you want to
- try out Julia at your company, or
- transition Matlab, Fortran, R, Python, etc. to Julia
- or speed up your existing code