# Training a network against data from the Burgers equation

Training a Convolutional Neural Network (CNN) with data from the Burgers equation. See model_1d folder for the code that generated the data.

A simple setup for a surrogate model predicts the next time-step from the current one. This all starts from a mathematical model, which tyically involves Patial Differential Equations, but here we just collect the dynamic variables in the model in a vector $x$ at discrete times $t_0,t_1,\ldots,t_K$ and denote these as $x_k$. We can represent the numerical model as:

$x_{k+1}=M(x_k)$, starting with given initial condition $x_0$

Next, we run the numerical model and collect $x_0,x_1,\ldots,x_K$ as training data for our Machine Learning (ML) model. The aim of the ML model will be to approximate the numerical model, and we'll call it a surrogate model. It's main purpose is to have a faster but slightly less accurate version of the model. We denote the surrogate model as: 

$\hat{x}_{k+1}=\hat{M}(\hat{x}_k,\theta)$

starting with $\hat{x}_0=x_0$, which is the true initial condition.

This model has $\hat{x}_k$ as input and $x_{k+1}$ as output, and $\theta$ is a vector of parameters of the ML model to be estimated. For a given vector of parameters, one can sequentially compute $\hat{x}_1$, $\hat{x}_2$, $\hat{x}_3$, etc. This is called a rollout.

## Training

For training the most basic approach is to use one-step-ahead predictions. This means that the target is to accurately approximate the output of the numerical model. We can write this as a loss function:

$J(\theta) = \sum_{k=1}^{K-1} | x_{k+1} - \hat{M}(x_k,\theta)|^2$

Viewed as a supervised learning method, we have the generated data $x_k$ as inputs and $x_{k+1}$ as outputs. As is common in such cases the input data are collected in an input array $X$ and an output array $Y$ that consist of:

$X=[x_0,x_1,\ldots,x_{K-1}]$
$Y=[x_1,x_2,\ldots,x_{K}]$ 

We can generalize $\hat{M}$ to matrices to obtain:

$Y \approx \hat{M}(X,\theta)$

Unfortunately there is no guarantee that reasonably accurate one-step-ahead predictions lead to accurate rollouts, since the errors can accumulate quickly.


In [11]:
# Load Packages

# swithch to the directory where this script is located
cd(@__DIR__)

# Packages
using Pkg
Pkg.activate(".")
#Pkg.instantiate()
using Flux
using BSON
using Plots, Measures
using JLD2

# optional gpu usage
const use_gpu = false
if use_gpu
    using CUDA
end

[32m[1m  Activating[22m[39m project at `~/src_nobackup/julia_ml_tests.jl.git/training_1d_flux`


In [16]:
# Load the data

input_file = joinpath("..", "model_1d", "burgers1d_periodic.jld2")
if !isfile(input_file)
    println("Data file not found: $input_file")
end
println("Loading data from $input_file")
data=load(input_file)

# Convert data to input and output arrays

n_times=length(data["solution"])
n_points=length(data["solution"][1])

# Create inputs X and outputs Y
# u is the solution, t is the time, x is the spatial coordinate
X = zeros(Float32,n_points,1,n_times-1) # Float32 for GPU compatibility, 1 channel since only one variable u
Y = zeros(Float32,n_points,1,n_times-1) 

for t in 1:n_times-1
    X[:,:,t] .= data["solution"][t]
    Y[:,:,t] .= data["solution"][t+1]
end

@show X, size(X)
@show Y, size(Y)

# some metadata for plotting
output_times = data["times"][2:end]
output_x = data["grid"]
x0= X[:,:,1] # initial condition for rollout
nsteps = length(output_times)
nothing

Excessive output truncated after 582758 bytes.

Loading data from ../model_1d/burgers1d_periodic.jld2


In [None]:
# Settings

# training
epochs = 10
batch_size = 32
learning_rate = 0.001

# model

In [4]:
# Limit the size of the training set
# Note that training set is not shuffled. It has eg all the zeros at the start
n_train = 2000
idx = rand(1:size(x_train,4),n_train)
x_train = x_train[:,:,:,idx]
y_train = y_train[idx]
y_train_onehot = Flux.onehotbatch(y_train, 0:9)

nothing

In [5]:
# move data to gpu
if use_gpu
    x_train_gpu = gpu(x_train)
    y_train_onehot_gpu = gpu(y_train_onehot)
    y_train_gpu = gpu(y_train)
    x_test_gpu = gpu(x_test)
    y_test_onehot_gpu = gpu(y_test_onehot)
    y_test_gpu = gpu(y_test)
else
    x_train_gpu = x_train
    y_train_onehot_gpu = y_train_onehot
    y_train_gpu = y_train
    x_test_gpu = x_test
    y_test_onehot_gpu = y_test_onehot
    y_test_gpu = y_test
end

nothing

In [7]:

# Define the Lenet v5 model -> http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf
model = Chain(
    Conv((5,5),1 => 6, relu),
    MaxPool((2,2)),
    Conv((5,5),6 => 16, relu),
    MaxPool((2,2)),
    Flux.flatten,
    Dense(256=>120,relu),
    Dense(120=>84, relu),
    Dense(84=>10, sigmoid),
    softmax
) 

if use_gpu
    model_gpu = gpu(model)
else
    model_gpu = model
end

nothing
     


In [8]:
# Loss and accuracy

# Define the loss function that uses the cross-entropy to 
# measure the error by comparing model predictions of data 
# row "x" with true data from labels vector "y" in one hot encoding
function loss(model, x, y_onehot)
	return Flux.crossentropy(model(x),y_onehot)
end

# Function that measures the accuracy of
# model on testing dataset
function accuracy(model,x,y)
	mx=model(x)
	n=length(y)
	return count((Flux.onecold(mx).-1) .== y)/n
end;

In [9]:
#model = model_gpu |>cpu
#a1_cpu=accuracy(model,x_train[:,:,:,1:10],y_train[1:10])
#mx=model(x_train[:,:,:,1:10])
#mx_1c=Flux.onecold(mx)


In [10]:
# some tesing only
# @time a1_gpu=accuracy(model_gpu,x_train_gpu,y_train_gpu)
# @time a1_cpu=accuracy(model,x_train,y_train)
# println("Accuracy on GPU: ",a1_gpu)
# println("Accuracy on CPU: ",a1_cpu)
# @time l1_gpu=loss(model_gpu,x_train_gpu,y_train_onehot_gpu)
# @time l1_cpu=loss(model,x_train,y_train_onehot)
# println("Loss on GPU: ",l1_gpu)
# println("Loss on CPU: ",l1_cpu)

In [11]:
# setup data loader for training data
# Assemble the training data
if use_gpu
	train_data = Flux.DataLoader((x_train_gpu,y_train_onehot_gpu), shuffle=true, batchsize=64)
else
	train_data = Flux.DataLoader((x_train,y_train_onehot), shuffle=true, batchsize=64)
end
train_data

32-element DataLoader(::Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, OneHotArrays.OneHotMatrix{UInt32, CuArray{UInt32, 1, CUDA.Mem.DeviceBuffer}}}, shuffle=true, batchsize=64)
  with first element:
  (28×28×1×64 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 10×64 OneHotMatrix(::CuArray{UInt32, 1, CUDA.Mem.DeviceBuffer}) with eltype Bool,)

In [12]:

# Initialize the ADAM optimizer with default settings
optimizer = Flux.setup(Adam(),model_gpu)

# Train the model
# and display the accuracy on each
# iteration
for epoch in 1:5
	Flux.train!(loss, model_gpu, train_data, optimizer)
    println("epoch: $(epoch)") #NOTE: we careless with compute, rerunning model each time below
    println("   Accuracy for training: $(accuracy(model_gpu,x_train_gpu,y_train_gpu))")
    println("   Accuracy for testing : $(accuracy(model_gpu,x_test_gpu,y_test_gpu))")
    println("   Loss for training    : $(loss(model_gpu,x_train_gpu,y_train_onehot_gpu))")
    println("   Loss for testing     : $(loss(model_gpu,x_test_gpu,y_test_onehot_gpu))")

end
     


epoch: 1


   Accuracy for training: 0.731


   Accuracy for testing : 0.735


   Loss for training    : 1.9271423
   Loss for testing     : 1.9248906
epoch: 2
   Accuracy for training: 0.7965
   Accuracy for testing : 0.8009
   Loss for training    : 1.6718405


   Loss for testing     : 1.669309
epoch: 3
   Accuracy for training: 0.847
   Accuracy for testing : 0.8366
   Loss for training    : 1.6163288
   Loss for testing     : 1.6166949


epoch: 4


   Accuracy for training: 0.888
   Accuracy for testing : 0.8821
   Loss for training    : 1.5865014
   Loss for testing     : 1.5919523
epoch: 5
   Accuracy for training: 0.915


   Accuracy for testing : 0.9014
   Loss for training    : 1.5627532
   Loss for testing     : 1.5727644
