# Internship Finn Sherry @ Sioux Mathware

---

# Bayesian grey-box system identification for thermal effects

Last update: 20-07-2022

$\renewcommand{\vec}[1]{\boldsymbol{\mathrm{#1}}}$
$\newcommand{\covec}[1]{\hat{\vec{#1}}}$
$\newcommand{\mat}[1]{\boldsymbol{\mathrm{#1}}}$
$\newcommand{\inv}[1]{#1^{-1}}$
$\newcommand{\given}{\, \vert \,}$
$\newcommand{\haslaw}{\sim}$
$\newcommand{\problaw}[1]{p(#1)}$
$\newcommand{\Expectation}{\mathbb{E}}$
$\newcommand{\Variance}{\mathbb{V}}$
$\newcommand{\Geometric}{\textrm{Geom}}$
$\newcommand{\NegBin}{\textrm{NB}}$
$\newcommand{\Poisson}{\textrm{Pois}}$
$\newcommand{\Bernoulli}{\textrm{Bern}}$
$\newcommand{\Uniform}{\textrm{Uni}}$
$\newcommand{\NormDist}{\mathcal{N}}$
$\newcommand{\GammaDist}{\textrm{Gamma}}$
$\newcommand{\ExpDist}{\textrm{Exp}}$
$\newcommand{\Uniform}{\textrm{Uniform}}$
$\newcommand{\Binomial}{\textrm{Binom}}$
$\newcommand{\BetaDist}{\textrm{Beta}}$
$\newcommand{\BetaFunc}{\textrm{B}}$
$\newcommand{\setify}[1]{\mathbb{#1}}$
$\newcommand{\NatSet}{\setify{N}}$
$\newcommand{\IntSet}{\setify{Z}}$
$\newcommand{\RealSet}{\setify{R}}$
$\newcommand{\CompSet}{\setify{C}}$
$\newcommand{\QuatSet}{\setify{H}}$
$\newcommand{\FieldSet}{\setify{K}}$
$\newcommand{\define}{:=}$
$\newcommand{\enifed}{=:}$
$\newcommand{\loss}{\ell}$
$\newcommand{\risk}{\textrm{R}}$
$\newcommand{\MSE}{\textrm{MSE}}$
$\newcommand{\norm}[1]{\lVert #2 \rVert}$
$\newcommand{\InnerProduct}[2]{\left( #1 , #2 \right)}$
$\newcommand{\kilogram}{\textrm{kg}}$
$\newcommand{\metre}{\textrm{m}}$
$\newcommand{\watt}{\textrm{W}}$
$\newcommand{\joule}{\textrm{J}}$
$\newcommand{\kelvin}{\textrm{K}}$
$\newcommand{\second}{\textrm{s}}$
$\newcommand{\centi}{\textrm{c}}$
$\newcommand{\bigO}{\mathcal{O}}$

The primary goal of this project is to apply Bayesian techniques in order to identify parameters in a grey-box model of a thermal setup by observing the evolution in time of temperatures at a limited number of locations in the setup. Such a Bayesian approach could simultaneously estimate the relevant parameters based on easily gathered calibration data, obviating the need for difficult, expensive, or even impossible experiments to directly determine the quantities of interest from the setup. 

In this notebook, we will first discuss some rudimentary aspects of Bayesian learning. Subsequently, we will describe the setup, and derive a simple but reasonable grey-box model. We will also look at a number of possible model extensions, which might make it more accurate. Due to the complexity of the problem, it will turn out not to be feasible to naively apply Bayesian techniques to identify the parameters; we will have to make use of an approximate method. We will therefore experiment with two (families of) approximate techniques, Variational Bayes and Markov Chain Monte Carlo, to see which is best suited to our needs. Having chosen an approach, we will perform various experiments. For this, we will use our models to generate test data, and subsequently investigate the quality of parameter estimates. We finally conclude by summarising our results, and discussing potentially interesting open questions.

## Bayesics
The goal of grey-box system identification is to get parameter estimates to allow for accurate state prediction. We will make use of Bayesian techniques to identify these parameters. What makes these techniques Bayesian is their reliance on Bayes' Rule:
$$\underbrace{p(\vec{x} \mid \vec{y})}_{\textrm{posterior}} = \frac{\overbrace{p(\vec{y} \mid \vec{x})}^{\textrm{likelihood}} \overbrace{p(\vec{x})}^{\textrm{prior}}}{\underbrace{p(\vec{y})}_{\textrm{evidence}}}.$$
Bayes' Rule relates four different distributions:
- _Prior_: this encodes our prior knowledge of the parameter $\vec{x}$ before we observe $\vec{y}$;
- _Likelihood_: this says how the measurement $\vec{y}$ depends on the parameter $\vec{x}$;
- _Evidence_: this represents the prior probability of observing $\vec{y}$. Note that, by the Law of Total Probability, 
$$p(\vec{y}) = \int p(\vec{y} \mid \vec{x}) p(\vec{x}) d\vec{x},$$
so that the right hand side (rhs) of Bayes' Rule is properly normalised;
- _Posterior_: this encodes our posterior knowledge of the parameter $\vec{x}$ after we observe $\vec{y}$.

Using a Bayesian approach has two main benefits compared to e.g. [Maximum Likelihood Estimation (MLE)](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation):
- The prior allows you to incorporate your knowledge of the parameter that is being estimated;
- You get more than just a point estimate: for instance, the width of the distribution gives you an idea of how confident you can be about your point estimate.

We first became acquainted with Bayesian inference by considering the problem of [learning the bias of a coin](Bayesics_Coin.ipynb), which is the quintessential introductory application of Bayesian learning. We expanded on this by tackling the Multi-Armed Bandit (MAB) problem. 

## System description
A schematic depiction of the setup is shown in the figure below. We must develop reasonable models for it, with which we can perform grey-box system identification. For a full derivation of the dynamics of the system, I refer to Chapter 2 of my report. 
<center><img src='setup_schema.png'/></center>
In short, the setup consists of 3 metal blocks which have been lined up, with resistive nylon pads interposed. The temperature can be measured using thermistors at arbitrary places on the setup; for simplicity we assume that we measure the temperature at a single spot on each block, which we call $T_1$, $T_2$, and $T_3$. The temperatures will evolve due to a number of different factors; we will only consider the influence of conduction, convection, radiation, and the user controlled input heat. 

By assuming that conduction within blocks is so fast that there are no temperature differences within a block, we may model the system using a [lumped-element model](https://en.wikipedia.org/wiki/Lumped-element_model), governed by the following system of ODEs:
$$\frac{d}{dt}\begin{pmatrix} m_1 c_{p, 1} T_1 \\ m_2 c_{p, 2} T_2 \\ m_3 c_{p, 3} T_3 \end{pmatrix} = 
\underbrace{\begin{pmatrix} -k_{12} & k_{12} & 0 \\ k_{12} & -(k_{12} + k_{23}) & k_{23} \\ 0 & k_{23} & -k_{23} \end{pmatrix} \begin{pmatrix} T_1 \\ T_2 \\ T_3 \end{pmatrix}}_{\textrm{conduction}} + \underbrace{\begin{pmatrix} h(T_1, T_a, 1, t) \\ h(T_2, T_a, 2, t) \\ h(T_1, T_a, 1, t) \end{pmatrix}}_{\textrm{convection}} + \underbrace{\sigma \begin{pmatrix} A_1 \varepsilon_1 (T_a^4 - T_1^4) \\ A_2 \varepsilon_2 (T_a^4 - T_2^4) \\ A_3 \varepsilon_3 (T_a^4 - T_3^4) \end{pmatrix}}_{\textrm{radiation}} + \underbrace{\begin{pmatrix} \Phi_1 \\ \Phi_2 \\ \Phi_3 \end{pmatrix}}_{\textrm{input}}.$$
Convection is notoriously hard to model. Essentially the gold standard is Newton's law of cooling Clercx (2015), Eq. (8.17), which says that the convection is linear in the difference between the temperature of the block and the ambient temperature, i.e. 
$$h(T_i, T_a, i, t) = h_a (T_a - T_i),$$
for some constant $h_a$. Furthermore, the role of radiation will often be negligible, so we could leave it out. With these simplification, our governing equations become
$$\frac{d}{dt}\begin{pmatrix} m_1 c_{p, 1} T_1 \\ m_2 c_{p, 2} T_2 \\ m_3 c_{p, 3} T_3 \end{pmatrix} = 
\begin{pmatrix} -k_{12} & k_{12} & 0 \\ k_{12} & -(k_{12} + k_{23}) & k_{23} \\ 0 & k_{23} & -k_{23} \end{pmatrix} \begin{pmatrix} T_1 \\ T_2 \\ T_3 \end{pmatrix} + h_a \begin{pmatrix} A_1 (T_a - T_1) \\ A_2 (T_a - T_2) \\ A_3 (T_a - T_3) \end{pmatrix} + \begin{pmatrix} \Phi_1 \\ \Phi_2 \\ \Phi_3 \end{pmatrix},$$
or, more compactly, 
$$ \mat{M} \dot{\vec{T}} = \mat{K} \vec{T} + \vec{\Phi} + h_a \mat{A} (\vec{T}_a - \vec{T}).$$
In these equations, we can distinguish three types of quantities:
1. Measured/observed quantities: e.g. $T_i$, $\Phi_i$. These may vary over time, and are known up to a given accuracy due to measurement noise;

2. Known constants: e.g. $m_i$, $c_{p, i}$, $T_a$. These are fully known, and are constant over time. This is reasonable for quantities such as mass (which can be easily measured) and specific heat capacity (which is a material property which according to the Dulong-Petit Law is roughly constant for metals over a long range of temperatures Carter (2000), Ch. 16) Maybe it is less reasonable for the ambient temperature (due to e.g. the setup heating up its surroundings);

3. Unknown constant: e.g. $k_{ij}$, $h_a$. These are not known a priori, for instance because there is no simple physical way to measure or derive their values. For example, the conduction coefficients $k_{ij}$ can vary depending on how tightly the blocks have been put together. In this project, we want to identify these constants using Bayesian inferencing techniques.

#### Magnitude Analysis
Unfortunately, the physical setup was not completed in time for us to experiment with "real" data. Instead, we will have to make do with artificial data, i.e. data that we have generated ourselves. So that we may experiment with parameters that are somewhat realistic, we will now estimate their typical sizes. Furthermore, we will estimate the magnitude and typical time scales of the heat exchange mechanisms. Such time scale are good to know, as they can help us design experiments that generate data on which inference is more effective. A summary of the typical magnitudes of our parameters can be found in the table below (it seems the large number of units messed up the formatting of the table🙃); for the derivation I refer to Section 2.2 of my report:

| Parameter | Description | Magnitude |
| :-: | :-: | :-: |
| $A$ | Surface area | $10^{-2}~\metre^2$ |
| $V$ | Volume | $10^{-3}~\metre^3$ |
| $m c_p$ | Heat capacity | $10^3~\joule~\kelvin^{-1}$ |
| $k$ | Conduction coefficient | $10^0~\watt~\kelvin^{-1}$ |
| $\tau_{\textrm{cond}}$ | Conduction time scale | $10^3~\second$ |
| $h_a$ | Convection coefficient | $10^1~\watt~\metre^{-2}~\kelvin^{-1}$ |
| $\tau_{\textrm{conv}}$ | Convection time scale | $10^4~\second$ |
| $B$ | Input amplitude | $10^1~\watt$ |
| $\omega$ | Input angular frequency | $10^{-3}~\second^{-1}$ |
| $\sigma$ | Measurement standard deviation | $10^{-2}~\kelvin$ |

It is important to choose a good time scale on which to observe as well as an appropriate number of observations:
- The typical time scale of conduction is $\tau_{\text{cond}} = \bigO(10^3~\second)$, while the typical time scale of convection is $\tau_{\text{conv}} = \bigO(10^4~\second)$. Intuitively, we can only learn about the value of parameters when these parameters actually influence the observations. For instance, if we start with all temperatures being equal, we will not be able to learn anything about the conduction coefficients, because conduction will not play an important role in the evolution of the temperatures. Therefore, it seems like it would be optimal to observe over the typical time scale over which parameters are relevant. That makes this problem quite interesting, since we are trying to identify parameters whose influence plays out on time scales that differ by an order of magnitude. Consequently, we cannot choose an observation horizon that is optimal for all the parameters simultaneously. On the other hand, if all parameters were relevant on similar time scales, we might encounter unidentifiability, i.e. that the various origins of the influences on the observations cannot be distinguished.
- In general, increasing the number of observations should improve the quality of the parameter estimates. However, it will also increase computational complexity, which could be undesirable in practice. Moreover, it would be unrealistic to assume that we can poll the temperature sensors at an arbitrary rate. 

After a bit of trial-and-error, we have decided to make 100 observations ($+1$ "observation" of the initial condition), with an observation horizon of $500~\second$. Hence, we perform a measurement every $5~\second$. A more structured way of deciding on the measurement setup would be useful.

## Exploring Approximate Techniques
For simple problems, such as [identifying the bias of a coin](Bayesics_Coin.ipynb), we can apply Bayesian techniques analytically, producing simple equations that allow us to determine the posterior. Unfortunately, we cannot find closed form descriptions of how to update our beliefs for this problem, since the model is too complicated. Moreover, as is often the case, it is computationally intractable to apply Bayes' rule numerically Särkkä (2013), Sec. 1.4: Bayes' rule  tells us that
$$\problaw{\vec{x} \given \vec{y}} = \frac{\problaw{\vec{y} \given \vec{x}} \problaw{\vec{x}}}{\problaw{\vec{y}}}.$$
We know the likelihood $\problaw{\vec{y} \given \vec{x}}$ and the prior $\problaw{\vec{x}}$. To get the posterior $\problaw{\vec{x} \given \vec{y}}$ directly from Bayes' rule we would have to determine the evidence $\problaw{\vec{y}}$, which involves calculating
$$\problaw{\vec{y}} = \int \problaw{\vec{y} \given \vec{x}} \problaw{\vec{x}} d\vec{x}.$$
In general, computing this integral is problematic. One issue is that the likelihood $\problaw{\vec{y} \given \vec{x}}$ is often expensive to calculate: in our case it would involve solving, at every evaluation, the system of ODEs which describe our grey-box model. Furthermore, it is typically a very high dimensional integral, and numerical methods like quadrature will consequently perform poorly Betancourt (2017). 

Since we cannot rely on fully analytical approaches, we have to resort to approximate techniques. We consider three different methods: Unscented Kalman filters, Variational Bayes, and Markov Chain Monte Carlo, which each have their pros and cons.

### Unscented Kalman filter

First, we will try to apply an unscented Kalman filter (UKF), for which we use ForneyLab.jl [Cox (2019)](https://github.com/biaslab/ForneyLab.jl), a package for Julia developed by [BIASlab](https://biaslab.github.io/). For a background on the UKF, I refer to Särkkä (2013), Ch. 5. The experiments we ran to test the viability can be found in [this Julia notebook](Explore_UKF.ipynb).

The primary issue with using a UKF from ForneyLab is that it is not yet compatible with vector based models. Hence, we would either need to implement stuff ourselves, or find some other package which already includes it. Due to the limited time we have for this project, we will not use a UKF.

### Variational Bayes

We will next try to apply VB. Essentially, VB works by approximating complicated distributions with ones that are easier to deal with. For instance, we might pretend some variables are independent, so that the probability density function factors. We take an entire family of such simple distributions, and then try to find the one that is closest to our true distribution in an iterative manner. It is possible to break the computations down into simple update steps, which are performed based on a factor graph which describes the relations between our variables: this makes VB suitable for online estimation on limited hardware. However, this does mean that VB is not one generic method: we need to implement corresponding "nodes" for the update steps. 

Since the corresponding nodes are not implemented in ForneyLab.jl (or in ReactiveMP.jl [Bagaev (2021)](https://github.com/biaslab/ReactiveMP.jl), another package by BIASlab), we have to look around elsewhere. [Luttinen (2013)](https://link.springer.com/chapter/10.1007/978-3-642-40988-2_20) wrote a paper about using VB to identify the state update matrix in a Linear State-Space Model (LSSM). For this, he developed the Python package [BayesPy](https://github.com/bayespy). In fact, it contains a bunch of tricks in order to speed up inference, which is not so relevant to our problem. 

While Julia can call Python (and a whole bunch of other popular languages), we have applied it in a separate [Python Jupyter notebook](Explore_VB.ipynb).

There are numerous issues with VB that make it unsuitable for application in this project, which could be mostly solved by investing a sufficient amount of time:
- Even without real process noise, the inference is very sensitive to the prior of the innovation $\nu$;
- The ARD as is currently applied in BayesPy does not make a lot of sense for our problem. We would have to implement a componentwise (instead of columnwise) ARD. That might help with the sensitivity to the prior of the innovation too;
- Our problem is somewhat more complex, involving also convection and input heats. It is not clear how these can be added using existing software;
- It is not clear how to convert knowledge of the distribution of components in matrices into posteriors for the underlying parameters.

Consequently I will not make use of VB.

### Markov Chain Monte Carlo
The final approximate method we will consider is Markov Chain Monte Carlo (MCMC). Using MCMC we approximate our posterior by first constructing a Markov chain. It has to be ergodic, so that a stationary distribution exists and can be interpreted as an equilibrium distribution. The Markov chain we constuct should have the desired posterior as stationary distribution. This can be done using a Metropolis-Hastings (MH) algorithm, such as Hamiltonian Monte Carlo (HMC). For a brief background on MCMC, MH, and HMC, I refer to Section 3.2 of my report. [Betancourt (2017)](http://arxiv.org/abs/1701.02434) provides some deeper intuition about HMC.

Since we are constructing a Markov chain, MCMC generates new samples iteratively:
- We start with the current sample $\vec{z}_n$;
- We somehow stochastically perturb $\vec{z}_n$ to $\hat{\vec{z}}_{n + 1}$. The most basic method is to use a random walk: this is the original Metropolis algorithm [Metropolis (1953)](https://doi.org/10.1063/1.1699114). We can associate a distribution $g(\hat{\vec{z}}_{n + 1} \given \vec{z}_n)$ with this. 
- We calculate the following acceptance probability:
$$\rho(\vec{z}_n, \hat{\vec{z}}_{n + 1}) = \frac{\problaw{\hat{\vec{z}}_{n + 1} \given \vec{y}}}{\problaw{\vec{z}_n \given \vec{y}}}\frac{g(\vec{z}_n \given \hat{\vec{z}}_{n + 1})}{g(\hat{\vec{z}}_{n + 1} \given \vec{z}_n)};$$
- If we accept, then we continue with $\vec{z}_{n + 1} = \hat{\vec{z}}_{n + 1}$, else we reuse $\vec{z}_{n + 1} = \vec{z}_n$.

This indeed produces a Markov chain, which under some assumptions of regularity has the desired equilibrium properties (for more details, see e.g. [Robert (2004)](http://link.springer.com/10.1007/978-1-4757-4145-2), Ch. 7). 
We can initialise our Markov chain in multiple different places if we want to check ergodicity. Notably, we only need a function that is proportional to the posterior to construct our Markov chain: Bayes' rule tells us that the product of the likelihood and the prior is such a function. 

When we start of with sampling, we will in general not yet be in equilibrium. We therefore first sample a bunch of times to reach equilibrium. This is called the _burn-in period_, and these samples are discarded. We then continue to sample: since we are in equilibrium, these samples will come from the stationary distribution, and by construction therefore approximate the posterior. 

The exploratory application of MCMC to our problem can be found in [this Julia Jupyter notebook](Explore_MCMC.ipynb). From these experiments, it seems like MCMC, implemented by combining Turing.jl [Ge (2018)](https://github.com/TuringLang/Turing.jl) with DifferentialEquations.jl [Rackaukas (2017)](https://github.com/SciML/DifferentialEquations.jl), is a good approximate technique for performing Bayesian inference:
- The method is fairly general, which means that it should not be necessary to develop (parts of) software packages;
- The method seems to be fairly insensitive to how informative the priors are, although this might change if we were to introduce some process noise;
- The accuracy (in terms of MSE) of the inference seems to scale with the measurement noise in a roughly linear way.

MCMC is fairly slow however, and so it would not be suitable for online parameter estimation. It would be more suited to occasional calibrations. This is not a problem for my project, however. Hence, during the rest of this project we will work with MCMC. 

## Results
We will experiment by expanding our physical and probabilistic models, and testing combinations of different models for generation and identification.

In [None]:
using Pkg
Pkg.activate(".");
Pkg.instantiate();
IJulia.clear_output();
using Turing, DifferentialEquations, Random, LinearAlgebra, StatsBase # Computational
using Measures, LaTeXStrings, StatsPlots # Formatting
default(label="");
Random.seed!(987654321); 

### Lumped-Element Model
#### Physical Model
We will start with our most basic lumped-element model: the temperature does not vary within a block, and radiation is ignored; the same model as we used in [our notebook for MCMC](Explore_MCMC.ipynb). Here, we will pay more attention to getting realistic system parameter values. First, we need to load all the necessary packages. 

With the following functions, we can then define the input heats and the system of ODEs that govern the basic lumped-element model:

In [None]:
# Input heat
Φ(B, ω, t) = B * sin(ω * t) # Julia allows you to define simple functions by assignment

# System of ODEs governing our basic lumped-element model
function LSSM_lump(dT, T, p, t)
    mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, k12, k23, h_a = p
    # Conduction
    dT[1] = k12 * (T[2] - T[1]) / mcp_1
    dT[2] = (k12 * (T[1] - T[2]) + k23 * (T[3] - T[2])) / mcp_2
    dT[3] = k23 * (T[2] - T[3]) / mcp_3
    # Convection
    dT[1] += h_a * A_1 * (T_a - T[1]) / mcp_1
    dT[2] += h_a * A_2 * (T_a - T[2]) / mcp_2
    dT[3] += h_a * A_3 * (T_a - T[3]) / mcp_3
    # Input
    dT[1] += Φ(B_1, ω_1, t) / mcp_1
    dT[2] += Φ(B_2, ω_2, t) / mcp_2
    dT[3] += Φ(B_3, ω_3, t) / mcp_3
end

#### Simulate Data
Next, we will simulate data to perform inference on. First, we will define the constants. Of these, only $k_{12}$, $k_{23}$, and $h_a$ are unknown: the ambient temperature $T_a$ and the initial temperatures $\vec{T}_0$ are assumed to be fully known (although it would not be very difficult to treat them as unknowns). 

In [None]:
# Time horizon
sample_size = 100
Δ = 5 # Final time t = 5 x 10^2 s
time = [Δ * i for i in 0:sample_size] 
# Constants of blocks
true_mcp_1 = 1e3
true_mcp_2 = 1.5e3
true_mcp_3 = 0.8e3
true_A_1 = 1e-2
true_A_2 = 1.5e-2
true_A_3 = 2e-2
# Input heat parameters
true_B_1 = 3e1
true_B_2 = -3e1
true_B_3 = 1.5e1
true_ω_1 = 3e-2
true_ω_2 = 2.4e-2
true_ω_3 = 3.8e-2
# Temperatures
true_T_a = 290.
T_0 = [330., 270., 310.]
# Unknown constants
true_k12 = 3e0
true_k23 = 2e0
true_h_a = 1e1

With these parameters, we can now generate our observations. The measurement noise is quite small, with a standard deviation of roughly $10^{-2}~\kelvin$:

In [None]:
# Plot observations over true evolution
function observ_plot(solution, data, horizon)
    p_obs = plot(horizon, solution; legend = true, xlim = (horizon[1], horizon[end]), ylim = (260, 340), linecolors = ["red" "blue" "orange"], labels = [L"$T_1$ true" L"$T_2$ true" L"$T_3$ true"], xlabel = L"t~(\textrm{s})", ylabel = L"T~(\textrm{K})", size = (1200, 400), bottommargin = 6mm, leftmargin = 6mm)
    scatter!(p_obs, time, data', markercolors = ["red" "blue" "orange"], labels = [L"$T_1$ observed" L"$T_2$ observed" L"$T_3$ observed"])
    return p_obs
end

In [None]:
true_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, true_T_a, true_k12, true_k23, true_h_a] # First known, then unknown parameters
# Solve the system numerically using DifferentialEquations.jl
LSSM_dynamics_lump = ODEProblem(LSSM_lump, T_0, (time[1], time[end]), true_p)
true_sol = solve(LSSM_dynamics_lump, Tsit5(); saveat = Δ, verbose = false)
true_σ = 1e-2
y = Array(true_sol) + true_σ * randn(size(Array(true_sol)))
plot_obs = observ_plot(Array(true_sol)', y, time)
# savefig(plot_obs, "Results\\Expand\\lump_obs.pdf")

This seems like an ideal problem for a heat map. We can imagine the 1D thermal set up in the vertical direction, which evolves over time on the horizontal axis. 

In [None]:
# Show 5 labels on time axis
function heatmap_time(horizon, time_step)
    steps = 5
    N = length(horizon)
    axis = Vector{String}(undef, N)
    axis .= "" 
    step = div(N, steps)
    for i ∈ 0:steps
        axis[i * step + 1] = string(i * step * time_step)
    end
    return axis
end

# If there are few slices, show label for each slice.
function heatmap_few_temps(slice_count)
    axis = string.(1:slice_count)
    return axis
end

# If there are many slices, only show labels for 9 slices
function heatmap_many_temps(slice_count)
    steps = 9
    axis = Vector{String}(undef, slice_count)
    axis .= "" 
    step = div(slice_count, 2 * steps)
    for i ∈ 1:steps
        axis[(2 * i - 1) * step] = string((2 * i - 1) * step)
    end
    return axis
end

# Create heatmap of true temperature evolution
function state_heatmap(solution, horizon, time_step)
    time_axis = heatmap_time(horizon, time_step)
    time_grid_end = length(time_axis)
    slice_count = size(solution)[1]
    if slice_count < 24
        block_axis = heatmap_few_temps(slice_count)
    else
        block_axis = heatmap_many_temps(slice_count)
    end
    block_grid_end = length(block_axis)
    hmap = heatmap(solution, xticks = (1:time_grid_end, time_axis), yticks = (1:block_grid_end, block_axis), xlabel = L"t~(\textrm{s})", ylabel = "Slice")
    return hmap
end

In [None]:
hmap = state_heatmap(Array(true_sol), time, Δ)
# savefig(hmap, "Results\\Expand\\lump_obs_heatmap.pdf")

#### Probabilistic Model
Next, we define the probabilistic model, i.e. the model that is used for the identification. The parameters we wish to identify are $h_a$, $k_{12}$, and $k_{23}$. It would not make sense for any of these to be negative, since this would lead to heat flowing from cold sources into hot sinks, in contravention of the [Second Law of Thermodynamics](https://en.wikipedia.org/wiki/Laws_of_thermodynamics#Second_law) Carter (2000), Ch. 6. Hence, we make use of Gamma priors. Finally, we treat the standard deviation of the measurement noise as an unknown quantity. This might help by absorbing any other sources of (process) noise. It would again be sensible to use a positively supported prior like the Gamma distribution for this. We pose reasonably broad priors (this does not seem to be very important in this case: it seems like there is a lot of learning going on).

In [None]:
# Tell Turing our probabilistic model is based on the lumped-element model, with normally distributed noise
@model function fit_LSSM_lump(data, system, syst_consts)
    # Gamma in Distributions.jl is shape-scale, not shape-rate
    σ ~ Gamma(1e-2, 1e0) # E[σ] = 10^-2, Var(σ) = 10^-2 
    k12 ~ Gamma(1e0, 1e0) # E[k] = 10^0, Var(k) = 10^0
    k23 ~ Gamma(1e0, 1e0) 
    h_a ~ Gamma(1e1, 1e0) # E[h_a] = 10^1, Var(h_a) = 10^1
    T_a, mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, Δ = syst_consts
    p = [mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, k12, k23, h_a]
    predicted = solve(system, Tsit5(); p = p, saveat = Δ, verbose = false)

    for i ∈ 1:length(predicted)
        data[:, i] ~ MvNormal(predicted[i], σ^2 * I)
    end
end

With Turing.jl and DifferentialEquations.jl, defining the probabilistic model is very easy. First, we pose the priors. With MCMC, we do not have to worry about conjugate priors. In general, as long as the support of the prior is infinite, semi-infinite, or an interval, as is the case for most nice continuous random variables, Turing.jl is able to work (under the hood, it automatically performs a transformation to make the support infinite, which is required for HMC (derived) methods). We subsequently take in all of the known parameters. Next, we determine how, given a certain sample of unknown parameters, the temperatures would evolve. Finally, we give the corresponding distribution of each observation, in this case a normal distribution with mean equal to the temperature predicted based on the current sample and fixed variance $\sigma^2$. The likelihood (for that sample of unknown parameters) will then be the product of all these distributions, i.e. the joint distribution (since the noise of each measurement is independent).

#### Inference
Finally, we can perform the sampling. We make use of the [No U-Turn Sampling (NUTS) sampler](https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo#No_U-Turn_Sampler) in [Turing.jl](https://github.com/TuringLang/Turing.jl), which can be seen as building on Hamiltonian Monte Carlo (HMC). One advantage of this sampler is that we only have to choose the acceptance rate: it will automatically tune a bunch of internal parameters (e.g. step size) to achieve this acceptance rate. A popular acceptance rate is 0.65, since Beskos et al. in their 2010 paper ["Optimal Tuning of the Hybrid Monte-Carlo Algorithm"](https://arxiv.org/abs/1001.4460v1) showed that this is asymptotically the optimal choice in the sense of balancing the cost of proposal generation with the typical number of required proposals per sample. Finally, we take 2500 samples per chain, for which Turing automatically chooses a burn-in period of 1000 samples. To check whether we have reached equilibrium after the burn-in period, we sample three independent Markov chains: if the resulting empirical posteriors roughly correspond, we can be reasonably confident that we had reached equilibrium before taking the "real" samples.

In [None]:
model_lump = fit_LSSM_lump(y, LSSM_dynamics_lump, [true_T_a, true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, Δ]);
chain_lump = sample(model_lump, NUTS(0.65), MCMCSerial(), 2500, 3; verbose = false, progress = true)

In order to plot the results, we sample from the Markov chain. For each set of sampled parameters, we generate a trajectory using our system of ODEs, which we overlay on the same plot. This is an alternative to trying to compute credible intervals directly using the posteriors. We only take 300 samples instead of making use of all samples in order to reduce the size of the resulting image. Too many elements in the figure will make it slow to generate, and even slow to scroll past.

In [None]:
# Plot sampled trajectories and observations over true evolution
function sampled_traj_plot(params_samples, solution, data, horizon, time_step, all_params, id_count, system, obs_slices = nothing)
    p_traj = plot(; legend = true, xlim = (horizon[1], horizon[end]), ylim = (260, 340), ylabel = L"T~(\textrm{K})", size = (1200, 400), bottommargin = 6mm, leftmargin = 6mm)
    params_cur = copy(all_params)
    for (i, params_row) ∈ enumerate(eachrow(params_samples))
        params_cur[(end - id_count):end] .= params_row[(end - id_count):end]
        traj_cur = solve(system, Tsit5(); p = params_cur, saveat = time_step)
        if size(Array(traj_cur))[1] == 3
            plot!(p_traj, traj_cur; alpha = 0.05, linecolors = ["red" "blue" "orange"], label = "")
        else
            plot!(p_traj, horizon, Array(traj_cur)[obs_slices, :]'; alpha = 0.05, linecolors = ["red" "blue" "orange"], label = "")
        end
    end
    if obs_slices == nothing
        plot!(p_traj, solution, linecolors = ["red" "blue" "orange"], linewidth = 1, labels = [L"T_1" L"T_2" L"T_3"]) 
    else 
        plot!(p_traj, horizon, Array(solution)[obs_slices, :]', linecolors = ["red" "blue" "orange"], linewidth = 1, labels = [L"T_1" L"T_2" L"T_3"]) 
    end
    scatter!(p_traj, horizon, data', xlabel = L"t~(\textrm{s})", markercolors = ["red" "blue" "orange"])
    return p_traj
end

In [None]:
posterior_samples = Array(sample(chain_lump, 300; replace = false))
plot_sol_app = sampled_traj_plot(posterior_samples, true_sol, y, time, Δ, true_p, 2, LSSM_dynamics_lump)
# savefig(plot_sol_app, "Results\\Expand\\lump_states.pdf")

We cannot see the sampled trajectories, because they lie so close to the true trajectories. Since the dynamics resulting from the samples is not too far off from the real dynamics, it would seem like the inference has been highly successful.

Let's have a look at the (approximate) posteriors of the parameters. We will quantify the quality of the estimates by calculating the MSE of the posteriors with respect to the true value. The bias-variance decomposition makes this an easy computation.

In [None]:
# Compute MSE of true value compared to posterior
get_mean(θ::Symbol, chain) = (summarize(chain[[θ]]).nt[:mean][1]
get_var(θ::Symbol, chain) = (summarize(chain[[θ]]).nt[:std])[1]^2
MSE(θ::Symbol, true_θ, chain) = (get_mean(θ, chain) - true_θ)^2 + get_var(θ, chain)

# Format MSE as nice string for on plot
sci_not(value, sigdigits = 2) = replace("$(round(value, sigdigits = sigdigits))", r"e(-?\d+)" => s"\\times 10^{\1}") # Regex magic

function MSE_string(θ::Symbol, true_θ, chain)
    MSE_θ = MSE(θ, true_θ, chain)
    label = latexstring("\\textrm{MSE} = " * sci_not(MSE_θ))
    return label
end

# Plot marginal posterior
function marg_post_plot(θ::Symbol, true_θ, chain, ymax = nothing)
    if ymax == nothing
        p_θ = density(chain[[θ]], title = L"%$θ", label = "", legend = true)      
    else
        p_θ = density(chain[[θ]], title = L"%$θ", label = "", legend = true, ylim = (0, ymax))
    end
    vline!(p_θ, [true_θ], label = L"True $%$θ$")
    MSE_label = MSE_string(θ, true_θ, chain)
    annotate!(p_θ, (0.05, 0.95), (MSE_label, 12, :left))
    return p_θ
end

In [None]:
p_σ = marg_post_plot(:σ, true_σ, chain_lump, 600)
p_k12 = marg_post_plot(:k12, true_k12, chain_lump, 900)
p_k23 = marg_post_plot(:k23, true_k23, chain_lump, 600)
p_h_a = marg_post_plot(:h_a, true_h_a, chain_lump, 14)
plot_marg_post = plot(p_σ, p_k12, p_k23, p_h_a, size = (1200, 800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(plot_marg_post, "Results\\Expand\\lump_params.pdf")

Indeed, the empirical posteriors of the unknown parameters look good: they peak close to the true values, and are rather narrow (width of $~10^{-1}$ for convection, $~10^{-3}$ for conduction). One should bear in mind that the observations are random due to the noise, so small deviations of the MAP from the true value do not necessarily indicate something has gone wrong. If we ran the tests again, we would see the MAPs shift slightly. 

On the other hand, the approximate posterior of the standard deviation of the measurement noise looks very far from the true value. This does appear to suggest a systematic problem with the inference. I believe this is due to the fact that our discretisation effectively introduces process noise. This process noise is indistinguishable from measurement noise, and simply gets added on top. However, this does not have to be a problem: we are not really interested in learning the measurement noise, and the fact that it can absorb process noise is a nice property.

In [None]:
corn = cornerplot(posterior_samples, label = [L"σ", L"k_{12}", L"k_{23}", L"h_a"], size = (1800, 1800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(corn, "Results\\Expand\\lump_corner.pdf")

#### Validation
The previous experiment gives a promising result: it seems like the parameters identified using a more simple model can accurately reproduce the true physics. One issue, however, is that the perceived accuracy may be due to overfitting. In this section, we would like to put the parameter estimates combined with the simple lumped-element model to the test. We could do this in multiple ways:
1) We could change some of the known parameters;
2) We could simulate further time, i.e. we identify based on say the first $500~\second$, and compare the predicted and true trajectories over the subsequent $500~\second$;
3) We could identify on a subset of the data, say the first $100~\second$, and compare the predicted and true trajectories on the remaining data;
4) We could change the input heats and initial temperatures.

The first method does not seem practically applicable: in the true setup, these types constants cannot be changed without disassembly, which should probably be followed by more calibration, anyway. In our case, the second method does not sound like it would be very useful, either: the state of our setup naturally moves towards an equilibrium over time, and therefore, the errors would become smaller over time. This means that we would not be able properly assess how good the state prediction is. To mitigate this issue, we could make use of the third approach. Finally, changing the input heats and initial temperatures seems like something that would be relevant in practice, and so the fourth method seems reasonable. We have decided to use the latter of these two viable methods, since it would seem to more drastically change the observed data.

In [None]:
# # Block input signal: works, but seems to make stuff unstable
# function Φ_val(B, t_start, t_end, t)
#     if t_start < t < t_end
#         return B
#     else 
#         return 0
#     end
# end

# Smoothed block input signal, might reduce numerical instabilities
sigmoid(x, a = 0, l = 1) = 1 / (1 + exp(-l * (x - a)))
Φ_val_smooth(B, t_start, t_end, t, l = 2e-1) = B * sigmoid(t, t_start, l) * sigmoid(-t, -t_end, l)

# System of ODEs to validate the lumped-element model for prediction
function LSSM_lump_val(dT, T, p, t)
    mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, t_start_1, t_start_2, t_start_3, t_end_1, t_end_2, t_end_3, T_a, k12, k23, h_a = p
    # Conduction
    dT[1] = k12 * (T[2] - T[1]) / mcp_1
    dT[2] = (k12 * (T[1] - T[2]) + k23 * (T[3] - T[2])) / mcp_2
    dT[3] = k23 * (T[2] - T[3]) / mcp_3
    # Convection
    dT[1] += h_a * A_1 * (T_a - T[1]) / mcp_1
    dT[2] += h_a * A_2 * (T_a - T[2]) / mcp_2
    dT[3] += h_a * A_3 * (T_a - T[3]) / mcp_3
    # Input
    dT[1] += Φ_val_smooth(B_1, t_start_1, t_end_1, t) / mcp_1
    dT[2] += Φ_val_smooth(B_2, t_start_2, t_end_2, t) / mcp_2
    dT[3] += Φ_val_smooth(B_3, t_start_3, t_end_3, t) / mcp_3
end

We now generate the validation data set, which consists of the trajectories for the true parameter values.

In [None]:
# Plot input signals used for validation
function val_input_plot(B_1, B_2, B_3, t_start_1, t_start_2, t_start_3, t_end_1, t_end_2, t_end_3, horizon)
    ymax = round(maximum([B_1, B_2, B_3]), sigdigits = 2, RoundUp)
    p_input = plot(horizon, Φ_val_smooth.(B_1, t_start_1, t_end_1, horizon), title = "Input", xlim = (horizon[1], horizon[end]), ylim = (0, ymax), label = L"Φ_1", linecolor = "red", xlabel = L"t~(\textrm{s})", ylabel = L"Φ~(\textrm{W})", legend = true)
    plot!(p_input, horizon, Φ_val_smooth.(B_2, t_start_2, t_end_2, horizon), label = L"Φ_2", linecolor = "blue")
    plot!(p_input, horizon, Φ_val_smooth.(B_3, t_start_3, t_end_3, horizon), label = L"Φ_3", linecolor = "orange")
    return p_input
end

In [None]:
# Randomly choose parameters of smoothed block waves for validation input signals
val_B_1, val_B_2, val_B_3 = rand(Gamma(2.5e1, 2e0), 3)
val_t_start_1, val_t_start_2, val_t_start_3 = rand(Gamma(5e1, 4e0), 3)
val_b_width_1, val_b_width_2, val_b_width_3 = rand(Gamma(2e1, 5e0), 3)
val_t_end_1, val_t_end_2, val_t_end_3 = val_t_start_1 + val_b_width_1, val_t_start_2 + val_b_width_2, val_t_start_3 + val_b_width_3
# Time horizon
sample_size_val = 1000
Δ_val = time[end] / sample_size_val
time_val = [Δ_val * i for i in 0:sample_size_val] 
val_input_plot(val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, time_val)
val_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_k12, true_k23, true_h_a] # First known, then unknown parameters
LSSM_dynamics_lump_val = ODEProblem(LSSM_lump_val, T_0, (time_val[1], time_val[end]), val_p)
val_sol = solve(LSSM_dynamics_lump_val, Tsit5(); saveat = Δ_val, verbose = false);

We then generate the trajectories for 300 sampled sets of parameters. To see how "good" these sampled trajectories are, we create a number of different diagrams, including a plot of the residuals, and histograms of the MSE between the true and the sampled trajectories. 

In [None]:
# Compute residuals of sampled trajectories compared to true trajectories
function compute_resid(traj, solution, obs_slices = nothing)
    if size(Array(solution))[1] == 3
        resid = Array(traj) - Array(solution)
    elseif size(Array(traj))[1] == 3 && size(Array(solution))[1] > 3
        resid = Array(traj) - Array(solution)[obs_slices, :]
    else
        resid = Array(traj)[obs_slices, :] - Array(solution)[obs_slices, :]
    end
    return resid
end

# Plot summaries of validation
function sampled_traj_val_plot(params_samples, solution, horizon, time_step, all_params, σ, id_count, system, obs_slices = nothing)
    p_traj = plot(; legend = true, xlim = (horizon[1], horizon[end]), title = "Trajectories", ylim = (260, 340), ylabel = L"T~(\textrm{K})", bottommargin = 6mm, leftmargin = 6mm)
    p_error = plot(; legend = true, xlim = (horizon[1], horizon[end]), title = "Residual", xlabel = L"t~(\textrm{s})", ylabel = L"ΔT~(\textrm{K})", labels = [L"ΔT_1" L"ΔT_2" L"ΔT_3"], bottommargin = 6mm, leftmargin = 6mm)
    emp_mse = Matrix{Float64}(undef, 3, size(params_samples)[1])
    params_cur = copy(all_params)
    for (i, params_row) ∈ enumerate(eachrow(params_samples))
        params_cur[(end - id_count):end] .= params_row[(end - id_count):end]
        traj_cur = solve(system, Tsit5(); p = params_cur, saveat = time_step)
        if size(Array(traj_cur))[1] == 3
            plot!(p_traj, traj_cur; alpha = 0.05, linecolors = ["red" "blue" "orange"], label = "")
        else
            plot!(p_traj, horizon, Array(traj_cur)[obs_slices, :]'; alpha = 0.05, linecolors = ["red" "blue" "orange"], label = "")
        end
        resid_cur = compute_resid(traj_cur, solution, obs_slices)
        emp_mse[:, i] = mean(abs2, resid_cur, dims = 2)
        plot!(p_error, horizon, resid_cur'; alpha = 0.05, linecolors = ["red" "blue" "orange"], label = "")
    end
    hline!(p_error, [(2 * σ) -(2 * σ)], linestyles = [:dash :dash], linecolors = ["black" "black"], labels = [L"\pm 2 \sigma" ""])
    if size(Array(solution))[1] == 3
        plot!(p_traj, solution, linecolors = ["red" "blue" "orange"], linewidth = 1, labels = [L"T_1" L"T_2" L"T_3"]) 
    else 
        plot!(p_traj, horizon, Array(solution)[obs_slices, :]', linecolors = ["red" "blue" "orange"], linewidth = 1, xlabel = L"t~(\textrm{s})", labels = [L"T_1" L"T_2" L"T_3"]) 
    end
    h_mse_1 = histogram(emp_mse[1, :], normalize = :pdf, title = L"\textrm{MSE}~T_1", xlabel = L"\textrm{MSE}~(\textrm{K}^2)", ylabel = "Density", color = "red")
    h_mse_2 = histogram(emp_mse[2, :], normalize = :pdf, title = L"\textrm{MSE}~T_2", xlabel = L"\textrm{MSE}~(\textrm{K}^2)", ylabel = "Density", color = "blue")
    h_mse_3 = histogram(emp_mse[3, :], normalize = :pdf, title = L"\textrm{MSE}~T_3", xlabel = L"\textrm{MSE}~(\textrm{K}^2)", ylabel = "Density", color = "orange")
    return p_traj, p_error, h_mse_1, h_mse_2, h_mse_3
end

In [None]:
p_in = val_input_plot(val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, time_val)
p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3 = sampled_traj_val_plot(posterior_samples, val_sol, time_val, Δ_val, val_p, true_σ, 2, LSSM_dynamics_lump_val)
plot_val = plot(p_in, p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3, size = (1800, 800), layout = (2, 3), leftmargin = 10mm, bottommargin = 10mm)
# savefig(plot_val, "Results\\Validation\\lump_val.pdf")

The top left diagram shows the used input signals; the start, end, and height of the blocks have been chosen randomly. Next, we have plotted the corresponding true trajectories, with the sampled trajectories overlaid. The top right diagram shows the residuals, i.e. the difference between each sampled trajectory and the true trajectory. For small $t$, the residuals grow: this is due to the fact that the initial temperatures are treated as known constants. Eventually, the residuals tend to stabilise, with the difference remaining less than about $10^{-2}~\kelvin$, which is good considering the typical temperatures are $10^2~\kelvin$ and the measurement accuracy is also about $10^{-2}~\kelvin$. The histograms on the bottom row show histograms of the \gls{mse} for each temperature. Each data point represents the MSE over an entire sampled trajectory. If the mean error were normally distributed, which does not sound unreasonable, then we would expect the MSE to follow a (scaled) $\chi^2$-distribution. This indeed appears to be the case looking at the histograms.

### Radiative Model
When deriving our physical model, we assumed that we could neglect the influence of thermal radiation. That might not be reasonable, however: the heat transfer due to radiation will be on the order of $1~\watt$, compared $10~\watt$ and $100~\watt$ for the conduction and the convection, respectively, or roughly 1~% of the total heat transfer. In the following section, we also take into account radiation.
#### Physical Model
According to the [Stefan-Boltzmann Law](https://en.wikipedia.org/wiki/Stefan%E2%80%93Boltzmann_law), the heat flux into a block (i.e. the additive inverse of the heat flux out of a block) will be 
$$q = -\varepsilon \sigma (T^4 - T_a^4),$$
where $\sigma = 5.67 \times 10^{-8}~\watt~\metre^{-2}~\kelvin^{-4}$ is the Stefan-Boltzmann constant Bouwens (2013), p. 8, and $\varepsilon \in [0, 1]$ is the so-called _emissivity_, which depends on e.g. how smooth the surface is. For instance, for polished metals the emissivity can be as low as $10^{-2}$, while for anodised aluminium it can be on the order of unity. 

Radiation would be an interesting effect to additionally model:
- The emissivity depends heavily on e.g. how clean the surface is, and hence probably is not known very well;
- Radiation would be the only term in our physical model that is nonlinear in the temperature. Consequently, we would no longer dealing with a _linear_ state space model. However, Turing.jl + DifferentialEquations.jl should be able to deal with it;
- The emissivity $\varepsilon \in [0, 1]$, and consequently it makes sense to pose a Uniform (or more generally Beta) prior. This is something that would probably get messy if we were trying to deal with this problem analytically or using VB, but for MCMC it should not matter (HMC/NUTS actually requires parameters to have unbounded suppport, but for nice prior distributions this can easily be achieved with a bijector, a bijective transform. Turing does this automatically for us).

#### Simulate Data
We can easily add the term corresponding to the influence of thermal radiation to the physical model ODEs.

In [None]:
# Emissivities
true_ε1 = 0.85
true_ε2 = 0.89
true_ε3 = 0.92

# System of ODEs governing our radiative model
function RSSM_lump(dT, T, p, t)
    mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, k12, k23, h_a, ε1, ε2, ε3 = p
    σ_sb = 5.67e-8 # Stefan-Boltzmann constant
    # Conduction
    dT[1] = k12 * (T[2] - T[1]) / mcp_1
    dT[2] = (k12 * (T[1] - T[2]) + k23 * (T[3] - T[2])) / mcp_2
    dT[3] = k23 * (T[2] - T[3]) / mcp_3
    # Convection
    dT[1] += h_a * A_1 * (T_a - T[1]) / mcp_1
    dT[2] += h_a * A_2 * (T_a - T[2]) / mcp_2
    dT[3] += h_a * A_3 * (T_a - T[3]) / mcp_3
    # Radiation
    dT[1] -= A_1 * ε1 * σ_sb * (T[1]^4 - T_a^4) / mcp_1
    dT[2] -= A_2 * ε2 * σ_sb * (T[2]^4 - T_a^4) / mcp_2
    dT[3] -= A_3 * ε3 * σ_sb * (T[3]^4 - T_a^4) / mcp_3    
    # Input
    dT[1] += Φ(B_1, ω_1, t) / mcp_1
    dT[2] += Φ(B_2, ω_2, t) / mcp_2
    dT[3] += Φ(B_3, ω_3, t) / mcp_3
end

To get an idea of the effect of radiation, we visualise the true temperature evolution with the observations overlaid.

In [None]:
true_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, true_T_a, true_k12, true_k23, true_h_a, true_ε1, true_ε2, true_ε3] # First known, then unknown parameters
RSSM_dynamics_lump = ODEProblem(RSSM_lump, T_0, (time[1], time[end]), true_p)
true_sol = solve(RSSM_dynamics_lump, Tsit5(); saveat = Δ, verbose = false)
true_σ = 1e-2
y = Array(true_sol) + true_σ * randn(size(Array(true_sol)))
plot_obs = observ_plot(Array(true_sol)', y, time)
# savefig(plot_obs, "Results\\Expand\\rad_rad_obs.pdf")

In [None]:
hmap = state_heatmap(Array(true_sol), time, Δ)
# savefig(hmap, "Results\\Expand\\rad_rad_obs_heatmap.pdf")

The influence of radiation appears to be limited: the trajectories do not look very different to those generated by the lumped-element model. This makes sense since we expect the power associated with radiation to be one and two orders of magnitude smaller than those associated with the convection and conduction, respectively. It will be interesting to see whether the inference will be able to "pick out" the contribution of the radiation.

#### Probabilistic Model
It is also straightforward to adapt the probabilistic model to take into account the unknown emissivities: we simply pose Beta priors, which have support $[0, 1]$. Typically, we would have a reasonable idea of the emissivity. For instance, for anodised aluminium, we know it is about $0.85$, maybe $\pm 0.1$. Hence, we choose hyperparameters for our Beta priors such that they have as mean $0.85$ and as standard deviation $0.05$. We determine these hyperparameters using the Method of Moments: we know that if $X \haslaw \BetaDist(\alpha, \beta)$, then Berkum (2016):
$$\Expectation[X] = \frac{\alpha}{\alpha + \beta}, \textrm{ and } \Variance(X) = \frac{\alpha \beta}{(\alpha + \beta + 1) (\alpha + \beta)^2} = \Expectation[X] \cdot \frac{\beta}{(\alpha + \beta + 1) (\alpha + \beta)}.$$
If we solve these equations for $\alpha$ and $\beta$, we find
$$\alpha = \left(\frac{\Expectation[X] (1 - \Expectation[X])}{\Variance(X)} - 1\right) \Expectation[X], \textrm{ and } \beta = \left(\frac{\Expectation[X] (1 - \Expectation[X])}{\Variance(X)} - 1\right) (1 - \Expectation[X]).$$

In [None]:
# Calculate Beta distribution hyperparameters to get given mean and standard deviation
function calc_beta_params(mean, std)
    var = std^2
    fac = (mean * (1 - mean) / var - 1)
    α = fac * mean
    β = fac * (1 - mean)
    return α, β
end

# Tell Turing our probabilistic model is based on the radiative model, with normally distributed noise
@model function fit_RSSM_lump(data, system, syst_consts)
    # Gamma in Distributions.jl is shape-scale, not shape-rate
    σ ~ Gamma(1e-2, 1e0) # E[σ] = 10^-2, Var(σ) = 10^-2 
    k12 ~ Gamma(1e0, 1e0) # E[k] = 10^0, Var(k) = 10^0
    k23 ~ Gamma(1e0, 1e0) 
    h_a ~ Gamma(1e1, 1e0) # E[h_a] = 10^1, Var(h_a) = 10^1
    prior_α, prior_β = calc_beta_params(0.85, 0.1) # Compute reasonable emissivity prior hyperparameters
    ε1 ~ Beta(prior_α, prior_β)
    ε2 ~ Beta(prior_α, prior_β)
    ε3 ~ Beta(prior_α, prior_β)
    T_a, mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, Δ = syst_consts
    p = [mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, k12, k23, h_a, ε1, ε2, ε3]
    predicted = solve(system, Tsit5(); p = p, saveat = Δ, verbose = false)

    for i ∈ 1:length(predicted)
        data[:, i] ~ MvNormal(predicted[i], σ^2 * I)
    end
end

#### Inference
Finally, we can perform the sampling...

In [None]:
model_rad_rad = fit_RSSM_lump(y, RSSM_dynamics_lump, [true_T_a, true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, Δ]);
chain_rad_rad = sample(model_rad_rad, NUTS(0.65), MCMCSerial(), 2500, 3; verbose = false, progress = true)

... and plot the results:

In [None]:
posterior_samples = Array(sample(chain_rad_rad, 300; replace = false))
plot_sol_app = sampled_traj_plot(posterior_samples, true_sol, y, time, Δ, true_p, 5, RSSM_dynamics_lump)
# savefig(plot_sol_app, "Results\\Expand\\rad_rad_states.pdf")

The inference appears to have been highly successful once again: the sampled trajectories are again too close to the true trajectories to see them. 

We again look at the (approximate) posteriors of the parameters. I did not want 7 plots, so I have not plotted the posterior of the measurement noise standard deviation. It is similar to the one we saw before, suggesting that the radiation term does not cause much more process noise.

In [None]:
p_k12 = marg_post_plot(:k12, true_k12, chain_rad_rad, 300)
p_k23 = marg_post_plot(:k23, true_k23, chain_rad_rad, 200)
p_h_a = marg_post_plot(:h_a, true_h_a, chain_rad_rad, 1.5)
prior_α, prior_β = calc_beta_params(0.85, 0.1)
p_ε1 = marg_post_plot(:ε1, true_ε1, chain_rad_rad, 8)
plot!(p_ε1, Beta(prior_α, prior_β), label = "Prior", linestyle = :dash, color = "black")
p_ε2 = marg_post_plot(:ε2, true_ε2, chain_rad_rad, 8)
plot!(p_ε2, Beta(prior_α, prior_β), label = "Prior", linestyle = :dash, color = "black")
p_ε3 = marg_post_plot(:ε3, true_ε3, chain_rad_rad, 8)
plot!(p_ε3, Beta(prior_α, prior_β), label = "Prior", linestyle = :dash, color = "black")
plot_marg_post = plot(p_k12, p_k23, p_h_a, p_ε1, p_ε2, p_ε3, size = (1800, 800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(plot_marg_post, "Results\\Expand\\rad_rad_params.pdf")

The posteriors of all the parameters are very good: peaking near the true value, and quite narrow. Notably, however, the posteriors of the parameters $h_a$, $k_{12}$, and $k_{23}$ are a bit wider than before. Maybe unidentifiability plays a role in this? The posteriors of the emissivities are a little bit taller than the priors, but it seems like we were not able to learn a lot about them from these measurements.

In [None]:
corn = cornerplot(posterior_samples[:, 2:end], label = [L"k_{12}", L"k_{23}", L"h_a", L"ε_1", L"ε_2", L"ε_3"], size = (2400, 2400), leftmargin = 8mm, bottommargin = 6mm)
# savefig(corn, "Results\\Expand\\rad_rad_corner.pdf")

#### Validation
We perform a similar validation to before. 

In [None]:
# System of ODEs to validate the radiative model for prediction
function RSSM_lump_val(dT, T, p, t)
    mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, t_start_1, t_start_2, t_start_3, t_end_1, t_end_2, t_end_3, T_a, k12, k23, h_a, ε1, ε2, ε3 = p
    σ_sb = 5.67e-8 # Stefan-Boltzmann constant
    # Conduction
    dT[1] = k12 * (T[2] - T[1]) / mcp_1
    dT[2] = (k12 * (T[1] - T[2]) + k23 * (T[3] - T[2])) / mcp_2
    dT[3] = k23 * (T[2] - T[3]) / mcp_3
    # Convection
    dT[1] += h_a * A_1 * (T_a - T[1]) / mcp_1
    dT[2] += h_a * A_2 * (T_a - T[2]) / mcp_2
    dT[3] += h_a * A_3 * (T_a - T[3]) / mcp_3
    # Radiation
    dT[1] -= A_1 * ε1 * σ_sb * (T[1]^4 - T_a^4) / mcp_1
    dT[2] -= A_2 * ε2 * σ_sb * (T[2]^4 - T_a^4) / mcp_2
    dT[3] -= A_3 * ε3 * σ_sb * (T[3]^4 - T_a^4) / mcp_3    
    # Input
    dT[1] += Φ_val_smooth(B_1, t_start_1, t_end_1, t) / mcp_1
    dT[2] += Φ_val_smooth(B_2, t_start_2, t_end_2, t) / mcp_2
    dT[3] += Φ_val_smooth(B_3, t_start_3, t_end_3, t) / mcp_3
end

In [None]:
val_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_k12, true_k23, true_h_a, true_ε1, true_ε2, true_ε3] 
RSSM_dynamics_lump_val = ODEProblem(RSSM_lump_val, T_0, (time_val[1], time_val[end]), val_p)
val_sol = solve(RSSM_dynamics_lump_val, Tsit5(); saveat = Δ_val, verbose = false);
p_in = val_input_plot(val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, time_val)
p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3 = sampled_traj_val_plot(posterior_samples, val_sol, time_val, Δ_val, val_p, true_σ, 5, RSSM_dynamics_lump_val)
plot_val = plot(p_in, p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3, size = (1800, 800), layout = (2, 3), leftmargin = 10mm, bottommargin = 10mm)
# savefig(plot_val, "Results\\Validation\\rad_rad_val.pdf")

The results of this validation are similar to the previous one. This makes sense, since we are still using the same model to generate the data as to identify the parameters. The residuals are somewhat larger, which makes sense since our parameter posteriors are wider. For some reason, the residuals start to diverge from $t \approx 300~\second$: we don't know the cause of this.

#### Inference: Probabilistic Model ≠ Physical Model
It would be interesting to know whether neglecting the radiation in the model we use for identification has a significant impact on the quality of the inference, because it can give us some intuition about whether we need to worry about small contributions that we have ignored/have not thought of. Obviously in this way we will not get estimates for the emissivities.

In [None]:
model_rad_nonrad = fit_LSSM_lump(y, LSSM_dynamics_lump, [true_T_a, true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, Δ]);
chain_rad_nonrad = sample(model_rad_nonrad, NUTS(0.65), MCMCSerial(), 2500, 3; verbose = false, progress = true)

In [None]:
true_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, true_T_a, true_k12, true_k23, true_h_a]
posterior_samples = Array(sample(chain_rad_nonrad, 300; replace = false))
plot_sol_app = sampled_traj_plot(posterior_samples, true_sol, y, time, Δ, true_p, 2, LSSM_dynamics_lump)
# savefig(plot_sol_app, "Results\\Expand\\rad_nonrad_states.pdf")

Despite the fact that the generative and identification models differ, the state estimates still appear to be very good. 

In [None]:
p_σ = marg_post_plot(:σ, true_σ, chain_rad_nonrad, 600)
p_k12 = marg_post_plot(:k12, true_k12, chain_rad_nonrad, 900)
p_k23 = marg_post_plot(:k23, true_k23, chain_rad_nonrad, 600)
p_h_a = marg_post_plot(:h_a, true_h_a, chain_rad_nonrad, 14)
plot_marg_post = plot(p_σ, p_k12, p_k23, p_h_a, size = (1200, 800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(plot_marg_post, "Results\\Expand\\rad_nonrad_params.pdf")

On the other hand, we see that the MAP estimates for each parameter is far from the true value. The estimate of the convection coefficient is particularly bad, which I think makes sense because the influence of radiation is most similar to that of convection: this means that convection takes over the role of radiation. Notably, the MAP estimate of the measurement noise standard deviation is about one order of magnitude greater than before. This is caused by the fact that the influences that we do not model behave somewhat like process noise. 

In [None]:
corn = cornerplot(posterior_samples, label = [L"σ", L"k_{12}", L"k_{23}", L"h_a"], size = (1800, 1800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(corn, "Results\\Expand\\rad_nonrad_corner.pdf")

In [None]:
val_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_k12, true_k23, true_h_a, true_ε1, true_ε2, true_ε3] 
RSSM_dynamics_lump_val = ODEProblem(RSSM_lump_val, T_0, (time_val[1], time_val[end]), val_p)
val_sol = solve(RSSM_dynamics_lump_val, Tsit5(); saveat = Δ_val, verbose = false);
val_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_k12, true_k23, true_h_a] 
p_in = val_input_plot(val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, time_val)
p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3 = sampled_traj_val_plot(posterior_samples, val_sol, time_val, Δ_val, val_p, true_σ, 2, LSSM_dynamics_lump_val)
plot_val = plot(p_in, p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3, size = (1800, 800), layout = (2, 3), leftmargin = 10mm, bottommargin = 10mm)
# savefig(plot_val, "Results\\Validation\\rad_nonrad_val.pdf")

### Sliced Blocks Model
In our [exploration of MCMC](Explore_MCMC.ipynb), as well as the previous examples, we saw that MCMC works well when we use the simple lumped-element model (with blocks with internally constant temperature) for both the physical and the probabilistic model. To derive our governing equations, we implicitly assumed that conduction within blocks occurs at a much higher rate than any other heat transfer process, so that each block has a single temperature throughout. This is only realistic if the internal conduction is much faster than the conduction between blocks. In the following section, we will first discuss how to augment the physical model to allow for temperature gradients within blocks. We will then perform inference on data generated in this way, using two different probabilistic models: the more simple lumped-element model we have used up to now, and the sliced blocks model we propose in this section.
#### Physical Model
One way we can deal with this is by chopping each block up into (equally sized) slices, which exchange heat between each other. We assume that slices within a block conduct in a known way. This seems sensible because internal conduction coefficients of metals are well-known physical constants (though they may depend on temperature, I think it reasonable to neglect this in the termpature ranges we are dealing with). We only observe the temperature of the central slice of each block, which is also where the external heat gets put into the system (This probably is not a very sensible thing to do in real life...), reusing the input heats from above. Finally, we again neglect the influence of radiation.

#### Simulate Data
To perform inference in a reasonable amount of time, we need to use fewer observations. We also have to limit the number of slices in order to keep the computation time down. If you are only interested in seeing the evolution of the temperatures with known parameters, however, it is no problem to use 100 slices per block.

In [None]:
# time horizon
sample_size = 25
Δ = 20 # Final time t = 500 s
time = [Δ * i for i in 0:sample_size] 

# Create the slices
slices_per_block = 5 # This should work for any positive integer (hopefully 🤞), including 1. More slices will make everything slower
total_slices = 3 * slices_per_block
mid_of_block = ceil(Int64, slices_per_block / 2)
observed_nodes = [i * slices_per_block + mid_of_block for i in 0:2]
# Conduction coefficients
int_vs_ext_cond = 1e0
true_k1 = 2e1 * slices_per_block * int_vs_ext_cond
true_k2 = 2.5e1 * slices_per_block * int_vs_ext_cond
true_k3 = 2.2e1 * slices_per_block * int_vs_ext_cond
true_ks_int = [true_k1, true_k2, true_k3]

# Heat capacities
true_Cs = Vector{Float64}(undef, total_slices)
true_Cs[1:slices_per_block] .= true_mcp_1 / slices_per_block
true_Cs[(slices_per_block + 1):(2 * slices_per_block)] .= true_mcp_2 / slices_per_block
true_Cs[(2 * slices_per_block + 1):(3 * slices_per_block)] .= true_mcp_3 / slices_per_block

# Slice surface areas
true_As = Vector{Float64}(undef, total_slices)
true_As[1:slices_per_block] .= true_A_1 / slices_per_block
true_As[(slices_per_block + 1):(2 * slices_per_block)] .= true_A_2 / slices_per_block
true_As[(2 * slices_per_block + 1):(3 * slices_per_block)] .= true_A_3 / slices_per_block

# Initial temperatures
T_0_slice = Vector{Float64}(undef, total_slices)
T_0_slice[1:slices_per_block] .= T_0[1]
T_0_slice[(slices_per_block + 1):(2 * slices_per_block)] .= T_0[2]
T_0_slice[(2 * slices_per_block + 1):(3 * slices_per_block)] .= T_0[3]

Using the true parameters, we can solve this system with DifferentialEquations.jl, and visualise the evolution of the temperatures, latent and observed. 

In [None]:
# Often 
function Φ_cond!(dT, T, ks, Cs, N_s)
    dT[1] = ks[1] * (T[2] - T[1]) / Cs[1]
    dT[2:(N_s - 1)] .= @. (ks[1:(N_s - 2)] * (T[1:(N_s - 2)] - T[2:(N_s - 1)]) + ks[2:(N_s - 1)] * (T[3:N_s] - T[2:(N_s - 1)])) / Cs[2:(N_s - 1)]
    dT[N_s] = ks[N_s - 1] * (T[N_s - 1] - T[N_s]) / Cs[N_s] 
end

function Φ_conv!(dT, T, h_a, T_a, As, Cs)
    dT .+= @. h_a * As * (T_a - T) / Cs
end

function Φ_input!(dT, T, B_1, B_2, B_3, ω_1, ω_2, ω_3, t, Cs, obs_slices)
    dT[obs_slices[1]] += Φ(B_1, ω_1, t) / Cs[obs_slices[1]]
    dT[obs_slices[2]] += Φ(B_2, ω_2, t) / Cs[obs_slices[2]]
    dT[obs_slices[3]] += Φ(B_3, ω_3, t) / Cs[obs_slices[3]]
end

function LSSM_slice(dT, T, p, t)
    s, obs_slices, Cs, As, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, ks_int, k12, k23, h_a = p
    N_s = 3 * s
    # Put conduction coefficient between adjacent slices in list
    ks = Vector{Any}(undef, N_s - 1) # DifferentialEquations uses automatic differentiation. This converts Float64 to complicated ForwardDiff type. ks must contain elements of both types.
    ks[1:(s - 1)] .= ks_int[1]
    ks[s] = k12
    ks[(s + 1):(2 * s - 1)] .= ks_int[2]
    ks[2 * s] = k23
    ks[(2 * s + 1):(3 * s - 1)] .= ks_int[3]
    
    Φ_cond!(dT, T, ks, Cs, N_s)
    Φ_conv!(dT, T, h_a, T_a, As, Cs)
    Φ_input!(dT, T, B_1, B_2, B_3, ω_1, ω_2, ω_3, t, Cs, obs_slices)
end

Note that if we increase the conduction coefficients within the block, our previous assumption of a homogeneous internal temperature should become more reasonable. Indeed, if we put them at around $10^5$, we essentially see just 3 lines.

In [None]:
true_p_slice = [slices_per_block, observed_nodes, true_Cs, true_As, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, true_T_a, true_ks_int, true_k12, true_k23, true_h_a]
LSSM_dynamics_slice = ODEProblem(LSSM_slice, T_0_slice, (time[1], time[end]), true_p_slice)
true_sol = solve(LSSM_dynamics_slice, Tsit5(); saveat = Δ, verbose = false)
plot_obs = plot(true_sol, title = "Evolution of All Temperatures", xlabel = L"t~(\textrm{s})", ylabel = L"T~(\textrm{K})", legend = nothing, xlim = (time[1], time[end]), ylim = (260, 340), size = (1200, 400), bottommargin = 6mm, leftmargin = 6mm)
# savefig(plot_obs, "Results\\Expand\\sliced_obs.pdf")

I think a heatmap will make the internal temperature differences more clear.

In [None]:
hmap = state_heatmap(Array(true_sol), time, Δ)
# savefig(hmap, "Results\\Expand\\sliced_obs_heatmap_slow_fine.pdf")

Recall that we still only observe a single temperature per block (namely the one in the centre slice). 

In [None]:
T_obs = Array(true_sol)[observed_nodes, :]
true_σ = 1e-2
y = T_obs + true_σ * randn(size(T_obs))
plot_obs = observ_plot(T_obs', y, time)

#### Inference: Probabilistic Model ≠ Physical Model
Next, we will perform the inference. It is interesting to consider the following: we have generated our data using some contrived physical model. In the previous example, we have subsequently performed inference using the physical model as our probabilistic model. In practice, however, our data will not come from a known physical model. Instead, it will be generated in some complicated way, which is modelled to a certain degree by some known physical model. It therefore feels like cheating to use the equations generating the data as the probabilistic model. 

Hence, in the next two subsections we will perform data generated using our sliced blocks model, using two different probabilistic models: the lumped-element and sliced blocks models. In this subsection, the probabilistic model is the lumped-element model, so that the physical and probabilistic model differ. 

In [None]:
true_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, true_T_a, true_k12, true_k23, true_h_a] # First known, then unknown parameters
LSSM_dynamics_lump = ODEProblem(LSSM_lump, T_0, (time[1], time[end]), true_p)
model_slice_lump = fit_LSSM_lump(y, LSSM_dynamics_lump, [true_T_a, true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, Δ]);
chain_slice_lump = sample(model_slice_lump, NUTS(0.65), MCMCSerial(), 2500, 3; verbose = false, progress = true)

We once again plot some sampled trajectories.

In [None]:
posterior_samples = Array(sample(chain_slice_lump, 100; replace = false))
plot_sol_app = sampled_traj_plot(posterior_samples, true_sol, y, time, Δ, true_p, 2, LSSM_dynamics_lump, observed_nodes)
# savefig(plot_sol_app, "Results\\Expand\\slice_lump_states.pdf")

Here, like when we used the linear state space model on data that was generated by the model including the influence of radiation, we see that the sampled trajectories are quite close together, but there are systematic error with respect to the true trajectories. 

In [None]:
p_σ = marg_post_plot(:σ, true_σ, chain_slice_lump, 40)
p_k12 = marg_post_plot(:k12, true_k12, chain_slice_lump, 60)
p_k23 = marg_post_plot(:k23, true_k23, chain_slice_lump, 40)
p_h_a = marg_post_plot(:h_a, true_h_a, chain_slice_lump, 1)
plot_marg_post = plot(p_σ, p_k12, p_k23, p_h_a, size = (1200, 800), leftmargin = 6mm, bottommargin = 6mm)
# savefig(plot_marg_post, "Results\\Expand\\slice_lump_params.pdf")

Similarly, the empirical posteriors for the conduction coefficients again show sizable biases. I think this is caused by the fact that the temperature differences at the boundaries of the blocks will be smaller than one would expect if the temperatures were constant within the blocks, because the internal conduction will tend to even things out. This causes the conduction coefficients to be underestimated. The standard deviation of the measurement noise is once again massively overestimated. 

Playing around with the value of `int_vs_ext_cond`, we see that inference gets worse as we make internal conduction coefficients small. This makes sense, since we expect our assumption to become more reasonable as the internal conduction becomes faster.

In [None]:
corn = cornerplot(posterior_samples, label = [L"σ", L"k_{12}", L"k_{23}", L"h_a"], size = (1800, 1800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(corn, "Results\\Expand\\slice_lump_corner.pdf")

In [None]:
function Φ_input_val!(dT, T, B_1, B_2, B_3, t_start_1, t_start_2, t_start_3, t_end_1, t_end_2, t_end_3, t, Cs, obs_slices)
    dT[obs_slices[1]] += Φ_val_smooth(B_1, t_start_1, t_end_1, t) / Cs[obs_slices[1]]
    dT[obs_slices[2]] += Φ_val_smooth(B_2, t_start_2, t_end_2, t) / Cs[obs_slices[2]]
    dT[obs_slices[3]] += Φ_val_smooth(B_3, t_start_3, t_end_3, t) / Cs[obs_slices[3]]
end

function LSSM_slice_val(dT, T, p, t)
    s, obs_slices, Cs, As, B_1, B_2, B_3, t_start_1, t_start_2, t_start_3, t_end_1, t_end_2, t_end_3, T_a, ks_int, k12, k23, h_a = p
    N_s = 3 * s
    # Put conduction coefficient between adjacent slices in list
    ks = Vector{Any}(undef, N_s - 1) 
    ks[1:(s - 1)] .= ks_int[1]
    ks[s] = k12
    ks[(s + 1):(2 * s - 1)] .= ks_int[2]
    ks[2 * s] = k23
    ks[(2 * s + 1):(3 * s - 1)] .= ks_int[3]
    Φ_cond!(dT, T, ks, Cs, N_s)
    Φ_conv!(dT, T, h_a, T_a, As, Cs)
    Φ_input_val!(dT, T, B_1, B_2, B_3, t_start_1, t_start_2, t_start_3, t_end_1, t_end_2, t_end_3, t, Cs, obs_slices)
end

In [None]:
val_p = [slices_per_block, observed_nodes, true_Cs, true_As, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_ks_int, true_k12, true_k23, true_h_a]  
LSSM_dynamics_slice_val = ODEProblem(LSSM_slice_val, T_0_slice, (time_val[1], time_val[end]), val_p)
val_sol = solve(LSSM_dynamics_slice_val, Tsit5(); saveat = Δ_val, verbose = false);
val_p = [true_mcp_1, true_mcp_2, true_mcp_3, true_A_1, true_A_2, true_A_3, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_k12, true_k23, true_h_a] 
p_in = val_input_plot(val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, time_val)
p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3 = sampled_traj_val_plot(posterior_samples, val_sol, time_val, Δ_val, val_p, true_σ, 2, LSSM_dynamics_lump_val, observed_nodes)
plot_val = plot(p_in, p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3, size = (1800, 800), layout = (2, 3), leftmargin = 10mm, bottommargin = 10mm)
# savefig(plot_val, "Results\\Validation\\slice_lump_val.pdf")

#### Inference: Probabilistic Model = Physical Model
<font size='10' color='red'> __WARNING: VERY SLOW 😥__</font>

In this subsection, the probabilistic model is the sliced blocks model, so that the physical and probabilistic model coincide.

In [None]:
@model function fit_LSSM_slice(data, system, syst_consts) 
    # Gamma in Distributions.jl is shape-scale, not shape-rate
    σ ~ Gamma(1e-2, 1e0) # E[σ] = 10^-2, Var(σ) = 10^-2 
    k12 ~ Gamma(1e0, 1e0) # E[k] = 10^0, Var(k) = 10^0
    k23 ~ Gamma(1e0, 1e0) 
    h_a ~ Gamma(1e1, 1e0) # E[h_a] = 10^1, Var(h_a) = 10^1
    # s, mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, k1, k2, k3, Δ = syst_consts
    # p = [s, mcp_1, mcp_2, mcp_3, A_1, A_2, A_3, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, k1, k2, k3, k12, k23, h_a]
    s, obs_slices, Cs, As, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, ks_int, Δ = syst_consts
    p = [s, obs_slices, Cs, As, B_1, B_2, B_3, ω_1, ω_2, ω_3, T_a, ks_int, k12, k23, h_a]
    predicted = solve(system, Tsit5(); p = p, saveat = Δ, verbose = false)
    for i ∈ 1:length(predicted[1, :])
        data[:, i] ~ MvNormal(predicted[obs_slices, i], σ^2 * I)
    end
end

Due to the complexity of the probabilistic model, inference is very slow: e.g. for each sample from the Markov chain, the trajectories (including the latent ones) have to be numerically computed. For that reason, I have decided to only take 1000 samples instead of 2500. On top of that, I only sample from a single Markov chain, whereas in the other examples I used three independent chains. Multiple chains are mostly useful for diagnostic purposes, since by looking at whether the empirical posteriors are roughly equal one can determine whether equilibrium has been reached in the burn-in stage. Moreover, more chains also means more samples (although it would be more efficient to sample from a single chain, since then you only have to throw away a single set of burn-in samples). We simply trust that equilibrium has been reached, and make do with fewer samples.

In [None]:
model_slice_slice = fit_LSSM_slice(y, LSSM_dynamics_slice, [slices_per_block, observed_nodes, true_Cs, true_As, true_B_1, true_B_2, true_B_3, true_ω_1, true_ω_2, true_ω_3, true_T_a, true_ks_int, Δ]);
chain_slice_slice = sample(model_slice_slice, NUTS(0.65), MCMCSerial(), 1000, 1; verbose = false, progress = true) 

Let's see whether the improvement in the inference was worth the wait:

In [None]:
posterior_samples = Array(sample(chain_slice_slice, 100; replace = false))
plot_sol_app = sampled_traj_plot(posterior_samples, true_sol, y, time, Δ, true_p_slice, 2, LSSM_dynamics_slice, observed_nodes)
# savefig(plot_sol_app, "Results\\Expand\\slice_slice_states.pdf")

That is a good sign: the sampled trajectories once again overlap with the true trajectory.

In [None]:
p_σ = marg_post_plot(:σ, true_σ, chain_slice_slice, 50)
p_k12 = marg_post_plot(:k12, true_k12, chain_slice_slice, 80)
p_k23 = marg_post_plot(:k23, true_k23, chain_slice_slice, 60)
p_h_a = marg_post_plot(:h_a, true_h_a, chain_slice_slice, 1.25)
plot_marg_post = plot(p_σ, p_k12, p_k23, p_h_a, size = (1200, 800), leftmargin = 6mm, bottommargin = 6mm)
# savefig(plot_marg_post, "Results\\Expand\\slice_slice_params.pdf")

Indeed, the MAP estimates for the parameters is very close to the true values, although the variance is somewhat larger than when we generated data with the lumped-element model. Using the sliced blocks model for the identification has significantly reduced the estimate for the standard deviation of the measurement noise, but it is still quite large. It therefore might be interesting to investigate what the sources of this are: by dealing with these it could be possible to improve the identification of the parameters of interest. 

In [None]:
corn = cornerplot(Array(posterior_samples), label = [L"σ", L"k_{12}", L"k_{23}", L"h_a"], size = (1800, 1800), leftmargin = 8mm, bottommargin = 6mm)
# savefig(corn, "Results\\Expand\\slice_slice_corner.pdf")

In [None]:
val_p = [slices_per_block, observed_nodes, true_Cs, true_As, val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, true_T_a, true_ks_int, true_k12, true_k23, true_h_a]  
LSSM_dynamics_slice_val = ODEProblem(LSSM_slice_val, T_0_slice, (time_val[1], time_val[end]), val_p)
val_sol = solve(LSSM_dynamics_slice_val, Tsit5(); saveat = Δ_val, verbose = false);
p_in = val_input_plot(val_B_1, val_B_2, val_B_3, val_t_start_1, val_t_start_2, val_t_start_3, val_t_end_1, val_t_end_2, val_t_end_3, time_val)
p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3 = sampled_traj_val_plot(posterior_samples, val_sol, time_val, Δ_val, val_p, true_σ, 2, LSSM_dynamics_slice_val, observed_nodes)
plot_val = plot(p_in, p_out_traj, p_out_error, h_mse_1, h_mse_2, h_mse_3, size = (1800, 800), layout = (2, 3), leftmargin = 10mm, bottommargin = 10mm)
# savefig(plot_val, "Results\\Validation\\slice_slice_val.pdf")

## Bibliography
- Abramovich, F., & Ritov, Y. (2013).  _Statistical Theory: A Concise Introduction_. CRC Press.
- Murphy, K. (2022). _Probabilistic Machine Learning: An Introduction_. URL: [https://github.com/probml/pml-book](https://github.com/probml/pml-book).
- Clercx, H. (2015). _Lecture Notes ‘Physics of Transport Phenomena’_. ISBN: 9788578110796
- Carter, A. (2000). _Classical and Statistical Thermodynamics_. Pearson. 
- Cox, A., van de Laar, T., & de Vries, B. (2019). _ForneyLab: A factor graph approach to automated design of Bayesian signal processing algorithms_. DOI: [10.1016/j.ijar.2018.11.002](https://doi.org/10.1016/j.ijar.2018.11.002). URL: [https://github.com/biaslab/ForneyLab.jl](https://github.com/biaslab/ForneyLab.jl).
- Särkkä, S. (2013). _Bayesian Filtering and Smoothing_. Cambridge University Press. ISBN: 9781482211849. DOI: [10.1080/00401706.1963.10490114](https://doi.org/10.1080/00401706.1963.10490114).
- Bagaev, D., & de Vries, B. (2021). _ReactiveMP.jl: a Julia package for automatic Bayesian inference on a factor graph with reactive message passing_. URL: [https://github.com/biaslab/ReactiveMP.jl/releases/tag/v1.3.1](https://github.com/biaslab/ReactiveMP.jl/releases/tag/v1.3.1).
- Lutinnen, J. (2013). 'Fast variational Bayesian linear state-space model'. In: _European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)_. Vol. 8188. pp. 305-320. ISBN: 9783642409875. DOI: [10.1007/978-3-642-40988-2_20](https://doi.org/10.1007/978-3-642-40988-2_20).
- Betancourt, M. (2017). _A Conceptual Introduction to Hamiltonian Monte Carlo_. DOI: [10.48550/ARXIV.1701.02434](https://doi.org/10.48550/ARXIV.1701.02434). arXiv: [1701.02434](http://arxiv.org/abs/1701.02434).
- Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). 'Equation of State Calculations by Fast Computing Machines'. In: _The Journal of Chemical Physics 21.6_. pp. 1087-1092. ISSN: 00219606. DOI: [10.1063/1.1699114](https://doi.org/10.1063/1.1699114).
- Robert, C. P., & Casella, G. (2004). _Monte Carlo Statistical Methods_. Springer Texts in Statistics. ISBN: 978-1-4419-1939-7. DOI: [10.1007/978-1-4757-4145-2](https://doi.org/10.1007/978-1-4757-4145-2). URL: [http://link.springer.com/10.1007/978-1-4757-4145-2](http://link.springer.com/10.1007/978-1-4757-4145-2).
- Ge, H., Xu, K., & Ghahramani, Z. (2018). 'Turing: a language for flexible probabilistic inference'. In: _International Conference on Artificial Intelligence and Statistics (AISTATS)_. pp. 1682-1690. URL: [http://proceedings.mlr.press/v84/ge18b.html](http://proceedings.mlr.press/v84/ge18b.html)
- Rackaukas, C., & Nie, Q. (2017). 'DifferentialEquations.jl–a performant and feature-rich ecosystem for solving differential equations in julia'. In: _Journal of Open Research Software 5.1_. p. 15
- Bouwens, R. E. A., de Groot, P. A. M., Kranendonk, W., van Lune, P., Prop - van den Berg, C. M., van Riswick, J. A. M. H., & Westra, J. J. (2013). _BINAS_. 6th ed. Noordhoff Uitgevers. ISBN: 9789001817497. URL: [https://view.publitas.com/noordhoff-voortgezet-onderwijs-groningen/binas-6e-ed-havo-vwo-informatieboek-9789001817497/page/2-3](https://view.publitas.com/noordhoff-voortgezet-onderwijs-groningen/binas-6e-ed-havo-vwo-informatieboek-9789001817497/page/2-3)
- Berkum, E. E. M., & Di Bucchianico, A (2016). _Statistical Compendium_. Eindhoven University of Technology. 