# Empirical Dynamic Modeling via Machine Learning

Typically the dynamics is described by some nonlinear ODE:

$$ \dot{x}(t) = f(x(t)) $$

As an example of bottom-up modeling, consider the 2-D Model of Yeast Glycolysis from *Bier, Bakker, & Westerhoff (Biophys. J. 78:1087-1093, 2000)*:

$$\begin{split} \dot{x_1} & = 2 k_1 x_1 x_2 -  \frac{k_p x_1}{x_1 + K_m} \\
\dot{x_2} & = V_{in} - k_1 x_2 x_1
\end{split}$$

This ODE can be solved through numerical integration:

$$ x(t) = x(t_0) + \int_{t_0}^{t} f(x(t')) \,d{t'} $$ 

Now, in data-driven modeling, we are given time-series measurements:

$$\{x(t_n)\},$$

from which we create data for training

$$\dot{x}(t_i), x(t_i)$$

as pairs of target-features, for every time point $i = 1,.., n$. This gives us a supervised learning problem:

$$ \arg \min_f \sum_{i=0}^n || f(x(t_i)) - \dot{x}(t_i)||^2 $$

that is solved to find the function $f$ that best describes the data through a machine learning method.

# General Direction

Performance metric:
1. Wasserstein Distance
2. Dynamic Time Warping
3. Kullback-Leibler Divergence

Wasserstein Distance takes into account the metric space!

![](https://i.stack.imgur.com/7rxeM.png)

# Executive Summary

1. One-step learning method $\rightarrow$ learn dynamics by computing derivatives from time-series data, then solve the optimization problem through machine learning methods. (No assumption of governing equations).
$$ \dot{x} = f(x, u) $$
2. LmmNet $\rightarrow$ learn dynamics by embedding a supervised learning problem inside a linear multistep method. This method assumes highly idealized setting where the data points are sampled at regular intervals. This issue can be overcome by using a 'data augmentation' strategy.

We evaluate the one-step method and the LmmNet extensively on canonical systems and also complex biochemical problems.

## Harmonic Oscillator

* 2-D Linear Oscillator
* We evaluate the performance of both methods on test data of the harmonic oscillator with cubic dynamics.
* With increasing augmentation, we get better performance.

## Linear Oscillator

* 3-D Linear Oscillator
* Evaluate the performance
* DTW, Wasserstein, MSE
* With increasing augmentation, we get better performance.

## Lorenz System

* We show that our methods accurately identify the attractor dynamics.
* 3-dimensional

## Hopf Bifurcation

* We show that our methods can identify bifurcation
* 3-dimensional

## 2-D Glycolysis

* We show that by training on two time-series, we can have 'very good' predictions on the test data.
* 2-D

## Cell Cycle

* The performance (accuracy) of both LmmNet and one-step learning improves with more data
* We also find that the methods are able to identify the dynamics of the 7 biochemical species.
* Simulate the results from Tyson and experiments from Solomon. (the S-shaped) MPF-cyclin curve.
* 1993

## Metabolic Pathway in E. Coli

* We do this using LmmNet (first test of LmmNet on real data)
* Extracting mechanistic insights using post-hoc explainability