# Dynamical Systems

Typically the dynamics is described by some nonlinear ODE:

$$ \dot{x}(t) = f(x(t)) $$

Bottom-up modeling example:

$$\begin{split} \dot{x_1} & = 2 k_1 x_1 x_2 -  \frac{k_p x_1}{x_1 + K_m} \\
\dot{x_2} & = V_{in} - k_1 x_2 x_1
\end{split}$$

2-D Model of Yeast Glycolysis from *Bier, Bakker, & Westerhoff (Biophys. J. 78:1087-1093, 2000)*.

$$ x(t) = x(t_0) + \int_{t_0}^{t} f(x(t')) \,d{t'} $$ 

Data-driven modeling: given time-series measurements

$$\{x(t_n)\},$$

create pairs of training data

$$\dot{x}(t_i), x(t_i)$$

as pairs of target-features, for every time point $i = 1,.., n$. This gives us a supervised learning problem:

$$ \arg \min_f \sum_{i=0}^n || f(x(t_i)) - \dot{x}(t_i)||^2 $$

to find the function f that best describes the data through a machine learning method.

*Kevin Siswandi*, 29 June 2020.

# General Direction

Traditional Supervised Learning problem:
$$ f: X \rightarrow Y$$
X are the features, Y are the targets. In our case, we learn $f'$ instead. Two approaches:
1. Manually compute derivatives from time-series data.
2. Embed the supervised learning problem in the framework of multistep method.

Advantage of the latter approach is that we do not need to artifically calculate derivatives. So far, we have:
1. Checked the correctness of our implementation in test problems -- Harmonic Oscillator, Hopf (Normal Form) Bifurcation, Lorenz Attractor, etc.
2. Applied this method to investigate the 2-D Yeast Glycolytic Oscillator -- studied the damped oscillation regime, oscillatory regime, identified bifurcation using ML simulations, investigated the behaviour of ML predictions with different parameters (ongoing).
3. Uncertainty Quantification
4. Explaination of predictions of the model (based on what the model has learnt from data)
5. Gain insights from ML simulations -- cell-cycle model

Reference:
* https://www.nature.com/articles/s41540-020-0126-z
* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1993813/

# Predicting Pathway Dynamics using Machine Learning

In the first half of the project, we use experimental data from proteomics and metabolomics to predict the metabolic pathway dynamics of either limonene or isopentenol produced by Escherichia Coli strains. In bioengineering, we often want to predict the cell behavior that results from the engineered changes in DNA. This is made possible thanks to the recent technological advances:
1. new synthetic biology capabilities, such as the gene-editing tool CRISPR-Cas9,
2. Availability of proteomics and metabolomics data

Turning these data into actionable insights is not trivial. While stoichiometric models ignore enzyme kinetics and cannot accurately capture the dynamics, kinetic models also have limitations:
* the kinetic parameters are estimated from in-vitro measurements which may not be valid in in-vivo conditions/experiments
* knowledge of the kinetic rate law for each specific reaction is required.

On the other hand, an ML approach could be automatically applied to any new pathway, improve in accuracy with more training data, and help in capturing complex dynamic relationships that are otherwise unknown. It also enables faster development of predictive pathway dynamics since all required knowledge is inferred from the experimental data.

![title](image/machine-learning.jpg)

The first step is to join the metabolites and protein tables, while creating the training data set consisting of:
* smoothed features (using [Savitzky-Golay](https://scipy-cookbook.readthedocs.io/items/SavitzkyGolay.html) low-pass filter)
* targets obtained by taking the derivative of the interpolated (smoothed) measurements

Here we will use TPOT to automate the ML pipeline: https://github.com/EpistasisLab/tpot. Cross-validation with learning curve is used to compare pipelines. For an interpretation of the learning curve, see: https://scikit-learn.org/stable/modules/learning_curve.html#learning-curve

## Applications

Remarks: Limonene is a bio-based jet fuel while isopentenol is a gasoline replacement.

* Accelerate the design of microbes/pathways that produce biofuel
* Provide a new way to guide bioengineering efforts

References:
* Main paper: [Costello & Martin, Nature 2018](https://www.nature.com/articles/s41540-018-0054-3)
* Dataset: https://doi.org/10.1016/j.cels.2016.04.004