# Protein Folding Application: Two State Folding

The following notebook will demonstrate an application of a two state reversible kinetic model, we will use a Physics Informed Neural Network to predict how much folded protein is present given some time $t$.

In [None]:
import torch
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

## The Two State Model

The fundamental kinetic model encapsulating the protein folding process is the two-state kinetic model. The model defines the unfolded and folded state of a protein as $U$ and $N$ respectively. Primarily applicable to single domain proteins, this model assumes reversible folding and unfolding events unaided by external factors and that no stable intermediates are present in the (un)folding process.

$$\require{mhchem} \Large{\ce{U <=>[\, k_f \,][\, k_u \,] N}}$$

There are two main kinetic values that drive this process, the rate of folding $k_f$ and rate of unfolding $k_u$. The equlibrium constant $K_{eq}$ is the ratio of these two rates.

$$\large{K_{eq} = \frac{k_u}{k_f}}$$

$K_{eq}$ is a key value as it allows us to determine the stability of the single domain protein $\Delta G^{UN}$ given some temperature $T$ in Kelvin.

$$\large{\Delta G^{UN} = -RT\ln(K_{eq})}$$

The differential function that represents the change of the folded population based on the kinetic model can be defined by the following equation.

$$\large{\frac{d[N]}{dt} = k_f[U] - k_u[N]}$$

With the following mass balance law enforced where $[N]_0$ and $[U]_0$ is the initial population of folded and unfolded protein respectively. This is effectively the law of conservation (mass cannot be created nor destroyed).

$$\large{[N]_0 + [U]_0 = [N] + [U]}$$ 

The mass balance law allows us to use substitution to make the ODE all in terms of $N$. Meaning we only need to track the change of a single population.

$$\large{\frac{d[N]}{dt} = k_f([N]_0 + [U]_0 - [N]) - k_u[N]}$$

Tracking only the folded population is sufficient as the mass balance allows us to determine that the change in $U$ is the opposite of $N$.

$$\large{\frac{d[N]}{dt} = -\frac{d[U]}{dt}}$$

## Solving for $[N]$

The following differential equation is a linear first-order ODE which has the following general solution given the mass balance above as our boundary condition ($[N](t = 0) = [N]_0$).

$$\large{[N](t) = \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)} + \left([N]_0 - \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)}\right)e^{-(k_f + k_u)t}}$$ 

As $t \rightarrow \infty$, the function reaches the following limit which we define as $[N]_{eq}$. This is the expected folded population when equilibrium is reached.

$$\large{[N](t \rightarrow \infty) = [N]_{eq} = \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)}}$$ 

We can finally substitute $[N]_{eq}$ to simplify the ODE solution we have above, the solution we have below will be used to generate data to train our neural networks with.

$$\large{[N](t) = [N]_{eq} + \left([N]_0 - [N]_{eq}\right)e^{-(k_f + k_u)t}}$$ 


## Why apply a neural network to a known ODE?

The purpose is to test if we in fact can teach black box methods to predict properties that adhere to physical laws. The convenience of having a known solution to this ODE is powerful in that we can generate a dataset to train the model with and to test if NNs can in fact adhere to physical laws and constraints that we see in the real world.
