# Protein Folding Application: Two State Folding

The following notebook will demonstrate an application of a two state reversible kinetic model, we will use a Physics Informed Neural Network to predict how much folded protein is present given some time $t$.

In [None]:
import torch
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

## The Two State Model

The simplest kinetic model that defines the process of folding a protein is the two state kinetic model. The two states are the unfolded state $U$ and folded state $N$ of protein. This kinetic model is relevant for proteins that are only a single domain as they are sufficiently small enough to be able to fold and unfold reversibly without any external assistance. This kinetic model assumes that a protein has no known intermediates (a stable structure that is not completely folded or unfolded).

$$\require{mhchem} \Large{\ce{U <=>[\, k_f \,][\, k_u \,] N}}$$

There are two main kinetic values that drive this process, the rate of folding $k_f$ and rate of unfolding $k_u$. The equlibrium $K_{eq}$ is determined by the ratio of these two rates.

$$\large{K_{eq} = \frac{k_u}{k_f}}$$

$K_{eq}$ is a key value as it allows us to determine the stability of the single domain protein $\Delta G^{UN}$ given some temperature $T$ in Kevlin.

$$\large{\Delta G^{UN} = -RT\ln(K_{eq})}$$

The function that represents the change of the folded population based on the kinetic model can be defined by the following differential equation.

$$\large{\frac{d[N]}{dt} = k_f[U] - k_u[N]}$$

With the following mass balance law enforced where $[N]_0$ and $[U]_0$ is the inital population of folded and unfolded protein respectively. This is effectively the law of conservation (mass cannot be created nor destroyed).

$$\large{[N]_0 + [U]_0 = [N] + [U]}$$ 

The mass balance law allows us to use substitution to make the ODE all in terms of $N$. Meaning we only need to track the change of a single population.

$$\large{\frac{d[N]}{dt} = k_f([N]_0 + [U]_0 - [N]) - k_u[N]}$$

Tracking only the folded population is sufficent as the mass balance allows us to determine that the change in $U$ is the opposite of $N$.

$$\large{\frac{d[N]}{dt} = -\frac{d[U]}{dt}}$$

## Solving for $[N]$

The following ODE is a linear first-order ODE and the general solution for the ODE is below where $c$ is a constant that can be solved for given a set of boundary conditions.

$$\large{[N](t) = \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)} + ce^{-(k_f + k_u)t}}$$

Say for example, a solution of protein was in acidic solution and was returned to physiological pH. The initial condition would be that there is no folded protein present in solution at $t = 0$ meaning that the following condition $[N](t = 0) = [N]_0 = 0$ is present. This means we can now solve for $c$ which gives us the specific solution for $N$.

$$\large{[N](t) = \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)} + \left([N]_0 - \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)}\right)e^{-(k_f + k_u)t}}$$ 

this function as $t \rightarrow \infty$ reaches the following limit which we define as $[N]_{eq}$. This is the expected folded population when equlibrium is reached.

$$\large{[N](t \rightarrow \infty) = [N]_{eq} = \frac{k_f([N]_0 + [U]_0)}{(k_f + k_u)}}$$ 

We can finally substitute $[N]_{eq}$ to simplify the ODE solution we have above, the solution we have below will be used to generate data to train our neural networks with.

$$\large{[N](t) = [N]_{eq} + \left([N]_0 - [N]_{eq}\right)e^{-(k_f + k_u)t}}$$ 


## Why apply a neural network to a known ODE?

The purpose is to test if we infact can teach black box methods to predict properties that adhere to physical laws. The conveinence of having a known solution to this ODE is powerful in that we can generate a dataset to train the model with and to test if NNs can infact adhere to physical laws and constraints that we see in the real world.