# Fitting dynamical models in latent space
To run this notebook, you need:
- Pre-processed cytokine time series in the `data/processed/` folder
- the input weights of a neural network and the min and max cytokine concentrations used to scale the data, in `data/trained-networks`. 
Those files are available in the data repository hosted online, or you can generate them yourself from raw cytokine data using [`cytokine-pipeline`](https://github.com/tjrademaker/cytokine-pipeline). 

## Definition of the different models 
### Notation
$LS_1, LS_2$: cytokine concentrations projected in latent space

$ LS_1, LS_2$: cytokine integrals projected in latent space

## Constant velocity model
See supplementary information

## Constant force model
Intermediate model between constant velocity and force model with matching; removed from the paper. 

## Force model with matching ("Sigmoid")
There are two versions: "Sigmoid_freealpha", in which kinetic rate paraemters $\alpha$ and $\beta$ are fitted separately, and "Sigmoid_fixalpha", in which $\alpha = \frac{1}{20} \,  \mathrm{h^{-1}}$ and $\gamma = \beta / \alpha$ is fitted. 

### Equations for $LS_1$, $LS_2$
Similar for both nodes, but to capture the initial dip in node 2, we need to square the bounded exponential of the first phase

$$ LS_1(t) = \left(1 - e^{-\alpha t} \right) \left( \frac{a_0 \cos{\theta} + v_1}{e^{\beta(t - t_0)} + 1} - v_1 \right)$$

$$ LS_2(t) = \left(1 - e^{-\alpha t} \right) \left( \frac{(a_0 \sin{\theta} + v_2)(1 - e^{-\alpha t})}{e^{\beta(t - t_0)} + 1} - v_2 \right)$$

### Equations for $LS_1$, $LS_2$
It is possible to integrate analytically, so we can fit integrals first. Define $\tau = \alpha t$, $\tau_0 = \alpha t_0$, and $\gamma = \beta / \alpha$. The result is


$$ LS_1(t) = \frac{a_0 \cos{\theta}+ v_1}{\alpha} \left( I(\tau, \tau_0, \gamma) - \frac{1}{\gamma} \ln{\left(e^{-\gamma \tau} + e^{-\gamma \tau_0}  \right)} \right)  
- \frac{v_1}{\alpha} \left(\tau + e^{-\tau} \right) + K_1 $$

$$ LS_2(t) = \frac{a_0 \sin{\theta} + v_2}{\alpha} \left( 2I(\tau, \tau_0, \gamma) - \tfrac12 I(2 \tau, 2\tau_0, \tfrac{\gamma}{2}) - \tfrac{1}{\gamma} \ln{\left(e^{-\gamma \tau} + e^{-\gamma \tau_0}  \right)} \right)  
- \frac{v_2}{\alpha} \left(\tau + e^{-\tau} \right) + K_2 $$ 

where the $K_i$ are chosen to ensure $LS_i(0) = 0$. The complicated part is the integral $I(\tau, \tau_0, \gamma)$, which is

$$ I(\tau, \tau_0, \gamma) = \int \mathrm{d} \tau \frac{-e^{-\tau}}{e^{\gamma (\tau - \tau_0) }+ 1} $$

The special case $\frac{1}{\gamma} \in \mathbb{N}^+$ can be solved using partial fractions:


$$ I(\tau) = e^{-\tau} + e^{-\tau} \sum_{j=1}^{n-1} \frac{n}{n-j} (-1)^j e^{(\tau - \tau_0)\frac{j}{n}} \\
+ (-1)^n n e^{-\tau_0} \ln{ \left(e^{-\tau/n} + e^{-\tau_0 / n} \right)} $$

For the general case, Mathematica suggests the following answer, valid  when $\frac{1}{\gamma} \notin \mathbb{N}^+$:

$$ I(\tau) = e^{-\tau} {}_2F_1(1, \frac{-1}{\gamma}; 1 - \frac{1}{\gamma}; -e^{\gamma(\tau - \tau_0)}) $$

where ${}_2F_1$ is Gauss' hypergeometric function. This result can be derived from ${}_2F_1(0, b; c; z) =1$ and from the recurrence relation 
(NIST's *Digital Library of Mathematical Formulas*, eq. 15.5.20)

$$ z(1-z) \frac{d}{dz} {}_2F_1(a, b; c; z) = (c-a) \, {}_2F_1(a-1, b; c; z) \\ + (a -c +bz) \, {}_2F_1(a, b; c; z) $$

In [None]:
import warnings
warnings.filterwarnings("ignore")

import pickle
import sys, os
from time import perf_counter  # For timing
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec

In [None]:
from ltspcyt.scripts.adapt_dataframes import set_standard_order, sort_SI_column
from ltspcyt.scripts.latent_space import import_mutant_output
from ltspcyt.scripts.neural_network import import_WT_output

# Curve fitting functions
from ltspcyt.scripts.sigmoid_ballistic import return_param_and_fitted_latentspace_dfs

In [None]:
from scipy.special import seterr as special_seterr
special_seterr(loss="warn");

In [None]:
%matplotlib inline

In [None]:
peptides=["N4", "Q4", "T4", "V4", "G4", "E1", "A2", "Y3", "A8", "Q7"]
concentrations=["1uM","100nM","10nM","1nM"]
fit_vars={"Constant velocity":["v0","t0","theta","vt"],"Constant force":["F","t0","theta","vt"],
         "Sigmoid":["a0", "tau0", "theta", "v1", "gamma"], 
         "Sigmoid_freealpha":["a0", "tau0", "theta", "v1", "alpha", "beta"]}

In [None]:
# Import data and neural network parts needed to fit models
df_WT = import_WT_output()
minmaxfile = os.path.join("data", "trained-networks", "min_max-thomasRecommendedTraining.hdf")
df_min = pd.read_hdf(minmaxfile, key="df_min")
df_max = pd.read_hdf(minmaxfile, key="df_max")
projmat = np.load(os.path.join("data", "trained-networks", "mlp_input_weights-thomasRecommendedTraining.npy"))
print(df_min)

In [None]:
cytokines=df_min.index.get_level_values("Cytokine")
times=np.arange(1,73)

In [None]:
df=df_WT.unstack("Time").loc[:,("integral",cytokines,times)].stack("Time")
df=(df - df_min)/(df_max - df_min)
df_proj=pd.DataFrame(np.dot(df, projmat),index=df.index,columns=["Node 1","Node 2"])
df_proj.columns.name = "Node"

In [None]:
# Defining the dataset(s) to use
# Multiple datasets to populate parameter space better
# Activation_2 is the 100k T cells condition of old combined experiment Activation_TCellNumber_1, 
# the naive data of which is in TCellNumber_3. Hence, they share the same data for 100k naive T cells. 
# Need only TCellNumber 3, do not use Activation_2 unless you want to look at blast cells specifically. 
subset_exp = [
    "Activation_1", "Activation_3",  
    "PeptideComparison_1", "PeptideComparison_2", "PeptideComparison_3", "PeptideComparison_4",
    "TCellNumber_1", "TCellNumber_2", "TCellNumber_3", "TCellNumber_4",
    "HighMI_1-1", "HighMI_1-2", "HighMI_1-3", "HighMI_1-4"
]
df_proj_exp = df_proj.loc[subset_exp]

In [None]:
# Model choice and curve fit of that model
# Advice: run twice, once with "Sigmoid_freealpha" and regul_rate = 0.4
# once with "Constant_velocity" and regul_rate = 1.0
# Save both dataframes of fitted parameters, they are used in other parts of this project. 
# There is also a supplementary figure using fits from Sigmoid (fixed alpha), regul_rate 0.4. 
# (to show the impact of fitting alpha too or keeping it constant)

fit = "Sigmoid_freealpha"
#fit = "Sigmoid"
regul_rate = 0.4
#fit = "Constant velocity"
#regul_rate = 1.0

# File names specification for that model and regularization rate
name_specs = "{}_reg{}".format(fit.replace(" ", "_"), str(round(regul_rate, 2)).replace(".", ""))

start_time = perf_counter()

ret = return_param_and_fitted_latentspace_dfs(df_proj_exp, fit, reg_rate=regul_rate)
df_params, df_compare, df_hess, df_v2v1 = ret

end_t = perf_counter()
print("Time to fit: ", perf_counter() - start_time)
del start_time

nparameters = len(fit_vars[fit])
print(df_hess.median())  # Hessian matrix: inverse of the covariance. 

In [None]:
dataset = subset_exp[1]  # Select the data set to plot here
tcellnum = "100k"
df_compare_sel = df_compare.xs(tcellnum, level="TCellNumber", axis=0).xs(dataset, level="Data", axis=0)
print(df_compare_sel.index.names)
df_compare_sel.columns.names = ["Variable"]
data=df_compare_sel.loc[(peptides,concentrations,slice(None),slice(None),"concentration"),:]
h=sns.relplot(data=data.stack().reset_index(),x="Time",y=0,kind="line",sort=False,
            hue="Peptide",hue_order=peptides,
            col="Concentration",col_order=concentrations,row="Variable",
            style="Processing type", height=3.5)
#h.fig.savefig(os.path.join("figures", "fits", 
#   "concentrations_{}_{}.pdf".format(name_specs, dataset)), transparent=True)
plt.show()
plt.close()

In [None]:
# Plotting the fit of integrals vs time
data=df_compare_sel.loc[(peptides,concentrations,slice(None),slice(None),"integral"),:]
h=sns.relplot(data=data.stack().reset_index(),x="Time",y=0,kind="line",sort=False,
            hue="Peptide",hue_order=peptides,
            col="Concentration",col_order=concentrations,row="Variable",
            style="Processing type", height=3.5)
#h.fig.savefig(os.path.join("figures", "fits", 
#   "integrals_{}_{}.pdf".format(name_specs, dataset)), transparent=True)
plt.show()
plt.close()

In [None]:
# Plotting the latent space ballistic trajectories LS_1 vs LS_2
data=df_compare_sel.loc[(peptides,concentrations,slice(None),slice(None),"integral"),:]
h=sns.relplot(data=data.reset_index(), x="Node 1",y="Node 2", kind="line", sort=False,
            hue="Peptide",hue_order=peptides,
            size="Concentration",size_order=concentrations,
            style="Processing type")
#h.fig.savefig(os.path.join("figures", "fits", 
#   "LS1_vs_LS2_{}_{}.pdf".format(name_specs, dataset)), transparent=True)
plt.show()
plt.close()

In [None]:
# Uncomment to save df_compare and df_params to be reused elsewhere for plotting, e.g. main_plotting_scripts/
#df_compare.to_hdf(os.path.join("results", "fits", "df_compare_{}_selectdata.hdf".format(name_specs)), key="df_compare", mode="w")
#df_params.to_hdf(os.path.join("results", "fits", "df_params_{}_selectdata.hdf".format(name_specs)), key="df_params", mode="w")