### Collective Variables

Collective variables are any structural parameters that can be calculated throughout the simulation.  
They are mathematical functions that map the high-dimensional atomic coordinates of a molecular system into a lower-dimensional one, representing the system's state or progress along a reaction pathway. 

* Simplify complex dynamics: Instead of analyzing all the coordinates, CVs allow us to focus on a few important features.
* Quantifying structural changes. They capture meaningful changes like structural changes, ligand binding, or chemical reactions. 

In MD simulations, the system evolves according to "variations" in the Energy Landscape. This energy landscape is defined by the force field chosen. When you bias a simulation using a CV, the variation of the CV within the new landscape must be accounted for to calculate the forces accurately. Therefore, JAX, with it's efficient vectorized and parallel computation of CV gradients for all frames is utilized here.  

These CVs are implemented within PLUMED enhanced sampling methods via PYMC,  a Python-based interface that facilitates the rapid prototyping of new CVs to be used for analyzing trajectories or within biasing functions. 



Yhis repository contains: 
* Differentiated Coolective variables Examples.  
* Tests for JAX auto diff on edge cases like max, min and sort functions.  


# **Collective Variables in Molecular Dynamics**

### **Overview**
Collective variables (CVs) are structural parameters that can be computed throughout a molecular dynamics (MD) simulation. They serve as mathematical functions that map high-dimensional atomic coordinates into a lower-dimensional representation, capturing key aspects of the system's state or progress along a reaction pathway.

### **Why Use Collective Variables?**
- **Simplify Complex Dynamics**: Instead of analyzing all atomic coordinates, CVs focus on essential features that describe the system's behavior.
- **Quantify Structural Changes**: They provide insight into meaningful transformations, such as ligand binding, conformational shifts, or chemical reactions.

### **CVs in Enhanced Sampling & PLUMED Integration**
In MD simulations, the system evolves according to variations in the **energy landscape**, which is determined by the force field. When biasing a simulation using a CV, the landscape modification must be accounted for to correctly compute forces.

This repository leverages **JAX** for **efficient, vectorized, and parallel computation** of CV gradients across all frames. The CVs are implemented within **PLUMED** enhanced sampling methods via **PYMC**, a Python-based interface that enables rapid prototyping of new CVs for trajectory analysis or biasing functions.

## **Contents of This Repository**
- **Differentiated Collective Variable Examples**: Implementations of CVs with automatic differentiation.
- **JAX Auto-Diff Tests**: Edge-case testing on operations such as `max`, `min`, and `sort` to ensure correct differentiation behavior.



# The Center of Curvature Collective Variable

## Introduction
This code defines a **collective variable** for finding the **center of curvature**.  
The **center of the circle** is the intersection of the **perpendicular bisectors** of any two segments formed by three points.  
Below is the **mathematical derivation**.

---

## **Theory**  

Given three points **$\mathbf{p}_1$**, **$\mathbf{p}_2$**, and **$\mathbf{p}_3$**,  
the **line segments** $\mathbf{v}$ can be constructed as:

$$
\mathbf{v}_{12} = \mathbf{p}_2 - \mathbf{p}_1
$$

$$
\mathbf{v}_{23} = \mathbf{p}_3 - \mathbf{p}_2
$$


To calculate the **perpendicular vectors** to these bisectors, we first define **$\mathbf{N}$**,  
a vector **perpendicular to the plane** formed by **$\mathbf{v}_{12}$** and **$\mathbf{v}_{23}$**:  


$$
\mathbf{N} = \mathbf{v}_{12} \times \mathbf{v}_{23}
$$

Now, the **perpendicular bisector vectors** **$\mathbf{v}_{12}^\perp$** and **$\mathbf{v}_{23}^\perp$** are calculated as:

$$
\mathbf{v}_{12}^\perp = \mathbf{v}_{12} \times \mathbf{N}
$$

$$
\mathbf{v}_{23}^\perp = \mathbf{v}_{23} \times \mathbf{N}
$$



The **normalized perpendicular vectors** are given by:

$$
\mathbf{v}_{12}^\perp = \frac{\mathbf{v}_{12}^\perp}{\|\mathbf{v}_{12}^\perp\|}
$$
$$
\mathbf{v}_{23}^\perp = \frac{\mathbf{v}_{23}^\perp}{\|\mathbf{v}_{23}^\perp\|}
$$

The **parametric equation** of the bisector to the line segments has the form:

$$
\mathbf{x} = \mathbf{m}_{12} + t_1 \mathbf{v}_{12}^\perp
$$

$$
\mathbf{x} = \mathbf{m}_{23} + t_2 \mathbf{v}_{23}^\perp
$$

where:
- **$ \mathbf{m}_{12} $** and **$ \mathbf{m}_{23} $** are the **midpoints** of the segments.
- **$ \mathbf{x} $** is the **center of curvature**.

The relationship between the perpendicular bisectors and the midpoints is given by:

$$
t_1 \mathbf{v}_{12}^\perp - t_2 \mathbf{v}_{23}^\perp = \mathbf{m}_{23} - \mathbf{m}_{12}
$$

This can be rewritten in **matrix form** as:

$$
\mathbf{A} \cdot \mathbf{t} = \mathbf{b}
$$

and solved using the **least square method**.



# readmeENGapCV

*Date: 28 December 2024*

The following document includes two sections:  
- Theory behind the code.  
- How to use the code.

---

## The Theory behind the code

The dynamic existence of a protein can be tracked through snapshots, or "frames," captured at specific time steps. At each frame, residues are found in unique positions—with some cooperating towards protein folding, others resisting the fold and creating tension—while the role of some remains enigmatic.

This work operates under the premise that a protein's destiny is influenced by its constituent residues. By employing the energy gap method, the **ENGap Collective Variable** was created to be used as a biasing potential integrated into PLUMED.

The energy gap is defined as the difference between the least and second least eigenvalues, normalized by the average difference between consecutive eigenvalues. Each frame will have an Energy Gap value, **ENG(t)**, and its standard deviation, **SDENG(t)**. Frames with larger ENG(t) and smaller SDENG(t) may reflect a native protein state [Meli et al.](#meli2020).

### Energy Gap Definitions

**ENG(t)** is defined as:

$$
\text{ENG}(t) = \frac{\Delta \lambda_{1-2}(t)}{\langle \Delta \lambda(t) \rangle}
$$

and the spectral standard deviation is given by:

$$
\text{SDENG}(t) = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left(\lambda_i - \bar{\lambda}\right)^2}
$$

#### ENGap Collective Variable Computation

The Collective Variable (CV) is computed as:

$$
\text{CV}(t) = \alpha(t) \cdot \text{ENG}(t) - \beta(t) \cdot \text{SDENG}(t)
$$

where **\(\alpha(t)\)** and **\(\beta(t)\)** are weights modulating **ENG(t)** and **SDENG(t)**, respectively. These weights are dynamically adjusted based on the computed ENG and SDENG values:

- \(\alpha = \alpha \times 1.1\) if \(\text{ENG}(t) > \text{threshold}_{\text{ENG}}\)
- \(\beta = \beta \times 0.9\) if \(\text{SDENG}(t) < \text{threshold}_{\text{SDENG}}\)

The threshold values are defined in [Meli et al.](#meli2020).

To integrate this CV into a biasing potential in PLUMED, it first needs to be differentiated with respect to the atomic positions \(\mathbf{r}\):

$$
\frac{\partial \text{CV}}{\partial \mathbf{r}} = \alpha(t) \cdot \frac{\partial \text{ENG}}{\partial \lambda_i} \cdot \frac{\partial \lambda_i}{\partial M} \cdot \frac{\partial M}{\partial D} \cdot \frac{\partial D}{\partial \mathbf{r}} - \beta(t) \cdot \frac{\partial \text{SDENG}}{\partial \lambda_i} \cdot \frac{\partial \lambda_i}{\partial M} \cdot \frac{\partial M}{\partial D} \cdot \frac{\partial D}{\partial \mathbf{r}}
$$

**Where:**

- **\(\alpha\):** Weight modulating the influence of ENG, indicating its sensitivity to eigenvalues.
- **\(\beta\):** Weight modulating the influence of SDENG, indicating its sensitivity to eigenvalues.
- **\(\lambda_i\):** The eigenvalue.
- **\(M\):** The energy matrix computed using Lennard-Jones and Coulombic interaction models.
- **\(D\):** The standard Euclidean distances matrix.

---

## How to use the code

This code is designed to analyze molecular dynamics simulation data by processing GROMACS-generated TPR and XTC files using the MDAnalysis library. The core functionality is based on extracting protein structure and atomic properties from these files and converting them into arrays suitable for efficient computation with JAX.

### Code's Input Files

To use the code, the following files are needed:

```plaintext
tpr_file = "/path/to/your/tpr/file"
xtc_file = "/path/to/your/xtc/file"
lj_contents = "/path/to/your/tpr_contents.txt"
