## Frustratometer in Python
<a href="https://colab.research.google.com/github/engelberger/frustrapy/blob/main/FrustraPy_colab.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Introduction

The concept of frustration in proteins refers to the presence of conflicting interactions within the protein structure. These conflicts arise when the local interactions within a protein are not optimally stabilizing, leading to a certain degree of energetic frustration. The following is a summary please refer to the original paper for more details:

[Protein Frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics](https://academic.oup.com/nar/article/44/W1/W356/2499321)
[FrustrometerR: an R-package to compute local frustration in protein structures, point mutants and MD simulations](https://academic.oup.com/bioinformatics/article/37/18/3038/6171179)


## There are three main types of frustration

* *Highly frustrated*: Highly frustrated regions in a protein are those where the local interactions are significantly destabilizing compared to what would be expected in an idealized, energetically minimized structure. These regions often play crucial roles in protein function, such as binding sites, allosteric sites, or regions involved in conformational changes. For example, in an enzyme, the active site might be highly frustrated to allow for substrate binding and catalysis, which require a certain degree of flexibility and adaptability.
* *Neutral*: Neutral regions in a protein are those where the local interactions are neither significantly stabilizing nor destabilizing. These regions may not directly contribute to protein function but are essential for maintaining the overall structural integrity of the protein. Neutral regions can serve as a buffer between highly frustrated and minimally frustrated regions, allowing for the necessary flexibility and stability balance within the protein.
* *Minimally frustrated*: Minimally frustrated regions in a protein are those where the local interactions are highly optimized and stabilizing. These regions typically form the stable core of the protein and are essential for maintaining the native folded state. Minimally frustrated regions often consist of hydrophobic residues that pack tightly together, forming a stable foundation for the protein structure. For example, in the case of globular proteins, the hydrophobic core is usually minimally frustrated, contributing to the overall stability of the folded state.

## Significance of Frustration in Proteins:

* Protein folding: During the protein folding process, the polypeptide chain navigates through an energy landscape to reach its native state. The concept of minimal frustration suggests that evolution has optimized the folding landscape to minimize energetic conflicts, allowing proteins to fold efficiently and avoid getting trapped in non-native states.
* Allostery: Allosteric regulation in proteins often involves highly frustrated regions that undergo conformational changes upon ligand binding or other perturbations. These frustrated regions allow for the propagation of allosteric signals throughout the protein structure, enabling long-range communication and regulation of protein function.
* Protein-protein interactions: Protein interfaces often contain a mix of highly frustrated and minimally frustrated regions. The highly frustrated regions may contribute to the specificity and adaptability of the interaction, while the minimally frustrated regions provide stability to the complex. The balance between frustration and stability at the interface is crucial for the formation and regulation of protein complexes.

In [13]:
# @title Install
%pip install -q biopython igraph leidenalg
%pip install -q -U kaleido pandas scikit-learn matplotlib seaborn plotly


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [11]:
# @title Frustratometer in Python
mode = "singleresidue" # @param ["configurational", "singleresidue", "mutational"]
pdbs_dir = "/content" # @param {type:"string"}
results_dir = "/home/ceramirez/github/frustrapy/Results_example" # @param {type:"string"}
example = True # @param {type:"boolean"}
overwrite = True # @param {type:"boolean"}
debug = False # @param {type:"boolean"}

import sys
# /home/ceramirez/github/frustrapy/frustrapy/functions.py
# Import the functions from the frustrapy package
import frustrapy.functions as frustrapy_functions

# If the example is True, we will download the example files
if example:
    !wget -q http://www.rcsb.org/pdb/files/1fhj.pdb -O 1fhj.pdb
    !wget -q http://www.rcsb.org/pdb/files/2dn1.pdb -O 2dn1.pdb
    !wget -q http://www.rcsb.org/pdb/files/1m6k.pdb -O 1m6k.pdb

    pdbs_dir = "/home/ceramirez/github/frustrapy"
    results_dir = "/home/ceramirez/github/frustrapy/Results_example"
    # Remove any previous results
    !rm -rf /home/ceramirez/github/frustrapy/Results_example/*

if overwrite:
    if example:
        !rm -rf /home/ceramirez/github/frustrapy/Results_example/*
    else:
        import os
        # Convert the results_dir to an absolute path
        results_dir = os.path.abspath(results_dir)
        os.system(f"rm -rf {results_dir}/*")

plots_dir_dict = frustrapy_functions.dir_frustration(
    pdbs_dir=pdbs_dir,
    mode=mode,
    results_dir=results_dir,
    debug=debug
)

2357.09s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
2364.05s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
2371.41s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
2378.37s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
2383.58s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


The results directory /home/ceramirez/github/frustrapy/Results_example has been created.
-----------------------------Filtering-----------------------------
/home/ceramirez/github/frustrapy/Results_example/1fhj.done/1fhj.pdb
1fhj.pdb
/home/ceramirez/github/frustrapy/Results_example/1fhj.done/1fhj.pdb_equivalences.txt
HI
/home/ceramirez/github/frustrapy/Results_example/1fhj.done/1fhj.pdb
1fhj.pdb
/home/ceramirez/github/frustrapy/Results_example/1fhj.done/1fhj.pdb_equivalences.txt
-----------------------------Preparing files-----------------------------
Pdb file: 1fhj
Output file: 1fhj
-----------------------------Setting options-----------------------------
-----------------------------Calculating-----------------------------
LAMMPS (3 Mar 2020)
Reading data file ...
  orthogonal box = (-20000 -20000 -20000) to (20000 20000 20000)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  1722 atoms
  scanning bonds ...
  3 = max bonds/atom
  reading bonds ...
  2288 bonds
Finding 1-2 1-3 1

### Plots

In [2]:
print(plots_dir_dict)

{'1fhj': {}, '2dn1': {}, '1m6k': {}}


In [18]:
# Show the fg objects 
for pdb in plots_dir_dict.keys():
    for plot in plots_dir_dict[pdb].keys():
        fig = plots_dir_dict[pdb][plot]
        fig.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

In [17]:
%pip install nbformat>=4.2.0

Note: you may need to restart the kernel to use updated packages.


## Types of frustration modes the Frustratometer you can calculate in this notebook:

| Frustration Mode | Description | Mathematical Formula | Example Calculation |
|------------------|-------------|----------------------|---------------------|
| Configurational  | Compares the native energy of each contact in the protein to a set of decoy energies from random variants of the same contact. A contact is considered frustrated if its native energy is higher than the average of the decoys. | $F_c = \frac{E_n - \langle E_d \rangle}{\sigma_d}$ <br><br> $E_n$ = native energy of contact <br> $\langle E_d \rangle$ = mean energy of decoys <br> $\sigma_d$ = standard deviation of decoy energies | Native contact energy $E_n = -2.5$ <br> Mean decoy energy $\langle E_d \rangle = -5.2$ <br> Decoy std dev $\sigma_d = 1.8$ <br><br> $F_c = \frac{-2.5 - (-5.2)}{1.8} = 1.5$ <br><br> $F_c > 0$, so contact is frustrated |
| Mutational       | Compares the native energy of each contact to the average energy of all possible mutations of the amino acids forming that contact. A contact is considered frustrated if mutating it makes the energy more favorable on average. | $F_m = \frac{E_n - \langle E_m \rangle}{\sigma_m}$ <br><br> $E_n$ = native energy of contact <br> $\langle E_m \rangle$ = mean energy of all mutations <br> $\sigma_m$ = standard deviation of mutation energies | Native contact energy $E_n = -4.2$ <br> Mean mutation energy $\langle E_m \rangle = -6.8$ <br> Mutation std dev $\sigma_m = 2.1$ <br><br> $F_m = \frac{-4.2 - (-6.8)}{2.1} = 1.2$ <br><br> $F_m > 0$, so contact is frustrated |  
| Single Residue   | Calculates the total frustration of all contacts a single residue is involved in. Residues with many frustrated contacts are considered highly frustrated. | $F_r = \frac{1}{N} \sum_{i=1}^N F_{c,i}$ <br><br> $F_{c,i}$ = configurational frustration of $i$th contact <br> $N$ = total number of contacts residue is involved in | Residue involved in 3 contacts: <br> $F_{c,1} = 0.8$ <br> $F_{c,2} = 1.2$ <br> $F_{c,3} = -0.5$ <br><br> $F_r = \frac{1}{3}(0.8 + 1.2 + -0.5) = 0.5$ <br><br> $F_r > 0$, so residue is net frustrated |

In a nuthshell:
- Configurational frustration compares native contact energy to decoys
- Mutational frustration compares native contact energy to average mutation energy  
- Single residue frustration averages configurational frustration over all of a residue's contacts

The key equations are:

$F_c = \frac{E_n - \langle E_d \rangle}{\sigma_d}$ (configurational)

$F_m = \frac{E_n - \langle E_m \rangle}{\sigma_m}$ (mutational)  

$F_r = \frac{1}{N} \sum_{i=1}^N F_{c,i}$ (single residue)

Where $E_n$ is the native energy, $\langle E_d \rangle$ and $\langle E_m \rangle$ are mean decoy and mutation energies, and $\sigma_d$ and $\sigma_m$ are the standard deviations of the decoy and mutation energy distributions.

