## Biopython for checking residue depth in a structure

Notebook developed from looking into ['Trouble with Biopython residue depth'](https://www.biostars.org/p/9603764/) and making [msms_wrapper](https://github.com/rinikerlab/msms_wrapper) work with MyBinder.

Used sessions launched from [the 'binderized_no_PyVista' of my fork of `msms_wrapper`](https://github.com/fomightez/msms_wrapper/tree/binderized_no_PyVista) to develop this because more related stuff already installed or present there. ([This link should continue to work to launch such a session](https://mybinder.org/v2/gh/fomightez/msms_wrapper/binderized_no_PyVista?labpath=examples.ipynb) even if I mess up the badge there.)

Also works in sessions launched from my [cl_demo-binder repo](https://github.com/fomightez/cl_demo-binder).

#### Get msms and unpack and install

This section will handle getting msms from https://ccsb.scripps.edu/msms/downloads/, which comes from clicking 'Download MSMS binaries here' [here](https://ccsb.scripps.edu/mgltools/), referenced by [Bio.PDB.ResidueDepth module](https://biopython.org/docs/1.75/api/Bio.PDB.ResidueDepth.html), and then clicking on the 'Downloads' option on the navigation bar on the left side of the screen,.

Some disclosures and addressing a licensing issue....

Note: While the Riniker Lab has put this wrapper under the MIT licence, `msms` is not. The msms_wrapper & effort to make it work with MyBinder-served Jupyter sessions are independent projects, and not affiliated with the `msms` program. You can obtain `msms` from https://ccsb.scripps.edu/msms/. For more information about the algorithm, see:

Sanner, M. F., Olson A.J. & Spehner, J.-C. (1996). Reduced Surface: An Efficient Way to Compute Molecular Surfaces. Biopolymers 38:305-320.

If you qualify for the license for detailed there, you can run the next cell in this session to get and install msmms:

In [1]:
msms_soft_zipped = "msms_i86_64Linux2_2.6.1.tar.gz"
import subprocess
cwd = subprocess.check_output('pwd', shell=True, universal_newlines=True).strip() #record current working directory silently; `cwd = !pwd` wouldn't be silent
!mkdir -p ~/.local/bin
%cd -q ~/.local/bin
!curl -OL https://ccsb.scripps.edu/msms/download/933/{msms_soft_zipped}
!tar xzf {msms_soft_zipped}
msms_soft_fn = msms_soft_zipped.replace("_i",".x").replace("Linux2_2","Linux2.2")[:-7]
!ln -s ~/.local/bin/{msms_soft_fn} msms
#Restore to initial current working directory
%cd -q {cwd} 
print("***`msms` has been installed and an alias set in the system path that matches what the `msms_wrapper.py` script expects.***")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  579k  100  579k    0     0   459k      0  0:00:01  0:00:01 --:--:--  459k
***`msms` has been installed and an alias set in the system path that matches what the `msms_wrapper.py` script expects.***


#### Check msms installation

The next cell should give usage for `msms` if all worked.

In [2]:
!msms -h

Usage : msms parameters 
  -probe_radius float : probe sphere radius, [1.5]
  -density float      : surface points density, [1.0]
  -hdensity float     : surface points high density, [3.0]
  -surface <tses,ases>: triangulated or Analytical SES, [tses]
  -no_area            : turns off the analytical surface area computation
  -socketName servicename : socket connection from a client
  -socketPort portNumber : socket connection from a client
  -xdr                : use xdr encoding over socket
  -sinetd             : inetd server connection
  -noh                : ignore atoms with radius 1.2
  -no_rest_on_pbr     : no restart if pb. during triangulation
  -no_rest            : no restart if pb. are encountered
  -if filename        : sphere input file
  -of filename        : output for triangulated surface
  -af filename        : area file
  -no_header         : do not add comment line to the output
  -free_vertices      : turns on computation for isolated RS vertices
  -all_components

---------

#### Demo use of Bio.PDB.ResidueDepth module

Essentially run code blocks at [Biopython's 'Bio.PDB.ResidueDepth module' page](https://biopython.org/docs/1.75/api/Bio.PDB.ResidueDepth.html) in this section to see if all works.

In [3]:
#Install Biopython here since not included in my msms_wrapper fork at this time
%pip install Biopython

Collecting Biopython
  Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m63.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Biopython
Successfully installed Biopython-1.84
Note: you may need to restart the kernel to use updated packages.


In [4]:
#Get the structure they reference in example
!curl -OL https://files.rcsb.org/download/1a8o.pdb.gz
!gunzip 1a8o.pdb.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18565  100 18565    0     0  47125      0 --:--:-- --:--:-- --:--:-- 47239


In [5]:
from Bio.PDB.ResidueDepth import ResidueDepth
from Bio.PDB.PDBParser import PDBParser
parser = PDBParser()
structure = parser.get_structure("1a8o", "1a8o.pdb")
model = structure[0]
rd = ResidueDepth(model)
print(rd['A',(' ', 152, ' ')])

(np.float64(1.7654032513871405), np.float64(1.999282370848168))


In [6]:
from Bio.PDB.ResidueDepth import get_surface
surface = get_surface(model)
surface

array([[ 3.113, 35.393,  9.268],
       [ 4.232, 34.474,  8.82 ],
       [ 3.7  , 33.354,  9.061],
       ...,
       [34.357, 31.816, 18.811],
       [34.357, 31.816, 17.705],
       [33.557, 32.841, 16.759]])

In [7]:
from Bio.PDB.ResidueDepth import min_dist
coord = (1.113, 35.393,  9.268)
dist = min_dist(coord, surface)
dist

np.float64(1.1839548133269278)

In [8]:
from Bio.PDB.ResidueDepth import residue_depth
chain = model['A']
res152 = chain[152]
rd = residue_depth(res152, surface)
rd

np.float64(1.7654032513871405)

---------

### Loop on specified chains to make a dataframe with residue depth

Work with code from ['Trouble with Biopython residue depth'](https://www.biostars.org/p/9603764/), and adapt it the code from [Biopython's 'Bio.PDB.ResidueDepth module' page](https://biopython.org/docs/1.75/api/Bio.PDB.ResidueDepth.html).

In [9]:
# Install Pandas here since not included in my msms_wrapper fork at this time
%pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [10]:
#Get a simple structure for now; get working with simpler first.
!curl -OL https://files.rcsb.org/download/1crn.pdb.gz
!gunzip 1crn.pdb.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10699  100 10699    0     0  31248      0 --:--:-- --:--:-- --:--:-- 31192


In [11]:
from Bio.PDB.ResidueDepth import residue_depth
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import is_aa #based on https://biopython.org/docs/1.75/api/Bio.PDB.Polypeptide.html
parser = PDBParser()
structure = parser.get_structure("1crn", "1crn.pdb")
from Bio.PDB.ResidueDepth import get_surface
model = structure[0]
surface = get_surface(model) # this step will give something like the following if msms has not been downloaded and installed and set to work as `mmsms`: `RuntimeError: Failed to generate surface file using command: ; msms -probe_radius 1.5 -if /tmp/tmpilsljfx9 -of /tmp/tmpyh3l4u4w > /tmp/tmpawwp1gs7`
#residue_depth_calculator = ResidueDepth(model)

selected_chains = ['s','I' ,'A']

# Calculate residue depth
residue_data = []
for model in structure:
    for chain in model:
        if chain.id in selected_chains:  # Check if the chain is in the selected list
            for residue in chain:
                if is_aa(residue):  # Only consider amino acid residues
                    depth = residue_depth(model[chain.id][residue.get_id()], surface)
                    residue_data.append({
                        'Residue ID': residue.get_id()[1],
                        'Residue Name': residue.get_resname(),
                        'Depth': depth
                    })

# Create a DataFrame from the residue data
import pandas as pd
depths_df = pd.DataFrame(residue_data)

In [12]:
depths_df.head()

Unnamed: 0,Residue ID,Residue Name,Depth
0,1,THR,2.065941
1,2,THR,2.009514
2,3,CYS,3.352963
3,4,CYS,3.92389
4,5,PRO,2.336881


In [13]:
depths_df.tail()

Unnamed: 0,Residue ID,Residue Name,Depth
41,42,GLY,1.709602
42,43,ASP,1.671992
43,44,TYR,2.311689
44,45,ALA,1.767627
45,46,ASN,1.758436


Now that we've establihsed the basics work, let's expand this to try OP's target structure and chains.

In [39]:
#Get a the structure
PDB_id = '5xtd'
!curl -OL https://files.rcsb.org/download/{PDB_id}.pdb.gz
!gunzip {PDB_id}.pdb.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1230k  100 1230k    0     0  10927      0  0:01:55  0:01:55 --:--:--  1771


In [41]:
from Bio.PDB.ResidueDepth import residue_depth
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import is_aa #based on https://biopython.org/docs/1.75/api/Bio.PDB.Polypeptide.html
parser = PDBParser()
structure = parser.get_structure("Complex_1", f"{PDB_id}.pdb")
from Bio.PDB.ResidueDepth import get_surface
model = structure[0]
surface = get_surface(model) # this step will give something like the following if msms has not been downloaded and installed and set to work as `mmsms`: `RuntimeError: Failed to generate surface file using command: ; msms -probe_radius 1.5 -if /tmp/tmpilsljfx9 -of /tmp/tmpyh3l4u4w > /tmp/tmpawwp1gs7`
#residue_depth_calculator = ResidueDepth(model)

selected_chains = ['s','I' ,'A']

# Calculate residue depth
residue_data = []
for model in structure:
    for chain in model:
        if chain.id in selected_chains:  # Check if the chain is in the selected list
            for residue in chain:
                if is_aa(residue):  # Only consider amino acid residues
                    depth = residue_depth(model[chain.id][residue.get_id()], surface)
                    residue_data.append({
                        'Residue ID': residue.get_id()[1],
                        'Residue Name': residue.get_resname(),
                        'Depth': depth
                    })

# Create a DataFrame from the residue data
import pandas as pd
depths_df = pd.DataFrame(residue_data)

srdf: un sommet est faux
srdf: un sommet est faux
srdf: un sommet est faux
srdf: un sommet est faux
srdf: un sommet est faux
srdf: un sommet est faux
sphere_mange_arete: inconcistence
sphere_mange_arete: inconcistence
srdf: un sommet est faux
srdf: un sommet est faux
srdf: un sommet est faux
sphere_mange_arete: inconcistence
sphere_mange_arete: inconcistence
sphere_mange_arete: inconcistence


RuntimeError: Failed to generate surface file using command:
msms -probe_radius 1.5 -if /tmp/tmpcfeqlo0o -of /tmp/tmprpzgyx3s > /tmp/tmp61fj6qyd

That fails because of issues with the structure file.  Maybe the discontinuous chains?

OP didn't post such warnings.