---
badges: true
author: "Samdani Ansar"
categories:
- Structural Bioinformatics
date: '2023-04-23'
title: ProDy cheatsheet
description: Different utilities available in ProDy
toc: true
image: images/2ETR.png

---

To run the notebook in Google Colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samdani1593/samdani1593.github.io/blob/main/posts/2023-04-24-Prody-cheatsheet.ipynb)



In this post few functionalities on how the prody can be used for analysing protein structures are discussed.

# Install dependencies

In [None]:
!pip install -U ProDy

In [None]:
#@title *Download Input files*
%%bash
mkdir data
cd data
DATA_DIR_PATH='https://raw.githubusercontent.com/samdani1593/samdani1593.github.io/main/posts/data'
for f in '2etr.pdb' '2etr.cif'
do
    wget $DATA_DIR_PATH/"$f"
done

In [86]:
from prody import *

# Read structures

In [87]:
#@title **Read pdb file**

pdbfile = parsePDB('data/2etr.pdb')
pdbfile

<AtomGroup: 2etr (6617 atoms)>

In [4]:
#@title **Fetch file from PDB database**
pdbfile = fetchPDB('2etr') # It will fetch and download .pdb.gz file in local directory
pdbfile

'D:\\sam\\git-work\\my_website\\posts\\2etr.pdb.gz'

In [6]:
#@title **Read pdb from object variable**
input_file = open('data/2etr.pdb','r')
pdbfile = parsePDBStream(input_file) #Useful if you don't want to use pdb file instead store in string object variable
#parsePDBStream(StringIO(lig)) #If lig variable contains pdb lines as string can use this line to read
pdbfile


<AtomGroup: Unknown (6617 atoms)>

In [32]:
input_file = open('data/2etr.pdb','r')
type(input_file)

_io.TextIOWrapper

In [7]:
#@title **Read CIF file**
pdbfile = parseMMCIF('data/2etr.cif')
pdbfile

<AtomGroup: 2etr (6617 atoms)>

In [8]:
#@title **Read CIF string object**
input_file = open('data/2etr.cif','r')
pdbfile = parseMMCIFStream(input_file) #Useful if you don't want to use cif file instead store it in string object variable
pdbfile

<AtomGroup: Unknown (6617 atoms)>

# Write structures

In [9]:
#@title **Save as pdbfile**
writePDB('data/protein.pdb',pdbfile)

'data/protein.pdb'

In [12]:
#@title **Save as pdb string object**
import io
out_file = io.StringIO()
writePDBStream(out_file,pdbfile)
out_file.getvalue()

'REMARK AtomGroup Unknown\nATOM      1  N   SER A   6      27.702 138.394   9.519  1.00115.02         A N  \nATOM      2  CA  SER A   6      26.663 137.778   8.643  1.00116.22         A C  \nATOM      3  C   SER A   6      25.901 136.668   9.366  1.00116.20         A C  \nATOM      4  O   SER A   6      26.407 136.082  10.323  1.00116.59         A O  \nATOM      5  CB  SER A   6      27.296 137.248   7.363  1.00116.58         A C  \nATOM      6  N   PHE A   7      24.686 136.388   8.896  1.00116.16         A N  \nATOM      7  CA  PHE A   7      23.831 135.356   9.486  1.00117.81         A C  \nATOM      8  C   PHE A   7      24.351 133.948   9.201  1.00119.46         A C  \nATOM      9  O   PHE A   7      24.428 133.113  10.108  1.00120.04         A O  \nATOM     10  CB  PHE A   7      22.388 135.506   8.986  1.00117.24         A C  \nATOM     11  CG  PHE A   7      21.417 134.545   9.622  1.00116.70         A C  \nATOM     12  CD1 PHE A   7      21.112 134.637  10.978  1.00116.66     

# Selection

## Using select

In [63]:
pdbfile.numChains()

2

In [61]:
pdbfile.numAtoms('protein')

6473

In [69]:
#@title ** **
protein = pdbfile.select('protein') # Select protein residues
protein

<Selection: 'protein' from 2etr (6473 atoms)>

In [55]:
hetero_group = pdbfile.select('hetero') # Select hetero groups
hetero_group.getResnames(),hetero_group.getResnums(),hetero_group.getChids()

(array(['Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27',
        'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27',
        'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27',
        'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27', 'Y27',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
        'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH', 'HOH',
      

In [56]:
water = pdbfile.select('water') # Select water
water

<Selection: 'water' from 2etr (108 atoms)>

In [51]:
ligand = pdbfile.select('chain A and resname Y27') # Select chain A ligand with residue name Y27
ligand

<Selection: 'chain A and resname Y27' from 2etr (18 atoms)>

In [88]:
protein_water = pdbfile.select('protein or water') # To select both protein and water use "or" do not use "and"
protein_water
# If we use "and" it will check for presence of both condition, there is no residue with protein and water tag

<Selection: 'protein or water' from 2etr (6581 atoms)>

In [90]:
lig = pdbfile.select('not water and hetero not chain B')
lig

<Selection: 'not water and hetero not chain B' from 2etr (18 atoms)>

In [95]:
protein.select('resnum 216 resname ASP name CA CB chain A')

<Selection: '(resnum 216 res...) and (protein)' from 2etr (2 atoms)>

resnum - residue number

resname - residue triple letter code

name - atom name

altloc - alternate location

'resnum 5' selects residue 5 (all insertion codes)

'resnum 5A' selects residue 5 with insertion code A

'resnum 5_' selects residue 5 with no insertion code

For specifiying empty use '_'

'to' is all inclusive, e.g. 'resnum 1 to 4' means '1 <= resnum <= 4'

':' is left inclusive, e.g. 'resnum 1:4' means '1 <= resnum < 4'

Consecutive use of ':', however, specifies a discrete range of numbers, e.g. 'resnum 1:4:2' means 'resnum 1 3'

## Indexing

In [98]:
protein[0] #atom index

<Atom: N from 2etr (index 0)>

In [108]:
protein['A'][216]

<Atom: O from 2etr (index 216)>

In [110]:
#Indexing a Chain instance with a tuple returns an Residue.
protein['A'][(216,)]

<Residue: ASP 216 from Chain A from 2etr (8 atoms)>

In [99]:
protein['A'] #chain selection

<Chain: A from 2etr (400 residues, 3247 atoms)>

In [100]:
protein['A',216] # chain and residue number selection

<Residue: ASP 216 from Chain A from 2etr (8 atoms)>

In [101]:
protein['A',216]['CA'] # chain, residue number, atom name selection

<Atom: CA from 2etr (index 1729)>

In [112]:
protein[:10] # slice and select first 10 atoms

<Selection: 'index 0 to 9' from 2etr (10 atoms)>

In [113]:
protein[::2] #Even number atom selection

<Selection: 'index 0 2 4 6 8... 6468 6470 6472' from 2etr (3237 atoms)>

# Fetch properties

In [3]:
#@title **Fetch residue name, number, atom name number**

pdbfile = parsePDB('data/2etr.pdb')
pdbfile.numChains() # For number of chains

2

In [4]:
pdbfile.numResidues() # Number of residues

908

In [5]:
pdbfile.numAtoms() #Number of atoms

6617

In [11]:
set(pdbfile.getAltlocs()) # Alternate location

{' '}

In [12]:
set(pdbfile.getChids()) # All Chain IDs

{'A', 'B'}

In [13]:
set(pdbfile.getElements()) # Atom types

{'C', 'N', 'O', 'S'}

In [14]:
set(pdbfile.getResnames()) # Residue names

{'ALA',
 'ARG',
 'ASN',
 'ASP',
 'CYS',
 'GLN',
 'GLU',
 'GLY',
 'HIS',
 'HOH',
 'ILE',
 'LEU',
 'LYS',
 'MET',
 'PHE',
 'PRO',
 'SER',
 'THR',
 'TRP',
 'TYR',
 'VAL',
 'Y27'}

In [15]:
set(pdbfile.getResnums()) # Residue numbers

{5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185,
 186,
 187,
 1

In [49]:
pdbfile['A'].getSequence()

'SFETRFEKMDNLLRDPKSEVNSDCLLDGLDALVYDLDFPALRKNKNIDNFLSRYKDTINKIRDLRMKAEDYEVVKVIGRGAFGEVQLVRHKSTRKVYAMKLLSKFEMIKRSDSAFFWEERDIMAFANSPWVVQLFYAFQDDRYLYMVMEYMPGGDLVNLMSNYDVPEKWARFYTAEVVLALDAIHSMGFIHRDVKPDNMLLDKSGHLKLADFGTCMKMNKEGMVRCDTAVGTPDYISPEVLKSQGGDGYYGRECDWWSVGVFLYEMLVGDTPFYADSLVGTYSKIMNHKNSLTFPDDNDISKEAKNLICAFLTDREVRLGRNGVEEIKRHLFFKNDQWAWETLRDTVAPVVPDLSSDIDTSNFDDLEEDKGEEETFPIPKAFVGNQLPFVGFTYYSNRRY'

# Calculate functionalities

In [24]:
#@title **Calculate Center of Geometry**
query_atom = calcCenter(ligand) # COG
query_atom

array([ 51.83716667, 101.54977778,  29.635     ])

In [26]:
#@title **Calculate Center of Mass**
query_atom = calcCenter(ligand, weights=ligand.getMasses())  # Center of Mass
query_atom

array([ 51.81354888, 101.55519646,  29.60240548])

In [96]:
#@title **Calculate Distance**
DFG_CA = protein.select('chain A and resnum 216 and ca') # Select 216-ASP CA atom
calcDistance(DFG_CA, query_atom).round(2)

array([6.23])

In [97]:
calcDistance(protein['A',216]['CA'],query_atom).round(2)

6.23

In [44]:
#@title **Calculate Angle**
calcAngle(protein['A',216]['CA'], protein['A',124]['CA'], protein['A',217]['CA'])

37.06595294876384

In [75]:
#@title **Calculate Dihedral angle**
calcDihedral(protein['A',215]['CA'],protein['A',216]['CA'],protein['A',217]['CA'],protein['A',218]['CA'])

-177.15828339826356

In [67]:
#@title **Calculate phi angle**
calcPhi(protein['A',216])

66.22053940730052

In [37]:
#@title **Calculate psi angle**
calcPsi(protein['A',216])

120.62391299111084

In [74]:
#@title **Calculate psi angle**
calcOmega(protein['A',216])

-176.76556684282522

# Select residues based on distance cut-off

## Using select function

In [19]:
active_site = protein.select('within 4 of mol', mol=ligand.toAtomGroup())  # Selecting protein residues surrounding fragment
# For passing variable into the selection string it should be an Atommap.
active_site.getResnums()
# This method will return only residues around the query not the individual atom specific pair distance info

array([ 82,  90, 103, 153, 154, 155, 155, 155, 156, 156, 156, 156, 156,
       202, 203, 203, 205, 215, 216, 216, 216, 368, 368])

## Calculate neighbours atoms

In [21]:
nearby = findNeighbors(ligand, 4, protein) # Returns pair of residues which are within the cutoff distance
# This will return pair of atom between two input selections and their distance
nearby

[(<Atom: O23 from 2etr (index 6473)>,
  <Atom: CE from 2etr (index 1234)>,
  3.5261898),
 (<Atom: C36 from 2etr (index 6476)>,
  <Atom: CB from 2etr (index 1727)>,
  3.84263),
 (<Atom: C36 from 2etr (index 6476)>,
  <Atom: CB from 2etr (index 1732)>,
  3.9769566),
 (<Atom: C35 from 2etr (index 6477)>,
  <Atom: CB from 2etr (index 1732)>,
  3.8783326),
 (<Atom: C35 from 2etr (index 6477)>,
  <Atom: O from 2etr (index 1624)>,
  3.7216585),
 (<Atom: C35 from 2etr (index 6477)>,
  <Atom: OD1 from 2etr (index 1635)>,
  3.4280102),
 (<Atom: C34 from 2etr (index 6478)>,
  <Atom: CB from 2etr (index 1732)>,
  3.9284542),
 (<Atom: C34 from 2etr (index 6478)>,
  <Atom: OD1 from 2etr (index 1734)>,
  3.5688658),
 (<Atom: C32 from 2etr (index 6480)>,
  <Atom: CG2 from 2etr (index 692)>,
  3.5779414),
 (<Atom: C41 from 2etr (index 6481)>,
  <Atom: OD1 from 2etr (index 1734)>,
  3.5450618),
 (<Atom: C41 from 2etr (index 6481)>,
  <Atom: OD1 from 2etr (index 1635)>,
  3.853823),
 (<Atom: N43 from 2et

In [76]:
for f in nearby:
    print(f[0].getResnum(),f[0].getName(),f[0].getResname(),f[1].getResnum(),f[1].getName(),f[1].getResname(),f[2])

416 O23 Y27 153 CE MET 3.5261898
416 C36 Y27 215 CB ALA 3.84263
416 C36 Y27 216 CB ASP 3.9769566
416 C35 Y27 216 CB ASP 3.8783326
416 C35 Y27 202 O ASP 3.7216585
416 C35 Y27 203 OD1 ASN 3.4280102
416 C34 Y27 216 CB ASP 3.9284542
416 C34 Y27 216 OD1 ASP 3.5688658
416 C32 Y27 90 CG2 VAL 3.5779414
416 C41 Y27 216 OD1 ASP 3.5450618
416 C41 Y27 203 OD1 ASN 3.853823
416 N43 Y27 216 CG ASP 3.8245695
416 N43 Y27 216 OD1 ASP 2.7895243
416 N43 Y27 203 OD1 ASN 2.7911913
416 N43 Y27 203 CG ASN 3.8788848
416 C14 Y27 205 CD1 LEU 3.9651945
416 C16 Y27 156 N MET 3.7972507
416 C16 Y27 103 CB ALA 3.7334185
416 C16 Y27 154 O GLU 3.5957477
416 C16 Y27 156 CG MET 3.884422
416 N11 Y27 155 CA TYR 3.9425724
416 N11 Y27 155 C TYR 3.9601307
416 N11 Y27 156 N MET 3.0231633
416 N11 Y27 156 CA MET 3.783572
416 N11 Y27 103 CB ALA 3.5249634
416 N11 Y27 155 CD1 TYR 3.7977254
416 N11 Y27 156 CB MET 3.5842628
416 N11 Y27 154 O GLU 3.925532
416 N11 Y27 156 CG MET 3.9491124
416 C12 Y27 156 N MET 3.8375065
416 C12 Y27 155