# PB assignation

We hereby demonstrate how to use the API to assign PB sequences.

In [1]:
from pprint import pprint
import urllib.request
import os

# print date & versions
import datetime
print("Date & time:",datetime.datetime.now())
import sys
print("Python version:", sys.version)

Date & time: 2017-03-13 10:26:07.996272
Python version: 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609]


In [2]:
import pbxplore as pbx
print("PBxplore version:", pbx.__version__)

PBxplore version: 1.3.5


## Use the built-in structure parser

### Assign PB for a single structure

The `pbxplore.chains_from_files()` function is the prefered way to read PDB and PDBx/mmCIF files using PBxplore. This function takes a list of file path as argument, and yield each chain it can read from these files. It provides a single interface to read PDB and PDBx/mmCIF files, to read single model and multimodel files, and to read a single file of a collection of files.

Here we want to read a single file with a single model and a single chain. Therefore, we need the first and only record that is yield by `pbxplore.chains_from_files()`. This record contains a name for the chain, and the chain itself as a `pbxplore.structure.structure.Chain` object. Note that, even if we want to read a single file, we need to provide it as a list to `pbxplore.chains_from_files()`.

In [3]:
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/1BTA.pdb', '1BTA.pdb')

structure_reader = pbx.chains_from_files([pdb_name])
chain_name, chain = next(structure_reader)
print(chain_name)
print(chain)

1BTA.pdb | chain A
Chain A / model : 1434 atoms


Protein Blocks are assigned based on the dihedral angles of the backbone. So we need to calculate them. The `pbxplore.structure.structure.Chain.get_phi_psi_angles()` methods calculate these angles and return them in a form that can be directly provided to the assignement function.

The dihedral angles are returned as a dictionnary. Each key of this dictionary is a residue number, and each value is a dictionary with the phi and psi angles.

In [4]:
dihedrals = chain.get_phi_psi_angles()
pprint(dihedrals)

{1: {'phi': None, 'psi': -171.65563134448544},
 2: {'phi': -133.80467711845586, 'psi': 153.74322760775027},
 3: {'phi': -134.6617568892695, 'psi': 157.30476083095581},
 4: {'phi': -144.49159910635186, 'psi': 118.59706956501037},
 5: {'phi': -100.12866913978127, 'psi': 92.98634825528089},
 6: {'phi': -83.48980457968895, 'psi': 104.23730726195485},
 7: {'phi': -64.77163869310709, 'psi': -43.25159835828049},
 8: {'phi': -44.47885842536948, 'psi': -25.89184262616925},
 9: {'phi': -94.90790101955957, 'psi': -47.182577907117775},
 10: {'phi': -41.312671692330014, 'psi': 133.73743399231304},
 11: {'phi': -119.15122785547305, 'psi': -11.827895864023617},
 12: {'phi': -174.21196552933984, 'psi': 175.87239770676175},
 13: {'phi': -56.61341695443224, 'psi': -45.74767617535588},
 14: {'phi': -50.78226415072095, 'psi': -45.3742585970337},
 15: {'phi': -57.93584481869442, 'psi': -43.329444361460844},
 16: {'phi': -55.20960354113049, 'psi': -56.47559202715399},
 17: {'phi': -64.51979885245254, 'psi':

The dihedral angles can be provided to the `pbxplore.assign()` function that assigns a Protein Block to each residue, and that returns the PB sequence as a string. Note that the first and last two residues are assigned to the `Z` jocker block as some dihedral angles cannot be calculated.

In [5]:
pb_seq = pbx.assign(dihedrals)
print(pb_seq)

ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmmmngoilmmmmmmmmmmmmnopacdcddZZ


### Assign PB for several models of a single file

A single PDB file can contain several models. Then, we do not want to read only the first chain. Instead, we want to iterate over all the chains.

In [6]:
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')

for chain_name, chain in pbx.chains_from_files([pdb_name]):
    dihedrals = chain.get_phi_psi_angles()
    pb_seq = pbx.assign(dihedrals)
    print('* {}'.format(chain_name))
    print('  {}'.format(pb_seq))

* 2LFU.pdb | model 1 | chain A
  ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
* 2LFU.pdb | model 2 | chain A
  ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
* 2LFU.pdb | model 3 | chain A
  ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
* 2LFU.pdb | model 4 | chain A
  ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ
* 2LFU.pdb | model 5 | chain A
  ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ
* 2LFU.pdb | model 6 | chain A

Read 10 chain(s) in 2LFU.pdb


### Assign PB for a set of structures

The `pbxplore.chains_from_files()` function can also handle several chains from several files.

In [7]:
import glob
files = ['1BTA.pdb', '2LFU.pdb', '3ICH.pdb']
for pdb_name in files:
    urllib.request.urlretrieve('https://files.rcsb.org/view/{0}'.format(pdb_name), pdb_name)

print('The following files will be used:')
pprint(files)
for chain_name, chain in pbx.chains_from_files(files):
    dihedrals = chain.get_phi_psi_angles()
    pb_seq = pbx.assign(dihedrals)
    print('* {}'.format(chain_name))
    print('  {}'.format(pb_seq))

The following files will be used:
['1BTA.pdb', '2LFU.pdb', '3ICH.pdb']
* 1BTA.pdb | chain A
  ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmmmngoilmmmmmmmmmmmmnopacdcddZZ


Read 1 chain(s) in 1BTA.pdb


* 2LFU.pdb | model 1 | chain A
  ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
* 2LFU.pdb | model 2 | chain A
  ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
* 2LFU.pdb | model 3 | chain A
  ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
* 2LFU.pdb | model 4 | chain A
  ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ
* 2LFU.pdb | model 5 | chain A
  ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ
* 2LFU.pdb | model 6 | chain A

Read 10 chain(s) in 2LFU.pdb
Read 1 chain(s) in 3ICH.pdb


### Assign PB for frames in a trajectory

PB sequences can be assigned from a trajectory. To do so, we use the `pbxplore.chains_from_trajectory()` function that takes the path to a trajectory and the path to the corresponding topology as argument. Any file formats readable by MDAnalysis can be used. Except for its arguments, `pbxplore.chains_from_trajectory()` works the same as `pbxplore.chains_from_files()`.


In [8]:
topology, _ = urllib.request.urlretrieve('https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.gro', 
                                         'psi_md_traj.gro')
trajectory, _ = urllib.request.urlretrieve('https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.xtc', 
                                           'psi_md_traj.xtc')

for chain_name, chain in pbx.chains_from_trajectory(trajectory, topology):
    dihedrals = chain.get_phi_psi_angles()
    pb_seq = pbx.assign(dihedrals)
    print('* {}'.format(chain_name))
    print('  {}'.format(pb_seq))

 ctime or size or n_atoms did not match
Frame 1/225.


* psi_md_traj.xtc | frame 0
  ZZfkbcnopabfklmmmmpckbccdfbfkbcghiaghidfklmmmmmpcfklccZZ
* psi_md_traj.xtc | frame 1
  ZZfkbcnhpabfklmmmmpckbccdfbfkbcchiachidfklmmmmmbdfklccZZ
* psi_md_traj.xtc | frame 2
  ZZfkbcnhpabfklmmmmcckbccdfbfkbcchiacdddfklmmmmmbdfklpcZZ
* psi_md_traj.xtc | frame 3
  ZZfkbcnhpacfklmmmmpckbccdfbfkbcehiaedddfklmmmpccdfklpcZZ
* psi_md_traj.xtc | frame 4
  ZZfkbmnopabfklmmmmmmmbccddbfkbcghiaehidfklmmmmmcdfklpcZZ
* psi_md_traj.xtc | frame 5
  ZZfkbcnlpabfklmmmmpmkbccdfbfkbcchiacdddfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 6
  ZZfkbcnopabfklmmmnofkbccdfbfkbcchiacdddfklmmoccbdfklpcZZ
* psi_md_traj.xtc | frame 7
  ZZfkbmnopabfklmmmmcfkbccdddfkbcchiaghidfklmmmccbdfklpcZZ
* psi_md_traj.xtc | frame 8
  ZZfkbcnhpabfklmmmmockbccdfbfkbcehiaehiafklmmmmcbcfklccZZ
* psi_md_traj.xtc | frame 9
  ZZfkbcnhpabfklmmmmcckbccdfbfkbcghiaehidfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 10
  ZZfkbcnopabfklmmmmpmkbccdfbfklcghiaehidfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 11
  ZZfkbcghpab

Frame 100/225.



* psi_md_traj.xtc | frame 108
  ZZfkbckbpcbfklmmmmpmkbccdfbfkbcehiaghidfklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 109
  ZZfkbcnbpabfklmmmmpmkbccdfbfklcghiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 110
  ZZfkbcnbpabfklmmmnockbccdfbfklcehiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 111
  ZZfkbcnbpabfklmmmmockbccdfbfklcghiaehiafklmmmnmpacfbacZZ
* psi_md_traj.xtc | frame 112
  ZZfkbcfbpcbfklmmmmockbccdfbfklcghiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 113
  ZZfkbcnhpabfklmmmmpckbccdfbfklcchiaehiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 114
  ZZfkbcehpabfklmmmmpmkbccdfbfkbcghiaghiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 115
  ZZfkbcnbpabfklmmmmomkbccdfbfkbcehiaghiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 116
  ZZfkbcnbpabfklmmmmomkbccdfbfklcghiaghiafklmmmmmcccfbccZZ
* psi_md_traj.xtc | frame 117
  ZZfkbcfbpabfklmmmmomkbccdfbfkbcghiaghiafklmmmmmccdfbccZZ
* psi_md_traj.xtc | frame 118
  ZZfkbcnbpcbfklmmmmockbccdfbfkbcghiaghiafklmmmmmpccfbpcZZ
* psi_md_traj.xtc | 

Frame 200/225.
Frame 225/225.


## Use a different structure parser

Providing the dihedral angles can be formated as expected by `pbxplore.assign()`, the source of these angles does not matter. For instance, other PDB parser can be used with PBxplore.

### BioPython

In [9]:
import Bio
import Bio.PDB
import math
print("BioPython version:", Bio.__version__)

pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')

for model in Bio.PDB.PDBParser().get_structure("2LFU", pdb_name):
    for chain in model:
        polypeptides = Bio.PDB.PPBuilder().build_peptides(chain)
        for poly_index, poly in enumerate(polypeptides):
            dihedral_list = poly.get_phi_psi_list()
            dihedrals = {}
            for resid, (phi, psi) in enumerate(dihedral_list, start=1):
                if not phi is None:
                    phi = 180 * phi / math.pi
                if not psi is None:
                    psi = 180 * psi / math.pi
                dihedrals[resid] = {'phi': phi, 'psi': psi}
        print(model, chain)
        pb_seq = pbx.assign(dihedrals)
        print(pb_seq)

BioPython version: 1.68
<Model id=0> <Chain id=A>
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
<Model id=1> <Chain id=A>
ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
<Model id=2> <Chain id=A>
ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
<Model id=3> <Chain id=A>
ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ
<Model id=4> <Chain id=A>
ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ
<Model id=5> <Chain id=A>
ZZmghbacfkbccdd