# Homology Modelling

### Author: William Glass

This notebook shows examples of how to perform homology modelling in KinoML. 

In [1]:
# get relevant imports
from kinoml.modeling.homology import HomologyModel
from kinoml.core.proteins import ProteinStructure

To start, we need a template from which to base our homolgy model on.

In [2]:

hm = HomologyModel()

# If already have our model (e.g. if prepared using Spruce TK etc), so we can load it easily
structure = ProteinStructure.from_file("/Users/williamglass/Desktop/4yne_protein.pdb")

# If we just want to grab a structure from the PDB we could use the `from_name` attribute in `ProteinStructure`, e.g:
structure_pdb = ProteinStructure.from_name('4yne')

# Once we have our structure, we need to extract it's sequence. Note, this will often not be the canonical sequence.
sequence = structure.sequence

We now have our template structure and its sequence. If, for some reason, we didn't have access to the structure but did have access to the sequence we could run a BLAST search to find a PDB structure for our template:

In [3]:
model_template = hm.get_pdb_template(sequence)

@> Blast searching NCBI PDB database for "GSGIR..."
@> Blast search completed in 9.2s.


In [4]:
model_template.metadata

{'id': '4yne'}

In this toy example our BLAST search using the `4YNE` sequence has, as expected, returend the `4YNE` structure in the PDB as the "best" model for us to use as our template.

Typically, we will want to search the PDB with a query sequence and find the most relevant PDB structure to use as our template. We can use the `get_uniprot_sequence` attribute to obtain the full canonical sequence from UniProt based on the UniProt ID.

In [5]:
up_seq = hm.get_uniprot_sequence('P04629')

In [6]:
up_seq

'MLRGGRRGQLGWHSWAAGPGSLLAWLILASAGAAPCPDACCPHGSSGLRCTRDGALDSLHHLPGAENLTELYIENQQHLQHLELRDLRGLGELRNLTIVKSGLRFVAPDAFHFTPRLSRLNLSFNALESLSWKTVQGLSLQELVLSGNPLHCSCALRWLQRWEEEGLGGVPEQKLQCHGQGPLAHMPNASCGVPTLKVQVPNASVDVGDDVLLRCQVEGRGLEQAGWILTELEQSATVMKSGGLPSLGLTLANVTSDLNRKNVTCWAENDVGRAEVSVQVNVSFPASVQLHTAVEMHHWCIPFSVDGQPAPSLRWLFNGSVLNETSFIFTEFLEPAANETVRHGCLRLNQPTHVNNGNYTLLAANPFGQASASIMAAFMDNPFEFNPEDPIPVSFSPVDTNSTSGDPVEKKDETPFGVSVAVGLAVFACLFLSTLLLVLNKCGRRNKFGINRPAVLAPEDGLAMSLHFMTLGGSSLSPTEGKGSGLQGHIIENPQYFSDACVHHIKRRDIVLKWELGEGAFGKVFLAECHNLLPEQDKMLVAVKALKEASESARQDFQREAELLTMLQHQHIVRFFGVCTEGRPLLMVFEYMRHGDLNRFLRSHGPDAKLLAGGEDVAPGPLGLGQLLAVASQVAAGMVYLAGLHFVHRDLATRNCLVGQGLVVKIGDFGMSRDIYSTDYYRVGGRTMLPIRWMPPESILYRKFTTESDVWSFGVVLWEIFTYGKQPWYQLSNTEAIDCITQGRELERPRACPPEVYAIMRGCWQREPQQRHSIKDVHARLQALAQAPPVYLDVLG'

We can then run a blast search using this canonical sequence to search the PDB for the best template model to use.

In [7]:
model_template2 = hm.get_pdb_template(up_seq)

@> Blast searching NCBI PDB database for "MLRGG..."
@> Blast search completed in 8.3s.


In [8]:
model_template2.metadata

{'id': '2ifg'}

**NOTE: `get_pdb_template()` is a WIP** 

A structure (either from a user made one or one downloaded from the PDB with `ProteinStructure.from_name()`) can be used as template on to which we can build our homology model. 

We also need a target sequence, this is downloaded from the UniProt server using `HomologyModel.get_uniprot_sequence()` as shown above.

This information can be used in `HomologyModel.get_alignment()` to produce an alignment file of the two sequences. Sequences are automatically trimmed if there are large gaps in the template structure (**NOTE: this is still in test phase**)

In [9]:
hm.get_alignment(structure_pdb, up_seq, pdb_entry=True)


                         MODELLER 9.24, 2020/04/06, r11614

     PROTEIN STRUCTURE MODELLING BY SATISFACTION OF SPATIAL RESTRAINTS


                     Copyright(c) 1989-2020 Andrej Sali
                            All Rights Reserved

                             Written by A. Sali
                               with help from
              B. Webb, M.S. Madhusudhan, M-Y. Shen, G.Q. Dong,
          M.A. Marti-Renom, N. Eswar, F. Alber, M. Topf, B. Oliva,
             A. Fiser, R. Sanchez, B. Yerkovich, A. Badretdinov,
                     F. Melo, J.P. Overington, E. Feyfant
                 University of California, San Francisco, USA
                    Rockefeller University, New York, USA
                      Harvard University, Cambridge, USA
                   Imperial Cancer Research Fund, London, UK
              Birkbeck College, University of London, London, UK


Kind, OS, HostName, Kernel, Processor: 4, Darwin Will-G-TMCS-Mac 19.6.0 x86_64
Date and time of compilation  

After producing our alignment file we can use the newly generated path to it to run MODELLER and produce a homolgy model. In this example we will generate one model, in a real run we would need a large number that can be scored to produce the "best" model for simulations.

In [10]:
hm.get_model(structure_pdb, hm.alipath, num_models=1)

2.545      1436.5       1.000
 7 Coulomb point-point electrostatic p:       0       0      0   0.000   0.000      0.0000       1.000
 8 H-bonding potential                :       0       0      0   0.000   0.000      0.0000       1.000
 9 Distance restraints 1 (CA-CA)      :       0       0      0   0.000   0.000      0.0000       1.000
10 Distance restraints 2 (N-O)        :       0       0      0   0.000   0.000      0.0000       1.000
11 Mainchain Phi dihedral restraints  :       0       0      0   0.000   0.000      0.0000       1.000
12 Mainchain Psi dihedral restraints  :       0       0      0   0.000   0.000      0.0000       1.000
13 Mainchain Omega dihedral restraints:      41       4      7   0.230   0.230      84.181       1.000
14 Sidechain Chi_1 dihedral restraints:      32       0      4   1.321   1.321      23.653       1.000
15 Sidechain Chi_2 dihedral restraints:      24       0      1   1.534   1.534      13.705       1.000
16 Sidechain Chi_3 dihedral restraints:    