# 7 Using the Protein Data Bank

In this chapter, we will cover the following recipes:
- Finding a protein in multiple databases
- Introducing Bio.PDB
- Extracting more information from a PDB file
- Computing molecular distances on a PDB file
- Performing geometric operations
- Implementing a basic PDB parser
- Animating with PyMol
- Parsing mmCIF files with Biopython

## introduction
Proteomics is  the study of proteins that includes the protein function and structure. One of the main objectives of this field is to characterize the 3D structure of proteins. 

In this chapter, we will mostly focus on processing data from the PDB. We will see how to parse PDB files, perform some geometric computations, and visualize molecules. 

Throughout this chapter, we will use a classic example of a protein: Tumor protein p53, a protein involved in the regulation of the cell cycle (for example, apoptosis). 

## Finding a protein in multiple databases
Before we start performing some more structural biology, we will see how to access existing proteomic databases such as UniProt. We will query UniProt for our gene of interest: *TP53*  and take it from there.


In [4]:
import requests
server = 'http://www.uniprot.org/uniprot'
def do_request(server, ID='', **kwargs):
    '''Uniprot 에서 REST 쿼리를 보냄
    '''
    params = ''
    req = requests.get('%s/%s%s' % (server, ID, params),params=kwargs)
    if not req.ok:
        req.raise_for_status()
    return req

# p53 gene 불러오기
req = do_request(server, query='gene:p53 AND reviewed:yes', format='tab', 
                 columns='id,entryname,length,organism,organism-id,database(PDB),database(HGNC)',
                 limit='50')

Let's check the result as follows:

In [11]:
import pandas as pd
import io

uniprot_list = pd.read_table(io.StringIO(req.text))
uniprot_list.rename(columns={'Organism ID': 'ID'}, inplace=True)
print(uniprot_list)  # or just uniprot_list on IPython

     Entry  Length                                           Organism     ID  \
0   Q8SPZ3     387               Delphinapterus leucas (Beluga whale)   9749   
1   P56423     393  Macaca fascicularis (Crab-eating macaque) (Cyn...   9541   
2   P79820     352  Oryzias latipes (Japanese rice fish) (Japanese...   8090   
3   O12946     366  Platichthys flesus (European flounder) (Pleuro...   8260   
4   P02340     387                               Mus musculus (Mouse)  10090   
5   Q92143     342  Xiphophorus maculatus (Southern platyfish) (Pl...   8083   
6   O93379     376  Ictalurus punctatus (Channel catfish) (Silurus...   7998   
7   O09185     393  Cricetulus griseus (Chinese hamster) (Cricetul...  10029   
8   P61260     393          Macaca fuscata fuscata (Japanese macaque)   9543   
9   P56424     393                    Macaca mulatta (Rhesus macaque)   9544   
10  P25035     396  Oncorhynchus mykiss (Rainbow trout) (Salmo gai...   8022   
11  Q9TUB2     386                      

Now, we can get the human p53 ID and use Biopython to retrieve and parse the 
SwissProt  record: 

In [12]:
from Bio import ExPASy, SwissProt
p53_human = uniprot_list[uniprot_list.ID == 9606]['Entry'].tolist()[0]
handle = ExPASy.get_sprot_raw(p53_human)
sp_rec= SwissProt.read(handle)

Let's take a look at the p53 record as follows:

In [13]:
print(sp_rec.entry_name, sp_rec.sequence_length, sp_rec.gene_name)
print(sp_rec.description)
print(sp_rec.organism, sp_rec.seqinfo)
print(sp_rec.sequence)

P53_HUMAN 393 Name=TP53; Synonyms=P53;
RecName: Full=Cellular tumor antigen p53; AltName: Full=Antigen NY-CO-13; AltName: Full=Phosphoprotein p53; AltName: Full=Tumor suppressor p53;
Homo sapiens (Human). (393, 43653, 'AD5C149FD8106131')
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD


## Introducing Bio.PDB
Here, we will introduce Biopython's PDB module to deal with the Protein Data Bank. 

First, let's retrieve our models of interest as follows:

In [16]:
from __future__ import print_function
from Bio import PDB
repository = PDB.PDBList()
repository.retrieve_pdb_file('1TUP', pdir='.')
repository.retrieve_pdb_file('1OLG', pdir='.')
repository.retrieve_pdb_file('1YCQ', pdir='.')

parser = PDB.PDBParser()
p53_1tup = parser.get_structure('P 53 - DNA Binding', 'pdb1tup.ent')
p53_1olg = parser.get_structure('P 53 - Tetramerization','pdb1olg.ent')
p53_1ycq = parser.get_structure('P 53 - Transactivation','pdb1ycq.ent')

def print_pdb_headers(headers, indent=0):
    ind_text = ' ' * indent
    for header, content in headers.items():
        if type(content) == dict:
            print('\n%s%20s:' % (ind_text, header))
            print_pdb_headers(content, indent + 4)
            print()
        elif type(content) == list:
            print('%s%20s:' % (ind_text, header))
            for elem in content:
                print('%s%21s %s' % (ind_text, '->', elem))
        else:
            print('%s%20s: %s' % (ind_text, header, content))
            
print_pdb_headers(p53_1tup.header)



Downloading PDB structure '1TUP'...
Desired structure doesn't exists
Downloading PDB structure '1OLG'...




Desired structure doesn't exists
Downloading PDB structure '1YCQ'...




Desired structure doesn't exists


FileNotFoundError: [Errno 2] No such file or directory: 'pdb1tup.ent'

In [5]:
import nglview as nv
view = nv.show_structure_file(nv.datafiles.PDB)
view



In [None]:

# Clear all representations to try new ones
view.clear_representations()
# add licorice without hydrogen
view.add_licorice('not hydrogen')
# add licorice without hydrogen and use 'blue' for color

view.clear_representations()
view.add_licorice('not hydrogen', color='blue')
# add surface for CA atoms

view.clear_representations()
view.add_surface('.CA', opacity=0.3)
# combine different representations

view.clear_representations()
view.add_surface('.CA', opacity=0.3)
view.add_licorice('not hydrogen')
# change to cartoon representation to have nicer view

view.clear_representations()
view.add_cartoon()
view.add_surface(opacity=0.3)



# make sure to call render_image in seperate cell
view.render_image()



# then call _display_image
# If you save this notebook to html file, you will see the display image

view._display_image()
