<h1><center style="font-family:newtimeroman;font-size:150%; border-radius:50px; padding: 20px; color: yellow; background-color: black">Bioinformatics-PDB Analysis And Visualization</center></h1>
<center><img style="border-radius: 20px 20px 20px 20px" src='https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRqWo9wC4dIdWK3fM4gyvGsgsj0xpVYEo90zw&s' height=300px width=500px border-radius=55%></center><br>

<div style="border-radius:50px; padding: 20px; background-color: black; font-size:120%; text-align:left">

<h3 align="left"><font color= white >About notebook and data:</font></h3>
    
<p><font color=#fffb00>
The PDB format is a standard file format used to store and represent the three-dimensional atomic coordinates of biomolecules, primarily proteins. It's a widely used format within the Protein Data Bank (PDB), and numerous programs are designed to read and write PDB files. The format also includes metadata like authors, references, and experimental methods, offering a comprehensive description of the structure. 

   <div class="container">
        <h1><font color= yellow>PDB Analysis and Visualization</h1>
        <div class="section">
            <h2><font color= yellow>What You'll Learn</h2>
            <p>This notebook teaches you how to analyze and visualize <span class="highlight">Protein Data Bank (PDB)</span> files. You'll explore:</p>
            <ul>
                <li>Managing directories and files for PDB data.</li>
                <li>Extracting key information like <span class="highlight">ligands, chains, resolution, and organism details</span>.</li>
                <li>Visualizing molecular structures using <span class="highlight">nglview</span>.</li>
                <li>Customizing visualizations to highlight specific molecular components.</li>
                <li>Saving and exporting visualizations for further use.</li>
            </ul>
        </div>
        <div class="section">
            <h2><font color= yellow>Why It's Useful</h2>
            <p>This notebook is perfect for learners interested in <span class="highlight">structural bioinformatics</span> and molecular visualization. It provides hands-on experience with real-world data and tools.</p>
        </div>
        <div class="section">
            <h2><font color= yellow>Get Started</h2>
            <p>Follow the steps in the notebook to unlock the power of PDB analysis and visualization!</p>
        </div>
    </div>
</body>
</html>

 <center><button type="button"><a href="https://www.rcsb.org/structure/2V0Z">RCSB website for pdb file</a></button></center>

# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">1. The First Step | Import library</p>

In [1]:
# Import required libraries
import os
import pandas as pd
import nglview as nv




In [2]:
# Show all the file that is exist in directory
os.listdir()

['.DS_Store',
 'pdb_files',
 '.gitignore',
 'bioinformatics-pdb-analysis-and-visualization.ipynb',
 'venv']

In [3]:
# make a file Path
path_of_file = os.path.join("pdb_files", "2v0z.pdb")
print(path_of_file)

pdb_files/2v0z.pdb


In [4]:
# Open and read file
with open(path_of_file) as fp:
    data= fp.readlines()

In [5]:
# PDB file
for lines in data[:20]:
    print(lines)

HEADER    HYDROLASE                               21-MAY-07   2V0Z              

TITLE     CRYSTAL STRUCTURE OF RENIN WITH INHIBITOR 10 (ALISKIREN)              

CAVEAT     2V0Z    NAG C 1327 HAS WRONG CHIRALITY AT ATOM C1 NAG O 1328 HAS     

CAVEAT   2 2V0Z    WRONG CHIRALITY AT ATOM C1                                   

COMPND    MOL_ID: 1;                                                            

COMPND   2 MOLECULE: RENIN;                                                     

COMPND   3 CHAIN: C, O;                                                         

COMPND   4 SYNONYM: ANGIOTENSINOGENASE;                                         

COMPND   5 EC: 3.4.23.15;                                                       

COMPND   6 ENGINEERED: YES                                                      

SOURCE    MOL_ID: 1;                                                            

SOURCE   2 ORGANISM_SCIENTIFIC: HOMO SAPIENS;                                   

SOURCE   3 ORGAN

# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">2. The Second Step | Analysis PDB File</p>

In [6]:
# File name
file_name = os.path.basename(path_of_file)
pdb_id = file_name.split(".")[0]
print(f"The PDB ID is {pdb_id}")

The PDB ID is 2v0z


In [7]:
# How many line in the files?
print(f"There are {len(data)} lins in this file.")

There are 6695 lins in this file.


In [8]:
# get info()
header = data[0][10: ]
title = data[1][10: ]
print(f"Header : {header}")
print(f"Title : {title}")

Header : HYDROLASE                               21-MAY-07   2V0Z              

Title : CRYSTAL STRUCTURE OF RENIN WITH INHIBITOR 10 (ALISKIREN)              



In [9]:
# Find special info from data
# number of ligand
het_name =list() 
for lines in data:
    if "HETNAM" in lines and "REVDAT" not in lines:
        het_name.append(lines)
ligand_1 = het_name[1].split()[1]
ligand_2 = het_name[0].split()[1]
ligand_1_complete_name = het_name[1][10:]
ligand_2_complete_name = het_name[0][10:]
print(f"Hetrogen molecule 1 is {ligand_1} or it's complete name is {ligand_1_complete_name}")
print(f"Hetrogen molecule 2 is {ligand_2} or it's complete name is {ligand_2_complete_name}")


Hetrogen molecule 1 is C41 or it's complete name is  C41 ALISKIREN                                                        

Hetrogen molecule 2 is NAG or it's complete name is  NAG 2-ACETAMIDO-2-DEOXY-BETA-D-GLUCOPYRANOSE                         



In [10]:
# Find special info from data
# number of protein chains
chain_name =list() 
for lines in data:
    if "TER" in lines:
        chain_name.append(lines)
chain_o = chain_name[-2].split()[3]
chain_c = chain_name[-3].split()[3]
print(f"In this file 2 chain exist : Chain {chain_o} and chain {chain_c}")

In this file 2 chain exist : Chain O and chain C


In [11]:
# Find special info from data
# Find resolution
for lines in data:
    if "RESOLUTION." in lines:
        res_line = lines
        resolution = res_line.split()[3]
print(f"Resolution is {resolution} Angstrom")


Resolution is 2.20 Angstrom


In [12]:
# Find special info from data
# Find EC number
for lines in data:
    if "EC:" in lines:
        res_line = lines
        ec_number = res_line.split()[3][ :9]
print(f"EC Number of this protein is {ec_number}")

EC Number of this protein is 3.4.23.15


In [13]:
# Find special info from data
# Find source of protein
for lines in data:
    if "ORGANISM_SCIENTIFIC:" in lines:
        res_line = lines
        protein_source = res_line.split()[3] + ' ' + res_line.split()[4][:7]
print(f" Source of this protein : {protein_source}")

 Source of this protein : HOMO SAPIENS


In [14]:
# Find special info from data
# number of atom in each chains
chain_name =list() 
for lines in data:
    if "TER" in lines:
        chain_name.append(lines)
chain_o = chain_name[-2].split()[3]
cain_c = chain_name[-3].split()[3]
number_of_atoms = chain_name[-2].split()[4]
print(f"In this file 2 chain exist : {chain_o} with {number_of_atoms} atoms and chain {cain_c} with {number_of_atoms} atoms")

In this file 2 chain exist : O with 326 atoms and chain C with 326 atoms


In [15]:
# Find special info from data
# organism tax id
for lines in data:
    if "ORGANISM_TAXID:" in lines:
        res_line = lines
        organism_tax_id = res_line.split()[3][:4]
print(f"The ORGANISM TAXID is {organism_tax_id}")

The ORGANISM TAXID is 9606


In [16]:
# Find missing residues in the protein
for lines in data:
    if "REMARK 465" in lines:
        print(lines.strip()[10:])



 MISSING RESIDUES
 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)

   M RES C SSSEQI
     ASN C   160A
     GLU O   160
     ASN O   160A
     SER O   160B


In [17]:
# Find missing atoms in the protein
for lines in data:
    if "REMARK 470" in lines:
        print(lines.strip()[10:])


 MISSING ATOM
 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER;
 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER;
 I=INSERTION CODE):
   M RES CSSEQI  ATOMS
     SER C 159    CB   OG
     GLU C 160    CA   C    O    CB   CG   CD   OE1
     GLU C 160    OE2
     SER C 160B   OG
     GLN C 160C   CB   CG   CD   OE1  NE2
     SER O 159    CA   C    O    CB   OG


# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">3. The Third Step | Create DataFrame</p>

In [18]:
# Create a DataFrame
data_file = {
   "Header" : header.split()[0],
    "Title" : title,
    "Pdb_id" : pdb_id,
    "Ligand name 1"  : ligand_1,
    "Ligand name 2" : ligand_2,
    "number of ligands" : 2,
    "Chain name 1" : chain_o,
    "Chain name 2" : chain_c,
    "number of chain" : 2,
    "resolution" : resolution,
    "EC Number" : ec_number,
    "Protein Source" : protein_source,
    "Chain atom" : number_of_atoms, 
    "Organism Tax Id" : organism_tax_id, 
    "MISSING RESIDUES" : "Yes"
}
df = pd.DataFrame(data_file, index=[1])
df

Unnamed: 0,Header,Title,Pdb_id,Ligand name 1,Ligand name 2,number of ligands,Chain name 1,Chain name 2,number of chain,resolution,EC Number,Protein Source,Chain atom,Organism Tax Id,MISSING RESIDUES
1,HYDROLASE,CRYSTAL STRUCTURE OF RENIN WITH INHIBITOR 10 (...,2v0z,C41,NAG,2,O,C,2,2.2,3.4.23.15,HOMO SAPIENS,326,9606,Yes


# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">4. The fourth Step | Visualizations All Of Molecules </p>

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title></title>
    <style>
        body {
            font-family: Arial, sans-serif;
            line-height: 1.6;
            margin: 20px;
            background-color: #f4f4f9;
        }
        .representation {
            margin-bottom: 20px;
            padding: 15px;
            border: 1px solid #ddd;
            border-radius: 5px;
            background-color: #fff;
        }
        .representation h2 {
            margin-top: 0;
            color: #333;
        }
        .representation p {
            margin: 0;
            color: #555;
        }
    </style>
</head>
<body>
    <h1>Different Types of Protein Representations</h1>

<div class="representation">
        <h2>Cartoon</h2>
        <p>Displays the protein backbone as a cartoon-like ribbon, highlighting secondary structures such as alpha-helices and beta-sheets. This is useful for understanding the overall structure of the protein.</p>
</div>

<div class="representation">
    <h2>Ball+Stick</h2>
    <p>Shows atoms as balls and bonds as sticks, providing a detailed view of the atomic structure. This is ideal for examining small molecules or specific regions of the protein.</p>
</div>

<div class="representation">
    <h2>Licorice</h2>
    <p>Similar to Ball+Stick but with thinner bonds, emphasizing the connectivity between atoms.</p>
</div>

<div class="representation">
    <h2>Spacefill</h2>
    <p>Represents atoms as spheres with radii proportional to their van der Waals radii, giving a sense of the molecular volume.</p>
</div>

<div class="representation">
    <h2>Hyperball</h2>
    <p>A variation of Ball+Stick with exaggerated bonds, making it easier to see the connections in complex structures.</p>
</div>

<div class="representation">
    <h2>Surface</h2>
    <p>Displays the molecular surface, which is useful for visualizing the shape and accessible surface area of the protein.</p>
</div>

<div class="representation">
    <h2>Base</h2>
    <p>Highlights the base of nucleotides in DNA or RNA structures.</p>
    </div>
</body>
</html>

In [19]:
# Show all molecules with ball+stick shape
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("ball+stick")   
view



NGLWidget()

In [20]:
# Show all molecules with cartoon shape
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("cartoon")     
view

NGLWidget()

In [21]:
# Show all molecules with licorice shape
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("licorice")
view

NGLWidget()

In [22]:
# Show all molecules with spacefill shape
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("spacefill")
view

NGLWidget()

In [23]:
# Show all molecules with hyperball shape
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("hyperball")
view

NGLWidget()

# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">5. The Fifth Step | Visualization Base On Specific Part Of Molecule</p>

In [24]:
# Show ligands
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("hyperball", "ligand")
view

NGLWidget()

In [25]:
# Show protein
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("cartoon", "protein")
view

NGLWidget()

In [26]:
# Show the protein and ligand in different shape
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("cartoon", "protein")
view.add_representation("hyperball", "ligand")
view

NGLWidget()

In [27]:
# Show water
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("hyperball", "water", color="blue")
view

NGLWidget()

In [28]:
# Show the ligand and water
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("hyperball", "water", color="blue")
view.add_representation("ball+stick", "ligand")
view

NGLWidget()

In [29]:
# Show the protein and water
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("cartoon", "protein")
view.add_representation("hyperball", "water", color="blue")
view

NGLWidget()

In [30]:
# Show All The Molecules
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("surface", "protein", opacity=0.9)
view.add_representation("hyperball", "water", color="blue")
view.add_representation("ball+stick", "ligand", color="red")
view

NGLWidget()

In [31]:
# Show the protein and ligand with different color
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("cartoon", "protein", color="blue", opacity=0.4)
view.add_representation("licorice", "ligand", color="red", opacity=0.9)
view

NGLWidget()

# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">6. The Sixth Step | Visualization With Contacts</p>

In [32]:
# Show the contact
view = nv.show_file(path_of_file)
view.clear_representations()
view.add_representation("cartoon", "protein", color="blue", opacity=0.4)
view.add_representation("licorice", "ligand", color="red", opacity=0.9)
view.add_representation("contact")
view

NGLWidget()

# <p style="background-color:black; font-family:calibri; color:#FF00FF; font-size:170%; text-align:center; border-radius:30px 30px;">7. The Seventh Step | Save Image</p>

In [33]:
# Create a new directory
os.mkdir("image")

In [34]:
# render the image
view.render_image()

Image(value=b'', width='99%')

In [35]:
# Download image
view.download_image("/image/2v0z.png")