In [1]:
import nglview as ng



In [14]:
# https://www.rcsb.org/structure/1FJS
# ng.show_structure_file('1fjs.pdb')
view = ng.show_pdbid("3fjs")
view.add_surface(selection="protein", opacity=0.2)
view.add_licorice('ALA')
view

NGLWidget()

In [21]:
def clean_pdb(input_file, output_file):
    """
    This cleaning procedure is not universally appropriate 
    (e.g., the case of a tightly bound ligand or otherwise functional active-site water molecule).
    
    HETATM stands for "HETero ATom":

        These records represent atoms in non-standard residues such as ligands, inhibitors,
    solvent molecules (like water), ions, and modified amino acids
        By removing lines with HETATM, you're keeping only the standard protein structure (the amino acid chain)


    CONECT stands for "CONnECTivity":

        These records specify the connectivity between atoms, particularly for non-standard residues
    They define chemical bonds between atoms that might not be inferred from standard geometry
    Removing CONECT records simplifies the file by eliminating these explicit bond definitions
    """
    with open(input_file, 'r') as f_in:
        with open(output_file, 'w') as f_out:
            for line in f_in:
                if line.startswith('HETATM'):
                    continue
                if line.startswith('CONECT'):
                    continue
                f_out.write(line)

In [22]:
clean_pdb('1fjs.pdb', '1fjs_protein.pdb')

In [28]:
#  Always check your .pdb file for entries listed under the comment MISSING,
# as these entries indicate either atoms or whole residues that are not present in the crystal structure.
# Terminal regions may be absent, and may not present a problem for dynamics.
"""
Protein crystal structures often have incomplete regions, particularly at the N-terminal and C-terminal ends
(the beginning and end of the protein chain).
When you see "MISSING" in a PDB file, it means those parts weren't resolved in the experimental structure
determination.

The note about terminal regions "not presenting a problem for dynamics" means:

1. Missing terminal regions (ends of the protein) are typically flexible and disordered in solution,
    which is why they often don't show up clearly in crystal structures
2. For molecular dynamics simulations, having these terminal regions missing is usually acceptable because:
   - They're often not crucial for the core protein function or structure
   - They typically extend into the solvent and don't participate in important structural interactions
   - The dynamics of the rest of the protein can still be studied meaningfully without them

However, if missing regions are in the middle of the protein or in functionally important areas
(like active sites or binding interfaces), that would be more problematic for a simulation.
Those gaps would need to be addressed by modeling the missing sections before running dynamics.
"""
!grep MISSING 1fjs.pdb

In [24]:
view = ng.show_structure_file("1fjs_protein.pdb")
#view.add_surface(selection="protein", opacity=0.2)
view.add_licorice('ALA')
view

NGLWidget()

# Topology

In [29]:
!gmx pdb2gmx -f 1fjs_protein.pdb -o 1fjs_processed.gro -water tip3p -ff "charmm27"

                     :-) GROMACS - gmx pdb2gmx, 2025.0 (-:

Executable:   /usr/local/gromacs/bin/gmx
Data prefix:  /usr/local/gromacs
Working dir:  /home/ubuntu/src/gromacs
Command line:
  gmx pdb2gmx -f 1fjs_protein.pdb -o 1fjs_processed.gro -water tip3p -ff charmm27

Using the Charmm27 force field in directory charmm27.ff

going to rename charmm27.ff/aminoacids.r2b
Opening force field file /usr/local/gromacs/share/gromacs/top/charmm27.ff/aminoacids.r2b

going to rename charmm27.ff/rna.r2b
Opening force field file /usr/local/gromacs/share/gromacs/top/charmm27.ff/rna.r2b
Reading 1fjs_protein.pdb...
Read 'COAGULATION FACTOR XA; COAGULATION FACTOR XA', 2236 atoms

Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.

There are 2 chains and 0 blocks of water and 286 residues with 2236 atoms

  chain  #res #atoms

  1 'A'   234   1852  

  2 'L'    52    384  

there were 14 atoms with zero occupancy and 20 atoms with          occupancy unequal to one (ou