<h1>MD Setup<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span><ul class="toc-item"><li><span><a href="#Input-parameters" data-toc-modified-id="Input-parameters-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Input parameters</a></span></li><li><span><a href="#Libraries-and-Functions" data-toc-modified-id="Libraries-and-Functions-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Libraries and Functions</a></span></li><li><span><a href="#Visualizing-initial-complex-structure" data-toc-modified-id="Visualizing-initial-complex-structure-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Visualizing initial complex structure</a></span></li></ul></li><li><span><a href="#Fix-protein-structure" data-toc-modified-id="Fix-protein-structure-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Fix protein structure</a></span><ul class="toc-item"><li><span><a href="#Perform-all-checks" data-toc-modified-id="Perform-all-checks-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Perform all checks</a></span></li></ul></li><li><span><a href="#Extract-complex-bcl-2-bax" data-toc-modified-id="Extract-complex-bcl-2-bax-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Extract complex bcl-2-bax</a></span></li><li><span><a href="#Extracting-Protein,-Ligand-and-Protein-Ligand-Complex" data-toc-modified-id="Extracting-Protein,-Ligand-and-Protein-Ligand-Complex-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Extracting Protein, Ligand and Protein-Ligand Complex</a></span><ul class="toc-item"><li><span><a href="#Visualizing-3D-structures" data-toc-modified-id="Visualizing-3D-structures-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Visualizing 3D structures</a></span></li><li><span><a href="#Fix-backbone" data-toc-modified-id="Fix-backbone-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Fix backbone</a></span></li><li><span><a href="#Quilarity" data-toc-modified-id="Quilarity-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Quilarity</a></span></li></ul></li></ul></div>

## Introduction
This notebook describes the process of **preparing protein and ligand molecules**. The protein is the bcl-2-bax complex (pdbcode 2xa0), and the ligand is the Genestein small molecule (Pubchem CID 5280961).

### Input parameters
- Input:  
 - **complex**: CIF file of the bcl-2-bax complex (PDB code 2xa0).

### Libraries and Functions

In [2]:
import nglview               # For visualizing 3D structures
import ipywidgets            # For organizing 3D structures in panels
from IPython.display import Image       # For showing images
import os, zipfile           # Python utilities



### Visualizing initial complex structure

In [65]:
view = nglview.show_structure_file("input.pdb", default=False)
view.add_representation(repr_type   = 'cartoon', 
                        selection   = 'not het',
                        colorScheme = 'atomindex')
view.center()
view

NGLWidget()

<a id="fix"></a>
***
## Fix protein structure
**Checking** and **fixing** (if needed) the protein structure:<br>
- **Modeling** **missing side-chain atoms**, modifying incorrect **amide assignments**, choosing **alternative locations**.<br>
- **Checking** for missing **backbone atoms**, **heteroatoms**, **modified residues** and possible **atomic clashes**.
***
**Tools** used:
 - [check_strcuture](https://biobb-structure-checking.readthedocs.io/en/latest/command_line_usage.html) (Structure Checking from MDWeb)
check_structure performs [MDWeb structure checking](http://mmb.irbbarcelona.org/MDWeb2/help.php?id=checking) set as a command line utility. It is intended to prepare a structure for molecular dynamics simulation. It includes structure manipulation options like selecting models or chains, removing components of the system, completing side chains and backbone, and quality checking as residue quirality, amide orientation, or vdw clashes.

- [Modeller](https://salilab.org/modeller/) Missing fragments filled using comparative modelling.
***

### Perform all checks

In [7]:
import os
os.getcwd()

'/home/lg/repos/javerianaMD/simulation/PR-biobb/pdb_1g5m'

In [4]:
# Check all 
!check_structure -i "input.cif" checkall

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

ERROR: fetching/parsing structure from input.cif


In [3]:
# Fix chiral
!check_structure -i "input.cif" -o "fixed_main.pdb" backbone --fix_atoms All --fix_chain All --add_caps All

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

ERROR: fetching/parsing structure from input.cif


In [70]:
view = nglview.show_structure_file("fixed_main.pdb", default=False)
view.add_representation(repr_type = 'cartoon',colorScheme = 'atomindex')
view.center()
view

NGLWidget()

In [47]:
# Fix amide
!check_structure -i "fixed_main.pdb" -o "fixed_main_amide.pdb" amide --fix all

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure fixed_main.pdb loaded
 Title: 
 Experimental method: unknown
 Resolution (A): N.A.

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  475
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  3531

Running amide. Options: --fix all
5 unusual contact(s) involving amide atoms found
 ARG A107.NH2 ASN C73.ND2     2.933 A
 GLN B118.OE1 LYS D57.O       2.907 A
 ASN B143.OD1 ASP D68.OD1     2.845 A
 ASN B143.ND2 GLY B145.N      2.601 A
 ASN B143.ND2 ARG B146.N      2.551 A
Amide residues fixed all (3)
Rechecking
2 unusual contact(s) involving amide atoms found
 GLN B118.NE2 LEU D59.N       2.605 A
 ASN B143.OD1 ASP D71.OD2     3.062 A
Final Num. models: 1
Final Num. chains: 4 (A: Protein, B: Prot

## Extract complex bcl-2-bax

In [52]:
!check_structure -i "fixed_main_amide.pdb" -o "fixed_complex.pdb" chains --select A,C

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure fixed_main_amide.pdb loaded
 Title: 
 Experimental method: unknown
 Resolution (A): N.A.

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  475
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  3531

Running chains. Options: --select A,C
4 Chain(s) detected
 A: Protein
 B: Protein
 C: Protein
 D: Protein
Selecting chain(s) A,C
Final Num. models: 1
Final Num. chains: 2 (A: Protein, C: Protein)
Final Num. residues:  241
Final Num. residues with ins. codes:  0
Final Num. HETATM residues:  14
Final Num. ligands or modified residues:  0
Final Num. water mol.:  14
Final Num. atoms:  1769
Structure saved on fixed_complex.pdb


In [63]:
view = nglview.show_structure_file("fixed_complex.pdb", default=False)
view.add_representation(repr_type   = 'cartoon', colorScheme = 'atomindex')
view.center()
view

NGLWidget()

## Extracting Protein, Ligand and Protein-Ligand Complex
***
**Building Blocks** used:
 - [extract_molecule](https://biobb-structure-utils.readthedocs.io/en/latest/utils.html#module-utils.extract_model): It includes structure manipulation options like selecting models or chains, removing components of the system, completing side chains and backbone, and quality checking as residue quirality, amide orientation, or vdw clashes.

 - [extract_heteroatoms](https://biobb-structure-utils.readthedocs.io/en/latest/utils.html#module-utils.extract_heteroatoms): Class to extract hetero-atoms from a 3D structure using Biopython.
   
 - [cat_pdb](https://biobb-structure-utils.readthedocs.io/en/latest/utils.html#module-utils.cat_pdb): Class to concat two PDB structures in a single PDB file.

In [23]:
# Extracting Protein, Ligand and Protein-Ligand Complex to three different files
# Import module
from biobb_structure_utils.utils.extract_heteroatoms import extract_heteroatoms
from biobb_structure_utils.utils.extract_molecule import extract_molecule
from biobb_structure_utils.utils.cat_pdb import cat_pdb

# Create inputs/outputs
protein     = "protein.pdb"
ligandFile  = "ligand.pdb"
complexFile = "complex.pdb"
print (">>> Files:", "input.pdb", ligandID, "protein.pdb", "ligand.pdb", "complex.pdb")

>>> Files: input.pdb GEN protein.pdb ligand.pdb complex.pdb


In [24]:
# Extract molecule
print ("\n>>> Extracting molecule...")
prop = {
     'remove_tmp' : False
}

extract_molecule (input_structure_path = "input.pdb",
                  output_molecule_path = "protein.pdb",
                  properties = prop);


>>> Extracting molecule...
2022-05-02 22:48:19,757 [MainThread  ] [INFO ]  ExtractMolecule: Unexisting input file, exiting


SystemExit: ExtractMolecule: Unexisting input file

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [25]:
# Extract ligand 
print ("\n>>> Extracting ligand...")
prop = {
     'heteroatoms' : [{"name": ligandID}]
}

extract_heteroatoms (input_structure_path   = "input.pdb",
                     output_heteroatom_path = "ligand.pdb", 
                     properties             = prop);


>>> Extracting ligand...
2022-05-02 22:48:36,071 [MainThread  ] [INFO ]  ExtractHeteroAtoms: Unexisting input file, exiting


SystemExit: ExtractHeteroAtoms: Unexisting input file

In [26]:
# Create complex
print ("\n>>> Creating complex...")
cat_pdb (input_structure1      = "protein.pdb", 
         input_structure2      = "ligand.pdb",
         output_structure_path = "complex.pdb");


>>> Creating complex...
2022-05-02 22:48:36,734 [MainThread  ] [INFO ]  CatPDB: Unexisting input file, exiting


SystemExit: CatPDB: Unexisting input file

### Visualizing 3D structures 

In [27]:
# Show structures: protein, ligand and protein-ligand complex
view1 = nglview.show_structure_file(protein)
view1._remote_call('setSize', target='Widget', args=['350px','400px'])
view1
view2 = nglview.show_structure_file(ligandFile)
view2.add_representation(repr_type='ball+stick')
view2._remote_call('setSize', target='Widget', args=['350px','400px'])
view2
view3 = nglview.show_structure_file(complexFile)
view3.add_representation(repr_type='licorice', radius='.5', selection=ligandName)
view3._remote_call('setSize', target='Widget', args=['350px','400px'])
view3
ipywidgets.HBox([view1, view2, view3])

ValueError: you must provide file extension if using file-like object or text content

<img src='images/img01.png' style='float: center;width:50%'></img>

In [64]:
# Show all possible structure errros
!check_structure -i inputs/2xa0.cif checkall

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure inputs/2xa0.cif loaded
 PDB id: 2XA0
 Title: Crystal structure of BCL-2 in complex with a BAX BH3 peptide
 Experimental method: X-RAY DIFFRACTION
 Keywords: APOPTOSIS
 Resolution (A): 2.70

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  347
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  2733

Running models.
1 Model(s) detected
Single model found
Running chains.
4 Chain(s) detected
 A: Protein
 B: Protein
 C: Protein
 D: Protein
Running inscodes.
No residues with insertion codes found
Running altloc.
No residues with alternative location labels detected
Running rem_hydrogen.
No residues with Hydrogen atoms found
Running add_hydrogen.
116 Residues requiring selection on adding H a

### Fix backbone

In [29]:
# Check & Fix Protein Structure
# Import module
from biobb_model.model.fix_side_chain import fix_side_chain

# Create prop dict and inputs/outputs
fixed_pdb = pdbName +  '_fixed.pdb'

# Create and launch bb
fix_side_chain(input_pdb_path  = protein, 
               output_pdb_path = fixed_pdb);

2022-05-02 22:48:43,310 [MainThread  ] [INFO ]  check_structure -i protein.pdb -o protein_fixed.pdb --force_save fixside --fix ALL

2022-05-02 22:48:43,311 [MainThread  ] [INFO ]  Exit code 1

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =


2022-05-02 22:48:43,312 [MainThread  ] [INFO ]  ERROR: fetching/parsing structure from protein.pdb

2022-05-02 22:48:43,313 [MainThread  ] [INFO ]  Removed: []


### Quilarity

In [30]:
!check_structure -i inputs/2xa0.cif -o "fix_complex_chiral.pdb" amide --fix All

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure inputs/2xa0.cif loaded
 PDB id: 2XA0
 Title: Crystal structure of BCL-2 in complex with a BAX BH3 peptide
 Experimental method: X-RAY DIFFRACTION
 Keywords: APOPTOSIS
 Resolution (A): 2.70

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  347
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  2733

Running amide. Options: --fix All
5 unusual contact(s) involving amide atoms found
 ARG A107.NH2 ASN C73.ND2     2.933 A
 GLN B118.OE1 LYS D57.O       2.907 A
 ASN B143.OD1 ASP D68.OD1     2.845 A
 ASN B143.ND2 GLY B145.N      2.601 A
 ASN B143.ND2 ARG B146.N      2.551 A
Amide residues fixed All (3)
Rechecking
2 unusual contact(s) involving amide atoms found
 GLN B118.NE2 LEU D59.N       2.