<h1>MD Setup<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span><ul class="toc-item"><li><span><a href="#Input-parameters" data-toc-modified-id="Input-parameters-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Input parameters</a></span></li><li><span><a href="#Libraries-and-Functions" data-toc-modified-id="Libraries-and-Functions-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Libraries and Functions</a></span></li><li><span><a href="#Visualizing-initial-complex-structure" data-toc-modified-id="Visualizing-initial-complex-structure-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Visualizing initial complex structure</a></span></li></ul></li><li><span><a href="#Fix-protein-structure" data-toc-modified-id="Fix-protein-structure-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Fix protein structure</a></span><ul class="toc-item"><li><span><a href="#Perform-all-checks" data-toc-modified-id="Perform-all-checks-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Perform all checks</a></span></li></ul></li><li><span><a href="#Extract-complex-bcl-2-bax" data-toc-modified-id="Extract-complex-bcl-2-bax-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Extract complex bcl-2-bax</a></span></li><li><span><a href="#Create-protein-system-topology" data-toc-modified-id="Create-protein-system-topology-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Create protein system topology</a></span></li></ul></div>

## Introduction
This notebook describes the process of **preparing protein and ligand molecules**. The protein is the bcl-2-bax complex (pdbcode 2xa0), and the ligand is the Genestein small molecule (Pubchem CID 5280961).

### Input parameters
 - **input.cif**: CIF file of the bcl-2-bax complex (PDB code 2xa0).

### Libraries and Functions

In [1]:
import nglview               # For visualizing 3D structures
import ipywidgets            # For organizing 3D structures in panels
from IPython.display import Image       # For showing images



### Visualizing initial complex structure

In [16]:
view = nglview.show_structure_file("input.cif", default=False)
view.add_representation(repr_type   = 'cartoon', 
                        selection   = 'not het',
                        colorScheme = 'atomindex')
view.center()
view

NGLWidget()

<a id="fix"></a>
***
## Fix protein structure
**Checking** and **fixing** (if needed) the protein structure:<br>
- **Modeling** **missing side-chain atoms**, modifying incorrect **amide assignments**, choosing **alternative locations**.<br>
- **Checking** for missing **backbone atoms**, **heteroatoms**, **modified residues** and possible **atomic clashes**.
***
**Tools** used:
 - [check_strcuture](https://biobb-structure-checking.readthedocs.io/en/latest/command_line_usage.html) (Structure Checking from MDWeb)
check_structure performs [MDWeb structure checking](http://mmb.irbbarcelona.org/MDWeb2/help.php?id=checking) set as a command line utility. It is intended to prepare a structure for molecular dynamics simulation. It includes structure manipulation options like selecting models or chains, removing components of the system, completing side chains and backbone, and quality checking as residue quirality, amide orientation, or vdw clashes.

- [Modeller](https://salilab.org/modeller/) Missing fragments filled using comparative modelling.
***

### Perform all checks

In [17]:
# Check all 
!check_structure -i "input.cif" checkall

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure input.cif loaded
 PDB id: 2XA0
 Title: Crystal structure of BCL-2 in complex with a BAX BH3 peptide
 Experimental method: X-RAY DIFFRACTION
 Keywords: APOPTOSIS
 Resolution (A): 2.70

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  347
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  2733

Running models.
1 Model(s) detected
Single model found
Running chains.
4 Chain(s) detected
 A: Protein
 B: Protein
 C: Protein
 D: Protein
Running inscodes.
No residues with insertion codes found
Running altloc.
No residues with alternative location labels detected
Running rem_hydrogen.
No residues with Hydrogen atoms found
Running add_hydrogen.
116 Residues requiring selection on adding H atoms
 

In [19]:
# Fix chiral
!check_structure -i "input.cif" -o "fixed_main.pdb" backbone --fix_atoms All --fix_chain All --add_caps None

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure input.cif loaded
 PDB id: 2XA0
 Title: Crystal structure of BCL-2 in complex with a BAX BH3 peptide
 Experimental method: X-RAY DIFFRACTION
 Keywords: APOPTOSIS
 Resolution (A): 2.70

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  347
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  2733

Running backbone. Options: --fix_atoms All --fix_chain All --add_caps None
2 Residues with missing backbone atoms found
 ASP A31    OXT
 ASP B31    OXT
2 Backbone breaks found
 ASP A31    - VAL A92    
 ASP B31    - VAL B92    
No unexpected backbone links
Main chain fixes

                         MODELLER 10.2, 2021/11/15, r12267

     PROTEIN STRUCTURE MODELLING BY SATISFACTION OF SPATIAL RESTR

In [24]:
view = nglview.show_structure_file("fixed_main.pdb", default=False)
view.add_representation(repr_type = 'cartoon',colorScheme = 'atomindex')
view.center()
view

NGLWidget()

In [21]:
# Fix amide
!check_structure -i "fixed_main.pdb" -o "fixed_main_amide.pdb" amide --fix all

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure fixed_main.pdb loaded
 Title: 
 Experimental method: unknown
 Resolution (A): N.A.

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  467
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  3515

Running amide. Options: --fix all
5 unusual contact(s) involving amide atoms found
 ARG A107.NH2 ASN C73.ND2     2.933 A
 GLN B118.OE1 LYS D57.O       2.907 A
 ASN B143.OD1 ASP D68.OD1     2.845 A
 ASN B143.ND2 GLY B145.N      2.601 A
 ASN B143.ND2 ARG B146.N      2.551 A
Amide residues fixed all (3)
Rechecking
2 unusual contact(s) involving amide atoms found
 GLN B118.NE2 LEU D59.N       2.605 A
 ASN B143.OD1 ASP D71.OD2     3.062 A
Final Num. models: 1
Final Num. chains: 4 (A: Protein, B: Prot

## Extract complex bcl-2-bax

In [22]:
!check_structure -i "fixed_main_amide.pdb" -o "fixed.pdb" chains --select A,C

=                   BioBB structure checking utility v3.8.1                   =
=                 A. Hospital, P. Andrio, J.L. Gelpi 2018-21                  =

Structure fixed_main_amide.pdb loaded
 Title: 
 Experimental method: unknown
 Resolution (A): N.A.

 Num. models: 1
 Num. chains: 4 (A: Protein, B: Protein, C: Protein, D: Protein)
 Num. residues:  467
 Num. residues with ins. codes:  0
 Num. HETATM residues:  21
 Num. ligands or modified residues:  0
 Num. water mol.:  21
 Num. atoms:  3515

Running chains. Options: --select A,C
4 Chain(s) detected
 A: Protein
 B: Protein
 C: Protein
 D: Protein
Selecting chain(s) A,C
Final Num. models: 1
Final Num. chains: 2 (A: Protein, C: Protein)
Final Num. residues:  237
Final Num. residues with ins. codes:  0
Final Num. HETATM residues:  14
Final Num. ligands or modified residues:  0
Final Num. water mol.:  14
Final Num. atoms:  1761
Structure saved on fixed.pdb


## Create protein system topology
**Building GROMACS topology** corresponding to the protein structure.<br>
- Force field used is [**amber99sb-ildn**](https://dx.doi.org/10.1002%2Fprot.22711): 
    * AMBER **parm99** force field with **corrections on backbone** (sb) and **side-chain torsion potentials** (ildn).<br>
    
- Water molecules type used is [**spc/e**](https://pubs.acs.org/doi/abs/10.1021/j100308a038).<br>
- Adding **hydrogen atoms** if missing. Automatically identifying **disulfide bridges**. <br>

Generating two output files: 
- **GROMACS structure** (gro file)
- **GROMACS topology** ZIP compressed file containing:
    - *GROMACS topology top file* (top file)
    - *GROMACS position restraint file/s* (itp file/s)
    
***
**Building Blocks** used:
 - [Pdb2gmx](https://biobb-md.readthedocs.io/en/latest/gromacs.html#module-gromacs.pdb2gmx) from **biobb_md.gromacs.pdb2gmx**

The GROMACS pdb2gmx module, reads a .pdb (or .gro) file, reads some database files, adds hydrogens to the molecules and generates coordinates in GROMACS (GROMOS), or optionally .pdb, format and a topology in GROMACS format. These files can subsequently be processed to generate a run input file.

***

In [23]:
# Create Protein system topology
# Import module
from biobb_md.gromacs.pdb2gmx import pdb2gmx

# Create inputs/outputs
prop = {
    'force_field' : 'amber99sb-ildn',
    'water_type': 'spce'
}

# Create and launch bb
pdb2gmx(input_pdb_path      = "fixed.pdb",
        output_gro_path     = "protein_pdb2gmx.gro",
        output_top_zip_path = "protein_pdb2gms_top.zip",
        properties          = prop)

2022-05-04 18:23:23,333 [MainThread  ] [INFO ]  GROMACS Pdb2gmx 20191 version detected
2022-05-04 18:23:23,334 [MainThread  ] [INFO ]  Not using any container
2022-05-04 18:23:24,040 [MainThread  ] [INFO ]  gmx -nobackup -nocopyright pdb2gmx -f fixed.pdb -o protein_pdb2gmx.gro -p p2g.top -water spce -ff amber99sb-ildn -i posre.itp

2022-05-04 18:23:24,041 [MainThread  ] [INFO ]  Exit code 0

2022-05-04 18:23:24,042 [MainThread  ] [INFO ]  
Using the Amber99sb-ildn force field in directory amber99sb-ildn.ff

going to rename amber99sb-ildn.ff/aminoacids.r2b
going to rename amber99sb-ildn.ff/dna.r2b
going to rename amber99sb-ildn.ff/rna.r2b
Reading fixed.pdb...
Read '', 1761 atoms
Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.
Moved all the water blocks to the end
There are 2 chains and 2 blocks of water and 237 residues with 1761 atoms

  chain  #res #atoms
  1 'A'   197   1536  
  2 'C'    26    211  
  3 ' '    10     10  (only water)
  4 ' '   

2022-05-04 18:23:24,044 [MainThread  ] [INFO ]  Compressing topology to: protein_pdb2gms_top.zip
2022-05-04 18:23:24,045 [MainThread  ] [INFO ]  Ignored file amber99sb-ildn.ff/forcefield.itp
2022-05-04 18:23:24,070 [MainThread  ] [INFO ]  Ignored file amber99sb-ildn.ff/spce.itp
2022-05-04 18:23:24,071 [MainThread  ] [INFO ]  Ignored file amber99sb-ildn.ff/ions.itp
2022-05-04 18:23:24,076 [MainThread  ] [INFO ]  Adding:
2022-05-04 18:23:24,077 [MainThread  ] [INFO ]  ['p2g.top', 'p2g_Protein_chain_A.itp', 'p2g_Protein_chain_C.itp', 'posre_Protein_chain_A.itp', 'posre_Protein_chain_C.itp']
2022-05-04 18:23:24,083 [MainThread  ] [INFO ]  to: /home/lg/repos/javerianaMD/simulation/PR-biobb/pdb_2xa0/protein_pdb2gms_top.zip
2022-05-04 18:23:24,084 [MainThread  ] [INFO ]  Removed: ['p2g.top']


0