<a href="https://colab.research.google.com/github/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/biomolecule_workflow/Biomolecule-Workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MoSDeF Biomolecule Workflow

## Tutorial summary
This tutorial aims to familiarize a molecular simulationist on the ways to create a complex molecular system using using the [MoSDeF](https://github.com/mosdef-hub) framework. In addition, after creating the inital system we demonstrate how to identify missing forcefield parameters and parameterize a system using [Foyer](https://github.com/mosdef-hub/foyer) and [GMSO](https://github.com/mosdef-hub/gmso).  This demonstrates the strengths of using the [MoSDeF](https://github.com/mosdef-hub) simulation framework for creating modular, complex, and custom molecular systems for computational chemistry simulations.

## Learning Objectives:
* How to build up a molecule of interest
* How to use visualization to identify specific atoms in a [mBuild.Compound](https://github.com/mosdef-hub/mbuild/blob/468028b5d0185c7325f91ee4fce7e50e73d1306d/mbuild/compound.py#L57)
* How to use gmso to debug parameterization
* How to add missing parameters to forcefield xmls
---

## Set up environment on Google Colab
---

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install_miniforge()

In [None]:
import condacolab
condacolab.check()

!conda install mamba
!mamba install anaconda-client -n base

!git clone https://github.com/mosdef-hub/CECAM-MoSDeF-Workshop

!git clone https://github.com/kierannp/protein_builder.git
%cd protein_builder
!pip install .
%cd ..
!pip install --upgrade ipykernel

!mamba install -c conda-forge py3Dmol nglview mbuild hoomd unyt forcefield-utilities fresnel gsd openbabel
!mamba install gmso=0.11.2

In [None]:
# Import Libraries
import mbuild as mb
import gmso
import numpy as np

## Exercise 1: Building `mBuild.Compounds`

For this exercise we want you to figure out how to build two molecules (glycine and alanine), save them to a variable, and visualize the molecule. HINT: The smiles for glycine is C(C(=O)O)N and the smiles for alanine is CC(C(=O)O)N, how do you create a molecule in `mBuild` from smiles?

In [None]:
# # Exercise 1
glycine = ???
alanine = ???

### <font color="red"><b>Exercise 1 Answer</b></font>

<details>
    <summary>Click once to show answer!</summary>
    
    class Glycine(mb.Compound):
        def __init__(self):
            super(Glycine, self).__init__()
            glycine = mb.load("C(C(=O)O)N",smiles=True)
            self.add(glycine)
            self.name = "Glycine"

    class Alanine(mb.Compound):
        def __init__(self):
            super(Alanine, self).__init__()
            alanine = mb.load('CC(C(=O)O)N',smiles=True)
            self.add(alanine)
            self.name = "Alanine"

    glycine = Glycine()
    alanine = Alanine()

    display(glycine.visualize())
    display(alanine.visualize())
</details>

# Exercise 2: Building molecular systems

For this exercise we want you to build a glycine-alanine protein, save to a variable, and visualize the protein. We suggest that you utilize `NGLView` (see code block below) to identify which of the atoms in glycine and alanine need to be removed inorder to create room for the peptide bond. After this, you need to identify the indices where the peptide bond between the carboxyl and amino group will be. 

In [7]:
#these line enable the nglviewer in colab
# from google.colab import output
# output.enable_custom_widget_manager()

glycine.visualize(backend='NGLView')

NGLWidget()

#### Protein Background
---

Amino acids have both amino and carboxylic acid functional groups and possess a generic structure as seen below:

![image info](https://github.com/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/images/Amino_acid_generic_structure.png?raw=1)

Proteins are polypeptide polymers that are held together by peptide bonds occuring between amino acids. Proteins have ends specificed as N-terminus and C-terminus, the green and blue molecular regions in the image below, respectively.

<img src="https://github.com/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/images/Tetrapeptide_structural_formulae_v.1.png?raw=1" alt="C/N-terminus" width="600"/>

In [None]:
# Exercise 2
glycine.remove(glycine[???])
alanine.remove(alanine[???])
                       
N_port = mb.Port(
    anchor=current_acid.amine, 
    orientation=[0, 1, -1], 
    separation=0.1
)

mb.force_overlap(
    move_this=C_terminal,
    from_positions=C_terminal['head'],
    to_positions=N_terminal['tail']
)

### <font color="red"><b>Exercise 2 Answer</b></font>

<details>
    <summary>Click once to show answer!</summary>


</details>

# Exercise 3: Solvating our system

Now that we have our protein built we can solvate it to see how it behaves.

In [None]:
# control energy minimization
chain.energy_minimize(steps=50)
chain.visualize()

In [None]:
# Exercise 3
water = mb.load(???, smiles=True)
water.name="H2O"
packed_box = mb.fill_box([???, ???], n_compounds=[1,1000], box=[10,10,10])
print(packed_box.print_hierarchy(show_tree=False))
packed_box.visualize()

In [None]:
# save out and reload current state for future use
packed_box.save("solvated_protein.pdb", overwrite=True)
reloaded_pdb = mb.load("solvated_protein.pdb") #xyz, gro, lammpsdata, sdf, mol2, hoomdxml, json

In [None]:
! head "solvated_protein.pdb"

### <font color="red"><b>Exercise 3 Answer</b></font>

<details>
    <summary>Click once to show answer!</summary>

        water = mb.load("O", smiles=True)
        water.name="H2O"
        packed_box = mb.fill_box([chain, water], n_compounds=[1,1000], box=[10,10,10])
        print(packed_box.print_hierarchy(show_tree=False))
        packed_box.visualize()
</details>

# Exercise 4: Forcefielding our system

For this exercise we want you to load the forcefield parameters for our water (SPCE) and our small protein, then apply them to our system using `gmso`

In [None]:
gaff_forcefield = gmso.ForceField("./CECAM-MoSDeF-Workshop/forcefields/gaff.xml")
spce_forcefield = gmso.ForceField("./CECAM-MoSDeF-Workshop/forcefields/spce.xml")

In [None]:
# Exercise 4
from gmso.parameterization import apply

gmso_top = packed_box.to_gmso()
forcefield_matchingDict = {"Protein":???, "H2O":???}
gmso_top = packed_box.to_gmso()
parameterized_top = apply(
    gmso_top, forcefield_matchingDict, identify_connections=True,
)

### <font color="red"><b>Exercise 4 Answer</b></font>

<details>
    <summary>Click once to show answer!</summary>

        from gmso.parameterization import apply

        gmso_top = packed_box.to_gmso()
        forcefield_matchingDict = {"Protein":gaff_forcefield, "H2O":spce_forcefield}
        gmso_top = packed_box.to_gmso()
        parameterized_top = apply(
            gmso_top, forcefield_matchingDict, identify_connections=True,
        )
</details>

### What Happened
This error indicates that have particles in our mbuild system that are missing parameters in our xml forcefield file. We will show how to correct this below.

# Exercise 5: Forcefield correction

See if you can figure out which lines to add to the forcefield file (gaff.xml) to get this to properly parameterize our system. <br><br>
*Hint: We suggest substituting the parameters (c1,c3,n3) for the (c1, c3, n2) parameters and (c3,c2,oh) for the (c3, c1, oh) parameters*

In [None]:
# Exercise 5
import unyt as u
from gmso.lib.potential_templates import PotentialTemplateLibrary  

ff = gmso.ForceField("./CECAM-MoSDeF-Workshop/forcefields/gaff.xml")        
templates = PotentialTemplateLibrary()  
harmonic_angle = templates["HarmonicAnglePotential"]
expression = harmonic_angle.expression  
name = harmonic_angle.name
variables = harmonic_angle.independent_variables     
first_params = {  
    "k": 562.3296 * u.kJ/u.mol/u.rad**2, #these parameters were substituted from (c1,c3,n3) from the link below
    "theta_eq":1.9675096657732076*u.rad
}
second_params = ???
        
first_angle_type = gmso.AngleType(        
    name=name, parameters=first_params,       
    expression=expression,
    independent_variables=variables,    
    member_classes = ("c1", "c3", "n2")
)
second_angle_type = ???

ff.angle_types["c1~c3~n2"] = first_angle_type
ff.angle_types["c3~c1~oh"] = ??? 
ff.version = 1.1 # updated version since we made modifications to the forcefield
ff.name = "gaff with added parameters from https://github.com/choderalab/ambermini/blob/master/share/amber/dat/leap/parm/gaff.dat"
ff.to_xml("gaff_added.xml")

### <font color="red"><b>Exercise 5 Answer</b></font>


<details>
    <summary>Click once to show the answer!</summary>
      
      import unyt as u
      from gmso.lib.potential_templates import PotentialTemplateLibrary 
      ff = gmso.ForceField("./CECAM-MoSDeF-Workshop/forcefields/gaff.xml")        
      templates = PotentialTemplateLibrary()  
      harmonic_angle = templates["HarmonicAnglePotential"]
      expression = harmonic_angle.expression  
      name = harmonic_angle.name
      variables = harmonic_angle.independent_variables     
      first_params = {  
          "k": 562.3296 * u.kJ/u.mol/u.rad**2,
          "theta_eq":1.9675096657732076*u.rad
      }
      second_params = {  
          "k": 571.5344 * u.kJ/u.mol/u.rad**2,
          "theta_eq":2.0099211665966696*u.rad
      }
                
      first_angle_type = gmso.AngleType(        
          name=name, parameters=first_params,       
          expression=expression,
          independent_variables=variables,    
          member_classes = ("c1", "c3", "n2")
      )
      second_angle_type = gmso.AngleType(        
          name=name, parameters=second_params,       
          expression=expression,
          independent_variables=variables,    
          member_classes = ("c3", "c1", "oh")
      )

      ff.angle_types["c1~c3~n2"] = first_angle_type
      ff.angle_types["c3~c1~oh"] = second_angle_type          
      ff.version = 1.1                                                                                                                                          ff.name = "gaff with added parameters from https://github.com/choderalab/ambermini/blob/master/share/amber/dat/leap/parm/gaff.dat"
      ff.to_xml("gaff_added.xml")

</details>

## Conclusion
---

We are assuming that you saved the new forcefield xml as gaff2.xml. If you weren't able to get the forcefield working please use gaff_ANSWER.xml.

In [None]:
gaff_forcefield = gmso.ForceField("./CECAM-MoSDeF-Workshop/forcefields/gaff_ANSWER.xml")
gmso_top = packed_box.to_gmso()
forcefield_matchingDict = {"Protein":gaff_forcefield, "H2O":spce_forcefield}
parameterized_top = apply(
    gmso_top, forcefield_matchingDict, identify_connections=True,
)