<a href="https://colab.research.google.com/github/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/slitpore_workflow/Slitpore-Workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Carbon Slitpore Workflow**
---
<figure>
  <center>
  <img src="https://github.com/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/images/slitpore_sims.png?raw=1" alt="Water_in_slitpore" style="width:80%">
  <figcaption>Water filling a slitpore from <a href="https://cassandra-mc.readthedocs.io/en/latest/theory/theory.html#grand-canonical-monte-carlo">GCMC Cassandra</a> simulations.</figcaption>
  </center>
</figure>


## Overview
"Porous  carbon  materials  are  used  for  separation, purification, and catalysis purposes. While the adsorption and phase behavior of nonpolar fluids in carbon pores has  been  studied  extensively,our  understanding  regarding adsorption of water in carbonaceous materials is still rudimentary. Nevertheless, the structure and the thermodynamic  properties  of  water  confined  in  hydrophobic  regions  are  of  importance  in  many  scientific disciplines such as chemistry, geology, nanotechnology, and biology. Water adsorption in hydrophobic materials is typically characterized by negligible adsorption at low relative pressures, sudden and complete pore filling by a capillary-condensation mechanism, and large adsorption/ desorption hysteresis loops."

    - Striolo, A.; Chialvo, A. A.; Cummings, P. T.; Gubbins, K. E. Water Adsorption in Carbon-Slit Nanopores. Langmuir, 2003, 19 (20), 8583–8591.

The above study was recreated in 2020 in a work by Cummings et al. using open-source moleuclar modeling software with focus on the Molecular Simulation Design Framework (MoSDeF).

    - Peter Cummings, Clare McCabe, Christopher Iacovella, et al. Open-Source Molecular Modeling Software in Chemical Engineering Focusing on the Molecular Simulation Design Framework. Authorea. November 30, 2020.


## Learning Objectives
This notebook provides interactivre examples that will assist learners in using MoSDeF tools to:
1. Create a molecule in different method using `mBuild`
2. Load in a force field fromr XML and inspect the ForceField object with `GMSO`
3. Parameterize a system with a force field and inspect the parameterized object
4. Save out the topology and use it to run a Cassandra MC simulation using `mosdef_cassandra`

## Tutorial Contents
0. Set up environment on Google Colab
1. Construct System with mBuild
    1. Exercise 1a - Create a molecule with mbuild
    2. Exercise 1b - Pack a box of solvent
2. Load a ForceField
    1. Exercise 2 - Load and inspect a force field from XML
3. Parameterization
    1. Exercise 3 - Parameterize a compound/topology and summarize the parameterized object
4. Save out to Cassandra files
    1. Exercise 4 - Save a `.mcf` file from a typed Topology
5. Set up Cassandra input file and run simulation (optional)
## Software stack setup
After running the cell below the kernel will restart -- This is necessary for conda dependencies, but you'll need to wait for that kernel restart before running the second cell.


## Working with Google Colab
There are two types of output in these Colab notebooks that can be a little tricky:

1. If the output is very long, for example from the mamba command in the second cell, scrolling past the output can feel onerous. In this case, scrolling up and down in the narrow grey area between the sidebar menu and the cells can help you navigate.

2. If the output is a visualization of a molecule or simulation configuration, scrolling up or down will zoom in or out if the cursor is over the visualization. In these cases, take some care to scroll outside of the visualization.

3. To run a cell, either click the run button (right facing triangle) or hit `shift + enter`

## __0. Set up environment on Google Colab__
----

In [None]:
# Note: Run this cell first and by itself.
# The kernel will be restarted after this step
# There might be an error pops up stating the session crashed
# for an unknown reason, but that is expected.
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
import condacolab
condacolab.check()

!conda install mamba

!mamba install anaconda-client -n base
!git clone https://github.com/mosdef-hub/CECAM-MoSDeF-Workshop
!mamba env update -n base -f CECAM-MoSDeF-Workshop/environment.yml
!pip install --upgrade ipykernel

%cd CECAM-MoSDeF-Workshop/slitpore_workflow

## __0. Import packages__
---

In [None]:
import warnings
warnings.filterwarnings("ignore")

# Import Libraries
import os
import mbuild as mb
import gmso
from porebuilder import GraphenePore

## __1. Construct System with mBuild__
----
- The chemical system can be constructied with mBuild, the hierarchical molecular constructor of the MoSDeF software suite. The single, most important, data structure of mbuild is `Compound`:
    - `Compound` can act as a particle at the lowest level, or a container containing other `Compound` (e.g., residue, molecule, etc.)
    - This set up allow for the construction of smaller `Compound`s (e.g., molecules) individually, and combine them into one bigger system, i.e., by adding them both to a new `Compound` container.

- The library offers several way to load or create molecules/systems, e.g., loading from common file format such as .xyz, .mol2, .pdb, from a SMILES string, using internal recipes, or user-construct recipes.
- Below, we demonstrate two methods of creating a molecule, i.e., using a SMILES string to create a water molecule, and using an user-recipe to build a carbon slitpore.

In [None]:
# load molecules from their daylight SMILES string
# https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
water = mb.load("O", smiles=True)

"""Visualization utilities"""
print(water.print_hierarchy(show_tree=False)) # print_hierarchy() in normal colab

In [None]:
water.visualize() # visualize molecule atoms and bonds

### Exercise 1a - Create and visualize a system with mBuild
1. Create and visualize a molecule of choice with mbuild using SMILES string
    - Tips: Google molecule name + SMILES usually return the input you need
    - Note: you will need to set the options `smiles=True` in `mb.load()`
    - Tips: you can look at how we create the water molecule 2 cells
1. mBuid also supports loading a molecule/system from various file format
    - Download a pdb file from https://files.rcsb.org/view/1OIL.pdb using wget
    - Load in the file and visualize with `mbuild`

In [None]:
# Exercise 1a.1
compound = mb.???(???, smiles=???)
compound.???()

In [None]:
# Exercise 1a.2
!wget -O 1OIL.pdb https://files.rcsb.org/view/1OIL.pdb
protein = mb.???("1OIL.pdb") # also can load .mol2, .xyz, .hoomdxml, .gro,
protein.???

### <font color="red"><b>Exercise 1a Example Answer</b></font>

<details>
    <summary>Click once on to hide/unhide the answer!</summary>
    
        # Loading from a SMILES string
         
        caffeine = mb.load("CN1C=NC2=C1C(=O)N(C(=O)N2C)C", smiles=True)
        caffeine.visualize()


        # Loading from a pdb file
    
    
        !wget -O 1OIL.pdb https://files.rcsb.org/view/1OIL.pdb
        protein = mb.load("1OIL.pdb")
        protein.visualize()
        
</details>

- Create a compound from a recipe gives us more control over the structure, such as bond length and angle. This level of details would be important some engine.
    - For example: Most Monte-Carlo simulation engines do not handle harmonic bond, hence, it's important for the input structure to have the correct bond length.
- mBuild comes with a few core recipes located in `mbuild.lib`, but user can also subclass from `mbuild.Compound` to build up their own structure.

In [None]:
# Load structure from recipes - Water, delivered with mBuild
from mbuild.lib.molecules.water import WaterSPC
water = WaterSPC()
water_box = mb.fill_box(water, box=[5,5,5], n_compounds=100)

"""Visualization utilities"""
print(water_box.print_hierarchy(show_tree=False))  # print_hierarchy() in normal colab

In [None]:
water_box.visualize()

In [None]:
# Load structure from recipes - Graphene, custom built
graphene = GraphenePore(pore_length=4,
                        pore_depth=4,
                        n_sheets=1,
                        pore_width=2,
                        slit_pore_dim=1)
graphene.translate(-graphene.center)
# Try changing the n_sheets to form more layers
"""Visualization utility"""
graphene.visualize()

### Exercise 1b - Fill box and solvate
- mBuild utilize PACKMOL as the backend to perform packing molecules, solvating a solute. These functionalities are stored under `mbuild.packing`, with the two most frequently used methods being `packing.fill_box` and `packing.solvate`. Here, we will test out the `fill_box` method.
    - Create a packed box of ethanol following the procedure
        * Create an ethanol molecule using SMILES string
        * Pack a box of ethanol using the `packing.fill_box`, you will need to provide
            * `compound`: The molecule that is to be packed (expecting type `mb.Compound`)
            * `n_compounds`: The number of molecule (expecting type `int`)
            * `box`: The size of the box (define in form of [x, y, z], in all in nm)
        * Visualize the packed box

In [None]:
# Exercise 1b.1
ethanol = mb.load(???, smiles=???)
ethanol_box = mb.???(compound=???,
                     n_compounds=???,
                     box=???)
ethanol_box.visualize()

### <font color="red"><b>Exercise 1b Example Answer</b></font>

<details>
    <summary>Click once on to hide/unhide the answer!</summary>
    
        ethanol = mb.load("CCO", smiles=True)
        ethanol_box = mb.fill_box(compound=ethanol,
                                  n_compounds=200,
                                  box=[3, 3, 3].
                    )
        ethanol_box.visualize()
</details>

## __2. Load A ForceField__
----

- In the MoSDeF ecosystem, we stored forcefield is stored in XML format, which contains information about version, combining rule, atom types, connection types and associated doi. Each atom type also includes a `def`, which stores the SMARTS definition, and `doi`, which store the original paper that the parameters are sourced from.
- Currently, there are two XML formats supported by MoSDeF tools, one of which is an extended version of OpenMM XML, while the other is newly developed to include more information that include additional information that we believe would be beneficial for performing TRUE research.

In [None]:
carbon_forcefield = gmso.ForceField("../forcefields/carbon.xml")
carbon_forcefield

In [None]:
"""Basic attributes of each atom type"""
for name, atype in carbon_forcefield.atom_types.items():
    print(atype)
    print("SMARTS definition:", atype.definition)
    print("Potential expression")
    display(atype.expression)
    print(atype.parameters)

In [None]:
spce_forcefield = gmso.ForceField("../forcefields/spce.xml")
spce_forcefield

In [None]:
"""Basic attributes of each connection type"""
for name, btype in spce_forcefield.bond_types.items():
    print(btype)
    print("Potential expression")
    display(btype.expression)
    print(btype.parameters)

### Exercise 2 - Load a force field and inspect some of its attributes
1. Load the "OPLS" forcefield at `"../forcefields/oplsaa.xml"` to an object named `oplsaa`
2. Inspect the forcefield
    - Try calling `oplsaa.__dict__` and see all attributes that a force field has
    - What is the comining rule and scaling factor of this forcefield
3. Inspect some attributes of an atomtype
    - Inspect the potential expression
    - Notable attributes

In [None]:
# Start your exercise here
oplsaa = gmso.???()

### <font color="red"><b>Exercise 2 Example Answer</b></font>

<details>
    <summary>Click once on to hide/unhide the answer!</summary>
    
        oplsaa = gmso.ForceField("../forcefields/oplsaa.xml")
        print(oplsaa.__dict__)
</details>

## __3. Parameterization__
----
- MoSDeF's backend data structure supports automatic atom typing and parameterization (mapping atom types and connection types stored in a loaded forcefield to a GMSO structure).
- This is done internally using Foyer, which performs graph matching between the molecule bond graph (of the GMSO Topology object) to the atom type SMARTS string. The algorithm for the processed is outlined in this [paper](https://www.journals.elsevier.com/computational-materials-science).
- The parameterization step created a typed Topology, which would be ready to be saved out to various file formats, ready to be taken in by corresponding simulation codes.

In [None]:
from gmso.parameterization import apply

graphene_top = graphene.to_gmso()
single_water_top = water.to_gmso()
water_top = water_box.to_gmso()

graphene_ptop = apply(graphene_top, carbon_forcefield, identify_connections=True)
single_water_ptop = apply(single_water_top, spce_forcefield, identify_connections=True)
water_ptop = apply(water_top, spce_forcefield, identify_connections=True)

In [None]:
# Iterable attributes
# graphene_top.sites
# graphene_top.bonds
# graphene_top.angles
# graphene_top.dihedrals
# graphene_top.impropers

display(graphene_ptop.sites[0].atom_type.expression)
print(f"{graphene_ptop.sites[0].atom_type.parameters}")

In [None]:
"""Utility to output system as Dataframe"""
single_water_ptop.to_dataframe(site_attrs=["atom_type.parameters"])

In [None]:
"""Utility to output system as Dataframe"""
graphene_ptop.to_dataframe(site_attrs=["atom_type.parameters"])

### Exercise 3 - Parametrized your solvent
1. Use the OPLS to try parameterize the molecule you created in the above exercise (it's may or may not be successful depends on how exotic the molecule you created)
    - Start by converting your compound to a GMSO `Topology`
    - Use the `apply` method to perform the parameterization.
    - Summarize the all the atomtypes in a dataframe
2. Open the docstring for `Topology.to_dataframe`
    - See what you can modify the output of the dataframe to get the information you need.
    

In [None]:
### Start your exercise here
topology = compound.to_gmso() # smiles string compound generated above
apply(???,
      ???,
      identify_connections=True)

topology.???(site_attrs=["atom_type.parameters"])

### <font color="red"><b>Exercise 3 Example Answer</b></font>

<details>
    <summary>Click once on to hide/unhide the answer!</summary>
    
        # Parameterize the created compound with the OPLS-AA force field

        topology = compound.to_gmso()
        apply(topology,
              oplsaa,
             identify_connection=True)

        topology.to_dataframe(site_attrs=["atom_type.parameters"])

        # Print out the docstring of Topology.to_dataframe
        help(Topology.to_dataframe) # Run this is a new cell

</details>

## __4. Save out to Cassandra files__
----
- The GMSO data structure provide direct support to multiple simulation engines, including GROMACS, LAMMPS, HOOMD-blue, GOMC and Cassandra. This includes the ability to directly save the typed Topology to molecular file input which can be used directly by the corresponding engines.
- In this example, we are writing out the file into Cassandra file format (`.mcf` or molecular connectivity file).

In [None]:
# Saving out file and inspect the output
graphene_ptop.save("graphene.mcf", overwrite=True)
!cat graphene.mcf

### Exercise 4 - Save out the parameterized Water
Use similar syntax as above, save out the `.mcf` for the parameterized water (the `water_ptop` object created above) and print out the file (using `!cat`)
    

In [None]:
### Start your exercise here
water_ptop.???(???, overwrite=???)
!cat ???

### <font color="red"><b>Exercise 4 Example Answer</b></font>

<details>
    <summary>Click once on to hide/unhide the answer!</summary>
    
        water_ptop.save("water.mcf", overwrite=True)
        !cat water.mcf

</details>

## __5. Set up Cassandra input file and run simulation (Optional)__
----
- In this step, we will attempt to use `mosdef_cassandra`, developed by Ryan DeFever et al., from the Maginn Group. The library provide utility to interface between MoSDeF core software stack to Cassandra. The library also provides a Python interface to define run-time parameters for the Monte Carlo simulation. Both `mosdef_cassandra` and `Cassandra` are installable through the `conda-forge` channel (for Linux and MacOS with Intel architecture).
- Here, we will run a short equilibration simulation just to demonstrate that we are able to write out are syntactically correct and are ready to be used as input for simulation engine. For the complete workflow, please refer to the original paper and its supplementa Github repository:
    - Peter Cummings, Clare McCabe, Christopher Iacovella, et al. Open-Source Molecular Modeling Software in Chemical Engineering Focusing on the Molecular Simulation Design Framework. Authorea. November 30, 2020.
    - https://github.com/mosdef-hub/mosdef_slitpore

In [None]:
import mosdef_cassandra as mc
import unyt as u

# set variables
n_steps = 10000
temperature = 300 * u.K
mu = -36.0 * u.kJ / u.mol

# Create box and species list
box_list = [graphene]
species_list = [graphene_ptop,
                single_water_ptop]

# Specify mols at start of the simulation
mols_in_boxes = [[graphene_ptop.n_sites, 0]]
mols_to_add = [[0, 100]]

# Create MC system
system = mc.System(box_list,
                   species_list,
                   mols_in_boxes=mols_in_boxes,
                   mols_to_add=mols_to_add)
moves = mc.MoveSet("gcmc", species_list)

# Specify the restricted insertion
restricted_type = [[None, "slitpore"]]
restricted_value = [[None, 0.5 * 4.0 * u.nm]]
moves.add_restricted_insertions(
    species_list, restricted_type, restricted_value
)


# Set thermodynamic properties
thermo_props = [
    "energy_total",
    "energy_intervdw",
    "energy_interq",
    "nmols",
]

default_args = {
    "run_name" : "gcmc",
    "cutoff_style": "cut",
    "charge_style": "ewald",
    "rcut_min": 0.5 * u.angstrom,
    "vdw_cutoff": 9.0 * u.angstrom,
    "charge_cutoff": 9.0 * u.angstrom,
    "properties": thermo_props,
    "angle_style": ["fixed", "fixed"],
    "coord_freq": 1000,
    "prop_freq": 100,
}

custom_args = {**default_args}

mc.run(
    system=system,
    moveset=moves,
    run_type="equilibration",
    run_length=n_steps,
    temperature=temperature,
    chemical_potentials=["none", mu],
    **custom_args,
)

In [None]:
# Viewing the output file
!cat gcmc.out.log

In [None]:
# Visualization of the final frame
lines = !grep -n MC_STEP gcmc.out.xyz | tail -n 1 | awk -F':' '{{print $$1}}'
total= !cat gcmc.out.xyz|wc -l
last = int(total[0])-int(lines[0])+2
!tail -n $last gcmc.out.xyz > viz2.xyz
system = mb.load("viz2.xyz")
system.visualize()

- This will be what the outcome systems look like (if we let the MC simulation runs it course).
![snapshot](https://github.com/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/images/slitpore_sims.png?raw=1)
- The output can be used as initial configuration for following Molecular Dynamics (MD) simulations using _GROMACS_, _LAMMPS_, or _HOOMD-blue_, all of which is supported by MoSDeF.



# Recap
----
- In summary, here we have used `mBuild` to initialize a graphene slitpore and SPC/E water, then used `GMSO` to load in force field, and apply the relevant parameters to the created systems. Finally, we used `mosdef_cassandra`, the Cassandra's Python API, to run a short GC-MC simulation to insert the said water into the slipore. The output configuration can be used for other simulation, i.e., NVT with MD to study intra-pore water dynamics, or desorption with MC.
- The developers of Cassandra from the Maginn group will have a tutorial later today if you are interested about this simulation engine.