# MoSDeF - A Molecular Simulation Design Framework

## Foyer Overview
Foyer is designed to be a force field agnostic tool for performing atom-typing that can output to various file formats. A key aspect of Foyer is the ability to unambiguously define force field usage rules in compact format that is:
- simultaneously human and machine readable
- can be automatically validated
- easy to disseminate 
- easy to evolve and expand.

### File Format
Foyer force fields are defined within an XML file that contains both the 'rules' required for atomtyping as well as the force field parameters within a single file. 

The Foyer XML format is an extension of the [OpenMM forcefield XML format](http://docs.openmm.org/7.0.0/userguide/application.html#creating-force-fields). The only differences reside in the `AtomTypes` section, where several additional attributes are available (which we will examine in a moment) that allow for atomtyping.

The `AtomTypes` section of the Foyer XML is similar to that used for OpenMM forcefield XMLs; however, each `Type` in Foyer XML supports four additional attributes not found in OpenMM:
* `def` - SMARTS string describing the chemical substructure of this atom type, as discussed later (Follow [this link](https://github.com/mosdef-hub/foyer/blob/master/docs/smarts.md) for more on SMARTS-based atomtyping using Foyer.)
* `desc` - Brief description of the atomtype
* `doi` - DOI reference for parameters associated with this atomtype
* `overrides` - One or more atomtypes to 'override', providing precedence to this atomtype (discussed later)


Let's quickly look at a Foyer XML file:


In [None]:
!cat utils/oplsaa-alcohol.xml

## Defining Chemical Context (i.e., "Rules") using SMARTS

Classical force fields are typically able to achieve high accuracy by creating sets of highly specific fitting parameters (i.e., atom types), in which each atom type describes an interaction site within a different chemical context. The chemical context is typically defined by the bonded environment of an interaction site (e.g., the number of bonds and the identity of the bonded neighbors) and may also consider, among other factors, the bonded environment of the neighbors, and/or the specific molecule/structure within which the interaction site is included.  Chemical context effectively defines the 'rules' for when an atomtype should apply. 

Foyer relies upon using [SMARTS](http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html) to define the chemical context of an atom-type. SMARTS is a language for describing chemical structures and substructures; in Foyer we tend focus on substructures (that is, we aren't defining SMARTS as a means to define the entire molecule, but rather to uniquely differentiate each atom-type). 

Consider defining the chemical context of the atoms in a methane molecule ($CH_4$) using SMARTS.  The first thing to note is that there are multiple valid SMARTS that could describe the atom.  
- For example, the simplest way to describe the carbon atom in methane would be ```[C]```.  
- While in some cases such a simple definition may be sufficient, typically we wish to also define the number of bond, e.g., ```[C;X4]```. Note, ```;``` represents ```AND```, thus stating that the atom is a carbon AND it has 4 bonds. 
- Often it is useful to further define the identity of those bonds. In this case, ```[C;X4](H)(H)(H)(H)```. 

### Exercise 1

Modify the "mff_test1.xml" file to define SMARTS strings to define atomtypes for simple linear alkanes, for our fictious forcefield "mosdef_ff". 
Here, the atom types for "mosdef_ff" are:

* `mff_0`: carbon in methane (CH_4)
* `mff_1`: carbon in a terminal methyl group of a linear chain (-CH_3)
* `mff_2`: carbon in a middle methyl group of a linear chain (-CH_2-)
* `mff_3`: a hydrogen atom

After defining the appropriate SMARTS strings, run the test suite to ensure you did it correctly.  These use pytest to compare the output of your forcefield to the atom types listed in known mol2 files for methane, ethane, and propane. 

In [None]:
%cd mff_test1/
!py.test -v --tb=line
%cd ..

### Working with coarse-grained and united atom forcefields

Foyer allows non-atomistic types to be defined within SMARTS, allowing coarse-grained and united atom forcefields to be handled as well. Non-elemental species can easily be defined by pre-pending the name of custom "element" with an underscore.

For example, let us consider defining the SMARTS for a simple homopolymer composed of "A" beads, where terminal groups are of atom type "cg_term" and middle groups are "cg_mid". Here, the key differentiation between atom types is the number of bonds (1 for cg_term, 2 for cg_mid): 

- `cg_term` : `[_A;X1]`
- `cg_mid` : `[_A;X2]`

Similarly, the following lines could be used to describe beads representing $-CH_2-$ groups in a polymer using the TraPPE forcefield. 

`  <Type name="CH2_sp3" class="CH2" element="_CH2" mass="14.02700" 
   def="[_CH2;X2]([_CH3,_CH2])[_CH3,_CH2]" 
   desc="Alkane CH2, united atom" doi="10.1021/jp972543+"/>`
  
Here, the SMARTS definition `[_CH2;X2]([_CH3,_CH2])[_CH3,_CH2]` states that for atom-type `CH2_sp3`

- our bead is `_CH2` with 2 bonded neighbors, i.e., `[_CH2;X2]`
- those neighbors can be either `_CH2` or `_CH3`, since, i.e., `([_CH3,_CH2])[_CH3,_CH2]` (note, `,`  represents an "or" statement).



### Using  `overrides` to set rule precendence

Force fields often contain atom types that ostensively have matching chemical context (at least in term of local bonded environment), but require different parameters.  For example, consider defining the a force field for both alkenes and benzene in a single file for the OPLS force field:

`
<ForceField>
  <AtomTypes>
    <Type name="opls_141" class="CM" element="C" mass="12.01100"\\
	def="[C;X3](C)(C)C" desc="alkene C (R2-C=)"/>
    <Type name="opls_142" class="CM" element="C" mass="12.01100"\\
	def="[C;X3](C)(C)H" desc="alkene C (RH-C=)"/>
    <Type name="opls_144" class="HC" element="H" mass="1.00800"\\
	def="[H][C;X3]" desc="alkene H"/>
    <Type name="opls_145" class="CA" element="C" mass="12.01100"\\
	def="[C;X3;r6]1[C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6]1"\\
	overrides="opls_142"/>
    <Type name="opls_146" class="HA" element="H" mass="1.00800"\\
	def="[H][C;%opls_145]" overrides="opls_144" desc="benzene H"/>
  </AtomTypes>
</ForceField>
`

When atom-typing a benzene molecule, the carbon atoms in the ring will match the SMARTS patterns for both `opls_142` (an alkene carbon) and `opls_145` (a benzene carbon). 


`Foyer` allows rule precedence to be explicitly stated via the use of the `overrides` attribute added to the XML file format. This allows atom type usage rules to be encoded in any order within the file, eliminating incorrectly placed rule order as a source of error.  Providing the `overrides` indicates that if the `opls_145` pattern matches, it will supersede `opls_142`.


`Foyer` iteratively evaluates all rules on all interaction sites in the system, maintaining for each  interaction site a "whitelist" consisting of rules that evaluate to `True` and a "blacklist" consisting  of rules that have been superseded by another rule (i.e., those that appear in the `overrides` attribute). The set difference between the white- and blacklists of an interaction site yields the correct atom type if the force field is implemented correctly. Thus, in this example the difference between the whitelist (containing `opls_142` and `opls_145`) and blacklist (containing only `opls_142`) would be `opls_145`.


## Force field application with `Foyer`. Generating data files.

If we wanted to actually run a simulation of any of these systems we've built with mBuild, we would need to apply a force field and write the necessary data files. mBuild handles all of this through a single `save` command, where we can pass as arguments the name of the force field to apply (which uses `Foyer` under the hood) and the name of the file to create, which will be formatted based on the extension.


First, let's consider how we would write to Gromacs `TOP` and `GRO` formats.

The `GRO` format contains no force field information, so we do not have to pass a force field file to `save` when writing to this format.

We will also specify a `residues` argument. In this case, we are saying to treat every `Compound` with the name `Octanol` as a separate residue.

In [None]:
import mbuild as mb
# copy from mbuild-overview and assert it works!
# octanol or octane

octanol_box.save('system.gro', residues='Octanol', overwrite=True)


Let's first take a quick look at this file.

In [None]:
!cat utils/oplsaa-alcohol.xml

With this force field XML file, Foyer will use the SMARTS strings to atom-type our system and will then apply the proper force field parameters. We'll execute the `save` method again, this time passing through our force field file and changing the desired file format from GRO to TOP. Additionally, as OPLS uses geometric mixing rules as opposed to Lorentz-Berthelot, we can feed this to `save` as well.

**Note:** The warning message about unparameterized impropers can be safely ignored, as OPLS does not include any impropers for our system. By default, Foyer will warn the user if improper parameters are not specified for all possible impropers and will exit with an error if bond, angle, or dihedral parameters are not specified for all possible bonds, angles, dihedrals. (This behavior may be overridden if desired.)

In [None]:
octanol_box.save('system.top', forcefield_files='utils/oplsaa-alcohol.xml', residues='Octanol',
                 combining_rule='geometric', overwrite=True)

Finally, just to prove that these files were written correctly, we can take a quick peek.

In [None]:
!cat system.top

This concludes the general MoSDeF overview. For more in-depth tutorials into mBuild and Foyer, refer to the [mosdef_tutorials repository](https://github.com/mosdef-hub/mosdef_tutorials) or use our [Binder link](https://mybinder.org/v2/gh/mosdef-hub/mosdef_tutorials/master).