# Introduction to GMSO's base data structures

In this notebook, we will explore the base data strucures, pythonic features and design decisions for the `gmso` package.

Particularly, we will cover the following aspects of the package:

1. The abstract base classes module
2. Extension of abc: Site vs Atom: Example implementing a new site
3. Core Classes:
    - Sites
    - Connections
    - Potentials
    - Topology
    - ForceField
4. Module gmso.lib, gmso.formats, gmso.external
5. XML Schema for GMSO Forefield (focus on unyts(yt.unyt) and functional form (sympy)
6. Limitations, future plans

## Module gmso.abc
This module provides the abstract base classes for all other core data structures used in gmso. Our abstract base classes inherit from [pydantic](https://pydantic-docs.helpmanual.io/)'s `BaseModel` class which provides type hints as well as runtime data validation together with out-of-the-box serialization. The module structure is as follows:
```
gmso/abc 
├── abstract_connection.py 
├── abstract_potential.py 
├── abstract_site.py 
├── gmso_base.py 
```


1. [`gmso_base.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/abc/gmso_base.py): Defines the class `GMSOBase` i.e. The base class for all our other classes that tweaks pydantic's `BaseModel` class to provide an ID based hasing as well as inject's numpydoc style docstrings from the fields of the class.

1. [`abstract_site.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/abc/abstract_site.py): Defines the `Site` class which provides a plain topology site with following features: (a.) name (b.) position (c.) label

1. [`abstract_potential.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/abc/abstract_potential.py): Defines the `AbstractPotential` class which is the base class for our `ParametricPotentials` as well as `PotentialTemplates`.

1. [`abstract_connection.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/abc/abstract_site.py): Defines the `AbstractConnection` class which is the base class for our `Bond`, `Angle`, `Dihedral` and `Improper` classes.

## Example Implementation: Bead

The `Bead` class can now be implemented as a subclass of the abstract `Site` class. We can use the existing attributes from the super class like `name`, `position` etc... and define new attributes and methods for `Bead`. The goal is the consolidation of as many universal characteristics of a generic topology site into a base class (`Site`) and tweak its down-stream usage according to the needs of a particular site(like an `Atom` or a `Bead`). Usage of `Site` to create an `Bead` class is shown below:

In [57]:
import warnings
warnings.simplefilter('ignore')
import unyt as u
from pydantic import Field

from gmso.abc.abstract_site import Site


class Bead(Site):
    __base_doc__ = "Basic Bead class inheriting from the Site Class"
    
my_bead = Bead()
my_bead.name  # When you inherit, the attribute(field) `name` is injected as the class name(Bead in this case)

'Bead'

In [58]:
# Documentation is injected automatically as well
%pdoc Bead

## Core Classes
In `gmso` we define the following core classes. All of our core classes make use of `gmso.abc` to define a particular site, connection or potential. The module `gmso.core`'s structure is as follows:

```
gmso/core/
├── angle.py
├── angle_type.py
├── atom.py
├── atom_type.py
├── bond.py
├── bond_type.py
├── box.py
├── dihedral.py
├── dihedral_type.py
├── element.py
├── forcefield.py
├── improper.py
├── improper_type.py
├── parametric_potential.py
└── topology.py
```


### Sites
1. [`atom.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/atom.py): Defines the class `gmso.core.atom.Atom` which inherits from `gmso.abc.abstract_site.Site` to define an `Atom`.

In [59]:
import json
import unyt as u
class UnytJsonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, u.unyt_array):
            return {
                    'array': obj.ravel().tolist(),
                    'unit': str(obj.units)    
            }
        
        return json.JSONEncoder.default(self, obj)

In [60]:
from gmso.core.atom import Atom
from pprint import pprint
atom1 = Atom(name='atom1', charge=2.0*u.elementary_charge)
atom2 = Atom(name='atom2', charge=1.0*u.elementary_charge)

# Dumping the model as json is easy
pprint(json.dumps(atom.dict(by_alias=True, exclude_unset=True), cls=UnytJsonEncoder, indent=2))

('{\n'
 '  "name": "atom1",\n'
 '  "charge": {\n'
 '    "array": [\n'
 '      3.2043532416e-19\n'
 '    ],\n'
 '    "unit": "C"\n'
 '  }\n'
 '}')



### Connections
1. [`angle.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/angle.py): Defines the class `gmso.core.angle.Angle` which inherits from `gmso.abc.abstract_connection.Connection` to define a 3-partner connection between `Atoms`.

2. [`bond.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/bond.py): Defines the class `gmso.core.bond.Bond` which inherits from `gmso.abc.abstract_connection.Connection` to define a 2-partner connection between `Atoms`.

3. [`dihedral.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/dihedral.py): Defines the class `gmso.core.dihedral.Dihedral` which inherits from `gmso.abc.abstract_connection.Connection` to define a 4-partner connection(dihedral) between `Atoms`.

4. [`improper.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/improper.py): Defines the class `gmso.core.improper.Improper` which inherits from `gmso.abc.abstract_connection.Connection` to define a 4-partner connection(improper) between `Atoms`.

In [61]:
from gmso import Bond
bond = Bond(connection_members=[atom1, atom2])
bond.connection_members

(<Atom, id 139915731034832>, <Atom, id 139915727337344>)


### Potentials
1. [`parametric_potential.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/parametric_potential.py): Defines the class `gmso.core.parametric_potential.ParametricPotential` which inherits from `gmso.abc.abstract_potential.Potential` to define a `ParametricPotential` class which stores the functional form of a Potential as sympy expression and parameters of the potential expression as `unyt_quantities`.

2. [`atom_type.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/atom_type.py): Defines the class `gmso.core.atom_type.AtomType` which inherits from `gmso.core.parametric_potential.ParametricPotential` to describe properties for an `AtomType`.

3. [`bond_type.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/bond_type.py): Defines the class `gmso.core.bond_type.BondType` which inherits from `gmso.core.parametric_potential.ParametricPotential` to describe properties for a `BondType`.

4. [`angle_type.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/angle_type.py): Defines the class `gmso.core.angle_type.AngleType` which inherits from `gmso.core.parametric_potential.ParametricPotential` to describe properties for an `AngleType`.

5. [`dihedral_type.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/dihedral_type.py) and [`improper_type.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/atom_type.py): Defines the classes `gmso.core.atom_type.DihedralType` and `gmso.core.improper_type.ImproperType` which inherit from `gmso.core.parametric_potential.ParametricPotential` which describe properties for a `DihedralType` and `ImproperType` respectively.

In [62]:
from gmso.core.parametric_potential import ParametricPotential

# Handle potential expression using separate Expression class
new_potential = ParametricPotential(
            name='mypotential',
            expression='a*x+b',
            parameters={
                'a': 1.0*u.g,
                'b': 1.0*u.m
            },
            independent_variables={'x'}
)

try:
    new_potential.independent_variables = 'y'
except ValueError as e:
    print(e)

symbol y is not in expression's free symbols. Cannot use an independent variable which doesn't exist in the expression's free symbols {x, a, b}


### Topologies
1. [`topology.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/topology.py): Defines the class `gmso.core.topology.Topology` which is the main data structure responsible for interactions between various potentials, connections and sites to form a chemical topology representation.

In [65]:
from gmso import Topology, Atom, Bond
methane_top = Topology()
c = Atom(name='c')
h1 = Atom(name='h1')
h2 = Atom(name='h2')
h3 = Atom(name='h3')
h4 = Atom(name='h4')
ch1 = Bond(connection_members=[c,h1])
ch2 = Bond(connection_members=[c,h2])
ch3 = Bond(connection_members=[c,h3])
ch4 = Bond(connection_members=[c,h4])
methane_top.add_site(c, update_types=False)
methane_top.add_site(h1, update_types=False)
methane_top.add_site(h2, update_types=False)
methane_top.add_site(h3, update_types=False)
methane_top.add_site(h4, update_types=False)
methane_top.add_connection(ch1, update_types=False)
methane_top.add_connection(ch2, update_types=False)
methane_top.add_connection(ch3, update_types=False)
methane_top.add_connection(ch4, update_types=False)
methane_top.update_topology()
print(methane_top)

<Topology 5 sites, 4 connections, id: 139915723078672>


### Forcefield
1. [`forcefield.py`](https://github.com/mosdef-hub/gmso/blob/3ff3829cb4bc492b41e5e520d26d35c09c5338a4/gmso/core/forcefield.py): Defines the class `ForceField` which defines the in memory representation of the gmso forcefield schema. An Example schema `tip3p.xml` is shown below:

```xml
<?xml version='1.0' encoding='UTF-8'?>
<ForceField name="TIP3P" version="0.0.1"> 
  <!-- Defines metadata as units -->
  <FFMetaData>
    <Units energy="kJ/mol" mass="amu" charge="elementary_charge" distance="nm"/>
  </FFMetaData>
  <!-- Potentials can be grouped together by expression and can have optional names -->
  <AtomTypes expression="4*epsilon * ((sigma/r)**12 - (sigma/r)**6)">
     <!--   Units for parameters are defined in the tags    -->
    <ParametersUnitDef parameter="epsilon" unit="kJ/mol"/>
    <ParametersUnitDef parameter="sigma" unit="nm"/>
    <AtomType name="opls_111" element="O" charge="-0.834" mass="16" definition="O" description="water O">
      <Parameters>
        <Parameter name="epsilon" value="0.636386"/>
        <Parameter name="sigma" value="0.315061"/>
      </Parameters>
    </AtomType>
    <AtomType name="opls_112" element="H" charge="0.417" mass="1.011" definition="H">
      <Parameters>
        <Parameter name="epsilon" value="0.0"/>
        <Parameter name="sigma" value="1.0"/>
      </Parameters>
    </AtomType>
  </AtomTypes>
  <BondTypes expression="0.5 * k * (r-r_eq)**2">
    <ParametersUnitDef parameter="k" unit="kJ/mol/nm**2"/>
    <ParametersUnitDef parameter="r_eq" unit="nm"/>
    <BondType name="BondType-Harmonic-1" type1="opls_111" type2="opls_112">
      <Parameters>
        <Parameter name="k" value="502416.0"/>
        <Parameter name="r_eq" value="0.09572"/>
      </Parameters>
    </BondType>
  </BondTypes>
  <AngleTypes expression="0.5 * k * (theta - theta_eq)**2">
    <ParametersUnitDef parameter="k" unit="kJ/(mol*radian**2)"/>
    <ParametersUnitDef parameter="theta_eq" unit="radian"/>
    <AngleType name="AngleType-Harmonic-1" type1="opls_112" type2="opls_111" type3="opls_112">
      <Parameters>
        <Parameter name="k" value="682.02"/>
        <Parameter name="theta_eq" value="1.824218134"/>
      </Parameters>
    </AngleType>
  </AngleTypes>
</ForceField>
```

In [71]:
from gmso import ForceField 
from gmso.tests.utils import get_path
water = ForceField(get_path('tip3p.xml'))
water.bond_types

{'opls_111~opls_112': <BondType BondType-Harmonic-1, id 139915721491392>}

## Module gmso.lib, gmso.formats, gmso.external
Module gmso.lib defines the a lazy loading module from json files. For example, the `OPLSTorsionPotential` is defined as a json file as shown below. The `PotentialTemplate` class also inherits from `gmso.abc.abstract_potential.Potential` is wrapped by a singleton class and is immutable.

```json
{
  "name": "OPLSTorsionPotential",
  "expression": "0.5 * k0 + 0.5 * k1 * (1 + cos(phi)) + 0.5 * k2 * (1 - cos(2*phi)) + 0.5 * k3 * (1 + cos(3*phi)) + 0.5 * k4 * (1 - cos(4*phi))",
  "independent_variables": "phi"
}

```

In [80]:
from gmso.lib.potential_templates import PotentialTemplateLibrary
opls = PotentialTemplateLibrary()['OPLSTorsionPotential']
opls.dict(by_alias=True)

{'name': 'OPLSTorsionPotential',
 'potential_expression': <PotentialExpression, expression: 0.5*k0 + 0.5*k1*(cos(phi) + 1) + 0.5*k2*(1 - cos(2*phi)) + 0.5*k3*(cos(3*phi) + 1) + 0.5*k4*(1 - cos(4*phi)), 1 independent variables>}

Modules `gmso.formats` and `gmso.external` define file writers to different simulation engines and converters to/from external packages.