This notebook will walk through the process of creating a new collection using QCSubmit and running MM clacultions such as optimizations and torsiondrives using QCEngine and QCFractal.

Here we will be doing this inside of a snowflake server which is deleted after the notebook is shut down but the same can be done for a persistent local instance of qcarchive with many workers. For information on how to set this up see [here](http://docs.qcarchive.molssi.org/projects/qcfractal/en/latest/setup_server.html). 

## You will need:
- [The openforcefield toolkit master branch from github](https://github.com/openforcefield/openforcefield)
- [The qcsubmit master branch from github](https://github.com/openforcefield/qcsubmit)
- [The QCEngine openmm refactor from github](https://github.com/jthorton/QCEngine/tree/openmm_cmiles)
- QCFractal
- [OpenMMforcefields](https://github.com/openmm/openmmforcefields)

# Plan:

- First we will download a small optimisation dataset from the public QCArchive 
- Then create a new optimization dataset using qcsubmit from the final geometries of the optimizations
- Then we will compute the dataset using openff and gaff with the new branch of qcengine

### Note
During testing three optimizations failed due to missing BCCs

In [1]:
# create a link to the public qcarchive 
from qcportal import FractalClient
client = FractalClient()

We have chossen the `OpenFF Gen 2 Opt Set 2 Coverage` collection as it only contains 373 records, so lets collect it using qcsubmit. Here we can see that only 359 optimizations are complete without error and are pulled from the server. For speed we have also only pulled the final molecule in each optimization.

In [2]:
from qcsubmit.results import OptimizationCollectionResult

%time result = OptimizationCollectionResult.from_server(client, "default", "OpenFF Gen 2 Opt Set 2 Coverage", final_molecule_only=True)

requested molecules 359
requested results 359
CPU times: user 769 ms, sys: 161 ms, total: 930 ms
Wall time: 17.1 s


Now lets make a new Optimization dataset ready to computed using MM. Note we use the `mm_extras` flag to make sure that the extras on each molecule are included, which are needed for acurate typing of the molecule. 

In [3]:

dataset = result.create_optimization_dataset(dataset_name="mm optimizations", description="testing mm optimizations", tagline="mm optimizations", mm_extras=True)

Now lets set up to run using parsley and OpenMM and submit the dataset to a local qcarchive instance that we can spin up here.

In [4]:
from qcfractal import FractalSnowflakeHandler

server = FractalSnowflakeHandler()

# Obtain a FractalClient to the server
client = FractalClient(server)
client

Lets set the program, method, basis and rename the spec inline with the what we want to compute

In [5]:
dataset.program = "openmm"
dataset.method = "openff_unconstrained-1.0.0"
dataset.basis = "smirnoff"
dataset.spec_name = "parsley"
dataset.spec_description = "default parsley spec"
dataset.metadata.long_description_url = "https://www.test.org/"

In [6]:
# now submit the dataset to the local fractal instance
dataset.submit(client=client, await_result=False)

359

Now that the dataset has been submited we can pull it from the database and query the status.

In [7]:
client.list_collections()

Unnamed: 0_level_0,Unnamed: 1_level_0,tagline
collection,name,Unnamed: 2_level_1
OptimizationDataset,mm optimizations,mm optimizations


In [73]:
opt_ds = client.get_collection("OptimizationDataset", "mm optimizations")

This can be buggy and may need runnig twice if you get an error the first time.
Note the optimizations should have now started but can take some time using only one worker, to spin up more use a local qcarchive instance.

In [75]:
opt_ds.status("parsley")

Unnamed: 0,parsley
COMPLETE,356
ERROR,3


In [11]:
record= opt_ds.get_record("c1cc1nc(=o)c[n@@]2c[c@h](co2)o-0", "parsley")

In [18]:
# we can load the initial and final molecule to inspect the difference in geometry
record.get_initial_molecule()

NGLWidget()

In [19]:
record.get_final_molecule()

NGLWidget()

In [25]:
dataset.dataset["c1cc1nc(=o)c[n@@]2c[c@h](co2)o-0"]

DatasetEntry(index='c1cc1nc(=o)c[n@@]2c[c@h](co2)o-0', initial_molecules=[Molecule(name='C8H14N2O3', formula='C8H14N2O3', hash='71c887a')], attributes={'canonical_explicit_hydrogen_smiles': '[H]C1(C(C1([H])N([H])C(=O)C([H])([H])N2C(C(C(O2)([H])[H])([H])O[H])([H])[H])([H])[H])[H]', 'canonical_isomeric_explicit_hydrogen_mapped_smiles': '[H:23][C@:7]1([C:4]([N@:9]([O:12][C:5]1([H:20])[H:21])[C:8]([H:24])([H:25])[C:1](=[O:11])[N:10]([H:26])[C:6]2([C:2]([C:3]2([H:16])[H:17])([H:14])[H:15])[H:22])([H:18])[H:19])[O:13][H:27]', 'canonical_isomeric_explicit_hydrogen_smiles': '[H][C@]1(C([N@](OC1([H])[H])C([H])([H])C(=O)N([H])C2(C(C2([H])[H])([H])[H])[H])([H])[H])O[H]', 'canonical_isomeric_smiles': 'C1CC1NC(=O)C[N@@]2C[C@H](CO2)O', 'canonical_smiles': 'C1CC1NC(=O)CN2CC(CO2)O', 'inchi_key': 'VODRAFCALIZTJH-SSDOTTSWSA-N', 'molecular_formula': 'C8H14N2O3', 'provenance': 'cmiles_0+unknown_openeye_2019.Apr.2', 'standard_inchi': 'InChI=1S/C8H14N2O3/c11-7-3-10(13-5-7)4-8(12)9-6-1-2-6/h6-7,11H,1-5H2,(H,

During testing some molecules failed the optiztion stage using parsley the molecules can be caught here.

In [69]:
for index in opt_ds.df.index:
    if opt_ds.df.loc[index].parsley.status.value == "ERROR":
        print(index)

c[s@](=[n-])(=o)nc1cc1-0
c[s@](=[n-])(=o)nc1cc1-1
cc1(cop(=s)(oc1)[s-])c-0


In [67]:
record = opt_ds.get_record("cc1(cop(=s)(oc1)[s-])c-0", "parsley")

In [68]:
record.get_error()

