# Using QCArchive with the OpenFF Toolkit

Here we show how to create OpenFF molecules safely from data in the QCArchive using the CMILES entries. This transformation relies on the `"canonical_isomeric_explicit_hydrogen_mapped_smiles"`.

First load up the client you wish to connect to, in this case, we use the public instance.

In [1]:
import qcelemental
import qcportal

from openff.toolkit import Molecule

client = qcportal.PortalClient("https://api.qcarchive.molssi.org:443")

client.list_datasets()

[{'id': 35,
  'dataset_type': 'torsiondrive',
  'dataset_name': 'OpenFF Fragmenter Phenyl Benchmark',
  'record_count': 454},
 {'id': 36,
  'dataset_type': 'torsiondrive',
  'dataset_name': 'OpenFF Group1 Torsions',
  'record_count': 820},
 {'id': 41,
  'dataset_type': 'optimization',
  'dataset_name': 'OpenFF Optimization Set 1',
  'record_count': 937},
 {'id': 42,
  'dataset_type': 'torsiondrive',
  'dataset_name': 'Fragment Stability Benchmark',
  'record_count': 86},
 {'id': 43,
  'dataset_type': 'optimization',
  'dataset_name': 'SMIRNOFF Coverage Set 1',
  'record_count': 1132},
 {'id': 45,
  'dataset_type': 'optimization',
  'dataset_name': 'OpenFF VEHICLe Set 1',
  'record_count': 25500},
 {'id': 48,
  'dataset_type': 'torsiondrive',
  'dataset_name': 'SMIRNOFF Coverage Torsion Set 1',
  'record_count': 585},
 {'id': 49,
  'dataset_type': 'optimization',
  'dataset_name': 'OpenFF NCI250K Boron 1',
  'record_count': 189},
 {'id': 50,
  'dataset_type': 'optimization',
  'dataset_

Data in the QCArchive is organized into [datasets](https://molssi.github.io/QCFractal/user_guide/datasets.html#using-datasets), which are semi-Now let us grab a molecule from an optimization dataset

In [2]:
dataset = client.get_dataset(
    dataset_type="optimization",
    dataset_name="Kinase Inhibitors: WBO Distributions",
)

Take an arbitrary entry from the collection.

In [3]:
entry = dataset.get_entry(entry_name=dataset.entry_names[-1])

We can view the entry in detail by looking at the dictionary representation.

In [4]:
entry.dict()

{'name': 'c1cc(c(cc1f)[c@h]2ccc[n@@]2c3ccn4c(n3)c(cn4)nc(=o)[n@@]5cc[c@@h](c5)o)f-77',
 'initial_molecule': {'schema_name': 'qcschema_molecule',
  'schema_version': 2,
  'validated': True,
  'symbols': array(['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
         'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'N', 'N', 'N', 'N',
         'N', 'O', 'O', 'F', 'F', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
         'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
         'H'], dtype='<U1'),
  'geometry': array([[ -3.58885944,  -7.23090986,  -7.2551731 ],
         [ -1.59367236,  -5.53173037,  -7.53744682],
         [  6.1103569 , -11.50984081,  -1.98214119],
         [ -0.63241771, -10.70971649,  -7.29841514],
         [  2.96313464,  -7.14609281,   4.48874654],
         [  5.68603826, -11.62984009,   0.50333312],
         [  1.36285019,  -9.01051898,  -7.58079293],
         [  2.83086228,  -5.82940284,   2.19505472],
         [ -3.1082523 ,  -9.81994888,  

Now we can make a molecule using a few different input options.

In [5]:
# first make a molecule using this record object
molecule_from_entry = Molecule.from_qcschema(entry)

# we could have also used the dictionary representation of the object
molecule_from_dict = Molecule.from_qcschema(entry.dict())

assert molecule_from_entry == molecule_from_dict

molecule = molecule_from_entry

In [6]:
# we check that the molecule has been ordered to match the ordering used in the data base
# by printing out the atomic numbers of both objects in order

# first lets get the initial molecule from the database
# note that this molecule uses an object model from QCArchive, _not_ the toolkit
initial_molecule: qcelemental.models.Molecule = [
    *client.query_molecules(molecule_hash=entry.initial_molecule.get_hash())
][0]

for atoms in zip(molecule.atoms, initial_molecule.atomic_numbers):
    assert atoms[0].atomic_number == atoms[1]

# can compare other things, too
print(molecule.to_hill_formula(), initial_molecule.get_molecular_formula())

# QCArchive molecules don't store all information the toolkit needs, like stereochemistry;
# that's why there is a Molecule.from_qcschema() method  at all

C21H22F2N6O2 C21F2H22N6O2


In [7]:
# we can also compare the graph representations of the molecules to make sure they are in the same order
import networkx as nx

# make a graph of the initial molecule using newtorkx and the data in the record
initial_network = nx.Graph()
for index, atomic_number in enumerate(initial_molecule.atomic_numbers):
    initial_network.add_node(index, atomic_number=atomic_number)

for bond in initial_molecule.connectivity:
    initial_network.add_edge(*bond[:2])
# now we can use the new isomorphic check to get the atom mapping
isomorphic, atom_map = Molecule.are_isomorphic(
    molecule,
    initial_network,
    return_atom_map=True,
    aromatic_matching=False,
    formal_charge_matching=False,
    bond_order_matching=False,
    bond_stereochemistry_matching=False,
    atom_stereochemistry_matching=False,
)

# we can check if the graph was found to be isomorphic and whether or not the
# atom mappings are in the same order
assert isomorphic

for index1, index2 in atom_map.items():
    assert index1 == index2

Now that we have seen how to make the molecule, lets look at also getting the geometry as currently we have none.

Entries store much information about the molecule, but not the geometry. This has to be pulled down from the server. Since we didn't pass the `client` object to `Molecule.from_qcschema` earlier, the `Molecule` objects we created earlier don't have conformers. (QCArchive's used of "geometry" can be thought of as interchangeable with the toolkit's use of "conformer".)

In [8]:
# entries don't store geometry, so there are no conformers yet
assert molecule.n_conformers == 0

# if we also want the input geometry for the molecule, we just need to pass the relavent client instance
molecule_with_conformer = Molecule.from_qcschema(entry, client=client)

# check that there is a conformer
assert molecule_with_conformer.n_conformers == 1

In [9]:
# OpenFF Toolkit `Molecule` objects can be converted back into QCArchive molecules, as long as there are conformer(s)
from openff.toolkit.utils.exceptions import InvalidConformerError

try:
    qc_molecule = molecule.to_qcschema()
except InvalidConformerError:
    qc_molecule = molecule_with_conformer.to_qcschema()

qc_molecule



NGLWidget()

This transformation unlocks functionality of [QCEngine](https://molssi.github.io/QCEngine/) (computing energies, gradients, hessians, etc. with a variety of different methods).

Here we will try and compute the energy using RDKit (only run this cell if QCEngine is installed.) 

In [10]:
import qcengine

# set up the RDKit task
rdkit_task = {
    "schema_name": "qcschema_input",
    "schema_version": 2,
    "molecule": qc_molecule,
    "driver": "energy",
    "model": {"method": "uff", "basis": None},
    "keywords": {"scf_type": "df"},
}

# now lets compute the energy using qcengine and RDKit and print the result
result = qcengine.compute(rdkit_task, "rdkit")

In [11]:
# note the result is in QC units of hartrees
print(result.return_result)

0.05583595251732078
