# Using QCArchive with the OpenFF Toolkit

Here we show how to create OpenFF molecules safely from data in the QCArchive using the CMILES entries. This transformation relies on the `"canonical_isomeric_explicit_hydrogen_mapped_smiles"`.

First load up the client you wish to connect to, in this case, we use the public instance.

In [1]:
import qcportal

from openff.toolkit import Molecule

client = qcportal.PortalClient("https://api.qcarchive.molssi.org:443")

print(client.list_datasets_table())

  id  type              name
----  ----------------  ----------------------------------------------------------------------------------
  35  torsiondrive      OpenFF Fragmenter Phenyl Benchmark
  36  torsiondrive      OpenFF Group1 Torsions
  41  optimization      OpenFF Optimization Set 1
  42  torsiondrive      Fragment Stability Benchmark
  43  optimization      SMIRNOFF Coverage Set 1
  45  optimization      OpenFF VEHICLe Set 1
  48  torsiondrive      SMIRNOFF Coverage Torsion Set 1
  49  optimization      OpenFF NCI250K Boron 1
  50  optimization      OpenFF Discrepancy Benchmark 1
  57  torsiondrive      OpenFF Substituted Phenyl Set 1
  68  optimization      Pfizer Discrepancy Optimization Dataset 1
  69  optimization      FDA Optimization Dataset 1
  70  torsiondrive      Pfizer Discrepancy Torsion Dataset 1
  71  gridoptimization  OpenFF Trivalent Nitrogen Set 1 (deprecated)
 148  reaction          AlkIsod14
 149  reaction          BHPERI26
 151  singlepoint       OpenFF Opt

Data in the QCArchive is organized into [datasets](https://molssi.github.io/QCFractal/user_guide/datasets.html#using-datasets). Let's grab a molecule from an optimization dataset

In [2]:
dataset = client.get_dataset(
    dataset_type="optimization",
    dataset_name="Kinase Inhibitors: WBO Distributions",
)

Take an arbitrary entry from the collection.

In [3]:
entry = dataset.get_entry(entry_name=dataset.entry_names[-1])

We can view the entry in detail by looking at the dictionary representation.

In [4]:
entry.dict()

{'name': 'c1cc(c(cc1f)[c@h]2ccc[n@@]2c3ccn4c(n3)c(cn4)nc(=o)[n@@]5cc[c@@h](c5)o)f-77',
 'initial_molecule': {'schema_name': 'qcschema_molecule',
  'schema_version': 2,
  'validated': True,
  'symbols': array(['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
         'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'N', 'N', 'N', 'N',
         'N', 'O', 'O', 'F', 'F', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
         'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
         'H'], dtype='<U1'),
  'geometry': array([[ -3.58885944,  -7.23090986,  -7.2551731 ],
         [ -1.59367236,  -5.53173037,  -7.53744682],
         [  6.1103569 , -11.50984081,  -1.98214119],
         [ -0.63241771, -10.70971649,  -7.29841514],
         [  2.96313464,  -7.14609281,   4.48874654],
         [  5.68603826, -11.62984009,   0.50333312],
         [  1.36285019,  -9.01051898,  -7.58079293],
         [  2.83086228,  -5.82940284,   2.19505472],
         [ -3.1082523 ,  -9.81994888,  

Now we can make a molecule using a few different input options.

In [5]:
# first make a molecule using this record object
molecule_from_entry = Molecule.from_qcschema(entry)

# we could have also used the dictionary representation of the object
molecule_from_dict = Molecule.from_qcschema(entry.dict())

assert molecule_from_entry == molecule_from_dict

molecule = molecule_from_entry

In [6]:
# first let's get the initial molecule from the database
initial_molecule = client.get_molecules(entry.initial_molecule.id)

# note that this molecule uses an object model from QCArchive, _not_ the toolkit
print(type(initial_molecule))

# we check that the molecule has been ordered to match the ordering used in the data base
# by printing out the atomic numbers of both objects in order

for atoms in zip(molecule.atoms, initial_molecule.atomic_numbers):
    print(atoms[0].atomic_number, atoms[1])
    assert atoms[0].atomic_number == atoms[1]

# can compare other things, too
print(molecule.to_hill_formula(), initial_molecule.get_molecular_formula())

# QCArchive molecules don't store all information the
# toolkit needs, like bond orders and formal charges;
# that's why there is a Molecule.from_qcschema() method at all

<class 'qcelemental.models.molecule.Molecule'>
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
7 7
7 7
7 7
7 7
7 7
7 7
8 8
8 8
9 9
9 9
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
C21H22F2N6O2 C21F2H22N6O2


In [7]:
# we can also compare the graph representations of the molecules to make sure they are in the same order
import networkx as nx

# make a graph of the initial molecule using newtorkx and the data in the record
initial_network = nx.Graph()
for index, atomic_number in enumerate(initial_molecule.atomic_numbers):
    initial_network.add_node(index, atomic_number=atomic_number)

for bond in initial_molecule.connectivity:
    initial_network.add_edge(*bond[:2])
# now we can use the new isomorphic check to get the atom mapping
isomorphic, atom_map = Molecule.are_isomorphic(
    molecule,
    initial_network,
    return_atom_map=True,
    aromatic_matching=False,
    formal_charge_matching=False,
    bond_order_matching=False,
    bond_stereochemistry_matching=False,
    atom_stereochemistry_matching=False,
)

# we can check if the graph was found to be isomorphic and whether or not the
# atom mappings are in the same order
assert isomorphic
print(atom_map)
for index1, index2 in atom_map.items():
    assert index1 == index2

{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 21, 22: 22, 23: 23, 24: 24, 25: 25, 26: 26, 27: 27, 28: 28, 29: 29, 30: 30, 31: 31, 32: 32, 33: 33, 34: 34, 35: 35, 36: 36, 37: 37, 38: 38, 39: 39, 40: 40, 41: 41, 42: 42, 43: 43, 44: 44, 45: 45, 46: 46, 47: 47, 48: 48, 49: 49, 50: 50, 51: 51, 52: 52}


In [8]:
mol = Molecule.from_qcschema(entry)
mol



NGLWidget()

In [9]:
# OpenFF Toolkit `Molecule` objects can be converted back into QCArchive molecules,
# as long as there are conformer(s)

qc_molecule = molecule.to_qcschema()

qc_molecule

NGLWidget()

This transformation unlocks functionality of [QCEngine](https://molssi.github.io/QCEngine/) (computing energies, gradients, hessians, etc. with a variety of different methods).

Here we will try and compute the energy using RDKit (only run this cell if QCEngine is installed.) 

In [10]:
import qcengine

# set up the RDKit task
rdkit_task = {
    "schema_name": "qcschema_input",
    "schema_version": 2,
    "molecule": qc_molecule,
    "driver": "energy",
    "model": {"method": "uff", "basis": None},
    "keywords": {"scf_type": "df"},
}

# now lets compute the energy using qcengine and RDKit and print the result
result = qcengine.compute(rdkit_task, "rdkit")

In [11]:
# note the result is in QC units of hartrees
print(result.return_result)

0.05583595251732078
