This is a guide on how to locate results in QCArchive inspect them and download them using QCSubmit.

To get everything you need please make a conda environment using the following commands

`conda create -n temp_qcsubmit -c conda-forge -c omnia/label/rc -c omnia -c openeye qcsubmit`

`conda activate temp_qcsubmit`

`pip install basis_set_exchange`

First lets load qcportal and access  the public client

In [1]:
import qcportal as ptl

# this is our way of interfacing with the archive
client = ptl.FractalClient()

To look at the datasets available use `list_collections` and pass the type we are intrested in, which in this case is an optimization dataset

In [2]:
client.list_collections("OptimizationDataset")

Unnamed: 0_level_0,Unnamed: 1_level_0,tagline
collection,name,Unnamed: 2_level_1
OptimizationDataset,FDA Optimization Dataset 1,
OptimizationDataset,JGI Metabolite Set 1,
OptimizationDataset,Kinase Inhibitors: WBO Distributions,
OptimizationDataset,OpenFF Discrepancy Benchmark 1,
OptimizationDataset,OpenFF Ehrman Informative Optimization v0.1,
OptimizationDataset,OpenFF Ehrman Informative Optimization v0.2,
OptimizationDataset,OpenFF Full Optimization Benchmark 1,
OptimizationDataset,OpenFF Gen 2 Opt Set 1 Roche,
OptimizationDataset,OpenFF Gen 2 Opt Set 2 Coverage,
OptimizationDataset,OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy,


Now we can pull out the collection we want

In [3]:
dataset = client.get_collection("OptimizationDataset", "OpenFF Protein Fragments v1.0")

We can now look at all of the records in the dataset here

In [4]:
dataset.data

DataModel(id='284', name='OpenFF Protein Fragments v1.0', collection='optimizationdataset', provenance={'qcsubmit': '0+untagged.119.gac29b70.dirty', 'openforcefield': '0.7.0', 'openeye': '2019.Oct.2'}, tags=['openff'], tagline='Constrained optimization of various protein fragments.', description='An optimization dataset using geometric.', group='default', visibility=True, view_url_hdf5=None, view_url_plaintext=None, view_metadata=None, view_available=False, metadata={'submitter': 'joshuahorton', 'creation_date': '2020-07-06', 'collection_type': 'OptimizationDataset', 'dataset_name': 'OpenFF Protein Fragments v1.0', 'short_description': 'Constrained optimization of various protein fragments.', 'long_description_url': 'https://github.com/openforcefield/qca-dataset-submission/tree/master/2020-07-06-OpenFF-Protein-Fragments-Initial', 'long_description': 'An optimization dataset using geometric.', 'elements': ['H', 'C', 'N', 'O']}, records={'gly_ala_ser-0': OptEntry(name='gly_ala_ser-0', in

Now we can pull out a record and inspect the optimization trajectory. Records are loaded by passing the record name and the specification that it was computed under as the collection can be computed multiple times using different specifications. In this case we use the dataframe in the dataset to get the record name for the first record. 

In [56]:
record = dataset.get_record(dataset.df.index[0], "default")


To see all information related to the optimization call the dict method. Here we see the spec used for the optimization including the geometric settings. The number of steps to reach convergence is equal to the number of results in the trajectory. 

In [57]:
record.dict()

{'id': '21272848',
 'hash_index': 'e32bb593fa4efdb26f0ef9aec379aa5e04b9ea7e',
 'procedure': 'optimization',
 'program': 'geometric',
 'version': 1,
 'protocols': {},
 'extras': {},
 'stdout': '24673348',
 'stderr': None,
 'error': None,
 'task_id': None,
 'manager_name': 'PacificResearchPlatform2-openff-qca2-6585bb9ff8-dw6lx-78b7b826-e963-465a-b7ec-dc6a957b59e9',
 'status': <RecordStatusEnum.complete: 'COMPLETE'>,
 'modified_on': datetime.datetime(2020, 7, 21, 19, 33, 19, 628202),
 'created_on': datetime.datetime(2020, 7, 21, 17, 45, 59, 356869),
 'provenance': {'creator': 'geomeTRIC',
  'version': '0.9.7.2',
  'routine': 'geometric.run_json.geometric_run_json',
  'cpu': 'Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz',
  'qcengine_version': 'v0.15.0',
  'username': 'qcfractal',
  'wall_time': 3608.0276279449463,
  'hostname': 'openff-qca2-6585bb9ff8-dw6lx'},
 'schema_version': 1,
 'initial_molecule': '14774456',
 'qc_spec': {'driver': <DriverEnum.gradient: 'gradient'>,
  'method': 'b3lyp-d3

If you have nglview installed and configured you can also view the intial and final molecules or any molecule in the trajectory.

In [59]:
record.get_initial_molecule()

NGLWidget()

In [60]:
record.get_final_molecule()

NGLWidget()

We can also inspect the individual gradient calls that built the trajectory.

In [11]:
results = record.get_trajectory()

Here we get a detailed summary of the QM calculation including any SCF properties we wanted like the dipole moment or bond orders as well as a break down of the energy components.

In [14]:
results[0].dict()

{'id': '21274435',
 'hash_index': None,
 'procedure': 'single',
 'program': 'psi4',
 'version': 1,
 'protocols': {},
 'extras': {'qcvars': {'2-BODY DISPERSION CORRECTION ENERGY': -0.06702804,
   'B3LYP-D3(BJ) DISPERSION CORRECTION ENERGY': -0.06702804,
   'CURRENT DIPOLE X': 3.4552804858537676,
   'CURRENT DIPOLE Y': 5.55045163574899,
   'CURRENT DIPOLE Z': -1.357263630651112,
   'CURRENT ENERGY': -1026.8934551382044,
   'CURRENT REFERENCE ENERGY': -1026.8934551382044,
   'DFT FUNCTIONAL TOTAL ENERGY': -1026.8264270982045,
   'DFT TOTAL ENERGY': -1026.8934551382044,
   'DFT VV10 ENERGY': 0.0,
   'DFT XC ENERGY': -111.72796964190206,
   'DISPERSION CORRECTION ENERGY': -0.06702804,
   'NUCLEAR REPULSION ENERGY': 1548.9458971516774,
   'ONE-ELECTRON ENERGY': -4477.0952707170945,
   'PCM POLARIZATION ENERGY': 0.0,
   'PE ENERGY': 0.0,
   'SCF DIPOLE X': 3.4552804858537676,
   'SCF DIPOLE Y': 5.55045163574899,
   'SCF DIPOLE Z': -1.357263630651112,
   'SCF ITERATION ENERGY': -1026.893455138

As this is a constrained optimization we can also make sure that the constraints kept the requested dihedrals in the correct geometry using the measure function on each molecule. For example on this molecule we know from the dict print out above that one of the constraints is a freeze with the following details `{'type': 'dihedral', 'indices': [24, 26, 28, 31]}`. Now we can measure this dihedral in the first and last molecule in the trajectory to ensure that the dihedral was fixed.

In [15]:
initial_dihedral = record.get_initial_molecule().measure([24, 26, 28, 31])
final_dihedral = record.get_final_molecule().measure([24, 26, 28, 31])
print(initial_dihedral, final_dihedral)

128.1878653554156 128.18961901433315


Now can build a function which checks that all constraints on a record are satisfied during an optimization.


In [18]:
def check_constraints(record) -> bool:
    """
    For the given record find all constraints and ensure that they are statisfied.
    Note only works for freeze types.
    """
    # grab the freeze constraints
    constraints = record.keywords["constraints"]["freeze"]
    # load the molecules start and final
    initial = record.get_initial_molecule()
    final = record.get_final_molecule()
    # check each constraint in order
    for constraint in constraints:
        # if we are not close to zero the constraint has not worked.
        if round(initial.measure(constraint["indices"]) - final.measure(constraint["indices"])) != 0:
            return False
    return True
    
    
    

Now lets apply this to the record and check all constraints.

In [20]:
check_constraints(record)

True

## Extracting results

Here we will look at extracting results of the collection using QCSubmit.


In [23]:
from qcsubmit.results import OptimizationCollectionResult
# here we tell qcsubmit to pull down the final molecule data only for each optimization.
opt_result = OptimizationCollectionResult.from_server(client=client, spec_name="default", dataset_name="OpenFF Protein Fragments v1.0", final_molecule_only=True)

requested molecules 576
requested results 576


The structure of the opt_result is very similar to the archive but it has the power to conver to OFF molecules directally and all results are included automatically, so we can view the energy and gradient of the optimized molecule. Any like molecules are also collapsed into one record with mulipule entries. For example the results class found only 16 unique molecules with a total of 567 geometries.

In [26]:
opt_result.n_molecules

16

In [27]:
opt_result.n_results

576

Lets get the first unique molecule and check how many different starting geometries were supplied for the molecle and therefore how many optimizations were done.

In [33]:
record = opt_result.collection["C[C@@H]([C+](N[C@@H](CO)[C+](NC)[O-])[O-])N[C+](CN[C+](C)[O-])[O-]"]
record.n_entries

28

Now we can see the first entry and look at the wbo/mbo array along with the final energy and gradient.

In [37]:
record.entries[0].final_molecule

SingleResult(molecule=Molecule(name='C11H20N4O5', formula='C11H20N4O5', hash='d10aa5e'), wbo=array([[0.0, 0.8986590363529622, 0.02970844171073566, ...,
        7.14912016419347e-11, 6.715223644015865e-10,
        1.2862760642511557e-10],
       [0.8986590363529622, 0.0, 0.8964714191799869, ...,
        5.454171193756835e-10, 4.485318804841463e-09,
        8.144458504935133e-10],
       [0.02970844171073566, 0.8964714191799869, 0.0, ...,
        1.9347711711421823e-11, 1.5660045827796298e-10,
        3.110923232490498e-11],
       ...,
       [7.14912016419347e-11, 5.454171193756835e-10,
        1.9347711711421823e-11, ..., 0.0, 0.03036847657652554,
        0.02910618699794112],
       [6.715223644015865e-10, 4.485318804841463e-09,
        1.5660045827796298e-10, ..., 0.03036847657652554, 0.0,
        0.029275168003054557],
       [1.2862760642511557e-10, 8.144458504935133e-10,
        3.110923232490498e-11, ..., 0.02910618699794112,
        0.029275168003054557, 0.0]], dtype=object), m

We can also get the OFF molecule.

In [45]:
off_mol = record.entries[0].get_final_molecule()
off_mol

NGLWidget()

In this format we have the ability to write the molecule to file or use any of the other toolkit methods.

In [52]:
off_mol.to_file("pro1.pdb", "pdb")