Skip to content

Commit

Permalink
Metal binding data, trajectories, and the start of stereochemistry (#706
Browse files Browse the repository at this point in the history
)

* Small change, mostly to redox - only calculate reduction potential vs. SHE (conversions to other electrodes can happen on front-end)

* Small type bugfix

* Test changes to reflect changes to redox

* Added InChI and InChI-key to MoleculeMetadata (for searching, mostly)

* Now have a way to extract optimization trajectories from Q-Chem Task docs

* API endpoint for geometry optimization trajectories

* We really are adding new features at this point - starting work on BindingDoc for metal binding properties

* Draft binding document. Now the fun part: builders and tests!

* Realized that it makes more sense to have different sub-docs within MetalBindingDoc, one for each metal in the molecule

* Beginning work on metal binding builder. This one will be... complicated. Lots of moving parts

* Progress on binding builder

* Full draft of the binding builder???

* Small tweaks

* Small tweak

* Enable multiple metal binding methods

* Metal binding should be working; just need tests

* Small fix to thermo

* Trying to see if I can speed up the extremely slow association builder

* Moving InChI from MoleculeMetadata to MoleculeDoc, where it really matters

* Small bugfix

* Some bugfixes with metal binding and summary

* New tests and test files and all that

* Tests pass; let's go

* Beginning of metal_binding API endpoint

* Flipped the sign of the binding energy/enthalpy/entropy/free energy

* Tests pass

* Draft query operator

* All looks good; just need a test for new query operator

* API tests pass

* Getting rid of some lint

* Shut up, mypy!

* More mypy suggestions. These ones are, admittedly, less bad

* Once more for the road

* Can we please get rid of mypy

* PLEASE

* Missed something

* Just ignoring everything

* More lint

* It's taking me longer to get type checks to pass than it did for me to write this code

* Lint

* Ahhhh mypy stop!

* Also this

* Did the creator of mypy hate us all?

* Union

* ThermoDocs with corrections weren't being validated with the way that level of theory was treated previously

* Was never passing kwargs along during building... this might explain something about builder memory usage

* Long story, but basically, this should resolve a long-standing and mysterious issue that I've had with mrun

* Index mismatch

* Fix solvent synonyms

* Small change requested by Orion

* Small tweaks to builder comments

* Small change to NBO bonding

* bugfix on metal bonding

* Another bugfix; NBO metal bonding detection should work now?

* One more small bugfix

* Now use pymatgen graph hashing; also fix hashing to use undirected graph (small change, but major impact)

* Update test

* Revert test

* Fix tests

* Seems I missed one test

* Include hash (and SMILES, cause it was already there) at the task level

* Ready to build molecules (assoc and collection) based on hashes

* Bugfix

* Fix linting

* Linting JSON files, because that's apparently a good use of our time

* Trying to make linter happy with black

* Pre-commit, don't fail me now

* mypy, my bitter enemy

* Man, I hate setting up these builder tests

* Linter fix

* Whoops, forgot to undo a change that I made for internal testing
  • Loading branch information
espottesmith committed Jun 16, 2023
1 parent c7ae50e commit 1774f3c
Show file tree
Hide file tree
Showing 62 changed files with 2,141 additions and 593 deletions.
5 changes: 5 additions & 0 deletions emmet-api/emmet/api/core/documentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,11 @@
"description": "Route for molecular bonding data. See the `MoleculeBondingDoc` schema for a full list \
of fields returned by this route.",
},
{
"name": "Molecules Metal Binding",
"description": "Route for data regarding metal binding to molecules. See the `MetalBindingDoc` schema \
for a full list of fields returned by this route.",
},
{
"name": "Molecules Orbitals",
"description": "Route for molecular orbital information obtained via Natural Bonding Orbital analysis. \
Expand Down
Empty file.
130 changes: 130 additions & 0 deletions emmet-api/emmet/api/routes/molecules/metal_binding/query_operators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
from typing import Any, Dict, Optional, Union
from fastapi import Query
from maggma.api.query_operator import QueryOperator
from maggma.api.utils import STORE_PARAMS


class BindingDataQuery(QueryOperator):
"""
Method to generate a query on binding data.
"""

def query(
self,
metal_element: Optional[str] = Query(
None,
description="Element symbol for coordinated metal, e.g. 'Li' for lithium or 'Mg' for magnesium",
),
min_metal_partial_charge: Optional[float] = Query(
None, description="Minimum metal partial charge."
),
max_metal_partial_charge: Optional[float] = Query(
None, description="Maximum metal partial charge."
),
min_metal_partial_spin: Optional[float] = Query(
None,
description="Minimum metal partial spin (only meaningful for open-shell systems).",
),
max_metal_partial_spin: Optional[float] = Query(
None,
description="Maximum metal partial spin (only meaningful for open-shell systems).",
),
min_metal_assigned_charge: Optional[float] = Query(
None,
description="Minimum charge of the metal, determined by analyzing partial charges/spins.",
),
max_metal_assigned_charge: Optional[float] = Query(
None,
description="Maximum charge of the metal, determined by analyzing partial charges/spins.",
),
min_metal_assigned_spin: Optional[Union[int, float]] = Query(
None,
description="Minimum spin multiplicity of the metal, determined by analyzing partial spins.",
),
max_metal_assigned_spin: Optional[Union[int, float]] = Query(
None,
description="Maximum spin multiplicity of the metal, determined by analyzing partial spins.",
),
min_number_coordinate_bonds: Optional[int] = Query(
None, description="Minimum number of atoms coordinated to the metal."
),
max_number_coordinate_bonds: Optional[int] = Query(
None, description="Maximum number of atoms coordinated to the metal."
),
min_binding_energy: Optional[float] = Query(
None, description="Minimum binding electronic energy (units: eV)"
),
max_binding_energy: Optional[float] = Query(
None, description="Maximum binding electronic energy (units: eV)"
),
min_binding_enthalpy: Optional[float] = Query(
None, description="Minimum binding enthalpy (units: eV)"
),
max_binding_enthalpy: Optional[float] = Query(
None, description="Maximum binding enthalpy (units: eV)"
),
min_binding_entropy: Optional[float] = Query(
None, description="Minimum binding entropy (units: eV/K)"
),
max_binding_entropy: Optional[float] = Query(
None, description="Maximum binding entropy (units: eV/K)"
),
min_binding_free_energy: Optional[float] = Query(
None, description="Minimum binding free energy (units: eV)"
),
max_binding_free_energy: Optional[float] = Query(
None, description="Maximum binding free energy (units: eV)"
),
) -> STORE_PARAMS:
crit: Dict[str, Any] = dict() # type: ignore

if metal_element:
crit["binding_data.metal_element"] = metal_element

d = {
"metal_partial_charge": [
min_metal_partial_charge,
max_metal_partial_charge,
],
"metal_partial_spin": [min_metal_partial_spin, max_metal_partial_spin],
"metal_assigned_charge": [
min_metal_assigned_charge,
max_metal_assigned_charge,
],
"metal_assigned_spin": [min_metal_assigned_spin, max_metal_assigned_spin],
"number_coordinate_bonds": [
min_number_coordinate_bonds,
max_number_coordinate_bonds,
],
"binding_energy": [min_binding_energy, max_binding_energy],
"binding_enthalpy": [min_binding_enthalpy, max_binding_enthalpy],
"binding_entropy": [min_binding_entropy, max_binding_entropy],
"binding_free_energy": [min_binding_free_energy, max_binding_free_energy],
}

for entry in d:
key = "binding_data." + entry
if d[entry][0] is not None or d[entry][1] is not None: # type: ignore
crit[key] = dict()

if d[entry][0] is not None: # type: ignore
crit[key]["$gte"] = d[entry][0] # type: ignore

if d[entry][1] is not None: # type: ignore
crit[key]["$lte"] = d[entry][1] # type: ignore

return {"criteria": crit}

def ensure_indexes(self): # pragma: no cover
return [
("binding_data.metal_element", False),
("binding_data.metal_partial_charge", False),
("binding_data.metal_partial_spin", False),
("binding_data.metal_assigned_charge", False),
("binding_data.metal_assigned_spin", False),
("binding_data.number_coordinate_bonds", False),
("binding_data.binding_energy", False),
("binding_data.binding_enthalpy", False),
("binding_data.binding_entropy", False),
("binding_data.binding_free_energy", False),
]
54 changes: 54 additions & 0 deletions emmet-api/emmet/api/routes/molecules/metal_binding/resources.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
from maggma.api.resource import ReadOnlyResource
from emmet.core.molecules.metal_binding import MetalBindingDoc

from maggma.api.query_operator import PaginationQuery, SortQuery, SparseFieldsQuery

from emmet.api.routes.molecules.molecules.query_operators import (
MultiMPculeIDQuery,
ExactCalcMethodQuery,
FormulaQuery,
ChemsysQuery,
ElementsQuery,
ChargeSpinQuery,
)
from emmet.api.routes.molecules.metal_binding.query_operators import BindingDataQuery
from emmet.api.routes.molecules.utils import MethodQuery, MultiPropertyIDQuery
from emmet.api.core.settings import MAPISettings
from emmet.api.core.global_header import GlobalHeaderProcessor


def metal_binding_resource(metal_binding_store):
resource = ReadOnlyResource(
metal_binding_store,
MetalBindingDoc,
query_operators=[
MultiMPculeIDQuery(),
ExactCalcMethodQuery(),
FormulaQuery(),
ChemsysQuery(),
ElementsQuery(),
ChargeSpinQuery(),
MethodQuery(),
BindingDataQuery(),
MultiPropertyIDQuery(),
SortQuery(),
PaginationQuery(),
SparseFieldsQuery(
MetalBindingDoc,
default_fields=[
"molecule_id",
"property_id",
"solvent",
"method",
"last_updated",
],
),
],
header_processor=GlobalHeaderProcessor(),
tags=["Molecules Metal Binding"],
sub_path="/metal_binding/",
disable_validation=True,
timeout=MAPISettings().TIMEOUT,
)

return resource
36 changes: 14 additions & 22 deletions emmet-api/emmet/api/routes/molecules/redox/query_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,48 +11,40 @@ class RedoxPotentialQuery(QueryOperator):

def query(
self,
electrode: str = Query(
"H", description="Reference electrode to be queried (e.g. 'H', 'Li', 'Mg')."
),
min_reduction_potential: Optional[float] = Query(
None,
description="Minimum reduction potential using the selected reference electrode.",
None, description="Minimum reduction potential."
),
max_reduction_potential: Optional[float] = Query(
None,
description="Maximum reduction potential using the selected reference electrode.",
None, description="Maximum reduction potential."
),
min_oxidation_potential: Optional[float] = Query(
None,
description="Minimum oxidation potential using the selected reference electrode.",
None, description="Minimum oxidation potential."
),
max_oxidation_potential: Optional[float] = Query(
None,
description="Maximum oxidation potential using the selected reference electrode.",
None, description="Maximum oxidation potential."
),
) -> STORE_PARAMS:
crit: Dict[str, Any] = dict() # type: ignore

d = {
"oxidation_potentials": [min_oxidation_potential, max_oxidation_potential],
"reduction_potentials": [min_reduction_potential, max_reduction_potential],
"oxidation_potential": [min_oxidation_potential, max_oxidation_potential],
"reduction_potential": [min_reduction_potential, max_reduction_potential],
}

for entry in d:
key = entry + "." + electrode
if d[entry][0] is not None or d[entry][1] is not None:
for key in d:
if d[key][0] is not None or d[key][1] is not None:
crit[key] = dict()

if d[entry][0] is not None:
crit[key]["$gte"] = d[entry][0]
if d[key][0] is not None:
crit[key]["$gte"] = d[key][0]

if d[entry][1] is not None:
crit[key]["$lte"] = d[entry][1]
if d[key][1] is not None:
crit[key]["$lte"] = d[key][1]

return {"criteria": crit}

def ensure_indexes(self): # pragma: no cover
return [
("oxidation_potentials", False),
("reduction_potentials", False),
("oxidation_potential", False),
("reduction_potential", False),
]
47 changes: 41 additions & 6 deletions emmet-api/emmet/api/routes/molecules/tasks/query_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
from fastapi import Query
from typing import Optional

from monty.json import jsanitize

# TODO: might need these utils once pmg changes are in place (see below)
# from emmet.api.routes.tasks.utils import calcs_reversed_to_trajectory, task_to_entry
from emmet.api.routes.molecules.tasks.utils import calcs_reversed_to_trajectory


class MultipleTaskIDsQuery(QueryOperator):
Expand Down Expand Up @@ -93,8 +93,43 @@ def post_process(self, docs, query):
return d


# TODO: class TrajectoryQuery(QueryOperator):
# Need to write Trajectory class in pmg for Molecules
class TrajectoryQuery(QueryOperator):
"""
Method to generate a query on calculation trajectory data from task documents
"""

def query(
self,
task_ids: Optional[str] = Query(
None, description="Comma-separated list of task_ids to query on"
),
) -> STORE_PARAMS:
crit = {}

if task_ids:
crit.update(
{
"task_id": {
"$in": [task_id.strip() for task_id in task_ids.split(",")]
}
}
)

return {"criteria": crit}

# TODO: class EntryQuery(QueryOperator):
# Need to write MoleculeEntry class in pmg
def post_process(self, docs, query):
"""
Post processing to generatore trajectory data
"""

d = [
{
"task_id": doc["task_id"],
"trajectories": jsanitize(
calcs_reversed_to_trajectory(doc["calcs_reversed"])
),
}
for doc in docs
]

return d
21 changes: 16 additions & 5 deletions emmet-api/emmet/api/routes/molecules/tasks/resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,12 @@
from emmet.api.routes.molecules.tasks.query_operators import (
DeprecationQuery,
MultipleTaskIDsQuery,
# TODO:
# TrajectoryQuery,
TrajectoryQuery,
# EntryQuery,
)
from emmet.api.core.global_header import GlobalHeaderProcessor
from emmet.api.core.settings import MAPISettings
from emmet.core.tasks import DeprecationDoc
from emmet.core.tasks import DeprecationDoc, TrajectoryDoc
from emmet.core.qchem.task import TaskDocument

timeout = MAPISettings().TIMEOUT
Expand Down Expand Up @@ -65,5 +64,17 @@ def task_deprecation_resource(task_store):
return resource


# TODO: def trajectory_resource(task_store):
# TODO: def entries_resource(task_store):
def trajectory_resource(task_store):
resource = ReadOnlyResource(
task_store,
TrajectoryDoc,
query_operators=[TrajectoryQuery(), PaginationQuery()],
key_fields=["task_id", "calcs_reversed"],
tags=["Molecules Tasks"],
sub_path="/tasks/trajectory/",
header_processor=GlobalHeaderProcessor(),
timeout=timeout,
disable_validation=True,
)

return resource

0 comments on commit 1774f3c

Please sign in to comment.