Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for NCI hackathon #186

Merged
merged 10 commits into from
Nov 1, 2022
27 changes: 22 additions & 5 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Changelog

## [v7.0.0](https://github.com/GENESIS-EFRC/reaction-network/tree/v7.0.0) (2022-10-27)

[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.1.1...v7.0.0)

**Implemented enhancements:**

- `graph-tool` is a slow/cumbersome to install dependency [\#84](https://github.com/GENESIS-EFRC/reaction-network/issues/84)

**Closed issues:**

- ray cannot parallelise the job among mutiple CPUs on HPC cluster [\#132](https://github.com/GENESIS-EFRC/reaction-network/issues/132)

**Merged pull requests:**

- Test github actions [\#176](https://github.com/GENESIS-EFRC/reaction-network/pull/176) ([mattmcdermott](https://github.com/mattmcdermott))
- Swap graph-tool for rustworkx [\#175](https://github.com/GENESIS-EFRC/reaction-network/pull/175) ([mattmcdermott](https://github.com/mattmcdermott))

## [v6.1.1](https://github.com/GENESIS-EFRC/reaction-network/tree/v6.1.1) (2022-10-25)

[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.1.0...v6.1.1)
Expand All @@ -10,19 +27,19 @@

## [v6.1.0](https://github.com/GENESIS-EFRC/reaction-network/tree/v6.1.0) (2022-10-25)

[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.0.2...v6.1.0)
[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.0.1...v6.1.0)

**Merged pull requests:**

- hopefully fix broken release workflow [\#173](https://github.com/GENESIS-EFRC/reaction-network/pull/173) ([mattmcdermott](https://github.com/mattmcdermott))

## [v6.0.2](https://github.com/GENESIS-EFRC/reaction-network/tree/v6.0.2) (2022-10-25)
## [v6.0.1](https://github.com/GENESIS-EFRC/reaction-network/tree/v6.0.1) (2022-10-25)

[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.0.1...v6.0.2)
[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.0.2...v6.0.1)

## [v6.0.1](https://github.com/GENESIS-EFRC/reaction-network/tree/v6.0.1) (2022-10-25)
## [v6.0.2](https://github.com/GENESIS-EFRC/reaction-network/tree/v6.0.2) (2022-10-25)

[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.0.0...v6.0.1)
[Full Changelog](https://github.com/GENESIS-EFRC/reaction-network/compare/v6.0.0...v6.0.2)

**Merged pull requests:**

Expand Down
1 change: 1 addition & 0 deletions docs/reference/builders/retrosynthesis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: rxn_network.builders.retrosynthesis
1 change: 1 addition & 0 deletions docs/reference/builders/schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: rxn_network.builders.schema
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ plotting = ["graphistry==0.28.0"]
tests = ["pytest==7.1.3", "pytest-cov==4.0.0"]
strict = [
"numba==0.56.2",
"pymatgen==2022.09.21",
"pymatgen==2022.10.22",
"jobflow==0.1.9",
"ray==2.0.0",
"mp-api==0.27.5",
Expand Down
Empty file.
196 changes: 196 additions & 0 deletions src/rxn_network/builders/retrosynthesis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
""" Builder(s) for generating synthesis recipe documents."""
from datetime import datetime
from math import ceil
from typing import Any, Dict, Optional

from maggma.builders import Builder
from maggma.core import Store
from maggma.stores import GridFSStore
from maggma.utils import grouper
from monty.json import MontyDecoder, jsanitize

from rxn_network.builders.schema import (
ComputedSynthesisRecipe,
ComputedSynthesisRecipesDoc,
)
from rxn_network.core.composition import Composition
from rxn_network.core.cost_function import CostFunction


class SynthesisRecipeBuilder(Builder): # pragma: no cover
"""
Build a synthesis recipe document from the reaction results from EnumeratorWF.
"""

def __init__(
self,
tasks: Store,
recipes: Store,
cf: CostFunction,
tasks_fs: Optional[GridFSStore] = None,
recipes_fs: Optional[GridFSStore] = None,
query: Optional[Dict] = None,
**kwargs,
):
self.tasks = tasks
self.tasks_fs = tasks_fs
self.recipes = recipes
self.recipes_fs = recipes_fs
self.cf = cf
self.query = query
self.kwargs = kwargs

sources = [tasks]
targets = [recipes]

if tasks_fs:
sources.append(tasks_fs)
if recipes_fs:
targets.append(recipes_fs)

super().__init__(sources=sources, targets=targets, **kwargs)

def ensure_indexes(self):
"""
Ensures indexes for the tasks, (tasks_fs), and recipes collections.
"""
self.tasks.ensure_index(self.tasks.key)
self.tasks.ensure_index(self.tasks.last_updated_field)
self.recipes.ensure_index(self.recipes.key)
self.recipes.ensure_index(self.recipes.last_updated_field)

if self.tasks_fs:
self.tasks_fs.ensure_index(self.tasks_fs.key)
if self.recipes_fs:
self.recipes_fs.ensure_index(self.recipes_fs.key)

def prechunk(self, number_splits: int):
"""
Prechunk method to perform chunking by the key field.
"""
keys = self._find_to_process()

N = ceil(len(keys) / number_splits)

for split in grouper(keys, N):
yield {"query": {self.tasks.key: {"$in": list(split)}}}

def get_items(self):
"""Get the items to process."""
to_process_task_ids = self._find_to_process()

self.total = len(to_process_task_ids)
self.logger.info(f"Processing {self.total} task docs for synthesis recipes")

for task_id in to_process_task_ids:
task = self.tasks.query_one({"task_id": task_id})
if self.tasks_fs:
rxns = self.tasks_fs.query_one(
{"task_id": task_id},
)["rxns"]
task["rxns"] = rxns
if not rxns:
self.logger.warning(
f"Missing rxns from GridFSStore for task_id {task_id}"
)
else:
if not task.get("rxns"):
self.logger.warning(f"Missing rxns in task {task_id}")

if task is not None:
yield task
else:
pass

def process_item(self, item):
"""Creates a synthesis recipe document from the task document."""
item = MontyDecoder().process_decoded(item)

task_id = item["task_id"]
task_label = item["task_label"]
rxns = item["rxns"]
targets = item["targets"]
elements = item["elements"]
chemsys = item["chemsys"]
added_elements = item["added_elements"]
added_chemsys = item["added_chemsys"]
enumerators = item["enumerators"]
mu_func = None # incorporate this?

if len(targets) > 1:
self.logger.warning(
f"Enumerator has multiple targets for task_id {item['task_id']}"
)
self.logger.warning("Selecting first target...")

target = item["targets"][0]
target_comp = Composition(target)

self.logger.debug(f"Creating synthesis recipes for {item['task_id']}")

rxns = rxns.get_rxns()
costs = [self.cf.evaluate(rxn) for rxn in rxns]
recipes = [
ComputedSynthesisRecipe.from_computed_rxn(
rxn, cost=cost, target=target_comp, mu_func=mu_func
)
for rxn, cost in zip(rxns, costs)
]

d: Dict[str, Any] = {}

d["task_id"] = task_id
d["task_label"] = task_label
d["last_updated"] = datetime.utcnow()
d["recipes"] = recipes
d["target_composition"] = target_comp
d["target_formula"] = target
d["elements"] = elements
d["chemsys"] = chemsys
d["added_elements"] = added_elements
d["added_chemsys"] = added_chemsys
d["enumerators"] = enumerators
d["cost_function"] = self.cf

doc = ComputedSynthesisRecipesDoc(**d)

return jsanitize(
doc.dict(),
strict=True,
allow_bson=True,
)

def update_targets(self, items):
"""
Inserts the new synthesis recipe docs into the Synthesis Recipes collection.
Stores recipes in GridFS if a recipes GridFSStore is provided.
"""
docs = list(filter(None, items))

if len(docs) > 0:
self.logger.info(f"Found {len(docs)} synthesis recipe docs to update")

if self.recipes_fs:
recipes = []
for d in docs:
d["use_gridfs"] = True
recipe = {"task_id": d["task_id"], "recipes": d.pop("recipes")}
recipes.append(recipe)

self.recipes_fs.update(
recipes, key="task_id", additional_metadata=["task_id"]
)

self.recipes.update(docs)
else:
self.logger.info("No items to update")

def _find_to_process(self):
self.logger.info("Synthesis Recipe builder started.")

self.logger.info("Setting up indexes.")
self.ensure_indexes()

task_keys = set(self.tasks.distinct("task_id", criteria=self.query))
updated_tasks = set(self.recipes.newer_in(self.tasks, exhaustive=True))
return updated_tasks & task_keys
Loading