# Atomate2 Workflow Ingestor
###  Kat Nykiel, Alejandro Strachan
School of Materials Engineering and Birck Nanotechnology Center, Purdue University, West Lafayette, Indiana 47907, United States

## Abstract
This [Sim2L](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0264492) allows researchers to share their density functional theory calculations performed using atomate2 and make the findable, accessible, interoperable, and reusable ([FAIR](https://www.go-fair.org/fair-principles/)). 

Specifically, this tool is used to cache the individual steps that compose a workflow. Using the atomate2 and FireWorks terminology, each green box in the diagram below represents a *firework*, or individual step, while the overall diagram is a *workflow*. Users can upload their atomate2 workflows, and the Sim2L stores the intermediate steps, and provides the final outputs of the workflow, along with relevant metadata. The fireworks containing VASP runs are not stored directly in this tool, but instead in nanoHUB's [vaspingestor tool](https://nanohub.org/tools/vaspingestor).

![sample atomate2 workflow](./notebooks/elastic-workflow.png)

Using this Sim2L, researchers satisfy data-sharing requirements such as those called for by the [US Office of Science and Technology Policy](https://www.whitehouse.gov/ostp/news-updates/2022/08/25/ostp-issues-guidance-to-make-federally-funded-research-freely-available-without-delay/)


## Atomate2 Workflow Data Ingestor Input/Output Overview

For robustness, the wflowingestor Sim2L stores each step (firework) of the atomate2 workflow as an individual entry into the database.

The Sim2L takes the following inputs: 

- the fw_doc, as a dictionary
- the workflow id
- the workflow graph associated with the fw_doc, as a dictionary
- the Author associated with the dataset
- a Tag to identify the specific datasetss

The Sim2L ingests the results and extracts the following outputs: 

- the fw_id of this document
- simplified outputs
  - structure
  - doc type (elastic tensor fitting, etc.)
- the vaspingestor squid_id, if returned

All inputs and outputs are indexed in the ResultsDB: https://nanohub.org/results.  

The Sim2L itself can be found here here: [wflowingestor.ipynb](simtool/wflowingestor.ipynb)


## Upload your VASP run and make your results FAIR
This notebook shows how to upload a workflow into the Sim2L. nanoHUB does the rest automatically. Once you upload your data, it will be automatically indexed into the results database.

[Upload your workflow and share it](notebooks/ingest-workflow.ipynb)


## Query the ResultsDB and explore workflows

- how to query for all steps of a specific workflow (i.e. for a given structure)
- how to query for all steps of a given type (i.e. elastic constants)

Below, we demonstrate how to use this tool to reproduct the results of "XYZ publication"

[Query the database for workflows and analyze results](notebooks/query-workflow.ipynb)