# VASP Ingstor Workflow
### Kat Nykiel, Dr. Alejandro Strachan

## Load Atomate2 TaskDocuments

This Sim2L uses parsed VASP results in the form of TaskDocuments from [atomate2](https://github.com/materialsproject/atomate2). 

These documents are obtained from VASP using Atomate2's [VaspDrone](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.drones.VaspDrone.html#atomate2.vasp.drones.VaspDrone), saved as json files. For example, running the following in a directory of VASP results would provide a TaskDocument json file

```
# Import libraries
from atomate2.vasp.drones import VaspDrone
from monty.json import jsanitize

# Parse results with atomate2
drone = VaspDrone()
doc = drone.assimilate()
doc = jsanitize(doc.dict(),recursive_msonable=True)

# Save results as json file
with open('doc.json','w',encoding='utf-8') as f_o:
    json.dump(doc,f_o)
```

These documents contain most information about the VASP run; however, its large size means it is not easily queryable. The purpose of this Sim2L is to extract relevant features from this schema which can further be used in machine learning workflows.

In [None]:
# Import libraries
import numpy as np
import json
import os
import pandas as pd

# Import nanoHUB-specific libraries
import nanohubremote as nr
from simtool import findInstalledSimToolNotebooks,searchForSimTool
from simtool import getSimToolInputs,getSimToolOutputs,Run

Here we load an example TaskDocument

In [None]:
# Load json file
with open('./../examples/doc.json','r') as f:
    doc = json.load(f)   

## Load Sim2L

In [None]:
# Load the Sim2L
simToolName = "vaspingestor"
simToolLocation = searchForSimTool(simToolName)
for key in simToolLocation.keys():
    print("%18s = %s" % (key,simToolLocation[key]))
    
installedSimToolNotebooks = findInstalledSimToolNotebooks(simToolName,returnString=True)
print(installedSimToolNotebooks)

In [None]:
# Get the list of inputs
inputs = getSimToolInputs(simToolLocation)
print(inputs)

In [None]:
# Get the list of outputs
outputs = getSimToolOutputs(simToolLocation)
print(outputs)

## Submit Sim2L sequentially

In [None]:
inputs['doc'].value = doc
inputs['author'].value = "Kat Nykiel"
inputs['dataset'].value = "example"

In [None]:
r = Run(simToolLocation,inputs)

In [None]:
r.getResultSummary()

## Submit Sim2L in parallel
The previous example submitted a single Sim2L job, but retained control of the notebook until the job was done. This works for low quantities of jobs; however, it is often desirable to submit multiple jobs at once. This is done below

TODO: modify script for VASPingestor tool once installed

In [None]:
# Reuse the current nanoHUB session 
auth_data = {
    'grant_type' : 'tool',
}
with open(os.environ["SESSIONDIR"]+"/resources") as file:
    lines = [line.split(" ", 1) for line in file.readlines()]
    properties = {line[0].strip(): line[1].strip() for line in lines if len(line)==2}
    auth_data["sessiontoken"] = properties["session_token"]
    auth_data["sessionnum"] = properties["sessionid"]
    
# Create a nanoHUB web services session
session = nr.Sim2l(auth_data)

In [None]:
# Query tool inputs
toolname = 'vaspingestor'
params = session.getToolParameters(toolname)
pd.DataFrame([p.to_dict() for p in params.values()])

In [None]:
# Modify inputs
params['doc'].current=docs

In [None]:
# Submit Sim2L job
job_id = session.submitTool(params)

In [None]:
# Check job status
import time
import pprint
from IPython.display import clear_output

for i in range(100):
    pprint.pprint(session.checkStatus(job_id['job_id']))
    time.sleep(5)
    clear_output(wait=True)

In [None]:
# Retrieve results
results = session.getResults(job_id['job_id'])

## View Sim2L ResultsDB
To view the cached results stored by the Sim2L, we can query like below

In [None]:
tool = 'vaspingestor'

installedSimToolNotebooks = findInstalledSimToolNotebooks(tool,returnString=True)
print(installedSimToolNotebooks)
cellrelaxdft = searchForSimTool(tool)

req_json = session.requestPost('dbexplorer/dbexplorer/tool_detail?simtool=true', data={'tool': tool})
req_json = req_json.json()
parameters = req_json['results']

inputs = np.array(list(parameters[0][tool]['input'].keys()))
inputs = np.reshape(inputs,(-1,1))
outputs = np.array(list(parameters[0][tool]['output'].keys()))
outputs = np.reshape(outputs,(-1,1))
input_df = pd.DataFrame(inputs, columns=["Inputs"])
output_df = pd.DataFrame(outputs, columns=["Outputs"])
display(input_df)
display(output_df)