# iprPy High-throughput Runner

- - -

**Lucas M. Hale**, [lucas.hale@nist.gov](mailto:lucas.hale@nist.gov?Subject=ipr-demo), *Materials Science and Engineering Division, NIST*.

**Chandler A. Becker**, [chandler.becker@nist.gov](mailto:chandler.becker@nist.gov?Subject=ipr-demo), *Office of Data and Informatics, NIST*.

**Zachary T. Trautt**, [zachary.trautt@nist.gov](mailto:zachary.trautt@nist.gov?Subject=ipr-demo), *Materials Measurement Science Division, NIST*.

Version: 2017-05-03

[Disclaimers](http://www.nist.gov/public_affairs/disclaimer.cfm) 
 
- - -

## Introduction

The runner routine is used to perform prepared calculation instances within a run directory. Multiple runners can simultaneously operate on the same run directory and/or same database. The runner also handles errors and dependencies in a simple manner. Each runner records its progress in a log file, which are saved in the runner-log directory.

The full description of the runner's processes is provided here:

1. The runner randomly selects a calculation instance in the run directory.

2. It bids on the calculation. If the bid fails, runner returns to 1.
    
    1. If a .bid file is already in the calculation, bid fails.
    
    2. A .bid file is created and the runner pauses for a few seconds. It then checks if the instance still exists and if it has the winning bid (lowest pid). If not, bid fails.

3. A check is done to verify that the calculation instance folder contains calc\_\*.py and calc\_\*.in files, and that an associated record exists in the database. If any of these are missing:

    1. The calculation instance is archived as a gzipped tar file and placed in the orphan directory. 
    
    2. The calculation instance folder is deleted from the run directory, and the runner returns to 1. 

4. The runner checks any parent record .xml and .json files in the calculation instance folder for the status field. 

    1. If status is 'error', results.json file is created from the partial record and given error message saying parent record issued error. Runner skips to 6.
    
    2. If status is 'not calculated', the copy of the up-to-date version of the parent record is accessed from the database. 
    
        1. If the status has changed, the new version of the parent record is copied to the run directory, and 3 is tried again.
        
        2. If the status is still 'not calculated', then the runner tries 2 for the parent calculation's instance.

5. The runner performs the Python calculation script as a subprocess.

6. If an error is raised, results.json file is created from the partial record and the error message. 

7. When the calculation is complete, the results.json file produced is uploaded to the database as the calculation's record.

8. The calculation instance folder is archived in the database as a gzipped tar file.

9. The calculation instance folder is deleted from the run directory, and the runner returns to 1.


This capability can be accessed in one of three ways:

1. Load iprPy in a Python script and directly call the function.

2. Run the associated script with an input parameter file.

3. Use the iprPy command line with the corresponding option. 

The underlying code can be found in [iprPy/highthroughput/runner.py](../../iprPy/highthroughput/runner.py).

## runner(dbase, run_directory, orphan_directory=None)

This is accessing the function through the iprPy package.

Arguments:

- __dbase__ is an iprPy.Database object.

- __run_directory__ is a directory for running the calculation instances from.

- __orphan_directory__ is the directory to place incomplete calculation instances. If not specified, will create/use a directory at the same level as the run_directory called 'orphan'.

## $./runner.py runner.in

This is running the associated isolated script with an input parameter file.

Input file parameters:

- __database__ is the 'style host' for initializing the database.

- __database_*__  define any values for any other database initialization parameters named \*.

- __run_directory__ is a directory for running the calculation instances from.

- __orphan_directory__ is the directory to place incomplete calculation instances. If not specified, will create/use a directory at the same level as the run_directory called 'orphan'.

## $./iprPy runner [database] [run_directory]

This is using the associated command line option.

Command line options:

- __database__ is the name associated with database settings saved to the [.iprPy settings](iprPy.highthroughput.settings.ipynb). If not given, the current stored database names will be listed and a prompt given.

- __run_directory__ is a directory for running the calculation instances from. If not given, the current stored run_directory names will be listed and a prompt given.

## Demonstration

Demonstration is left for the tutorials.

- - -

__Docs Navigation:__

Tutorial:

1. [Basics](../tutorial/1 Basics.ipynb)

Reference:

- [iprPy](../reference/iprPy.ipynb)

- [iprPy.calculations](../reference/iprPy.convert.ipynb)

- [iprPy.databases](../reference/iprPy.databases.ipynb)

- [iprPy.highthroughput](../reference/iprPy.highthroughput.ipynb)

- [iprPy.input](../reference/iprPy.input.ipynb)

- [iprPy.prepare](../reference/iprPy.prepare.ipynb)

- [iprPy.records](../reference/iprPy.records.ipynb)

- [iprPy.tools](../reference/iprPy.tools.ipynb)