# 4. Brief iprPy overview as an extension of yabadaba

While iprPy was one of the original packages that yabadaba branched out from, it can now be thought of as extending yabadaba's capabilities to support performing calculations individually and in high throughput.

*Thinking of iprPy in this way is new and still in development.  Ideally, a goal for this upcoming year is to extract out the core functionalities of iprPy from the IPR-specific content (yabadabado?), but lots of work to do.  As such, some of what is described here is currently only conceptual related to wanted features.*


## Extending yabadaba

At the most basic level, the design of iprPy extends what yabadaba does in three primary ways

1. A base Calculation class extends the base Record class with functionality for managing the execution of a calculation function.
2. (*planned*) InputValue and ResultValue classes extend Value with calculation-centric operations.
3. The Database class is also extended with methods supporting the design and execution of high throughput workflows of calculations.

<img src="Data_transformations.png" width="800">

## Calculation basics

In [6]:
# https://ipython.org/
from IPython.display import display, Markdown

import iprPy

Calculations are modular, just like records

In [4]:
calc = iprPy.load_calculation('energy_check')

They have Markdown documentation files that can be accessed from the Calculation objects

In [8]:
# Display main docs and theory
display(Markdown(calc.maindoc))
display(Markdown(calc.theorydoc))

# energy_check calculation style

**Lucas M. Hale**, [lucas.hale@nist.gov](mailto:lucas.hale@nist.gov?Subject=ipr-demo), *Materials Science and Engineering Division, NIST*.

Idea suggested by Udo v. Toussaint (Max-Planck-Institute f. Plasmaphysics)

## Introduction

The energy_check calculation style provides a quick check if the energy of an atomic configuration matches with an expected one.

### Version notes

### Additional dependencies

### Disclaimers

- [NIST disclaimers](http://www.nist.gov/public_affairs/disclaimer.cfm)

- Small variations in the energy are to be expected due to numerical precisions. 


## Method and Theory

The calculation performs a quick run 0 (no relaxation) energy calculation on a given atomic configuration using a given potential and compares the computed potential energy versus an expected energy value. 

The Calculation.run() method runs a calculation from a text-based input file.  The template version of this file can be accessed as template

In [12]:
calc.run

<bound method Calculation.run of <iprPy.calculation.energy_check.EnergyCheck.EnergyCheck object at 0x7f109fc57710>>

In [10]:
print(calc.template)

# Input script for iprPy calculation energy_check

# Calculation Metadata
branch                          <branch>

# LAMMPS and MPI Commands
lammps_command                  <lammps_command>
mpi_command                     <mpi_command>

# Interatomic Potential
potential_file                  <potential_file>
potential_kim_id                <potential_kim_id>
potential_kim_potid             <potential_kim_potid>
potential_dir                   <potential_dir>

# Initial System Configuration
load_file                       <load_file>
load_style                      <load_style>
load_options                    <load_options>
family                          <family>
symbols                         <symbols>
box_parameters                  <box_parameters>

# Input/Output Units
length_unit                     <length_unit>
pressure_unit                   <pressure_unit>
energy_unit                     <energy_unit>
force_unit                      <force_unit>



And, there is Markdown documentation for what the terms are

In [9]:
display(Markdown(calc.templatedoc))

# energy_check Input Terms

## Calculation Metadata

Specifies metadata descriptors common to all calculation styles.

- __branch__: A metadata group name that the calculation can be parsed by. Primarily meant for differentiating runs with different settings parameters.

## LAMMPS and MPI Commands

Specifies the external commands for running LAMMPS and MPI.

- __lammps_command__: The path to the executable for running LAMMPS on your system. Don't include command line options.
- __mpi_command__: The path to the MPI executable and any command line options to use for calling LAMMPS to run in parallel on your system. LAMMPS will run as a serial process if not given.

## Interatomic Potential

Specifies the interatomic potential to use and the directory where any associated parameter files are located.

- __potential_file__: The path to the potential_LAMMPS or potential_LAMMPS_KIM record that defines the interatomic potential to use for LAMMPS calculations.
- __potential_kim_id__: If potential_file is a potential_LAMMPS_KIM record, this allows for the specification of which version of the KIM model to use by specifying a full kim model id.  If not given, the newest known version of the kim model will be assumed.
- __potential_kim_potid__: Some potential_LAMMPS_KIM records are associated with multiple potential entries.  This allows for the clear specification of which potential (by potid) to associate with those kim models.This will affect the list of available symbols for the calculation.
- __potential_dir__: The path to the directory containing any potential parameter files (eg. eam.alloy setfl files) that are needed for the potential. If not given, then any required files are expected to be in the working directory where the calculation is executed.

## Initial System Configuration

Specifies the file and options to load for the initial atomic configuration.

- __load_file__: The path to the initial configuration file to load.
- __load_style__: The atomman.load() style indicating the format of the load_file.
- __load_options__: A space-delimited list of key-value pairs for optional style-specific arguments used by atomman.load().
- __family__: A metadata descriptor for relating the load_file back to the original crystal structure or prototype that the load_file was based on.  If not given, will use the family field in load_file if load_style is 'system_model', or the file's name otherwise.
- __symbols__: A space-delimited list of the potential's atom-model symbols to associate with the loaded system's atom types.  Required if load_file does not contain symbol/species information.
- __box_parameters__: Specifies new box parameters to scale the loaded configuration by. Can be given either as a list of three or six numbers: 'a b c' for orthogonal boxes, or 'a b c alpha beta gamma' for triclinic boxes. The a, b, c parameters are in units of length and the alpha, beta, gamma angles are in degrees.

## Input/Output Units

Specifies the default units to use for the other input keys and to use for saving to the results file.

- __length_unit__: The unit of length to use. Default value is 'angstrom'.
- __pressure_unit__: The unit of pressure to use.  Default value is 'GPa'.
- __energy_unit__: The unit of energy to use.  Default value is 'eV'.
- __force_unit__: The unit of force to use.  Default value is 'eV/angstrom'.


Alternatively, the underlying calculation function can be directly accessed as Calculation.calc()

In [11]:
calc.calc

<function iprPy.calculation.energy_check.energy_check.energy_check(lammps_command: str, system: atomman.core.System.System, potential: <function Potential at 0x7f10a9aba0c0>, mpi_command: Optional[str] = None) -> dict>

## High throughput outline

This is an extremely brief overview of how the high throughput workflow is set up.

1. The workflow is designed around the text input files.  When you run a single calculation from the text files, each input parameter can have only a single value.  This is indicated by the first "key" value needing to be unique.  Alternatively, the input content can be given in Python as a flat dict where each term has a single value.

2. High throughput runs of the calculation can then be specified by giving multiple values for some of the inputs.  For simple ht operations, this can be done using a text input where some input parameter terms show up on repeated lines.  Alternatively, the dict version of the input can have lists of values.

3. When defining the Calculation class, the input parameters are divided into sets depending on if they are allowed to have multiple values, and if so if they should be iterated in conjunction with any other parameters.  Additional metadata also indicates which input fields should be used to compare for uniqueness.

4. The lists of values can also be automatically built based on existing records/calculations in the database.  This allows for the design of workflows consisting of multiple calculations.  This does, however, require users to define the "buildcombos" functions to use to map records to input values.

5. Running the calculations is then a 2-step process: "prepare" generates the set of new calculations to run, and "runner" sequentially executes the calculations.  Multiple runners can operate on the same set of prepared calculations allowing for free control on how many jobs are currently active and how many cores each job uses.

6. Finally, for established multi-calculation workflows, "master_prepare" branches can be defined for a Calculation that collect the standard input + buildcombos sets.  In this way, the default values are specified and the user only has to give modifications. 