# Workflow Manager

This Notebook provides a means of managing preparing and/or running iprPy calculations.  

Note that this Notebook mostly outlines the code/steps associated with with work and is *not* the most optimal means of preparing and running. This is especially true for runners as ideally each runner should be a truly separate process to observe and manage how each is behaving.

For preparing and running on a cluster, the suggestions are:
- Copy the prepare cells below that you wish to use into a Python script.  Submit a job to the cluster for the prepare script.  It only needs to be a serial process, but can take a long time depending on how many calculations are being prepared.
- Submit separate jobs for each runner you wish to be active.  These can easily be based on the iprPy runner command line.

Example prepare Python scripts that correspond to the content below can be found in the bin/prepare/ directory of the iprPy repository.

In [1]:
# import libraries
import numpy as np
import potentials
import atomman as am

# https://github.com/usnistgov/iprPy
import iprPy
print('iprPy version', iprPy.__version__)

iprPy version 0.11.2


---

## 1. Load the database

The database to use where the calculation records will be added to and to search for existing calculations to skip.

In [2]:
database = iprPy.load_database('master')
print(database)

database style mongo at localhost:27017.iprPy


---

## 2. Define global prepare terms

All prepare terms are collected into a dictionary making it easy to pass along to the underlying prepare methods.

In [3]:
prepare_terms = {}

### 2.1. Executable terms

These are basic terms that specify executables and some options that are required by most calculations in the workflow

- __lammps_command__ is the primary LAMMPS executable to use.
- __mpi_command__ is the MPI command to use.  Leave {np_per_runner} as a variable.

In [4]:
prepare_terms['lammps_command'] =        'E:/LAMMPS/2020-03-03/bin/lmp_mpi'
prepare_terms['mpi_command'] =           'mpiexec -localonly {np_per_runner}'

### 2.2. Old LAMMPS executables (optional)

Some older implementations of potentials will no longer work with the most current version of LAMMPS.  These options allow for alternate LAMMPS executables to be automatically selected as needed.  Note that this is only important if you want to compare the different versions of a given potential as all current active potentials in the NIST database are compatible with the newest LAMMPS.

- __lammps_command_snap_1__: SNAP version 1 needs LAMMPS between 8 Oct 2014 and 30 May 2017.
- __lammps_command_snap_2__: SNAP version 2 needs LAMMPS between 3 Dec 2018 and 12 June 2019.
- __lammps_command_old__: Some older implementations of potentials need LAMMPS before 30 Oct 2019.

In [5]:
prepare_terms['lammps_command_snap_1'] = 'E:/LAMMPS/2017-01-27/bin/lmp_mpi'
prepare_terms['lammps_command_snap_2'] = 'E:/LAMMPS/2019-06-05/bin/lmp_mpi'
prepare_terms['lammps_command_old'] =    'E:/LAMMPS/2019-06-05/bin/lmp_mpi'

---

## 3. Specify the LAMMPS potentials to use

Most master prepare operations use buildcombos functions that build records based on all or a selection of interatomic potentials.  In the buildcombos functions, this is achieved by calling database.potdb.get_lammps_potentials() and iterating over all of the returned results.  To limit which potentials are used, any terms in prepare_terms that start with "potential_" are passed to the underlying get_lammps_potentials() as kwargs.  Then, only calculations that correspond to the matching LAMMPS potentials are prepared.

In practice, however, it was found that trying to prepare calculations for a large number of potentials all at once is problematic.  Too much time is spent building the parameter combinations to check before any new calculations are even prepared.  To get around this, the code below

- Calls database.potdb.get_lammps_potentials() directly to obtain a list of the potentials matching the conditions wanted.
- Saves the ids for the selected potentials to the all_lmppot_ids list.
- Uses the yield_lmppot_ids() function defined below to divide all_lmppot_ids into sets smaller than a specific size, which are then prepared separately.



In [6]:
def yield_lmppot_ids(delta=100):
    """
    This function divides the total interatomic potentials into smaller sets
    for preparing.  This helps avoid having the prepare methods generating
    too many possible calculation variations to test in one go.
    
    Parameters
    ----------
    delta : int, optional
        The number of potentials to prepare at one time.  Default value is 100.
    """
    i=0
    for i in range(delta, len(all_lmppot_ids), delta):
        print(f'Using potential #s {i-delta} to {i-1}\n')
        yield all_lmppot_ids[i-delta:i]
        
    print(f'Using potential #s {i} to {len(all_lmppot_ids)-1}\n')
    yield all_lmppot_ids[i:len(all_lmppot_ids)]

### 3.1. Option #1: Select potential ids to prepare

Most useful get_lammps_potentials() parsing terms:

- __id__ *(str or list*) The unique record id(s) labeling the records to parse by. Or, specify directly as shown below.
- __potid__ *(str or list, optional*) The unique record id(s) labeling the associated potential records to parse by.
- __pair_style__ *(str or list, optional*) LAMMPS pair_style(s) to parse by.
- __status__ *(None, str or list, optional*) Limits the search by the status of the LAMMPS implementations: "active", "superseded" and/or "retracted".
- __symbols__ *(str or list, optional*) Model symbol(s) to parse by.  Typically correspond to elements for atomic potential models.
- __elements__ *(str or list, optional*) Element(s) in the model to parse by.

In [10]:
lmppots, lmppots_df = database.potdb.get_lammps_potentials(return_df = True,
    status = 'active', # 'active' does current potential versions, None does all (old and bad versions as well)
    #potid = ['1999--Mishin-Y-Farkas-D-Mehl-M-J-Papaconstantopoulos-D-A--Ni'],
    #pair_style = ['eam', 'eam/alloy', 'eam/fs'],
    symbols = ['Cu'],
)
all_lmppot_ids = np.unique(lmppots_df.id).tolist()
print(len(all_lmppot_ids), 'potential ids found')

101 potential ids found


### 3.2. Option #2: Specify potential ids directly

Or, if you already know which potential implementations you want to use, you can specify them directly.

In [7]:
# Potential settings
all_lmppot_ids = [
    '2019--Plummer-G--Ti-Al-C--LAMMPS--ipr1',
    '2019--Plummer-G--Ti-Si-C--LAMMPS--ipr1',
    '2021--Plummer-G--Ti-Al-C--LAMMPS--ipr1',
#    '2022--Hiremath-P--W--LAMMPS--ipr1',
#    '2022--Mendelev-M-I--Ni-Nb--LAMMPS--ipr1'
]
print(len(all_lmppot_ids), 'potential ids found')

3 potential ids found


---

## 4. Prepare pools

The prepared calculations are divided into separate "pools" based on the calculation type and where in the global NIST workflow that they are positioned.

By default, the iprPy runner methods are all assigned a set number of processors to work with, and then runs through the calculations in a pool by randomly selecting them. As such, individual pools should be used for different steps along the workflow, as well as for calculations that will be assigned different numbers of processors.

The prepare options associated with each pool are

- __styles__ lists the iprPy calculation styles to prepare in the pool.  By default, these will use the pre-defined "main" branch, but alternate branches can be selected by giving the branch name after a :.
- __run_directory__ is the name of the specific run directory where the pool is located.  All prepared calculations will be created in this run directory.
- __np_per_runner__ is the number of processors each runner will be assigned to use for the underlying simulations.
- __num_lmppot_ids__ is the number of potentials to prepare at a given time.

Each pool section provides a description of the pair styles and what prepare options are set as the default by the master prepare method.  Any calculation-specific prepare values, including those modified by master prepare, can be directly changed by adding terms to prepare_terms.

**NOTE** Some pools use results from earlier pools as inputs meaning that all calculation combinations may not be generated until all parent calculations are finished.  This means that you either need to wait to run the Jupyter cell untill all parent calculations have finished, or run the cell multiple times as parent cells are performed.

### 4.1. Pool #1: Basic potential evaluations and scans

These are basic potential evaluation methods and initial energy scans.  None of these are prepared based on inputs from any other calculation. 

#### isolated_atom
Evaluates the energy of a single atom in isolation.

- buildcombos lammpspotential potential_file intpot

#### diatom_scan
Evaluates the energy of a pair of atoms at various interatomic spacings.

- buildcombos diatom potential_file intpot
- minimum_r 0.02 angstrom
- maximum_r 10.0 angstrom
- number_of_steps_r 500

#### E_vs_r_scan
Evaluates the energy of crystal prototypes subjected to a volumetric scan.

- buildcombos crystalprototype load_file prototype
- sizemults 10 10 10
- minimum_r 0.5 angstrom
- maximum_r 6.0 angstrom
- number_of_steps_r 276

#### E_vs_r_scan:bop
Is a variation of E_vs_r_scan specifically for bop potentials where the minimum r value is increased.

- buildcombos crystalprototype load_file prototype
- prototype_potential_pair_style bop
- sizemults 10 10 10
- minimum_r 2.0 angstrom
- maximum_r 6.0 angstrom
- number_of_steps_r 201

In [8]:
# Specify master prepare options
styles = [
    'isolated_atom',
    'diatom_scan',
    'E_vs_r_scan:bop',
    'E_vs_r_scan',
]
run_directory = 'master_1'
np_per_runner = '1'
num_lmppot_ids = 100

# Setup and run master_prepare
prepare_terms['styles']        = ' '.join(styles)
prepare_terms['run_directory'] = run_directory
prepare_terms['np_per_runner'] = np_per_runner
for lmppot_ids in yield_lmppot_ids(num_lmppot_ids):
    prepare_terms['potential_id'] = lmppot_ids
    database.master_prepare(**prepare_terms)

Using potential #s 0 to 2

Preparing calculation isolated_atom branch main
1003 existing calculation records found
3 matching interatomic potentials found
3 calculation combinations to check
0 new records to prepare

Preparing calculation diatom_scan branch main
2169 existing calculation records found
3 matching interatomic potentials found
18 calculation combinations to check
0 new records to prepare

Preparing calculation E_vs_r_scan branch bop
28188 existing calculation records found
19 matching crystal prototypes found
0 matching interatomic potentials found
1 invalid calculations skipped
0 calculation combinations to check

Preparing calculation E_vs_r_scan branch main
28188 existing calculation records found
19 matching crystal prototypes found
3 matching interatomic potentials found
252 calculation combinations to check
0 new records to prepare



### 4.2. Pool #2: Crystal relaxations

These perform crystal structure relaxations based on a guess structure and an interatomic potential.

#### relax_box
Relaxes a crystal structure by only altering box dimensions to zero pressure while keeping all atoms in the same box-relative positions.

- buildcombos atomicreference load_file reference
- buildcombos atomicparent load_file parent
- parent_record calculation_E_vs_r_scan
- parent_load_key minimum-atomic-system
- parent_status finished
- sizemults 10 10 10
- atomshift 0.05 0.05 0.05
- strainrange 1e-6

#### relax_static
Relaxes a crystal structure using energy/force minimization plus a simultaneous box relax.

- buildcombos atomicreference load_file reference
- buildcombos atomicparent load_file parent
- parent_record calculation_E_vs_r_scan
- parent_load_key minimum-atomic-system
- parent_status finished
- sizemults 10 10 10
- atomshift 0.05 0.05 0.05
- energytolerance 0.0
- forcetolerance 1e-10 eV/angstrom
- maxiterations 10000
- maxevaluations 100000
- maxatommotion 0.01 angstrom
- maxcycles 100
- cycletolerance 1e-10

#### relax_dynamic
Relaxes a crystal structure using a nph barrostat plus a Langevin thermostat set at 0 K.  This evolves the system while dampening out forces over time.

- buildcombos atomicreference load_file reference
- buildcombos atomicparent load_file parent
- parent_record calculation_E_vs_r_scan
- parent_load_key minimum-atomic-system
- parent_status finished
- sizemults 10 10 10
- atomshift 0.05 0.05 0.05
- temperature 0.0
- integrator nph+l
- thermosteps 1000
- runsteps 10000
- equilsteps 0

In [9]:
# Specify master prepare options
styles = [
    'relax_box',
    'relax_static',
    'relax_dynamic',
]
run_directory = 'master_2'
np_per_runner = '1'
num_lmppot_ids = 100

# Setup and run master_prepare
prepare_terms['styles']        = ' '.join(styles)
prepare_terms['run_directory'] = run_directory
prepare_terms['np_per_runner'] = np_per_runner
for lmppot_ids in yield_lmppot_ids(num_lmppot_ids):
    prepare_terms['potential_id'] = lmppot_ids
    database.master_prepare(**prepare_terms)

Using potential #s 0 to 2

Preparing calculation relax_box branch main
131889 existing calculation records found
6587 matching atomic references found
3 matching interatomic potentials found
3 matching interatomic potentials found
252 matching atomic parents found
1010 calculation combinations to check
0 new records to prepare

Preparing calculation relax_static branch main
175836 existing calculation records found
6587 matching atomic references found
3 matching interatomic potentials found
3 matching interatomic potentials found
252 matching atomic parents found
1010 calculation combinations to check
0 new records to prepare

Preparing calculation relax_dynamic branch main
102839 existing calculation records found
6587 matching atomic references found
3 matching interatomic potentials found
3 matching interatomic potentials found
252 matching atomic parents found
1010 calculation combinations to check
0 new records to prepare



### 4.3. Pool #3: Further crystal relaxations

This performs further crystal relaxations on the results of pool #2. 

#### relax_static:from_dynamic
Takes the resulting structures of relax_dynamic and subjects them to an energy/force minimization plus box relaxation.

- buildcombos atomicarchive load_file archive
- archive_record calculation_relax_dynamic
- archive_branch main
- archive_load_key final-system
- archive_status finished
- sizemults 1 1 1
- energytolerance 0.0
- forcetolerance 1e-10 eV/angstrom
- maxiterations 10000
- maxevaluations 100000
- maxatommotion 0.01 angstrom
- maxcycles 100
- cycletolerance 1e-10

In [None]:
# Specify master prepare options
styles = [
    'relax_static:from_dynamic'
]
run_directory = 'master_3'
np_per_runner = '1'
num_lmppot_ids = 100

# Setup and run master_prepare
prepare_terms['styles']        = ' '.join(styles)
prepare_terms['run_directory'] = run_directory
prepare_terms['np_per_runner'] = np_per_runner
for lmppot_ids in yield_lmppot_ids(num_lmppot_ids):
    prepare_terms['potential_id'] = lmppot_ids
    database.master_prepare(**prepare_terms)

### 4.4. Pool #4: Crystal space group analysis

These evaluate the crystal space group information for the relaxed structures computed above and for the initial prototype and DFT structures used.

#### crystal_space_group:prototype
Evaluates the crystal space group information for the prototype structures.  Only needs to be done once per prototype.

- buildcombos crystalprototype load_file proto


#### crystal_space_group:reference
Evaluates the crystal space group information for DFT relaxed structures.  Only needs to be done once per structure.

- buildcombos atomicreference load_file ref


#### crystal_space_group:relax
Takes the resulting structures of relax_dynamic and subjects them to an energy/force minimization plus box relaxation.

- buildcombos atomicarchive load_file archive1
- buildcombos atomicarchive load_file archive2
- archive1_record calculation_relax_static
- archive1_load_key final-system
- archive1_status finished
- archive2_record calculation_relax_box
- archive2_load_key final-system
- archive2_status finished

In [10]:
# Specify master prepare options
styles = [
    #'crystal_space_group:prototype',
    #'crystal_space_group:reference',
    'crystal_space_group:relax',
]
run_directory = 'master_4'
np_per_runner = '1'
num_lmppot_ids = 100

# Setup and run master_prepare
prepare_terms['styles']        = ' '.join(styles)
prepare_terms['run_directory'] = run_directory
prepare_terms['np_per_runner'] = np_per_runner
for lmppot_ids in yield_lmppot_ids(num_lmppot_ids):
    prepare_terms['potential_id'] = lmppot_ids
    database.master_prepare(**prepare_terms)

Using potential #s 0 to 2

Preparing calculation crystal_space_group branch relax
194852 existing calculation records found
583 matching atomic archives found
798 matching atomic archives found
1381 calculation combinations to check
1381 new records to prepare



### 4.5.  Further styles coming soon...

## 5. Runner

Once calculations have been prepared, you can then start runner jobs to perform them.  

Options for managing runners:

- Use the cell below to call runner() for the database.  This will perform one calculation at a time until finished or stopped.  Not recommended unless you only want one runner active at any given time.
- Open a separate terminal for each runner you wish to be active and call the "iprPy runner" command with the specific database and run directory you want each to use.  This isolates each runner in action and allows for runners to operate on the same or different databases and run directories.
- Submit runner jobs to a cluster that has access to the run directory and the database.


In [None]:
database.runner(run_directory='master_4')

Runner started with pid 56228
5d73c2f8-2825-4e66-a2c7-9929633404f5
sim calculated successfully

08b57466-c5cf-4354-9c93-a9f2d618863c
sim calculated successfully

9a378027-2556-4bea-9f29-b4f5231cf40d
sim calculated successfully

9f3e62d9-7ec9-42e8-99a8-b75e17eb4659
sim calculated successfully

ab93a7f9-0b8c-46f2-bf1e-f86b108e7194
sim calculated successfully

5354b670-b22f-4016-a69d-331cf907208e
sim calculated successfully

23fedbb6-2196-4e48-860e-3bd1ec20e148
sim calculated successfully

1c34eddc-90e5-404c-9fb1-d2b342787067
sim calculated successfully

339fcfaf-f1e4-4a75-a59b-13676908f5a3
sim calculated successfully

24d85c3b-6972-402e-90b7-ae815701fcca
sim calculated successfully

35427d61-7107-4896-b775-9d7ced3483f8
sim calculated successfully

7a20020c-0048-4ba3-8ccc-2547193311eb
sim calculated successfully

9db98b3c-0058-4b5a-a78a-18302b01d469
sim calculated successfully

539dadbd-173a-4da5-8727-1a3f54a02f16
sim calculated successfully

192a7a34-d662-4170-85e5-4caf338edbfa
sim calcu