# Database Control

This Notebook oversees commands related to control of the calculation database that is created for high-throughput calculations. This includes:

1. Defining databases for easy access.

2. Specifying the local run_directories where calculations will be placed/performed.

3. Uploading/updating the reference records to a database based on the iprPy/library.

4. Checking the number and status of records within a database.

5. Cleaning records in a database by resetting errored calculations and removing excess \*.bid files.

6. Copying/removing database records.

7. Forgetting stored database information.

__Global workflow details:__

The commands offered by this Notebook are outside the global workflow, with the exception that new databases can be defined here before use in the other Notebooks.

**Library imports**

In [1]:
# Standard Python libraries
from __future__ import (print_function, division, absolute_import,
                        unicode_literals)

# https://github.com/usnistgov/iprPy
import iprPy
print('iprPy version', iprPy.__version__)

iprPy version 0.8.3


## 1. Define databases

Settings for accessing databases can be stored under simple names for easy access later.

The **list_databases()** function returns a list of all of the names for the stored databases.

In [2]:
print(iprPy.list_databases())

['test', 'iprhub-local', 'iprhub', 'Al_for_Kamal', 'PN', 'fccedge', 'master', 'demo', 'potentials', 'master_local']


The **set_database()** function allows for database access information to be saved under a simple name.

In [3]:
# DEfine local mongo database to save files to
iprPy.set_database(name='master', style='mongo', host='localhost', port=27017, database='iprPy')

Database master already defined.
Overwrite? (yes or no): yes


In [3]:
# Specify remote MDCS database to save files to
host = 'https://iprhub.nist.gov/'
user = 'lmh1'
pswd = 'C:/Users/lmh1/Documents/iprhub/iprhub_password.txt'
cert =  'C:/Users/lmh1/Documents/iprhub/2019-04-iprhub-ca.pem'

# Define mdcs database called iprhub
iprPy.set_database(name='iprhub', style='mdcs', host=host, user=user, pswd=pswd, cert=cert)

Database iprhub already defined.
Overwrite? (yes or no): yes


In [7]:
# Define local database called master_local
host = 'C:/Users/lmh1/Documents/calculations/ipr/master'
iprPy.set_database(name='master_local', style='local', host=host)

Enter the database's host: C:/Users/lmh1/Documents/calculations/ipr/master
Enter any other database parameters as key, value
Exit by leaving key blank
key: 


Test that the database is set

In [2]:
master = iprPy.load_database('master')
print(master)

database style mongo at localhost:27017.iprPy


In [4]:
iprhub = iprPy.load_database('iprhub')
print(iprhub)

database style mdcs at https://iprhub.nist.gov/


In [5]:
local = iprPy.load_database('master_local')
print(local)

database style local at C:\Users\lmh1\Documents\calculations\ipr\master


## 2. Define run directories

The high-throughput calculations are prepared and executed using local directories.  The paths to these directories can be saved and stored using simple names for easy access later.

The **list_run_directories()** function returns a list of all of the names for the stored run directories.

In [5]:
print(iprPy.list_run_directories())

['test1', 'test2', 'test3', 'test4', 'iprhub-local1', 'iprhub-local2', 'iprhub-local3', 'iprhub-local4', 'iprhub1', 'iprhub2', 'iprhub3', 'iprhub4', 'Al_for_Kamal_1', 'fccedge1', 'master_1', 'demo_1', 'demo_2', 'demo_3', 'demo_4', 'master_4', 'master_2', 'master_3']


The **set_run_directory()** function allows for a local run directory to be saved under a simple name. For best functionality, each run_directory should be for a unique database and number of cores.

In [6]:
# Define running directories for up to four cores
torun = 'C:\\Users\\lmh1\\Documents\\calculations\\ipr\\torun\\master\\'
iprPy.set_run_directory(name='master_1', path=torun + '1')
iprPy.set_run_directory(name='master_2', path=torun + '2')
iprPy.set_run_directory(name='master_3', path=torun + '3')
iprPy.set_run_directory(name='master_4', path=torun + '4')

run_directory master_1 already defined.
Overwrite? (yes or no): no
run_directory master_4 already defined.
Overwrite? (yes or no): no


The **load_run_directory()** function accesses the stored directory path associated with a run directory's name.

In [7]:
run_directory = iprPy.load_run_directory('master_1')
print(run_directory)

C:\Users\lmh1\Documents\calculations\ipr\torun\master\1


## 3. Build database by copying reference records into it

The **build_refs()** method copies the reference records in iprPy/library to the database for use in high-throughput calculations.

Parameters

- **lib_directory** (*str, optional*) The directory path for the library.  If not given, then it will use the iprPy/library directory.

- **refresh** (*bool or list, optional*) If False (default) only new reference records are added.  If True, all existing reference records are refreshed by deleting the current ones in the database and uploading the references in lib_directory.  If a list is given, then only the reference record styles named in the list are refreshed.

Upload library records to database

In [3]:
database = master
refresh = False
#refresh = ['potential_LAMMPS']

database.build_refs(refresh=refresh)

## 4. Check record numbers and status

The **check_records()** method checks how many records of a given style are stored in the database.  If the record is a calculation record, it will also display how many are unfinished, issued errors, or have successfully finished.

In [4]:
database.check_records('potential_LAMMPS')

In database style mongo at localhost:27017.iprPy :
- 283 of style potential_LAMMPS


In [5]:
database.check_records('calculation_relax_box')

In database style mongo at localhost:27017.iprPy :
- 35997 of style calculation_relax_box
 - 24883 are complete
 - 0 still to run
 - 11114 issued errors


## 5. Copy records between databases

The **copy_records()** method copies records from the current database to another database.  Either a list of records 

Parameters
        
- **dbase2** (*iprPy.Database*) The database to copy to.

- **record_style** (*str, optional*) The record style to copy.  If record_style and records not given, then the available record styles will be listed and the user prompted to pick one.  Cannot be given with records.

- **records** (*list, optional*) A list of iprPy.Record objects from the current database to copy to dbase2.  Allows the user full control on which records to copy/update.  Cannot be given with record_style.

- **includetar** (*bool, optional*) If True, the tar archives will be copied along with the records. If False, only the records will be copied. (Default is True).

- **overwrite** (*bool, optional*) If False (default) only new records and tars will be copied. If True, all existing content will be updated.

In [9]:
source_database = local
dest_database = master

record_style = 'calculation_E_vs_r_scan'
#records = source_database.get_records(...)

source_database.copy_records(dest_database, 
                             record_style=record_style,
                             #records=records,
                             includetar=True,
                             overwrite=True,
                            )

947 records to try to copy
947 records added/updated
947 tars added/updated


In [6]:
record = local.get_record(name='0a31c1dc-0377-406e-a6b4-36843e4cd39c')

In [7]:
tar = local.get_tar(record=record, raw=True)

In [8]:
master.add_tar(record=record, tar=tar)

## 5. Clean calculation records

The **clean_records()** method resets errored calculations of a specified record style.  Cleaning a record style means:

- Resetting any calculations that issued errors back into a run_directory

- Removing any .bid files in the calculation folders in the run_directory

This is useful to resetting and rerunning calculations that may have failed for reasons external to the calculation's method.  E.g. runners terminated early, parameter conflicts for a limited number of potentials, debugging calculations.

__WARNING:__ Conflicts may occur if you clean a run_directory that active runners are operating on as the .bid files are used to avoid multiple runners working on the same calculation at the same time.

In [11]:
#database.clean_records('calculation_E_vs_r_scan', 'demo_1')

## 6. Destroy calculation records

The **destroy_records()** method deletes all records of a specified style.  Useful if you want to reset any library records or rerun calculations with different parameters. 

**WARNING:** This is a permanent delete even for local database styles.

In [12]:
#database.destroy_records('calculation_E_vs_r_scan')

## 7. Forget database information

The **unset_database()** and **unset_run_directory()** functions will remove the saved settings for the databases. 

**NOTE:** Only the stored access information is removed as the records in a database and files in a run_directory will remain.

In [13]:
# Clear out existing definitions
#iprPy.unset_database(name='demo')
#iprPy.unset_run_directory(name='demo_1')
#iprPy.unset_run_directory(name='demo_2')
#iprPy.unset_run_directory(name='demo_3')
#iprPy.unset_run_directory(name='demo_4')