# Database Manager

The Database Manager Notebook oversees commands related to defining database settings for setting up and performing calculation workflows.  At least one database needs to be defined in order to perform the calculation workflows.  In particular, this Notebook allows for 

- Defining databases,
- Specifying the local run_directories where calculations will be placed/performed,
- Copying/uploading reference records to the database,
- Checking the number and status of records within a database,
- Cleaning records in a database by resetting any that issued errors, and
- Copying/removing database records.

__Global workflow details:__

The commands offered by this Notebook are outside the global workflow, with the exception that new databases can be defined here before use in the other Notebooks.

**Library imports**

In [1]:
from pathlib import Path

# https://github.com/usnistgov/iprPy
import iprPy
print('iprPy version', iprPy.__version__)

iprPy version 0.11.2


## 1. Define databases

Settings for accessing databases can be stored under simple names for easy access later.

The **list_databases()** function returns a list of all of the names for the stored databases.

In [2]:
print(iprPy.settings.list_databases)

['potentials_local', 'potentials', 'master', 'iprhub', 'test']


The **set_database()** function allows for database access information to be saved under a simple name.

### 1.1 Mongo database

In [3]:
# Define local mongo database to save files to
iprPy.settings.set_database(name='master', style='mongo', host='localhost', port=27017, database='iprPy')

Database master already defined.
Overwrite? (yes or no): no


In [20]:
master = iprPy.load_database('master')
print(master)

database style mongo at localhost:27017.iprPy


### 1.2 CDCS database

In [19]:
# Specify remote CDCS database to save files to
host = 'https://potentials.nist.gov/'

# Define mdcs database called iprhub
iprPy.settings.set_database(name='potentials', style='cdcs')

Database potentials already defined.
Overwrite? (yes or no): no


In [6]:
remote = iprPy.load_database('potentials')
print(remote)

database style cdcs at https://potentials.nist.gov/


### 1.3 Local database

In [4]:
# Define local database called test
host = 'E:/calculations/ipr/test'
iprPy.settings.set_database(name='test', style='local', host=host)

Database test already defined.
Overwrite? (yes or no): no


In [11]:
local = iprPy.load_database('potentials_local')
print(local)

database style local at C:\Users\lmh1\Documents\library


In [12]:
testdb = iprPy.load_database('test')
print(testdb)

database style local at E:\calculations\ipr\test


## 2. Define run directories

The high-throughput calculations are prepared and executed using local directories.  The paths to these directories can be saved and stored using simple names for easy access later.

The **list_run_directories()** function returns a list of all of the names for the stored run directories.

In [16]:
print(iprPy.settings.list_run_directories)

['master_1', 'master_2', 'master_3', 'master_4', 'master_5', 'master_6', 'master_7', 'master_8', 'iprhub_1', 'iprhub_2', 'iprhub_3', 'iprhub_4', 'iprhub_5', 'iprhub_6', 'iprhub_7', 'iprhub_8', 'test_1', 'test_2', 'test_3', 'test_4', 'test_5', 'test_6', 'test_7', 'test_8']


The **set_run_directory()** function allows for a local run directory to be saved under a simple name. For best functionality, each run_directory should be for a unique database and number of cores.

In [15]:
# Define running directories for up to four cores
torun_root = Path('e:/calculations/ipr/torun')
dbname = 'master'

for i in range(8):
    iprPy.settings.set_run_directory(name = f'{dbname}_{i+1}', 
                                     path = Path(torun_root, dbname, f'{i+1}'))

The **load_run_directory()** function accesses the stored directory path associated with a run directory's name.

In [17]:
run_directory = iprPy.load_run_directory('master_1')
print(run_directory)

E:\calculations\ipr\torun\master\1


## 3. Get reference records

The easiest way to get all of the reference records for the database is to

1. Download/clone https://github.com/lmhale99/potentials-library
2. Define a "local"-style database that points to the directory where #1 was saved.
3. You can then either use that database or copy from there to another database.


**Database.copy_references()** copies all records of the "reference" styles from a source database to another database.

Parameters

- __dest__ (*Database*) The destination database to copy the reference records to.
- __includetar__ (*bool, optional*) If True, the tar archives will be copied along with the records. If False, only the records will be copied. (Default is True).
- __overwrite__ (*bool, optional*) If False (default) only new records and tars will be copied. If True, all existing content will be updated.


In [14]:
# Copy from "local" database, where the git repository is saved
source = local

# To the "test" database defined above
dest = testdb

source.copy_references(dest, includetar=False, overwrite=False)

potential_LAMMPS
476 records to try to copy
0 records added/updated
potential_LAMMPS_KIM
450 records to try to copy
0 records added/updated
crystal_prototype
19 records to try to copy
0 records added/updated
reference_crystal
6587 records to try to copy
0 records added/updated
free_surface
146 records to try to copy
0 records added/updated
stacking_fault
11 records to try to copy
0 records added/updated
point_defect
38 records to try to copy
0 records added/updated
dislocation
6 records to try to copy
0 records added/updated


## 4. Check record numbers and status

The **check_records()** method checks how many records of a given style are stored in the database.  If the record is a calculation record, it will also display how many are unfinished, issued errors, or have successfully finished.

In [15]:
database = testdb

In [16]:
database.check_records('potential_LAMMPS')

In database style local at E:\calculations\ipr\test:
- 476 of style potential_LAMMPS
 - 0 finished
 - 0 not finished
 - 0 issued errors


In [17]:
database.check_records('calculation_phonon')

In database style local at E:\calculations\ipr\test:
- 17 of style calculation_phonon
 - 11 finished
 - 3 not finished
 - 3 issued errors


## 5. Copy records between databases

The **copy_records()** method copies records from the current database to another database.  Either a list of records 

Parameters
        
- **dbase2** (*iprPy.Database*) The database to copy to.

- **record_style** (*str, optional*) The record style to copy.  If record_style and records not given, then the available record styles will be listed and the user prompted to pick one.  Cannot be given with records.

- **records** (*list, optional*) A list of iprPy.Record objects from the current database to copy to dbase2.  Allows the user full control on which records to copy/update.  Cannot be given with record_style.

- **includetar** (*bool, optional*) If True, the tar archives will be copied along with the records. If False, only the records will be copied. (Default is True).

- **overwrite** (*bool, optional*) If False (default) only new records and tars will be copied. If True, all existing content will be updated.

In [21]:
source = master
dest = testdb

record_style = 'calculation_isolated_atom'
#records = source_database.get_records(...)

source.copy_records(dest, 
                    record_style=record_style,
                    #records=records,
                    includetar=True,
                    overwrite=True,
                    )

1003 records to try to copy
1003 records added/updated
1003 tars added/updated


## 5. Clean calculation records

The **clean_records()** method resets errored calculations of a specified record style.  Cleaning a record style means:

- Resetting any calculations that issued errors back into a run_directory

- Removing any .bid files in the calculation folders in the run_directory

This is useful to resetting and rerunning calculations that may have failed for reasons external to the calculation's method.  E.g. runners terminated early, parameter conflicts for a limited number of potentials, debugging calculations.

__WARNING:__ Conflicts may occur if you clean a run_directory that active runners are operating on as the .bid files are used to avoid multiple runners working on the same calculation at the same time.

In [None]:
database = testdb

In [7]:
database.clean_records(record_style='calculation_E_vs_r_scan', run_directory='test_1')

1 records to clean


In [8]:
record = database.get_record('694a24f9-aae6-43d4-9110-5e8c41968da8')

In [7]:
database.clean_records(records=[record])

Select a run_directory:
1 master_1
2 master_2
3 master_3
4 master_4
5 master_5
6 master_6
7 master_7
8 master_8
9 iprhub_1
: 

 4


1 records to clean


## 6. Destroy calculation records

The **destroy_records()** method deletes all records of a specified style.  Useful if you want to reset any library records or rerun calculations with different parameters. 

**WARNING:** This is a permanent delete even for local database styles.

In [22]:
database = testdb

In [23]:
database.destroy_records('calculation_isolated_atom')

1005 records found to be destroyed
Delete records? (must type yes): yes
1005 records successfully deleted


## 7. Forget database information

The **unset_database()** and **unset_run_directory()** functions will remove the saved settings for the databases. 

**NOTE:** Only the stored access information is removed as the records in a database and files in a run_directory will remain.

In [13]:
# Clear out existing definitions
#iprPy.unset_database(name='demo')
#iprPy.unset_run_directory(name='demo_1')
#iprPy.unset_run_directory(name='demo_2')
#iprPy.unset_run_directory(name='demo_3')
#iprPy.unset_run_directory(name='demo_4')