# Database Control

The Database Manager Notebook oversees commands related to defining database settings for setting up and performing calculation workflows.  At least one database needs to be defined in order to perform the calculation workflows.  In particular, this Notebook allows for 

- Defining databases,
- Specifying the local run_directories where calculations will be placed/performed,
- Copying/uploading reference records from the library to the databases,
- Checking the number and status of records within a database,
- Cleaning records in a database by resetting any that issued errors, and
- Copying/removing database records.

__Global workflow details:__

The commands offered by this Notebook are outside the global workflow, with the exception that new databases can be defined here before use in the other Notebooks.

**Library imports**

In [1]:
from pathlib import Path

# https://github.com/usnistgov/iprPy
import iprPy
print('iprPy version', iprPy.__version__)

settings = iprPy.Settings()

iprPy version 0.10.2


## 1. Define databases

Settings for accessing databases can be stored under simple names for easy access later.

The **list_databases()** function returns a list of all of the names for the stored databases.

In [2]:
print(settings.list_databases)

['master', 'iprhub', 'master_local', 'library_local', 'potentials']


The **set_database()** function allows for database access information to be saved under a simple name.

### 1.1 Mongo database

In [3]:
# Define local mongo database to save files to
settings.set_database(name='master', style='mongo', host='localhost', port=27017, database='iprPy')

Database master already defined.
Overwrite? (yes or no): no


In [4]:
master = iprPy.load_database('master')
print(master)

database style mongo at localhost:27017.iprPy


### 1.2 CDCS database

In [5]:
# Specify remote CDCS database to save files to
host = 'https://potentials.nist.gov/'
user = 'lmh1'
pswd = 'XXXXXXXXXXXXXXXXX'

# Define mdcs database called iprhub
settings.set_database(name='potentials', style='cdcs', host=host, user=user, pswd=pswd)

Database potentials already defined.
Overwrite? (yes or no): no


In [6]:
remote = iprPy.load_database('potentials')
print(remote)

database style cdcs at https://potentials.nist.gov/


### 1.3 Local database

In [7]:
# Define local database called master_local
host = 'e:/calculations/ipr/master'
settings.set_database(name='master_local', style='local', host=host)

Database master_local already defined.
Overwrite? (yes or no): no


In [8]:
local = iprPy.load_database('master_local')
print(local)

database style local at E:\calculations\ipr\master


## 2. Define run directories

The high-throughput calculations are prepared and executed using local directories.  The paths to these directories can be saved and stored using simple names for easy access later.

The **list_run_directories()** function returns a list of all of the names for the stored run directories.

In [9]:
print(settings.list_run_directories)

['master_1', 'master_2', 'master_3', 'master_4', 'master_5', 'master_6', 'master_7', 'master_8']


The **set_run_directory()** function allows for a local run directory to be saved under a simple name. For best functionality, each run_directory should be for a unique database and number of cores.

In [15]:
# Define running directories for up to four cores
torun_root = Path('e:/calculations/ipr/torun')
dbname = 'master'

for i in range(8):
    settings.set_run_directory(name = f'{dbname}_{i+1}', 
                               path = Path(torun_root, dbname, f'{i+1}'))

The **load_run_directory()** function accesses the stored directory path associated with a run directory's name.

In [10]:
run_directory = iprPy.load_run_directory('master_1')
print(run_directory)

E:\calculations\ipr\torun\master\1


In [11]:
database = local

## 3. Build database by copying reference records into it

The **build_refs()** method copies the reference records in iprPy/library to the database for use in high-throughput calculations.

Parameters

- __lib_directory__ (*str or path, optional*) The directory path for the library.  If not given, then it will use the iprPy library directory.
- __refresh__ (*bool or list, optional*) If False (default) only new reference records are added.  If True, all existing reference records are refreshed by deleting the current ones in the database and uploading the references in lib_directory.  If a list is given, then only the reference record styles named in the list are refreshed.
- __include__ (*str or list, optional*) The reference record style(s) to copy to the database.  If not given will upload all record styles found in lib_directory.

In [13]:
database = master
#refresh = False
refresh = False
include = [
    'crystal_prototype',
    'dislocation',
    'free_surface', 
    'point_defect',
    'potential_LAMMPS',
    'reference_crystal',
    #'relaxed_crystal',
    'stacking_fault',
]
#refresh = include = 'dislocation'

database.build_refs(refresh=refresh, include=include)

## 4. Check record numbers and status

The **check_records()** method checks how many records of a given style are stored in the database.  If the record is a calculation record, it will also display how many are unfinished, issued errors, or have successfully finished.

In [14]:
database.check_records('potential_LAMMPS')

In database style mongo at localhost:27017.iprPy:
- 333 of style potential_LAMMPS


In [None]:
database.check_records('calculation_phonon')

## 5. Copy records between databases

The **copy_records()** method copies records from the current database to another database.  Either a list of records 

Parameters
        
- **dbase2** (*iprPy.Database*) The database to copy to.

- **record_style** (*str, optional*) The record style to copy.  If record_style and records not given, then the available record styles will be listed and the user prompted to pick one.  Cannot be given with records.

- **records** (*list, optional*) A list of iprPy.Record objects from the current database to copy to dbase2.  Allows the user full control on which records to copy/update.  Cannot be given with record_style.

- **includetar** (*bool, optional*) If True, the tar archives will be copied along with the records. If False, only the records will be copied. (Default is True).

- **overwrite** (*bool, optional*) If False (default) only new records and tars will be copied. If True, all existing content will be updated.

In [9]:
source_database = local
dest_database = master

record_style = 'calculation_E_vs_r_scan'
#records = source_database.get_records(...)

source_database.copy_records(dest_database, 
                             record_style=record_style,
                             #records=records,
                             includetar=True,
                             overwrite=True,
                            )

947 records to try to copy
947 records added/updated
947 tars added/updated


In [6]:
record = local.get_record(name='0a31c1dc-0377-406e-a6b4-36843e4cd39c')

In [7]:
tar = local.get_tar(record=record, raw=True)

In [8]:
master.add_tar(record=record, tar=tar)

## 5. Clean calculation records

The **clean_records()** method resets errored calculations of a specified record style.  Cleaning a record style means:

- Resetting any calculations that issued errors back into a run_directory

- Removing any .bid files in the calculation folders in the run_directory

This is useful to resetting and rerunning calculations that may have failed for reasons external to the calculation's method.  E.g. runners terminated early, parameter conflicts for a limited number of potentials, debugging calculations.

__WARNING:__ Conflicts may occur if you clean a run_directory that active runners are operating on as the .bid files are used to avoid multiple runners working on the same calculation at the same time.

In [7]:
database.clean_records(record_style='calculation_E_vs_r_scan', run_directory='test_1')

1 records to clean


## 6. Destroy calculation records

The **destroy_records()** method deletes all records of a specified style.  Useful if you want to reset any library records or rerun calculations with different parameters. 

**WARNING:** This is a permanent delete even for local database styles.

In [12]:
#database.destroy_records('calculation_surface_energy_static')

## 7. Forget database information

The **unset_database()** and **unset_run_directory()** functions will remove the saved settings for the databases. 

**NOTE:** Only the stored access information is removed as the records in a database and files in a run_directory will remain.

In [13]:
# Clear out existing definitions
#iprPy.unset_database(name='demo')
#iprPy.unset_run_directory(name='demo_1')
#iprPy.unset_run_directory(name='demo_2')
#iprPy.unset_run_directory(name='demo_3')
#iprPy.unset_run_directory(name='demo_4')