# Database class

This Notebook outlines the Database class, which allows for users to interact with the https://potentials.nist.gov database.

Currently, the database class provides in-depth support for three primary record types
- __citations__, which contain bibliographic information for the articles associated with interatomic potentials,
- __potentials__, which provides the description of the different interatomic models as it appears on the NIST Interatomic Potentials Repository, and 
- __lammps_potentials__, which provides metadata information for generating LAMMPS input command lines for the LAMMPS compatible potential implementations.

There is also basic support allowing for generic searching and downloading of any other record types hosted there.

Library imports

In [1]:
try:
    # Check if potentials has been installed
    import potentials
except:
    # Install if need be and print message
    !pip install potentials
    print('!!!!! RESTART NOTEBOOK KERNEL TO USE POTENTIALS !!!!!')
     
else:
    # Other imports
    import time
    from pathlib import Path
    from IPython.core.display import display, HTML

## 1. Database initialization

The potentials.Database() class serves as the primary means of accessing records from the potentials database. Depending on the initialization settings, the class can access records from a CDCS instance and/or from a local directory. 

### Option #1: Default 

The default initialization with no parameters will only interact with https://potentials.nist.gov and no records will be loaded into memory.  This is best for lightweight apps that do single searches for potentials as every search requires an http request to the database. 

In [2]:
s = time.time()

db_remote_no_load = potentials.Database()

print(f'init took {time.time()-s} seconds')

init took 0.0 seconds


### Option #2: Loading records from remote

Specifying load=True, or load=[styles] will download all records from https://potentials.nist.gov for the indicated style(s) and store them in memory.  The loading may take a few seconds, but all searches after will be much faster as they will operate on the loaded records.  This is best for involved database explorations on the most current potentials hosted.

Records can also be (re)loaded after initialization with the load_all(), load_citations(), load_potentials() and load_lammps_potentials() methods.

In [5]:
s = time.time()

db_remote_load = potentials.Database(load=True, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 0 local citations
Loaded 0 remote citations
No citations loaded
Loaded 0 local potentials
Failed to load potentials from remote
No potentials loaded
Loaded 0 local LAMMPS potentials
Loaded 0 remote LAMMPS potentials
No LAMMPS potentials loaded
init took 683.3278486728668 seconds


### Option #3: Remote plus local

Giving a localpath allows for the database to interact with a local directory as well as https://potentials.nist.gov. This allows for

- the creation of a local copy of https://potentials.nist.gov by downloading all records, and 
- user-defined/modified potentials to be integrated in with the hosted ones.

NOTE: Searching the local records requires that they first be loaded. During loading, the local records are accessed before the remote ones. If the names of any records from the two locations are the same, the local versions are retained in the loaded set.

In [6]:
# cwd/testdb used here for testing purposes. SET TO REAL PATH FOR REAL OPERATIONS!
localpath = Path('testdb')

In [7]:
s = time.time()

db_remote_local = potentials.Database(localpath=localpath, load=True, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 332 local citations
Loaded 0 remote citations
 - 0 new
Loaded 293 local potentials
Failed to load potentials from remote
Loaded 278 local LAMMPS potentials
Loaded 0 remote LAMMPS potentials
 - 0 new
init took 925.5792889595032 seconds


A local copy of the database (or a limited set of records) can easily be built using the download_all(), download_citations(), download_potentials() and download_lammps_potentials() methods. These methods will save all downloaded records to localpath.

In [8]:
db_remote_local.download_all(verbose=True)

0 citation records copied to localpath


SSLError: HTTPSConnectionPool(host='potentials.nist.gov', port=443): Max retries exceeded with url: /rest/template/5e0a34edab2f7c00263cf2c6/ (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')")))

### Option #4: Local only

If the records are stored locally, then the remote options can be turned off.  This avoids needing to perform any http requests during normal operations. As such, it is much faster and can be used on resources that do not have internet access.

In [None]:
s = time.time()

db_local = potentials.Database(localpath=localpath, load=True, remote=False, verbose=True)

print(f'init took {time.time()-s} seconds')

### Option #5: Authorized user

Uploading of new content to https://potentials.nist.gov is restricted to those who have accounts and write permissions.  Please contact [potentials@nist.gov](potentials@nist.gov) if you are interested in contributing content to the database.

In [None]:
db_auth = potentials.Database(username='lmh1')

## 2. Get methods

The available records can be searched through to find matches using various get methods.

### 2.1 get_citation()

Allows for searching the citations. get_citation() returns a single citation for a given DOI. The function first checks the loaded citations, then checks https://potentials.nist.gov, then checks CrossRef for a match.

In [None]:
cit = db_remote_load.get_citation('10.1016/j.actamat.2003.11.026', verbose=True)

In [None]:
cit = db_remote_no_load.get_citation('10.1016/j.actamat.2003.11.026', verbose=True)

The returned Citation object allows for the content to be retrieved as a bibtex string, a Python dictionary, or as formatted html.

In [None]:
print(cit.bibtex)

In [None]:
cit.asdict()

In [None]:
display(HTML(cit.html()))

### 2.2 get_potentials(), get_potential()

Allows for searching the hosted potentials and viewing the descriptions hosted on the NIST Interatomic Potentials Repository.  get_potentials() always returns a list of matches, while get_potential() returns a single match if exactly one is found and throws an error otherwise.

- __id__ (*str or list, optional*) Potential ID(s) to search for.  These are unique identifiers derived from the publication information and the elemental system being modeled.

- __key__ (*str or list, optional*) UUID4 key(s) to search for.  Each entry has a unique random-generated UUID4 key.

- __author__ (*str or list, optional*) Author string(s) to search for.

- __year__ (*int or list, optional*) Publication year(s) to search for.

- __element__ (*str or list, optional*) Element model(s) to search for.

- __localpath__ (*str, optional*) Path to a local directory to check for records first.  If not given, will check localpath value set during object initialization.  If not given or set during initialization, then only the remote database will be loaded.

- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.


In [None]:
# Get all Mendelev Fe potentials from 2003 and 2007
pots = db_local.get_potentials(author='Mendelev', element='Fe', year=[2003, 2007])
for pot in pots:
    print(pot.id)

In [None]:
# Get the Mishin Al potential from 1999
pot = db_local.get_potential(author='Mishin', element='Al', year=1999)
print(pot.id)

The returned Potential object(s) can be viewed as HTML

In [None]:
display(HTML(pot.html()))

Individual values can also be accessed as attributes. All associated attributes can be seen using asdict()

In [None]:
print(pot.asdict())

In [None]:
print(pot.recorddate)

### 2.3 get_lammps_potentials(), get_lammps_potential()

Allows for searching the LAMMPS potential records that assist with building LAMMPS input command lines, as well as the downloading/locating of LAMMPS potential parameter files.  get_lammps_potentials() always returns a list of matches, while get_lammps_potential() returns a single match if exactly one is found and throws an error otherwise.

- __id__ (*str or list, optional*) The id value(s) to limit the search by.
        
- __key__ (*str or list, optional*) The key value(s) to limit the search by.
        
- __potid__ (*str or list, optional*) The potid value(s) to limit the search by.
        
- __potkey__ (*str or list, optional*) The potkey value(s) to limit the search by.
        
- __status__ (*str or list, optional*) The status value(s) to limit the search by.
        
- __pair_style__ (*str or list, optional*) The pair_style value(s) to limit the search by.
        
- __element__ (*str or list, optional*) The included elemental model(s) to limit the search by.
        
- __symbol__ (*str or list, optional*) The included symbol model(s) to limit the search by.
        
- __verbose__ (*bool, optional*) If True, informative print statements will be used.
        
- __get_files__ (*bool, optional*) If True, then the parameter files for the matching potentials will also be retrieved and copied to the working directory. If False (default) and the parameter files are in the library, then the returned objects' pot_dir path will be set appropriately.

In [None]:
# Get all potentials with the bop pair style
lmppots = db_local.get_lammps_potentials(pair_style='bop')
for lmppot in lmppots:
    print(lmppot.id)

In [None]:
# Get the LAMMPS potential associated with the 1999 Mishin Al potential found above
lmppot = db_local.get_lammps_potential(potkey=pot.key)
print(lmppot.id)

The returned LAMMPSPotential object(s) can be used to generate LAMMPS command lines for the potential.  If the parameter files for the LAMMPS potential are located in the localpath directory, then the command lines will point to the correct location.

In [None]:
print(lmppot.pair_info())

In [None]:
print(lmppot.pair_info(symbols=['Al', 'Al']))

If the parameter files have not been downloaded, or you want to copy them to the working directory, then you can use the get_files setting.

In [None]:
lmppot = db_local.get_lammps_potential(potkey=pot.key, get_files=True, verbose=True)
print(lmppot.pair_info())

## 3. Custom searches

The Database class also provides access to the underlying tools used to perform the database searches.

- __Database.citations_df, Database.potentials_df, and Database.lammps_potentials_df__ if records of the given style have been loaded, then these are pandas.DataFrame containing the dictionary representations of all the loaded record objects.

- __Database.citations, Database.potentials, and Database.lammps_potentials__: if records of the given style have been loaded, then these are numpy arrays containing all the loaded record objects.  The order of the records in these arrays is the same as in the corresponding DataFrames, so any conditional searches on the DataFrame values can be directly applied to these arrays.

- __Database.cdcs__ : the underlying cdcs.CDCS API client. Accessing this directly allows for custom-built queries and rest calls to be constructed.

In [None]:
db_local.lammps_potentials_df

## 4. Other database records

https://potentials.nist.gov contains other records than the three primary potentials-centric styles. The Database class has some basic methods supporting these other record types as well.

- __download_records()__ allowing all records of a given template (style) to be downloaded to the localpath.

- __get_record()__ allowing a single record of a given template to be retrieved by name either from localpath or the remote database.  The retrieved record is returned as a DataModelDict.DataModelDict object.

