# ASE databases

**ase-db: A database for ASE calculations**

The ASE database feature offers a powerful way to explore and manage batches of calculations. It can be very convenient to parse a number of output files, scattered across a directory structure, to form a single file with all the useful data. 

**ASE supports three backends:**
    
* `SQLite3` is fully-featured relational database system which stores databases in local files. It's fast and widely-used for data serialisation in software. The file format is binary. You can inspect and modify ASE-generated SQLite3 files with regular sqlite3 tools, but some parameters (e.g. atomic positions) are stored as binary blobs.
* `JSON` is a simple text-based format for data serialisation. This can be a good choice for long-term archiving and publication support. It will always be readable, but it's human-friendliness is overrated...
* `PostgreSQL` is a more traditional server-hosted relational database system. This might be suitable for a group sharing data, but the configuration is correspondingly more complex.

**Resources:**
* For a more exhaustive overview see official ASE tutorial: https://wiki.fysik.dtu.dk/ase/ase/db/db.html#querying
* Nice notebook with an overview of the topic, though it uses old python 2 syntax: https://github.com/WMD-group/ASE-Tutorials/blob/master/ase-db/ase-db.ipynb
* ASE example of incorporating databases into the workflow: https://wiki.fysik.dtu.dk/ase/ase/db/db.html?highlight=database - sqlite3 library might not be readily available on all python versions on the supercomputers! (e.g. HAWK has that issue)
* Server side databases using ASE db: https://ase-workshop.materialsmodeling.org/assets/talks/kirsten-t-winther.pdf


To get into the details, let's create some toy models.

Firstly we will create H_2, CO_2 and O_2 molecules and calculate their potential energy with the EMT calculator:

In [1]:
from ase.build import molecule
from ase.calculators.emt import EMT


H2 = molecule("H2")
CO2 = molecule("CO2")
O2 = molecule("O2")

# Make the toy models more realistic, attach calculators and calculate energy
for i in [H2, CO2, O2]:
    i.calc = EMT()
    print(i.symbols, ":", i.get_potential_energy(), "eV")

H2 : 1.1588634908331514 eV
CO2 : 0.8862766630335832 eV
O2 : 0.9226813260843298 eV


Now we have some data, let's put this information into a database.

To get in touch with a database, we need to use the `connect()` function from `ase.db`. By default, data is appended to the database; if we want to start a new databasse, we must set `append=False` during initialisation (Note: this will overwrite existing data!)

Importantly, as written we don't need to repeatedly open/close the database to add data - `with` allows us to keep the database object open for as long as needed to add all the information, and then automatically closes once the code segment is finished. This is computationally efficient (and makes sure we close the output, which is good practice!)

In [2]:
from ase.db import connect

with connect("test_database.db", append=False) as db:
    print("Database is now connected:", db) # Now we have a db object

Database is now connected: <ase.db.sqlite.SQLite3Database object at 0x7fdacec8af70>


Now we've opened the database, we indent our actions and add data with `db.write()`. Here, we natively pass in the atoms object, plus any additional data we want - in this case we've included some `comment`, and also for hydrogen a dictionary with the H-H bond distance and the initial structure.

In [3]:
    import numpy as np # Needed for bond distance calculation
    db.write(H2, comment="hydrogen",
             data={'HH_distance':np.linalg.norm(H2[0].position-H2[1].position),
                   'initial_structure':molecule("H2")})
    db.write(CO2, comment="carbon dioxide")
    db.write(O2, comment="oxygen")

3

In the next code block, as the indent has gone, the database has closed and we can start to look at extracting information from the database. 

In the simplest case, we can use the `ase.io` functionality to `read()` the database and extract `atoms` objects with information of interest (e.g. `comment=oxygen`). You can then e.g. visualise any selected atoms object (as well as open natively with ASE GUI using the same syntax)

In [4]:
from ase.io import read
# Find entries named "oxygen"
atoms_list = read("test_database.db@comment=oxygen")
print("Oxygen", [i.symbols for i in atoms_list])

# Find all entries containing oxygen (oxygen > 0)
atoms_list = read("test_database.db@O>0")
print("Oxygen containing species", [i.symbols for i in atoms_list])

Oxygen [Symbols('O2')]
Oxygen containing species [Symbols('CO2'), Symbols('O2')]


Whilst this functionality is nice, the database contains more information and we can access this by finding a row of interest (which is in the form of an `AtomsRow` object under the hood)

Let's connect to the database again to read some data, firstly finding IDs for hydrogen molecules:

In [5]:
with connect("test_database.db") as db:

    # Retrieving an AtomsRow entry
    for row in db.select("H2"):
        print("Id of entry containing H2:", row.id)

Id of entry containing H2: 1


And so with this we can see the ID of the object for H_2. Let's access this object and see all the keys (i.e. data types) we can extract from the database for each entry:

In [6]:
        H2_keys = db.get(row.id)
        print("Methods: ", [key for key in H2_keys])

Methods:  ['comment', 'id', 'unique_id', 'ctime', 'mtime', 'user', 'numbers', 'positions', 'cell', 'pbc', 'calculator', 'calculator_parameters', 'energy', 'forces']


Notice that `data` is not among the keys printed, despite being saved above! Data entries allow more flexibility for storage of information - i.e. anything can be stored - but are not easily queried. If you know it exists, one can easily access the information stored there:

In [7]:
    print("Dictionary of all additional data:", row.data)
    print("HH_distance is:", row.data.get("HH_distance"), "Angstrom")

Dictionary of all additional data: {'HH_distance': 0.737166, 'initial_structure': Atoms(symbols='H2', pbc=False)}
HH_distance is: 0.737166 Angstrom


As a final note, as well as through `python3`, one can also easily engage wih the data by typing in the terminal/command line:
<br>
*ase db filename.db -w*

This allows one to inspect all data contained in the database in an interactive way in your browser! Just navigate to http://0.0.0.0:5000/ in your browser after running the command.