# ASE databases

**ase-db: A database for ASE calculations**

The ase database features offer a powerful way to explore and manage batches of calculations. It can be very convenient to parse a number of output files, scattered across a directory structure, to form a single file with all the useful data. 

**ASE supports three backends:**
    
* SQLite3 is fully-featured relational database system which stores databases in local files. It's fast and widely-used for data serialisation in software. The file format is binary. You can inspect and modify ASE-generated SQLite3 files with regular sqlite3 tools, but some parameters (e.g. atomic positions) are stored as binary blobs.
* JSON is a simple text-based format for data serialisation. This can be a good choice for long-term archiving and publication support. It will always be readable, but it's human-friendliness is overrated...
* PostgreSQL is a more traditional server-hosted relational database system. This might be suitable for a group sharing data, but the configuration is correspondingly more complex.

**Resources:**
* For a more exhaustive overview see official ASE tutorial: https://wiki.fysik.dtu.dk/ase/ase/db/db.html#querying
* Nice notebook with an overview of the topic, though it uses old python 2 syntax: https://github.com/WMD-group/ASE-Tutorials/blob/master/ase-db/ase-db.ipynb
* ASE example of incorporating databases into the workflow: https://wiki.fysik.dtu.dk/ase/ase/db/db.html?highlight=database - sqlite3 library might not be readily available on all python versions on the supercomputers! (e.g. HAWK has that issue)
* Server side databases using ASE db: https://ase-workshop.materialsmodeling.org/assets/talks/kirsten-t-winther.pdf


Let's create some toy models:

In [1]:
from ase.db import connect
from ase.build import molecule
from ase.calculators.emt import EMT
import numpy as np

H2 = molecule("H2")
CO2 = molecule("CO2")
O2 = molecule("O2")

# Make the toy models more realistic, attach calculators and calculate energy
for i in [H2, CO2, O2]:
    i.calc = EMT()
    i.get_potential_energy()


In [2]:
# By default you are appending entries to the database
# we use the append=False key for the purpose of this example
# to avoid duplicated entries in the database

# by using with connect(...) we can add all relevant entries 
# and only access the file once, we avoid IO overhead
with connect("test_database.db", append=False) as db:
    # we can add keys with single value or dictionaries after the Atoms object
    db.write(H2, comment="hydrogen",
             data={'HH_distance':np.linalg.norm(H2[0].position-H2[1].position),
                   'initial_structure':molecule("H2")})
    db.write(CO2, comment="carbon dioxide")
    db.write(O2, comment="oxygen")

Extracting information from the database

In [3]:
# the database can be used using the ase.io.read parser as well
# once can also visualize the database using ASE GUI using the same syntax
from ase.io import read
atoms = read("test_database.db@comment=oxygen")
print("Oxygen", atoms)

# Find all entries containing oxygen (oxygen > 0)
atoms_list = read("test_database.db@O>0")
print("Oxygen containing species", atoms_list)


Oxygen [Atoms(symbols='O2', pbc=False, initial_magmoms=..., calculator=SinglePointCalculator(...))]
Oxygen containing species [Atoms(symbols='CO2', pbc=False, calculator=SinglePointCalculator(...)), Atoms(symbols='O2', pbc=False, initial_magmoms=..., calculator=SinglePointCalculator(...))]


In [4]:
# But AtomsRow contains more information for data analysis and can be 
# converted to Atoms object using .toatoms()
db = connect("test_database.db")

# Retrieving an AtomsRow entry
for row in db.select("H2"):
    # Check the id of the entry(ies) you have just selected
    print("Id of entry containing H2:", row.id)

Id of entry containing H2: 1


In [5]:
# Retrieving an AtomsRow entry
for row in db.select("H2"):
    
    # Dealing with database that you do not know?
    # Retrieve all keys available for this entry
    H2_keys = db.get(row.id)
    print("Methods: ", [key for key in H2_keys])

Methods:  ['comment', 'id', 'unique_id', 'ctime', 'mtime', 'user', 'numbers', 'positions', 'cell', 'pbc', 'calculator', 'calculator_parameters', 'energy', 'forces']


Notice that data is not among the keys printed, data entries allow more flexibility for storage
of information, but are not easily queried. One can easily access the information stored there:

In [6]:
# Retrieving an AtomsRow entry
for row in db.select("H2"):
    print("Dictionary of all additional data:", row.data)
    print("HH_distance is:", row.data.get("HH_distance"))


Dictionary of all additional data: {'HH_distance': 0.737166, 'initial_structure': Atoms(symbols='H2', pbc=False)}
HH_distance is: 0.737166


You can also easily engage wih the data by typing in the terminal/command line:
<br>
*ase db filename.db -w*

You can inspect all data contained in the database in an interactive way in your browser. You can access your database by going to http://0.0.0.0:5000/ in your browser.