# Getting `ase.db` to do what we want

James Kermode, June 2015

The [ASE database module](https://wiki.fysik.dtu.dk/ase/ase/db/db.html) is a lightweight way to store atomic configurations. However, it doesn't yet support all the features we would like, namely:

 * No support for storing original output files from calculations along with records
 * No command line tool for extracting configurations to files
 * No easy way to include arbitrary per-frame and per-atom data

I've extended `ase.db.core.Database.write()`, `ase.db.row.AtomsRow.toatoms()` to allow all data to be read from the `Atoms.info` and `Atoms.arrays` dictionaries, and added some new command line options to `ase-db` to do what we need:

 * `-o/--store-original-file` - attach original input to database record as a string in the `data` section
 * `-x/--extract-original-file` - attach original input to database record as a string in the `data` section

 * `-W/--write-to-file [type]:filename` - write rows matching a query to files, optionallly specifiying format
 * `-A/--all-data` - include contents of `atoms.info` (per-frame dictionary) and `atoms.arrays` (per-atom dictionary) in database records

## Generate some dummy data

In [0]:
import glob
import numpy as np
from ase.io import read, write
from ase.lattice import bulk
from ase.calculators.singlepoint import SinglePointCalculator

N = 10

!rm dump*.xyz dump*.cif
for i in range(N):
    atoms = bulk('Si', crystalstructure='diamond', a=5.43, cubic=True)
    atoms.rattle()

    # simulate a calculation with random results
    e = np.random.uniform()
    f = np.random.uniform(size=3*len(atoms)).reshape((len(atoms), 3))
    s = np.random.uniform(size=9).reshape((3, 3))
    calc = SinglePointCalculator(atoms, energy=e, forces=f, stress=s)
    atoms.set_calculator(calc)
    f = atoms.get_forces()
    e = atoms.get_potential_energy()

    # add some arbitrary data
    atoms.info['integer_info'] = 42
    atoms.info['real_info'] = 217
    atoms.info['config_type'] = 'diamond'
    atoms.new_array('array_data', np.ones_like(atoms.numbers))

    atoms.write('dump_%03d.xyz' % i, format='extxyz')

## Make a `.db` file and add our `.xyz` files, attaching original files with `-o`

In [0]:
!rm -f tmp.db
for addfile in glob.glob("dump*.xyz"):
    !ase-db tmp.db -Aoa $addfile

In [0]:
!ase-db tmp.db

## Extract records from database in various file formats

* First, let's dump record #1 in `POSCAR` format

In [0]:
!ase-db tmp.db 1 -W vasp:-

* Now, we extract the same record in extended XYZ format, including all the additional data (`-A` option)

In [0]:
!ase-db tmp.db 1 -AW -

 * A more complicated query - we extract all configs with `energy<0.7` and write to separate CIF files

In [0]:
!ase-db tmp.db 'energy>0.7' -W dump_%02d.cif
!ls dump*.cif

## Recover original files with `-x` argument

In [0]:
!rm -f dump*.xyz
!ase-db tmp.db -x
!ls dump*.xyz
!cat dump_000.xyz

## Access via the Python API

In [0]:
from ase.db import connect
con = connect('tmp.db')
for row in con.select('id=1'):
    atoms = row.toatoms(add_to_info_and_arrays=True)
    print atoms, 'original_file', atoms.info['original_file_name']