# Interface with data storage formats

There are many data storage formats that are widely used for various purposes.
Our code produces a variety of data that can be stored in files for efficiency and, sometimes, readability.
However, the data strutures we use are custom made, and are not directly supported by those existing data format interfaces.
Since we still want the advantage of these formats, the best solution is to have a light weight interface of our own, that can assist storing our own data into those data formats.

As of now, we generally will be using YAML and HDF5 to safely and efficiently store the data produced by our code.
The frame we present here can be used in the future if there is a new data format that becomes the new norm.

## Generic Data Wrapper

This abstract class provides the basic framework for our wrapper class, and defines how the wrapper is expected to work.

The wrapper takes in data in the form of a dictionary, and convert the keys and values into supported data types and store them in the data format it is designed for.

Before taking in the data, the wrapper needs to know the names (keys of dictionary) of the data and how to process them when storing them to or reading them from the data file. Here we refer to the process step "actions": a save action takes a piece of data store them into the data file in a supported format; a load action reads the data out of the data file and convert to the format that's expected in our code.

In addition to actions, as a shortcut, it is also supported to specify data types and link them with a load and a save action.

Meanwhile, since many parts of our code are frequently reused for different purposes, we also want the data wrapper to be easily reused in accordance with the reused of the codes they are associated with. We overloaded the add operator, so that 2 instances of data wrappers can be combined together to store data from both parts into the same file.

The wrapper has 2 ways of load or save data:
1. "dumps" and "loads" save and load data from/to a text string when the data format is in text form.
2. "dump" and "load" save and load data from/to a file, this will be the only way for save/load data if the format is in binary.

## YAMLWrapper

YAML is a very poppular data format, similar to json that is used in Python to storing data in a readable and easily writable form. However, the default yaml module has a lot of shortcomes that make the resulting YAML file not as easily to look at as we want.

Thus, derived from the generic wrapper, this YAML wrapper allows us to store various data from our code into 
YAML in a consistent and readable format, provides more flexibility of writing the file by hand and saves us 
the  trouble of converting the data into the safe types supported by the default yaml engine.

When reading and writing data, the wrapper keeps the data in the same order the names are registered into the wrapper.

In [1]:
from principia_materia import Fraction
from principia_materia.io_interface.yaml_wrapper import YAMLWrapper

Below is a demostration of how to use the YAMLWrapper.
First, let's creat a wrapper to process the following data:
1.  "lattice_vector", an array of floats that represents the basis vectors of a lattice
2.  "point_group", a string, name of the point group of the lattice

In [2]:
yw1 = YAMLWrapper(default_filename="demo1.yml", title="Lattice")
yw1.add_item("lattice_vector", action="2d-array", optional=True)
yw1.add_item("point_group", dtype=str)

This wrapper will allow us process both the lattice vector and the point group, for exmple:

In [3]:
data1 = {
    "lattice_vector": np.array([
        [0.0, 0.5, 0.5],
        [0.5, 0.0, 0.5],
        [0.5, 0.5, 0.0],
    ]),
    "point_group": "Oh"
}

In [4]:
print(yw1.dumps(data1))

# Lattice
lattice_vector:
  [[ 0.00000000,  0.50000000,  0.50000000],
   [ 0.50000000,  0.00000000,  0.50000000],
   [ 0.50000000,  0.50000000,  0.00000000]]
point_group: Oh


In [5]:
yw1.loads("""\
# Lattice
lattice_vector:
  [[ 0.00000000,  0.50000000,  0.50000000],
   [ 0.50000000,  0.00000000,  0.50000000],
   [ 0.50000000,  0.50000000,  0.00000000]]
point_group: Oh""")

OrderedDict([('lattice_vector',
              array([[0. , 0.5, 0.5],
                     [0.5, 0. , 0.5],
                     [0.5, 0.5, 0. ]])),
             ('point_group', 'Oh')])

Now, we can create another wrapper to store the following information:
1. "natoms", number of atoms
2. "orbital", name of the orbital for the atoms, choice between "s", "p" and "d"
3. "qpoint", a q-point, array of Fraction

In [6]:
yw2 = YAMLWrapper(default_filename="demo2.yml", title="Extra lattice info")
yw2.add_item("natoms", action="int")
yw2.add_item("orbital", choice=["s", "p", "d"])
yw2.add_item("qpoint", action="fraction-array")

In [7]:
data2 = {
    "natoms": 3,
    "orbital": "p",
    "qpoint": np.array([Fraction(1, 2), Fraction(0, 1), Fraction(0, 1)]),
}

In [8]:
print(yw2.dumps(data2))

# Extra lattice info
natoms:  3
orbital: p
qpoint: [1/2, 0, 0]


In [9]:
yw2.loads("""\
# Extra lattice info
natoms:  3
orbital: p
qpoint: [1/2, 0, 0]""")

OrderedDict([('natoms', 3),
             ('orbital', 'p'),
             ('qpoint',
              array([Fraction(1, 2), Fraction(0, 1), Fraction(0, 1)], dtype=object))])

We can combine the 2 wrappers to process all the information together:

In [10]:
yw3 = yw1 + yw2
yw3.title = "All the information"

In [11]:
data = data1.copy()
data.update(data2.copy())
data

{'lattice_vector': array([[0. , 0.5, 0.5],
        [0.5, 0. , 0.5],
        [0.5, 0.5, 0. ]]),
 'point_group': 'Oh',
 'natoms': 3,
 'orbital': 'p',
 'qpoint': array([Fraction(1, 2), Fraction(0, 1), Fraction(0, 1)], dtype=object)}

In [12]:
print(yw3.dumps(data))

# All the information
lattice_vector:
  [[ 0.00000000,  0.50000000,  0.50000000],
   [ 0.50000000,  0.00000000,  0.50000000],
   [ 0.50000000,  0.50000000,  0.00000000]]
point_group: Oh
natoms:  3
orbital: p
qpoint: [1/2, 0, 0]


In [13]:
yw3.loads(yw3.dumps(data))

OrderedDict([('lattice_vector',
              array([[0. , 0.5, 0.5],
                     [0.5, 0. , 0.5],
                     [0.5, 0.5, 0. ]])),
             ('point_group', 'Oh'),
             ('natoms', 3),
             ('orbital', 'p'),
             ('qpoint',
              array([Fraction(1, 2), Fraction(0, 1), Fraction(0, 1)], dtype=object))])

## HDF5Wrapper

HDF5 is a popular data format to store large amount of data in a very efficient way.

Our wrapper interfaces many of data types that is used in our codes with the supported data types in the default h5py module.

It works in the similar way as the YAMLWrapper, while it doesn't support loading from or saving to a text string.
But, when using `dump` and `save` methods, instead of providing a path to a HDF5 file, one can provide a h5py file handle so that it's possible to save the data into a subgroup of a HDF5 file.