In [1]:
import os
import qm
from entropylab import *
import numpy as np
import matplotlib.pyplot as plt

# Using the HDF5 backend

The data persistence layer of Entropy supports saving your experiment results to either a SQL based database (most simply to a sqlite db) or to the [HDF5 data format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format). 

By default, experiment results are saved to an HDF5 file and additional data is saved to the SQL database. 

**Note** this is a change from earlier version behavior where all data was saved to a single SQL file. 

This notebook shows this feature, how to to deactivate it and how to migrate your existing databses.

We start by setting up the database files. The data base entry point is calling the `SqlAlchemyDB` function with the db file path. 

In [2]:
db_file='docs_cache/tutorial.db'
hdf5_file='docs_cache/tutorial.hdf5'

if os.path.exists(db_file):
  os.remove(db_file)
  os.remove(hdf5_file)
  
db = SqlAlchemyDB(db_file)


INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running stamp_revision  -> 04ae19b32c08


This creates two files: one is the `.db` file and a second file with the `.hdf5` extension is created next to it, in the same folder. 

You can view the contenst of HDF files directly by using programs such as [hdf5 view](https://www.hdfgroup.org/downloads/hdfview/), or programmatically. In python this can be done with the h5py and PyTables packages, but many programming languages and environments (e.g. MATLAB) have tools to work with HDF5 files. 

## Turning the feature off

As mentioned above, the HDF5 feature is turned on by default.

You can turn it on by using the `enable_hdf5_storage` flag on the `SqlAlchemyDB` function



In [None]:
db=SqlAlchemyDB(path="mydb.db",enable_hdf5_storage=False) #via ctor


**Note**:Turning this feature off build a DB with results all contained in the SQL file. If you then want to turn it on, you will need to migrate the DB using the upgrade tool we supply (introduced later on this notebook).

### Turning HDF5 off using a configuration file

You can turn off the HDF5 feature by default (instead of using a feature flag). 
To do this, you need to create a file called `setting.toml` next to the `.py` file running your entropy graph. 
The `.toml` file should have the following contents:

```toml
[toggles]
hdf5_storage = true
```

Even if the `.toml` file is present, the `SqlAlchemyDB` feature flag value overrides the behavior. 

## Seeing experiment results as they are saved in HDF5

The following example graph generates a results which is then saved to the HDF5 file

In [4]:
db=SqlAlchemyDB(path=db_file,enable_hdf5_storage=True)
er = ExperimentResources(db)
def node_operation():
    return {'res':np.array([1,2,3,4])}

node1 = PyNode(label="first_node", program=node_operation,output_vars={'res'})
experiment = Graph(resources=er, graph={node1}, story="run_a") 
handle = experiment.run()

INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.


## Migration from sqlite

To migrate an existing DB to HDF5 use the following snippet

In [5]:
file_to_migrate = 'some_sqlite_db.db'
results_backend.sqlalchemy.upgrade_db(path=file_to_migrate)

INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 1318a586f31d, Initial migration
INFO  [alembic.runtime.migration] Running upgrade 1318a586f31d -> 04ae19b32c08, Add col saved_in_hdf5


The migtation tool copies the data from sqlite file to the hdf5 file. This may take a while to complete on larger database files. Get a coffee while it's working. 