# explore data using ezHDF
ezHDF also offers a very convenient API for users to explore data stored in a HDF5 file if they are stored using ezHDF hdf_store. Let's first load the file we've just stored. Remember to use mode = 'r' or 'a'. Use 'w' will erase all your stored data. 

In [1]:
import sys; sys.path.insert(0,'/Users/shutingpi/Dropbox/ezHDF')
from ezHDF.ezHDF import hdf_store
import pandas as pd

  from ._conv import register_converters as _register_converters


In [2]:
wkdir = '/Users/shutingpi/Dropbox/ezHDF/example/'
store = hdf_store(wkdir = wkdir, hdf_name = 'my_hdf.h5', mode = 'r')

# check info of the file

In [3]:
store.info()


--- ezHDF hdf_store info ---

dataset name: data1
column names:
   ['str0', 'int1', 'float2', 'float3', 'str4', 'str5', 'str6']
column dtype: s , i , f , f , s , s , s
n_rows: 10000
n_container: 10000

dataset name: data2
column names:
   ['str0', 'int1', 'str2', 'str3', 'float4', 'float5', 'str6']
column dtype: s , i , s , s , f , f , s
n_rows: 20000
n_container: 20000



# Explore a dataset
If you want to explore the dataset "data1", you can simple call the dataset explore. It will create an dataset explore object for you manupulate the data.  

In [10]:
ds = store.ds_explorer(ds_name = 'data1')

# check information of the ds object
just print it, it will show all the necessary information of this dataset. 

In [5]:
print(ds)

ezHDF hanlder object:
  dataset name: data1
  column names: 
  [str0 , int1 , float2 , float3 , str4 , str5 , str6]
  column dtype:
  [s , i , f , f , s , s , s]
  size of data: 10000
  size of container 10000


# Fetch data by slicing
Now you can fetch your data by slicing using the row index and column names. The return is super convenient, **a pandas data frame!**  
  
Note that:  
* the column can not use "slice" such as 1:3. You can only use a list for slicing.
* In HDF5, data will be moved to RAM only when you slice it. Therefore, you should slice only a chunk of data when dealing with huge dataset. 

In [6]:
# slicing using column names
df = ds[0:5,'str0']
print(df)

                str0
0           hJNDpcpA
1  jGwbJmIFYwSwhjeVh
2            ZNpYSwb
3           nQjBoPjp
4             BTQMgU


In [7]:
# you can also slice using a list of column names
df = ds[0:5,['str0','float2','str5']]
print(df)

                str0    float2                str5
0           hJNDpcpA  0.423237     doKrkpzJmKGDDix
1  jGwbJmIFYwSwhjeVh  0.979484             nJIaonN
2            ZNpYSwb  0.963585         PLRYcItYPoe
3           nQjBoPjp  0.381746               JseDV
4             BTQMgU  0.309559  XyYvSGFkcaQNPNpgmS


In [8]:
# of course you can slice use numeric indexes
df = ds[0:5,2]
print(df)

     float2
0  0.423237
1  0.979484
2  0.963585
3  0.381746
4  0.309559


In [11]:
# let along a list of numeric indexes
df = ds[0:5, [0,2,4,6]]
print(df)

                str0    float2                  str4                 str6
0           hJNDpcpA  0.423237        bZUXqFubbyCKCP       vySjHlNofqeNeE
1  jGwbJmIFYwSwhjeVh  0.979484        LVdZZEyENWdcvZ            edBTaHlEA
2            ZNpYSwb  0.963585  SoVCJtYtiuAAOXpYmxny  OcjYwHwrtFErHYhzGMN
3           nQjBoPjp  0.381746    fmlKBIMbLjBvJYDitK                yuGNa
4             BTQMgU  0.309559    AeYzNJgdDynHqEyWDD              beiMCJY


# Enjoy it !